# Zero shot using LLM with Ollama locally

### Import necessary packages

Before coding, you need to do this in your terminal:
1. brew install ollama (install ollama framework for starting LLMs locally)
2. pip install langchain-ollama (in virtual environment)
2. ollama serve (start ollama server)
3. get model name from: https://ollama.com/library/
4. ollama run name_of_model_from_ollama (e.g. llama3.1:8b; in second terminal; downloads model and runs it)


In [1]:
import pandas as pd
import os
from tqdm import tqdm
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

import warnings
warnings.filterwarnings("ignore")

from langchain_groq import ChatGroq
from langchain_ollama import OllamaLLM
import warnings
warnings.filterwarnings("ignore")

import mlflow
from mlflow.sklearn import save_model
from mlflow.transformers import log_model
import logging

import config

### Setup

In [2]:
MODEL_NAME = "deepseek" 
TRACKING_URI = open("../.mlflow_uri").read().strip()
EXPERIMENT_NAME = config.EXPERIMENT_NAME

logging.basicConfig(format="%(asctime)s: %(message)s") # Configure logging format to show timestamp before every message

logger = logging.getLogger()
logger.setLevel(logging.INFO) # Only show logs that are INFO or more important (e.g., WARNING, ERROR) — but ignore DEBUG.

In [3]:
DATA_PATH = "../data/data_val.csv"
SAVE_PATH = "../models/DeepSeek"

### Get data

In [4]:
df = pd.read_csv(DATA_PATH)

### Initialize local model using Ollama

In [5]:
#temperature is hyperparameter, how creative do I want the llm to be (0 is not creative), sometimes, when it is not 0, it can give you the second likely word

# Initialize the local Llama model using Ollama
llm = OllamaLLM(
    model="deepseek-r1:8b",  # llama3.1:8b, Replace with your desired local Llama model version
    # temperature=0,        # no randomness, deterministic output
    max_tokens=None,      # unlimited token length (adjust as needed)
)

In [6]:
params = {
      #"temperature": 0,
      "max_tokens": None,
  }
  
# setting the MLFlow connection and experiment
mlflow.set_tracking_uri(TRACKING_URI)
mlflow.set_experiment(EXPERIMENT_NAME)

mlflow.start_run()
run = mlflow.active_run()
print("Active run_id: {}".format(run.info.run_id))

mlflow.set_tag("model_name", MODEL_NAME)
mlflow.set_tag('mlflow.runName', 'deepseek-r1:8b')
mlflow.log_params(params)

Active run_id: 150fd5732ad44440a0a254ba5e74b478


### Define a prompt

In [7]:
# import re


# def classify_fallacy(text: str) -> str:
#     """Classifies text into one of the predefined logical fallacies."""
#     try:
#         prompt = f"""Classify the text below into exactly one logical fallacy category:
# - faulty_generalization
# - ad_hominem
# - false_dilemma 
# - appeal_to_authority
# - appeal_to_emotion  
# - none

# **Rules**:
# - Use ONLY the exact category names above
# - If no fallacy matches, use "none"
# - No explanations, only the category name

# Here are definitions of each category for reference:
# 1. **Faulty Generalization**: This fallacy occurs when an argument assumes something is true for a large population without having a large enough sample. A kind of overgeneralization.
# 2. **Ad Hominem**: This fallacy occurs when the speaker is attacking the other person or some aspect of them rather than addressing the argument itself.
# 3. **False Dilemma**: This fallacy occurs when only two options are presented in an argument, even though more options may exist. A case of “either this or that”.
# 4. **Appeal to Authority**: This fallacy occurs when an argument relies on the opinion or endorsement of an authority figure who may not have relevant expertise or whose expertise is questionable.
# 5. **Appeal to Emotion**: This fallacy occurs when emotion is used to support an argument, such as pity, fear, anger, etc.
# 6. **None**: There are no fallacies in this text!

# Here are examples of each category for reference:
# 1. **Faulty Generalization**: "I read one report about corruption, so that industry must be corrupt."
# 2. **Ad Hominem**: "Do you even know what you're talking about?"
# 3. **False Dilemma**: "Do you recommend drinking or injecting bleach to fight Covid?"
# 4. **Appeal to Authority**: "Trust me, I am a lawyer, so I know how to handle your taxes."
# 5. **Appeal to Emotion**: "You murdered 100,000 people, called Coronavirus a hoax, fired doctors, and told Americans to inject themselves with bleach. Maybe you should shut the fuck up."
# 6. **None**: "I don't think that kind of logic is good. It's essentially saying that so long as the authoritarians repress their people enough and nobody can rise up against them, we should think things are okay. We should strive to ensure they have leaders who value democracy, not oppression and authoritarianism."

# Text to classify: {text}

# ."""

#         # Generate response using the local Llama model
#         response = llm.invoke(prompt)
        
#         # for deepseek, remove <think>...</think> tags
#         # Remove <think>...</think> tags and line breaks
#         response = re.sub(r"<think>.*?</think>", "", response, flags=re.DOTALL).strip()

#         # Extract and normalize response content
#         prediction = response.strip().lower()
#         valid_categories = ["faulty_generalization", "ad_hominem", "false_dilemma", 
#                             "appeal_to_authority", "appeal_to_emotion", "none"]
        
#         return prediction if prediction in valid_categories else "error" #else "none"
        
#     except Exception as e:
#         print(f"Error processing text: {text[:50]}... | Error: {str(e)}")
#         return "error"

In [10]:
#Prompt without faulty generalization, but with slippery slope
import re


def classify_fallacy(text: str) -> str:
    """Classifies text into one of the predefined logical fallacies."""
    try:
        prompt = f"""Classify the text below into exactly one logical fallacy category:
- slippery_slope
- ad_hominem
- false_dilemma 
- appeal_to_authority
- appeal_to_emotion  
- none

**Rules**:
- Use ONLY the exact category names above
- If no fallacy matches, use "none"
- No explanations, only the category name

Here are definitions of each category for reference:
1. **Slippery Slope**: This fallacy occurs when an argument suggests that a relatively minor initial action or event will lead to a chain of increasingly significant and undesirable consequences, often without sufficient evidence to support the causal connections between steps.
2. **Ad Hominem**: This fallacy occurs when the speaker is attacking the other person or some aspect of them rather than addressing the argument itself.
3. **False Dilemma**: This fallacy occurs when only two options are presented in an argument, even though more options may exist. A case of “either this or that”.
4. **Appeal to Authority**: This fallacy occurs when an argument relies on the opinion or endorsement of an authority figure who may not have relevant expertise or whose expertise is questionable.
5. **Appeal to Emotion**: This fallacy occurs when emotion is used to support an argument, such as pity, fear, anger, etc.
6. **None**: There are no fallacies in this text!

Here are examples of each category for reference:
1. **Slippery Slope**: "If we lower the voting age to 17, then people will argue for lowering it to 16, and eventually, babies will be voting."
2. **Ad Hominem**: "Do you even know what you're talking about?"
3. **False Dilemma**: "Do you recommend drinking or injecting bleach to fight Covid?"
4. **Appeal to Authority**: "Trust me, I am a lawyer, so I know how to handle your taxes."
5. **Appeal to Emotion**: "You murdered 100,000 people, called Coronavirus a hoax, fired doctors, and told Americans to inject themselves with bleach. Maybe you should shut the fuck up."
6. **None**: "I don't think that kind of logic is good. It's essentially saying that so long as the authoritarians repress their people enough and nobody can rise up against them, we should think things are okay. We should strive to ensure they have leaders who value democracy, not oppression and authoritarianism."

Text to classify: {text}

."""

        # Generate response using the local Llama model
        response = llm.invoke(prompt)
        
        # for deepseek, remove <think>...</think> tags
        # Remove <think>...</think> tags and line breaks
        response = re.sub(r"<think>.*?</think>", "", response, flags=re.DOTALL).strip()

        # Extract and normalize response content
        prediction = response.strip().lower()
        valid_categories = ["slippery_slope", "ad_hominem", "false_dilemma", 
                            "appeal_to_authority", "appeal_to_emotion", "none"]
        
        return prediction if prediction in valid_categories else "error" #else "none"
        
    except Exception as e:
        print(f"Error processing text: {text[:50]}... | Error: {str(e)}")
        return "error"

### Process dataframe

In [11]:
def process_dataframe(df: pd.DataFrame, batch_size=10) -> pd.DataFrame:
    """Process DataFrame with chunking for better performance."""
    result_df = df.copy()
    
    # Process in smaller batches to reduce errors
    chunks = [df[i:i+batch_size] for i in range(0, len(df), batch_size)]
    
    with tqdm(total=len(df), desc="Classifying Logical Fallacies") as pbar: #tqdm progresses bar
        for chunk in chunks:
            chunk_results = []
            for text in chunk['text']:
                result = classify_fallacy(text)
                chunk_results.append(result)
                pbar.update(1)
                
            # Update results for this chunk
            result_df.loc[chunk.index, 'predicted_fallacy'] = chunk_results
    
    return result_df

### Make predictions

In [12]:
logger.info('prediction of logical fallacies')
# Process the DataFrame and classify logical fallacies
processed_df = process_dataframe(df)

2025-04-12 12:51:41,579: prediction of logical fallacies
Classifying Logical Fallacies:   0%|          | 0/1350 [00:00<?, ?it/s]2025-04-12 12:51:45,938: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
Classifying Logical Fallacies:   0%|          | 1/1350 [00:24<9:13:45, 24.63s/it]2025-04-12 12:52:06,737: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
Classifying Logical Fallacies:   0%|          | 2/1350 [00:41<7:29:57, 20.03s/it]2025-04-12 12:52:23,391: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
Classifying Logical Fallacies:   0%|          | 3/1350 [01:05<8:05:58, 21.65s/it]2025-04-12 12:52:47,254: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
Classifying Logical Fallacies:   0%|          | 4/1350 [01:25<7:54:38, 21.16s/it]2025-04-12 12:53:07,379: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
Classifying Logical Fallacies:   0%|          | 5/1350 [01

### Evaluation

In [5]:
y_true = processed_df["logical_fallacies"] 
y_pred = processed_df["predicted_fallacy"]

NameError: name 'processed_df' is not defined

In [8]:
def get_metrics(y_true, y_pred):
    logger.info('classification_report')
    classification_report_dict = classification_report(y_true, y_pred, output_dict=True)
    print(classification_report(y_true, y_pred))

    logger.info('confusion_matrix')
    print(confusion_matrix(y_true, y_pred))

    return classification_report_dict

In [9]:
def log_metrics(cr):
    for key, value in cr.items():
        if key == "accuracy":
            mlflow.log_metric(key, value)  # Logging accuracy directly
        else:
            for metric in value:
                mlflow.log_metric(f"{key}_{metric}", value.get(metric))  # Logging other metrics

In [16]:
# note, results have error, need to remove
logger.info('predictions')

classification_report = get_metrics(y_true, y_pred)
log_metrics(classification_report)

2025-04-12 19:59:56,684: predictions
2025-04-12 19:59:56,684: classification_report
2025-04-12 19:59:56,758: confusion_matrix


                     precision    recall  f1-score   support

         ad_hominem       0.87      0.42      0.56       250
appeal_to_authority       0.80      0.39      0.53       250
  appeal_to_emotion       0.52      0.20      0.28       250
              error       0.00      0.00      0.00         0
      false_dilemma       0.75      0.60      0.66       250
               none       0.30      0.78      0.43       250
     slippery_slope       0.56      0.79      0.65       100

           accuracy                           0.50      1350
          macro avg       0.54      0.45      0.45      1350
       weighted avg       0.64      0.50      0.51      1350

[[104  11  16   5  13  97   4]
 [  4  98   5  11  13 107  12]
 [  8   2  49   4  11 164  12]
 [  0   0   0   0   0   0   0]
 [  1   1   6   3 149  69  21]
 [  2  10  13   3  13 195  14]
 [  0   0   5   0   0  16  79]]


In [17]:
mlflow.end_run()

🏃 View run deepseek-r1:8b at: http://127.0.0.1:5001/#/experiments/823412171152425451/runs/150fd5732ad44440a0a254ba5e74b478
🧪 View experiment at: http://127.0.0.1:5001/#/experiments/823412171152425451


### Save CSV

In [18]:
processed_df.to_csv(SAVE_PATH, index=False)

### Open CSV

In [4]:
df = pd.read_csv(SAVE_PATH)

In [9]:
df.head()

Unnamed: 0,dataset,text,logical_fallacies,source,two_class_target,predicted_fallacy
0,3,Either the Prime Minister lets independent med...,false_dilemma,,fallacy,none
1,3,Turkey really needs to get rid of Erdogan and ...,false_dilemma,,fallacy,none
2,3,I understand Turkey s position here They can t...,false_dilemma,,fallacy,false_dilemma
3,3,I don t know why the international community e...,false_dilemma,,fallacy,none
4,1,If it is very low for instance around one it m...,false_dilemma,http://business.financialpost.com/opinion/ross...,fallacy,slippery_slope


In [5]:
df_woe = df[df["predicted_fallacy"] != "error"]

In [6]:
y_true_reload = df_woe["logical_fallacies"] 
y_pred_reload = df_woe["predicted_fallacy"]

In [10]:
# note, results have error, need to remove
logger.info('predictions')

classification_report = get_metrics(y_true_reload, y_pred_reload)
log_metrics(classification_report)

2025-04-12 21:58:59,161: predictions
2025-04-12 21:58:59,163: classification_report
2025-04-12 21:58:59,195: confusion_matrix


                     precision    recall  f1-score   support

         ad_hominem       0.87      0.42      0.57       245
appeal_to_authority       0.80      0.41      0.54       239
  appeal_to_emotion       0.52      0.20      0.29       246
      false_dilemma       0.75      0.60      0.67       247
               none       0.30      0.79      0.44       247
     slippery_slope       0.56      0.79      0.65       100

           accuracy                           0.51      1324
          macro avg       0.63      0.54      0.53      1324
       weighted avg       0.64      0.51      0.51      1324

[[104  11  16  13  97   4]
 [  4  98   5  13 107  12]
 [  8   2  49  11 164  12]
 [  1   1   6 149  69  21]
 [  2  10  13  13 195  14]
 [  0   0   5   0  16  79]]
