# Zero shot using LLM with Ollama locally

### Import necessary packages

Before coding, you need to do this in your terminal:
1. brew install ollama (install ollama framework for starting LLMs locally)
2. pip install langchain-ollama (in virtual environment)
2. ollama serve (start ollama server)
3. get model name from: https://ollama.com/library/
4. ollama run name_of_model_from_ollama (e.g. llama3.1:8b; in second terminal; downloads model and runs it)


In [1]:
import pandas as pd
import os
from tqdm import tqdm
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

import warnings
warnings.filterwarnings("ignore")

from langchain_groq import ChatGroq
from langchain_ollama import OllamaLLM
import warnings
warnings.filterwarnings("ignore")

import mlflow
from mlflow.sklearn import save_model
from mlflow.transformers import log_model
import logging

import config

### Setup

In [2]:
MODEL_NAME = "deepseek" 
TRACKING_URI = open("../.mlflow_uri").read().strip()
EXPERIMENT_NAME = config.EXPERIMENT_NAME

logging.basicConfig(format="%(asctime)s: %(message)s") # Configure logging format to show timestamp before every message

logger = logging.getLogger()
logger.setLevel(logging.INFO) # Only show logs that are INFO or more important (e.g., WARNING, ERROR) — but ignore DEBUG.

In [3]:
DATA_PATH = "../data/data_dropped_duplicates_small.csv"
SAVE_PATH = "../models/DeepSeek"

### Get data

In [4]:
df = pd.read_csv(DATA_PATH)

### Initialize local model using Ollama

In [5]:
#temperature is hyperparameter, how creative do I want the llm to be (0 is not creative), sometimes, when it is not 0, it can give you the second likely word

# Initialize the local Llama model using Ollama
llm = OllamaLLM(
    model="deepseek-r1:8b",  # llama3.1:8b, Replace with your desired local Llama model version
    # temperature=0,        # no randomness, deterministic output
    max_tokens=None,      # unlimited token length (adjust as needed)
)

In [None]:
params = {
      #"temperature": 0,
      "max_tokens": None,
  }
  
# setting the MLFlow connection and experiment
mlflow.set_tracking_uri(TRACKING_URI)
mlflow.set_experiment(EXPERIMENT_NAME)

mlflow.start_run()
run = mlflow.active_run()
print("Active run_id: {}".format(run.info.run_id))

mlflow.set_tag("model_name", MODEL_NAME)
mlflow.set_tag('mlflow.runName', 'deepseek-r1:8b')
mlflow.log_params(params)

Active run_id: ec2ca99970f943daac67b393bdd123e7


### Define a prompt

In [7]:
import re


def classify_fallacy(text: str) -> str:
    """Classifies text into one of the predefined logical fallacies."""
    try:
        prompt = f"""Classify the text below into exactly one logical fallacy category:
- faulty_generalization
- ad_hominem
- false_dilemma 
- appeal_to_authority
- appeal_to_emotion  
- none

**Rules**:
- Use ONLY the exact category names above
- If no fallacy matches, use "none"
- No explanations, only the category name

Here are definitions of each category for reference:
1. **Faulty Generalization**: This fallacy occurs when an argument assumes something is true for a large population without having a large enough sample. A kind of overgeneralization.
2. **Ad Hominem**: This fallacy occurs when the speaker is attacking the other person or some aspect of them rather than addressing the argument itself.
3. **False Dilemma**: This fallacy occurs when only two options are presented in an argument, even though more options may exist. A case of “either this or that”.
4. **Appeal to Authority**: This fallacy occurs when an argument relies on the opinion or endorsement of an authority figure who may not have relevant expertise or whose expertise is questionable.
5. **Appeal to Emotion**: This fallacy occurs when emotion is used to support an argument, such as pity, fear, anger, etc.
6. **None**: There are no fallacies in this text!

Here are examples of each category for reference:
1. **Faulty Generalization**: "I read one report about corruption, so that industry must be corrupt."
2. **Ad Hominem**: "Do you even know what you're talking about?"
3. **False Dilemma**: "Do you recommend drinking or injecting bleach to fight Covid?"
4. **Appeal to Authority**: "Trust me, I am a lawyer, so I know how to handle your taxes."
5. **Appeal to Emotion**: "You murdered 100,000 people, called Coronavirus a hoax, fired doctors, and told Americans to inject themselves with bleach. Maybe you should shut the fuck up."
6. **None**: "I don't think that kind of logic is good. It's essentially saying that so long as the authoritarians repress their people enough and nobody can rise up against them, we should think things are okay. We should strive to ensure they have leaders who value democracy, not oppression and authoritarianism."

Text to classify: {text}

."""

        # Generate response using the local Llama model
        response = llm.invoke(prompt)
        
        # for deepseek, remove <think>...</think> tags
        # Remove <think>...</think> tags and line breaks
        response = re.sub(r"<think>.*?</think>", "", response, flags=re.DOTALL).strip()

        # Extract and normalize response content
        prediction = response.strip().lower()
        valid_categories = ["faulty_generalization", "ad_hominem", "false_dilemma", 
                            "appeal_to_authority", "appeal_to_emotion", "none"]
        
        return prediction if prediction in valid_categories else "error" #else "none"
        
    except Exception as e:
        print(f"Error processing text: {text[:50]}... | Error: {str(e)}")
        return "error"

### Process dataframe

In [8]:
def process_dataframe(df: pd.DataFrame, batch_size=10) -> pd.DataFrame:
    """Process DataFrame with chunking for better performance."""
    result_df = df.copy()
    
    # Process in smaller batches to reduce errors
    chunks = [df[i:i+batch_size] for i in range(0, len(df), batch_size)]
    
    with tqdm(total=len(df), desc="Classifying Logical Fallacies") as pbar: #tqdm progresses bar
        for chunk in chunks:
            chunk_results = []
            for text in chunk['text']:
                result = classify_fallacy(text)
                chunk_results.append(result)
                pbar.update(1)
                
            # Update results for this chunk
            result_df.loc[chunk.index, 'predicted_fallacy'] = chunk_results
    
    return result_df

### Make predictions

In [9]:
# make predictions only based on 1000 rows
df_small = df.iloc[:1000]

In [10]:
logger.info('prediction of logical fallacies')
# Process the DataFrame and classify logical fallacies
processed_df = process_dataframe(df_small)

2025-04-07 14:56:31,155: prediction of logical fallacies
Classifying Logical Fallacies:   0%|          | 0/1000 [00:00<?, ?it/s]2025-04-07 14:56:33,733: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
Classifying Logical Fallacies:   0%|          | 1/1000 [00:12<3:29:16, 12.57s/it]2025-04-07 14:56:44,562: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
Classifying Logical Fallacies:   0%|          | 2/1000 [00:32<4:37:56, 16.71s/it]2025-04-07 14:57:03,573: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
Classifying Logical Fallacies:   0%|          | 3/1000 [00:47<4:30:49, 16.30s/it]2025-04-07 14:57:19,364: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
Classifying Logical Fallacies:   0%|          | 4/1000 [01:01<4:11:20, 15.14s/it]2025-04-07 14:57:33,056: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
Classifying Logical Fallacies:   0%|          | 5/1000 [01

### Evaluation

In [13]:
y_true = processed_df["logical_fallacies"] 
y_pred = processed_df["predicted_fallacy"]

In [8]:
def get_metrics(y_true, y_pred):
    logger.info('classification_report')
    classification_report_dict = classification_report(y_true, y_pred, output_dict=True)
    print(classification_report(y_true, y_pred))

    logger.info('confusion_matrix')
    print(confusion_matrix(y_true, y_pred))

    return classification_report_dict

In [9]:
def log_metrics(cr):
    for key, value in cr.items():
        if key == "accuracy":
            mlflow.log_metric(key, value)  # Logging accuracy directly
        else:
            for metric in value:
                mlflow.log_metric(f"{key}_{metric}", value.get(metric))  # Logging other metrics

In [19]:
# note, results have error, need to remove
logger.info('predictions')

classification_report = get_metrics(y_true, y_pred)
log_metrics(classification_report)

2025-04-07 20:50:59,911: predictions
2025-04-07 20:50:59,912: classification_report
2025-04-07 20:50:59,956: confusion_matrix


                       precision    recall  f1-score   support

           ad_hominem       0.70      0.48      0.57        82
  appeal_to_authority       0.57      0.29      0.38        70
    appeal_to_emotion       0.28      0.15      0.19       136
                error       0.00      0.00      0.00         0
        false_dilemma       0.55      0.62      0.58        86
faulty_generalization       0.36      0.61      0.46       131
                 none       0.63      0.63      0.63       495

             accuracy                           0.53      1000
            macro avg       0.44      0.40      0.40      1000
         weighted avg       0.54      0.53      0.52      1000

[[ 39   2   5   1   1  11  23]
 [  1  20   6   3   2   6  32]
 [  2   2  20   4   6  19  83]
 [  0   0   0   0   0   0   0]
 [  0   0   7   1  53   8  17]
 [  0   2   8   3   6  80  32]
 [ 14   9  25  10  28  96 313]]


In [20]:
mlflow.end_run()

🏃 View run deepseek-r1:8b at: http://127.0.0.1:5001/#/experiments/823412171152425451/runs/ec2ca99970f943daac67b393bdd123e7
🧪 View experiment at: http://127.0.0.1:5001/#/experiments/823412171152425451


### Save CSV

In [23]:
processed_df.to_csv(SAVE_PATH, index=False)

### Open CSV

In [4]:
df_wp = pd.read_csv(SAVE_PATH)

In [5]:
df_wp_woe = df_wp[df_wp["predicted_fallacy"] != "error"]

In [6]:
y_true_reload = df_wp_woe["logical_fallacies"] 
y_pred_reload = df_wp_woe["predicted_fallacy"]

In [11]:
# note, results have error, need to remove
logger.info('predictions')

classification_report = get_metrics(y_true_reload, y_pred_reload)
log_metrics(classification_report)

2025-04-09 16:20:22,620: predictions
2025-04-09 16:20:22,622: classification_report
2025-04-09 16:20:22,651: confusion_matrix


                       precision    recall  f1-score   support

           ad_hominem       0.70      0.48      0.57        81
  appeal_to_authority       0.57      0.30      0.39        67
    appeal_to_emotion       0.28      0.15      0.20       132
        false_dilemma       0.55      0.62      0.59        85
faulty_generalization       0.36      0.62      0.46       128
                 none       0.63      0.65      0.64       485

             accuracy                           0.54       978
            macro avg       0.52      0.47      0.47       978
         weighted avg       0.54      0.54      0.53       978

[[ 39   2   5   1  11  23]
 [  1  20   6   2   6  32]
 [  2   2  20   6  19  83]
 [  0   0   7  53   8  17]
 [  0   2   8   6  80  32]
 [ 14   9  25  28  96 313]]
