# Zero shot using LLAMA locally

### Import necessary packages

Before coding, you need to do this in your terminal:
1. brew install ollama (install ollama framework for starting LLMs locally)
2. pip install langchain-ollama (in virtual environment)
2. ollama serve (start ollama server)
3. get model name from: https://ollama.com/library/
4. ollama run llama3.1:8b (in second terminal; downloads model and runs it)


In [31]:
import pandas as pd
import os
from tqdm import tqdm
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score, classification_report

import warnings
warnings.filterwarnings("ignore")

from langchain_ollama import OllamaLLM

import mlflow
from mlflow.sklearn import save_model
from mlflow.transformers import log_model
import logging

import config

### MLFlow setup

In [32]:
MODEL_NAME = "llama3.1" 
TRACKING_URI = open("../.mlflow_uri").read().strip()
EXPERIMENT_NAME = config.EXPERIMENT_NAME

logging.basicConfig(format="%(asctime)s: %(message)s") # Configure logging format to show timestamp before every message

logger = logging.getLogger()
logger.setLevel(logging.INFO) # Only show logs that are INFO or more important (e.g., WARNING, ERROR) — but ignore DEBUG.

### Load df

In [33]:
df = pd.read_csv('../data/data_small.csv')

In [34]:
df.head()

Unnamed: 0.1,Unnamed: 0,dataset,text,logical_fallacies,source
0,18384,8,Testing on animals could save the life of you ...,appeal_to_emotion,
1,11271,3,"I remember when China took over Hong Kong, I r...",none,
2,15702,4,": The only ""Light at the End of the Tunnel"", i...",appeal_to_emotion,
3,7148,3,So you only believe there are two ways to run ...,none,
4,8147,3,Keep things the way they are or change them co...,false_dilemma,


### Initialize local model using Ollama

In [35]:
import warnings
warnings.filterwarnings("ignore")
from langchain_groq import ChatGroq

#temperature is hyperparameter, how creative do I want the llm to be (0 is not creative), sometimes, when it is not 0, it can give you the second likely word

# Initialize the local Llama model using Ollama
llm = OllamaLLM(
    model="deepseek-r1:8b",  # llama3.1:8b, Replace with your desired local Llama model version
    temperature=0,        # No randomness, deterministic output
    max_tokens=None,      # Unlimited token length (adjust as needed)
)

### Define a prompt

In [36]:
def classify_fallacy(text: str) -> str:
    """Classifies text into one of the predefined logical fallacies."""
    try:
        prompt = f"""Classify the following text into exactly one logical fallacy category:
- faulty_generalization
- ad_hominem
- false_dilemma 
- appeal_to_authority
- appeal_to_emotion  
- none

Here are definitions of each category for reference:
1. **Faulty Generalization**: This fallacy occurs when an argument assumes something is true for a large population without having a large enough sample. A kind of overgeneralization.
2. **Ad Hominem**: This fallacy occurs when the speaker is attacking the other person or some aspect of them rather than addressing the argument itself.
3. **False Dilemma**: This fallacy occurs when only two options are presented in an argument, even though more options may exist. A case of “either this or that”.
4. **Appeal to Authority**: This fallacy occurs when an argument relies on the opinion or endorsement of an authority figure who may not have relevant expertise or whose expertise is questionable.
5. **Appeal to Emotion**: This fallacy occurs when emotion is used to support an argument, such as pity, fear, anger, etc.
6. **None**: There are no fallacies in this text!

Here are examples of each category for reference:
1. **Faulty Generalization**: "I read one report about corruption, so that industry must be corrupt."
2. **Ad Hominem**: "Do you even know what you're talking about?"
3. **False Dilemma**: "Do you recommend drinking or injecting bleach to fight Covid?"
4. **Appeal to Authority**: "Trust me, I am a lawyer, so I know how to handle your taxes."
5. **Appeal to Emotion**: "You murdered 100,000 people, called Coronavirus a hoax, fired doctors, and told Americans to inject themselves with bleach. Maybe you should shut the fuck up."
6. **None**: "I don't think that kind of logic is good. It's essentially saying that so long as the authoritarians repress their people enough and nobody can rise up against them, we should think things are okay. We should strive to ensure they have leaders who value democracy, not oppression and authoritarianism."

Text to classify: {text}

Respond ONLY with the category name and nothing else."""

        # Generate response using the local Llama model
        response = llm.invoke(prompt)
        
        # Extract and normalize response content
        prediction = response.strip().lower()
        valid_categories = ["faulty_generalization", "ad_hominem", "false_dilemma", 
                            "appeal_to_authority", "appeal_to_emotion", "none"]
        
        return prediction if prediction in valid_categories else "none"
        
    except Exception as e:
        print(f"Error processing text: {text[:50]}... | Error: {str(e)}")
        return "Error"

### Process dataframe

In [37]:
def process_dataframe(df: pd.DataFrame, batch_size=10) -> pd.DataFrame:
    """Process DataFrame with chunking for better performance."""
    result_df = df.copy()
    
    # Process in smaller batches to reduce errors
    chunks = [df[i:i+batch_size] for i in range(0, len(df), batch_size)]
    
    with tqdm(total=len(df), desc="Classifying Logical Fallacies") as pbar:
        for chunk in chunks:
            chunk_results = []
            for text in chunk['text']:
                result = classify_fallacy(text)
                chunk_results.append(result)
                pbar.update(1)
                
            # Update results for this chunk
            result_df.loc[chunk.index, 'predicted_fallacy'] = chunk_results
    
    return result_df

### Make predictions

In [38]:
# make predictions only based on 1000 rows
df_small = df.iloc[:50]

In [39]:
# Process the DataFrame and classify logical fallacies
processed_df = process_dataframe(df_small)

Classifying Logical Fallacies:   0%|          | 0/50 [00:00<?, ?it/s]2025-04-04 15:11:51,685: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
Classifying Logical Fallacies:   2%|▏         | 1/50 [00:15<12:32, 15.35s/it]2025-04-04 15:12:07,251: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
Classifying Logical Fallacies:   4%|▍         | 2/50 [00:36<15:05, 18.87s/it]2025-04-04 15:12:28,442: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
Classifying Logical Fallacies:   6%|▌         | 3/50 [01:08<19:33, 24.97s/it]2025-04-04 15:13:00,803: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
Classifying Logical Fallacies:   8%|▊         | 4/50 [01:28<17:37, 22.99s/it]2025-04-04 15:13:20,481: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
Classifying Logical Fallacies:  10%|█         | 5/50 [01:41<14:19, 19.09s/it]2025-04-04 15:13:32,474: HTTP Request: POST http://127.0

### Evaluation

In [44]:
processed_df.head(50)

Unnamed: 0.1,Unnamed: 0,dataset,text,logical_fallacies,source,predicted_fallacy
0,18384,8,Testing on animals could save the life of you ...,appeal_to_emotion,,none
1,11271,3,"I remember when China took over Hong Kong, I r...",none,,none
2,15702,4,": The only ""Light at the End of the Tunnel"", i...",appeal_to_emotion,,none
3,7148,3,So you only believe there are two ways to run ...,none,,none
4,8147,3,Keep things the way they are or change them co...,false_dilemma,,none
5,4242,2,If you don't agree to sign the labor agreement...,appeal_to_emotion,,none
6,12550,3,It is true that there are many people in gover...,none,,none
7,8372,3,either the company regularizes those employees...,false_dilemma,,none
8,12883,4,: ... False. Crack is a choice. A once in a...,none,,none
9,19528,9,Horribly wounded.,appeal_to_emotion,,none


In [41]:
print(classification_report(processed_df["logical_fallacies"], processed_df["predicted_fallacy"]))

                       precision    recall  f1-score   support

           ad_hominem       0.00      0.00      0.00         4
  appeal_to_authority       0.00      0.00      0.00         4
    appeal_to_emotion       0.00      0.00      0.00        13
        false_dilemma       0.00      0.00      0.00         5
faulty_generalization       0.00      0.00      0.00         5
                 none       0.38      1.00      0.55        19

             accuracy                           0.38        50
            macro avg       0.06      0.17      0.09        50
         weighted avg       0.14      0.38      0.21        50

