# 🧠 Claude 3.5 Sonnet – Human vs AI Text Classifier
This notebook uses Anthropic's Claude 3.5 Sonnet model to classify whether each text input was written by a human or generated by AI.

### 📦 Install the required package (if needed)

In [None]:
# !pip install anthropic

### 📚 Import the required libraries

In [None]:
import os
import csv
import anthropic
import pandas as pd

### 🔐 Define your API key

In [None]:
api_key = "sk-ant-..."  # Replace with your real key

### ✅ Validate the API key format

In [None]:
if not api_key or not api_key.startswith("sk-ant-"):
    raise ValueError("❌ CLAUDE_KEY inválida. A chave tem de começar por sk-ant-.")

### 🤖 Initialize the Claude client

In [None]:
client = anthropic.Anthropic(api_key=api_key)

### 📝 Build the prompt with instructions and examples

In [None]:
prompt = """
You are an advanced AI content detection system, designed to distinguish between texts written by humans and those generated by artificial intelligence.  
You will act as an automated evaluator similar to tools like GPTZero, analyzing the linguistic patterns, structure, and writing style of each passage to determine its most likely origin: Human or AI.

Instructions:
- Human: if the text is written by a human.
- AI: if the text is generated by an AI.
- Ignore the ID when analyzing the text.
- Output strictly in CSV format: ID;Label
- Use exactly \"Human\" or \"AI\" as labels.
- No explanations. No headers. No extra formatting.
-Example Input:
ID;Text  
E0-1;The use of statistical tools in climate modeling has evolved significantly over time.  
E0-2;Unlock the power of the universe with our AI-driven magic story generator.
-Example Output:
E0-1;Human  
E0-2;AI
"""

### 📂 Read and format the input dataset

In [None]:
fileContent = ""
with open("data/submission3_inputs.csv", mode='r', encoding='utf-8') as file:
    reader = csv.reader(file, delimiter='\t')
    next(reader)  # Skip header
    for row in reader:
        fileContent += f"{row[0]};{row[1]}\n"

prompt += "\n### Input Dataset:\n" + fileContent

### 📡 Send the prompt to Claude 3.5 Sonnet and get a prediction

In [None]:
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=4000,
    temperature=0.0,
    top_p=1,
    messages=[{"role": "user", "content": prompt}]
)

### 🧾 Parse Claude's response into a clean format

In [None]:
results = message.content[0].text.strip().split('\n')
parsed = [row.split(';') for row in results if ';' in row]

### 📊 Create a DataFrame with the final predictions

In [None]:
ids = [row[0] for row in parsed]
labels = [row[1] for row in parsed]
output_df = pd.DataFrame({"ID": ids, "Label": labels})

### 💾 Save the output to a `.tsv` file

In [None]:
output_df.to_csv("submissao3-grupo5-s2.csv", sep="\t", index=False)
print("✅ Resultados guardados com sucesso em 'submissao3-grupo5-s2.csv'")