# Political Corruption Article Translation and Classification Pipeline

This notebook performs a complete pipeline for analyzing political corruption articles:

1. **Load and translate** human-annotated news articles from the original language into the target language.
2. **Save** the translated dataset   for further use.
3. **Classify** the translated
4.  articles using a large language model (LLM)-based classifier (`classify_article`), extracting evidence, rationale, confidence scores, and tentative labels.
5. **Save** the classification results as both CSV and Excel files for easy sharing and review.
6. **Evaluate** the classification performance by comparing the LLM-generated labels against the human-annotated ground truth using a classification report.

This workflow streamlines the process from raw human annotations to translated, LLM-annotated, and evaluated datasets ready for analysis or reporting.


In [2]:
import os
import pandas as pd
from dataloader import load_and_prepare_data, balanced_sample, load_human_annotated_for_translation
from translation import translate_dataframe
from utils.classifier import classify_article
from sklearn.metrics import classification_report

# Step 1: Load and translate human-annotated data
df_anno = load_human_annotated_for_translation()
df_anno["text"] = df_anno["combined_text"]
df_translated_anno = translate_dataframe(df_anno)

# Prepare output dataframe with selected columns
df = pd.DataFrame({
    "original_text": df_anno["combined_text"],
    "translated_text": df_translated_anno["translated_text"],
    "uri": df_anno.get("uri", ""),  # Use empty string if 'uri' missing
    "corruption_label_m": df_anno["corruption_label_m"]
})

# Save translated annotations CSV
translated_csv_path = os.path.expanduser(
    '~/webdav/ASCOR-FMG-5580-RESPOND-news-data (Projectfolder)/annotations/classified_pol_corruption_gabriele_translated.csv'
)
df.to_csv(translated_csv_path, index=False)
print(f"Translated annotations saved to: {translated_csv_path}")

# Step 2: Classify translated articles using the LLM classifier
results = {
    "llm_evidence": [],
    "llm_rationale": [],
    "llm_confidence": [],
    "llm_label": []
}

print(f"Processing {len(df)} articles for classification...")

for idx, row in df.iterrows():
    print(f"Classifying article {idx + 1}/{len(df)}")
    article_text = row["translated_text"]

    if not isinstance(article_text, str) or len(article_text.strip()) == 0:
        print("⚠️ Skipping empty article.")
        results["llm_evidence"].append("")
        results["llm_rationale"].append("No content")
        results["llm_confidence"].append(None)
        results["llm_label"].append("No")
        continue

    output = classify_article(article_text)

    evidence = "; ".join(output.get("highlights", []))
    results["llm_evidence"].append(evidence)
    results["llm_rationale"].append(output.get("rationale", ""))
    results["llm_confidence"].append(output.get("confidence", ""))
    results["llm_label"].append(output.get("tentative_label", ""))

# Add classification results to DataFrame
df["llm_evidence"] = results["llm_evidence"]
df["llm_rationale"] = results["llm_rationale"]
df["llm_confidence"] = results["llm_confidence"]
df["llm_label"] = results["llm_label"]

# Step 3: Save final annotated DataFrame as CSV and Excel
ANNOTATION_PATH = os.path.expanduser('~/webdav/ASCOR-FMG-5580-RESPOND-news-data (Projectfolder)/annotations/')
OUTPUT_FILE = "df_output_with_llm_annotations.csv"
os.makedirs(ANNOTATION_PATH, exist_ok=True)

csv_path = os.path.join(ANNOTATION_PATH, OUTPUT_FILE)
xlsx_path = csv_path.replace('.csv', '.xlsx')

df.to_csv(csv_path, index=False)
print(f"CSV saved to: {csv_path}")

df.to_excel(xlsx_path, index=False)
print(f"Excel saved to: {xlsx_path}")

# Step 4: Evaluate classification with sklearn classification_report
# Map ground truth to "Yes" or "No"
y_true = df["corruption_label_m"].map({
    "political corruption": "Yes",
    "no political corruption": "No"
})

# Map predictions, collapsing 'Unsure' and 'Mentioned but not central' into 'No'
def map_prediction(label):
    if not isinstance(label, str):
        return "No"
    label = label.strip().capitalize()
    if label == "Yes":
        return "Yes"
    elif label in ["No", "Unsure", "Mentioned but not central"]:
        return "No"
    else:
        return "No"

y_pred = df["llm_label"].map(map_prediction)

print("\n📊 Classification Report:")
print(classification_report(y_true, y_pred, labels=["Yes", "No"], zero_division=0))

🌍 Starting translation for multilingual dataset using 'combined_text'


🔁 Translating:   0%|          | 0/201 [00:00<?, ?it/s]

Translating chunk 1/1 (chars: 1152)


🔁 Translating:   1%|          | 2/201 [00:04<07:58,  2.40s/it]

Translating chunk 1/1 (chars: 837)


🔁 Translating:   1%|▏         | 3/201 [00:06<07:15,  2.20s/it]

Translating chunk 1/2 (chars: 1497)
⚠️ Suspected untranslated chunk — marking empty
❌ Failed to translate chunk 1 properly.
Translating chunk 2/2 (chars: 209)


🔁 Translating:   2%|▏         | 4/201 [00:12<11:43,  3.57s/it]

Translating chunk 1/1 (chars: 1117)


🔁 Translating:   2%|▏         | 5/201 [00:15<10:27,  3.20s/it]

Translating chunk 1/1 (chars: 802)


🔁 Translating:   3%|▎         | 6/201 [00:17<09:02,  2.78s/it]

Translating chunk 1/2 (chars: 1444)
Translating chunk 2/2 (chars: 375)


🔁 Translating:   3%|▎         | 7/201 [00:21<10:27,  3.24s/it]

Translating chunk 1/2 (chars: 1465)
Translating chunk 2/2 (chars: 662)


🔁 Translating:   4%|▍         | 8/201 [00:26<12:20,  3.84s/it]

Translating chunk 1/2 (chars: 1224)
⚠️ Suspected untranslated chunk — marking empty
❌ Failed to translate chunk 1 properly.
Translating chunk 2/2 (chars: 460)


🔁 Translating:   4%|▍         | 9/201 [00:31<13:41,  4.28s/it]

Translating chunk 1/2 (chars: 1360)
Translating chunk 2/2 (chars: 1080)


🔁 Translating:   5%|▍         | 10/201 [00:37<15:07,  4.75s/it]

Translating chunk 1/1 (chars: 1206)


🔁 Translating:   5%|▌         | 11/201 [00:40<13:38,  4.31s/it]

Translating chunk 1/1 (chars: 998)


🔁 Translating:   6%|▌         | 12/201 [00:43<11:53,  3.77s/it]

Translating chunk 1/3 (chars: 1389)
Translating chunk 2/3 (chars: 1202)
Translating chunk 3/3 (chars: 612)


🔁 Translating:   6%|▋         | 13/201 [00:51<16:15,  5.19s/it]

Translating chunk 1/2 (chars: 1282)
Translating chunk 2/2 (chars: 653)


🔁 Translating:   7%|▋         | 14/201 [00:55<15:13,  4.88s/it]

Translating chunk 1/9 (chars: 1145)
Translating chunk 2/9 (chars: 1348)
Translating chunk 3/9 (chars: 1422)
Translating chunk 4/9 (chars: 1131)
Translating chunk 5/9 (chars: 1133)
Translating chunk 6/9 (chars: 1387)
Translating chunk 7/9 (chars: 695)
Translating chunk 8/9 (chars: 1137)
Translating chunk 9/9 (chars: 655)


🔁 Translating:   7%|▋         | 15/201 [01:20<33:31, 10.81s/it]

Translating chunk 1/6 (chars: 1473)
Translating chunk 2/6 (chars: 1393)
Translating chunk 3/6 (chars: 1342)
Translating chunk 4/6 (chars: 1498)
Translating chunk 5/6 (chars: 1371)
Translating chunk 6/6 (chars: 935)


🔁 Translating:   8%|▊         | 16/201 [01:38<39:52, 12.93s/it]

Translating chunk 1/10 (chars: 1108)
Translating chunk 2/10 (chars: 730)
Translating chunk 3/10 (chars: 1500)
Translating chunk 4/10 (chars: 1273)
Translating chunk 5/10 (chars: 1124)
Translating chunk 6/10 (chars: 1500)
Translating chunk 7/10 (chars: 869)
Translating chunk 8/10 (chars: 1500)
Translating chunk 9/10 (chars: 744)
Translating chunk 10/10 (chars: 1354)


🔁 Translating:   8%|▊         | 17/201 [02:07<54:17, 17.70s/it]

Translating chunk 1/3 (chars: 1278)
Translating chunk 2/3 (chars: 1483)
Translating chunk 3/3 (chars: 169)


🔁 Translating:   9%|▉         | 18/201 [02:13<43:48, 14.36s/it]

Translating chunk 1/1 (chars: 993)


🔁 Translating:   9%|▉         | 19/201 [02:16<32:46, 10.81s/it]

Translating chunk 1/10 (chars: 1268)
Translating chunk 2/10 (chars: 1179)
Translating chunk 3/10 (chars: 1400)
Translating chunk 4/10 (chars: 1318)
Translating chunk 5/10 (chars: 1311)
Translating chunk 6/10 (chars: 1454)
Translating chunk 7/10 (chars: 1107)
Translating chunk 8/10 (chars: 1357)
Translating chunk 9/10 (chars: 1476)
Translating chunk 10/10 (chars: 232)


🔁 Translating:  10%|▉         | 20/201 [02:45<49:20, 16.36s/it]

⚠️ Suspected untranslated chunk — marking empty
❌ Failed to translate chunk 10 properly.
Translating chunk 1/1 (chars: 1116)


🔁 Translating:  10%|█         | 21/201 [02:48<37:20, 12.45s/it]

Translating chunk 1/2 (chars: 1415)
Translating chunk 2/2 (chars: 1185)


🔁 Translating:  11%|█         | 22/201 [02:56<32:56, 11.04s/it]

Translating chunk 1/2 (chars: 1402)
⚠️ Suspected untranslated chunk — marking empty
❌ Failed to translate chunk 1 properly.
Translating chunk 2/2 (chars: 770)


🔁 Translating:  11%|█▏        | 23/201 [03:03<29:09,  9.83s/it]

Translating chunk 1/1 (chars: 1485)


🔁 Translating:  12%|█▏        | 24/201 [03:07<24:07,  8.18s/it]

Translating chunk 1/2 (chars: 1449)
Translating chunk 2/2 (chars: 477)


🔁 Translating:  12%|█▏        | 25/201 [03:13<21:36,  7.37s/it]

Translating chunk 1/2 (chars: 1494)
Translating chunk 2/2 (chars: 132)


🔁 Translating:  13%|█▎        | 26/201 [03:19<20:10,  6.92s/it]

Translating chunk 1/1 (chars: 862)


🔁 Translating:  13%|█▎        | 27/201 [03:21<15:57,  5.51s/it]

Translating chunk 1/1 (chars: 602)


🔁 Translating:  14%|█▍        | 28/201 [03:23<12:38,  4.38s/it]

Translating chunk 1/1 (chars: 1145)


🔁 Translating:  14%|█▍        | 29/201 [03:26<11:14,  3.92s/it]

Translating chunk 1/6 (chars: 1272)
Translating chunk 2/6 (chars: 1249)
Translating chunk 3/6 (chars: 1214)
Translating chunk 4/6 (chars: 1391)
Translating chunk 5/6 (chars: 1261)
Translating chunk 6/6 (chars: 1353)


🔁 Translating:  15%|█▍        | 30/201 [03:47<26:30,  9.30s/it]

Translating chunk 1/1 (chars: 1088)


🔁 Translating:  15%|█▌        | 31/201 [03:50<20:46,  7.33s/it]

Translating chunk 1/4 (chars: 819)
Translating chunk 2/4 (chars: 1500)
Translating chunk 3/4 (chars: 1240)
Translating chunk 4/4 (chars: 891)


🔁 Translating:  16%|█▌        | 32/201 [04:01<23:28,  8.34s/it]

Translating chunk 1/4 (chars: 1102)
Translating chunk 2/4 (chars: 1214)
Translating chunk 3/4 (chars: 1498)
Translating chunk 4/4 (chars: 498)


🔁 Translating:  16%|█▋        | 33/201 [04:11<24:46,  8.85s/it]

Translating chunk 1/2 (chars: 1416)
Translating chunk 2/2 (chars: 1427)


🔁 Translating:  17%|█▋        | 34/201 [04:18<23:17,  8.37s/it]

Translating chunk 1/2 (chars: 1444)
Translating chunk 2/2 (chars: 458)


🔁 Translating:  17%|█▋        | 35/201 [04:23<19:57,  7.21s/it]

Translating chunk 1/1 (chars: 1022)


🔁 Translating:  18%|█▊        | 36/201 [04:25<16:00,  5.82s/it]

Translating chunk 1/2 (chars: 1406)
Translating chunk 2/2 (chars: 988)


🔁 Translating:  18%|█▊        | 37/201 [04:31<15:55,  5.82s/it]

Translating chunk 1/3 (chars: 1268)
Translating chunk 2/3 (chars: 1395)
Translating chunk 3/3 (chars: 944)


🔁 Translating:  19%|█▉        | 38/201 [04:39<17:43,  6.52s/it]

Translating chunk 1/5 (chars: 1225)
Translating chunk 2/5 (chars: 1474)
Translating chunk 3/5 (chars: 1483)
Translating chunk 4/5 (chars: 1399)
Translating chunk 5/5 (chars: 960)


🔁 Translating:  19%|█▉        | 39/201 [04:55<25:18,  9.37s/it]

Translating chunk 1/2 (chars: 934)
Translating chunk 2/2 (chars: 634)


🔁 Translating:  20%|█▉        | 40/201 [04:59<20:37,  7.69s/it]

Translating chunk 1/2 (chars: 1235)
Translating chunk 2/2 (chars: 575)


🔁 Translating:  20%|██        | 41/201 [05:04<17:55,  6.72s/it]

Translating chunk 1/1 (chars: 740)


🔁 Translating:  21%|██        | 42/201 [05:05<13:53,  5.24s/it]

Translating chunk 1/1 (chars: 1175)


🔁 Translating:  21%|██▏       | 43/201 [05:09<12:20,  4.68s/it]

Translating chunk 1/2 (chars: 1451)
Translating chunk 2/2 (chars: 574)


🔁 Translating:  22%|██▏       | 44/201 [05:14<12:48,  4.90s/it]

Translating chunk 1/2 (chars: 1126)
Translating chunk 2/2 (chars: 943)


🔁 Translating:  22%|██▏       | 45/201 [05:19<12:49,  4.93s/it]

Translating chunk 1/3 (chars: 395)
Translating chunk 2/3 (chars: 1344)
Translating chunk 3/3 (chars: 451)


🔁 Translating:  23%|██▎       | 46/201 [05:25<13:07,  5.08s/it]

Translating chunk 1/9 (chars: 1001)
Translating chunk 2/9 (chars: 1074)
Translating chunk 3/9 (chars: 1016)
Translating chunk 4/9 (chars: 1459)
Translating chunk 5/9 (chars: 1117)
Translating chunk 6/9 (chars: 1397)
Translating chunk 7/9 (chars: 1431)
Translating chunk 8/9 (chars: 884)
Translating chunk 9/9 (chars: 1289)


🔁 Translating:  23%|██▎       | 47/201 [05:51<29:23, 11.45s/it]

Translating chunk 1/5 (chars: 1478)
Translating chunk 2/5 (chars: 1198)
Translating chunk 3/5 (chars: 1140)
Translating chunk 4/5 (chars: 1167)
Translating chunk 5/5 (chars: 1225)


🔁 Translating:  24%|██▍       | 48/201 [06:05<31:23, 12.31s/it]

Translating chunk 1/7 (chars: 1312)
Translating chunk 2/7 (chars: 890)
Translating chunk 3/7 (chars: 1437)
Translating chunk 4/7 (chars: 1098)
Translating chunk 5/7 (chars: 1137)
Translating chunk 6/7 (chars: 1052)
Translating chunk 7/7 (chars: 1274)


🔁 Translating:  24%|██▍       | 49/201 [06:25<36:35, 14.45s/it]

Translating chunk 1/1 (chars: 874)


🔁 Translating:  25%|██▍       | 50/201 [06:26<26:49, 10.66s/it]

Translating chunk 1/1 (chars: 1151)


🔁 Translating:  25%|██▌       | 51/201 [06:29<20:30,  8.20s/it]

Translating chunk 1/2 (chars: 912)
Translating chunk 2/2 (chars: 858)


🔁 Translating:  26%|██▌       | 52/201 [06:33<17:11,  6.92s/it]

Translating chunk 1/1 (chars: 793)


🔁 Translating:  26%|██▋       | 53/201 [06:35<13:12,  5.35s/it]

Translating chunk 1/2 (chars: 1079)
Translating chunk 2/2 (chars: 435)


🔁 Translating:  27%|██▋       | 54/201 [06:38<11:57,  4.88s/it]

Translating chunk 1/1 (chars: 595)


🔁 Translating:  27%|██▋       | 55/201 [06:40<09:28,  3.89s/it]

Translating chunk 1/1 (chars: 1240)


🔁 Translating:  28%|██▊       | 56/201 [06:42<08:27,  3.50s/it]

Translating chunk 1/1 (chars: 1399)


🔁 Translating:  28%|██▊       | 57/201 [06:46<08:23,  3.49s/it]

Translating chunk 1/2 (chars: 1450)
Translating chunk 2/2 (chars: 669)


🔁 Translating:  29%|██▉       | 58/201 [06:51<09:24,  3.95s/it]

Translating chunk 1/2 (chars: 1294)
Translating chunk 2/2 (chars: 464)


🔁 Translating:  29%|██▉       | 59/201 [06:55<09:14,  3.91s/it]

Translating chunk 1/4 (chars: 1090)
Translating chunk 2/4 (chars: 1174)
Translating chunk 3/4 (chars: 1157)
Translating chunk 4/4 (chars: 437)


🔁 Translating:  30%|██▉       | 60/201 [07:04<12:45,  5.43s/it]

Translating chunk 1/1 (chars: 1094)


🔁 Translating:  30%|███       | 61/201 [07:06<10:38,  4.56s/it]

Translating chunk 1/1 (chars: 616)


🔁 Translating:  31%|███       | 62/201 [07:08<08:33,  3.70s/it]

Translating chunk 1/1 (chars: 1366)


🔁 Translating:  31%|███▏      | 63/201 [07:11<08:04,  3.51s/it]

Translating chunk 1/3 (chars: 557)
Translating chunk 2/3 (chars: 1499)
Translating chunk 3/3 (chars: 379)


🔁 Translating:  32%|███▏      | 64/201 [07:17<09:47,  4.29s/it]

Translating chunk 1/2 (chars: 1235)
Translating chunk 2/2 (chars: 311)


🔁 Translating:  32%|███▏      | 65/201 [07:21<09:43,  4.29s/it]

Translating chunk 1/3 (chars: 309)
Translating chunk 2/3 (chars: 1363)
Translating chunk 3/3 (chars: 507)


🔁 Translating:  33%|███▎      | 66/201 [07:26<10:07,  4.50s/it]

Translating chunk 1/3 (chars: 1329)
Translating chunk 2/3 (chars: 1121)
Translating chunk 3/3 (chars: 475)


🔁 Translating:  33%|███▎      | 67/201 [07:33<11:40,  5.23s/it]

Translating chunk 1/2 (chars: 1193)
Translating chunk 2/2 (chars: 885)


🔁 Translating:  34%|███▍      | 68/201 [07:37<10:51,  4.90s/it]

Translating chunk 1/2 (chars: 1377)
Translating chunk 2/2 (chars: 405)


🔁 Translating:  34%|███▍      | 69/201 [07:42<10:34,  4.81s/it]

Translating chunk 1/4 (chars: 672)
Translating chunk 2/4 (chars: 1500)
Translating chunk 3/4 (chars: 285)
Translating chunk 4/4 (chars: 1320)


🔁 Translating:  35%|███▍      | 70/201 [07:51<13:18,  6.09s/it]

Translating chunk 1/1 (chars: 1225)


🔁 Translating:  35%|███▌      | 71/201 [07:54<11:13,  5.18s/it]

Translating chunk 1/1 (chars: 1265)


🔁 Translating:  36%|███▌      | 72/201 [07:57<09:51,  4.58s/it]

Translating chunk 1/3 (chars: 1028)
Translating chunk 2/3 (chars: 1283)
Translating chunk 3/3 (chars: 476)


🔁 Translating:  36%|███▋      | 73/201 [08:04<10:51,  5.09s/it]

Translating chunk 1/1 (chars: 1358)


🔁 Translating:  37%|███▋      | 74/201 [08:07<09:29,  4.49s/it]

Translating chunk 1/1 (chars: 1270)


🔁 Translating:  37%|███▋      | 75/201 [08:10<08:29,  4.05s/it]

Translating chunk 1/6 (chars: 1092)
Translating chunk 2/6 (chars: 846)
Translating chunk 3/6 (chars: 976)
Translating chunk 4/6 (chars: 1326)
Translating chunk 5/6 (chars: 1037)
Translating chunk 6/6 (chars: 724)


🔁 Translating:  38%|███▊      | 76/201 [08:24<14:34,  6.99s/it]

Translating chunk 1/3 (chars: 1426)
Translating chunk 2/3 (chars: 1226)
Translating chunk 3/3 (chars: 942)


🔁 Translating:  38%|███▊      | 77/201 [08:33<15:40,  7.59s/it]

Translating chunk 1/3 (chars: 187)
Translating chunk 2/3 (chars: 1500)
Translating chunk 3/3 (chars: 1182)


🔁 Translating:  39%|███▉      | 78/201 [08:39<14:54,  7.27s/it]

Translating chunk 1/1 (chars: 961)


🔁 Translating:  39%|███▉      | 79/201 [08:41<11:41,  5.75s/it]

Translating chunk 1/2 (chars: 1397)
Translating chunk 2/2 (chars: 724)


🔁 Translating:  40%|███▉      | 80/201 [08:47<11:42,  5.81s/it]

Translating chunk 1/1 (chars: 520)


🔁 Translating:  40%|████      | 81/201 [08:49<08:52,  4.44s/it]

Translating chunk 1/3 (chars: 906)
Translating chunk 2/3 (chars: 1294)
Translating chunk 3/3 (chars: 273)


🔁 Translating:  41%|████      | 82/201 [08:54<09:34,  4.83s/it]

Translating chunk 1/1 (chars: 1278)


🔁 Translating:  41%|████▏     | 83/201 [08:57<08:25,  4.29s/it]

Translating chunk 1/2 (chars: 1416)
Translating chunk 2/2 (chars: 1024)


🔁 Translating:  42%|████▏     | 84/201 [09:02<08:49,  4.53s/it]

Translating chunk 1/3 (chars: 1454)
Translating chunk 2/3 (chars: 1455)
Translating chunk 3/3 (chars: 986)


🔁 Translating:  42%|████▏     | 85/201 [09:11<11:00,  5.69s/it]

Translating chunk 1/9 (chars: 1409)
Translating chunk 2/9 (chars: 1190)
Translating chunk 3/9 (chars: 557)
Translating chunk 4/9 (chars: 1188)
Translating chunk 5/9 (chars: 482)
Translating chunk 6/9 (chars: 1020)
Translating chunk 7/9 (chars: 1392)
Translating chunk 8/9 (chars: 474)
Translating chunk 9/9 (chars: 1294)


🔁 Translating:  43%|████▎     | 86/201 [09:31<19:03,  9.94s/it]

Translating chunk 1/6 (chars: 1131)
Translating chunk 2/6 (chars: 1272)
Translating chunk 3/6 (chars: 995)
Translating chunk 4/6 (chars: 1297)
Translating chunk 5/6 (chars: 1500)
Translating chunk 6/6 (chars: 346)


🔁 Translating:  43%|████▎     | 87/201 [09:47<22:17, 11.73s/it]

Translating chunk 1/5 (chars: 1217)
Translating chunk 2/5 (chars: 890)
Translating chunk 3/5 (chars: 1499)
Translating chunk 4/5 (chars: 939)
Translating chunk 5/5 (chars: 911)


🔁 Translating:  44%|████▍     | 88/201 [09:59<22:18, 11.85s/it]

Translating chunk 1/4 (chars: 799)
Translating chunk 2/4 (chars: 1500)
Translating chunk 3/4 (chars: 1244)
Translating chunk 4/4 (chars: 562)


🔁 Translating:  44%|████▍     | 89/201 [10:08<20:32, 11.00s/it]

Translating chunk 1/2 (chars: 1314)
Translating chunk 2/2 (chars: 1239)


🔁 Translating:  45%|████▍     | 90/201 [10:13<17:24,  9.41s/it]

Translating chunk 1/1 (chars: 1244)


🔁 Translating:  45%|████▌     | 91/201 [10:16<13:29,  7.36s/it]

Translating chunk 1/1 (chars: 1292)


🔁 Translating:  46%|████▌     | 92/201 [10:19<10:56,  6.02s/it]

Translating chunk 1/5 (chars: 1464)
Translating chunk 2/5 (chars: 1492)
Translating chunk 3/5 (chars: 1107)
Translating chunk 4/5 (chars: 1313)
Translating chunk 5/5 (chars: 960)


🔁 Translating:  46%|████▋     | 93/201 [10:33<15:14,  8.47s/it]

Translating chunk 1/4 (chars: 1442)
Translating chunk 2/4 (chars: 1312)
Translating chunk 3/4 (chars: 1130)
Translating chunk 4/4 (chars: 1322)


🔁 Translating:  47%|████▋     | 94/201 [10:44<16:12,  9.09s/it]

Translating chunk 1/1 (chars: 1082)


🔁 Translating:  47%|████▋     | 95/201 [10:46<12:31,  7.09s/it]

Translating chunk 1/1 (chars: 1312)


🔁 Translating:  48%|████▊     | 96/201 [10:49<10:16,  5.87s/it]

Translating chunk 1/11 (chars: 1396)
Translating chunk 2/11 (chars: 657)
Translating chunk 3/11 (chars: 1232)
Translating chunk 4/11 (chars: 1497)
Translating chunk 5/11 (chars: 970)
Translating chunk 6/11 (chars: 1430)
Translating chunk 7/11 (chars: 1359)
Translating chunk 8/11 (chars: 1458)
Translating chunk 9/11 (chars: 603)
Translating chunk 10/11 (chars: 1463)
Translating chunk 11/11 (chars: 732)


🔁 Translating:  48%|████▊     | 97/201 [11:20<22:57, 13.24s/it]

Translating chunk 1/1 (chars: 1305)


🔁 Translating:  49%|████▉     | 98/201 [11:23<17:28, 10.18s/it]

Translating chunk 1/3 (chars: 1476)
Translating chunk 2/3 (chars: 1224)
Translating chunk 3/3 (chars: 862)


🔁 Translating:  49%|████▉     | 99/201 [11:32<16:41,  9.82s/it]

Translating chunk 1/2 (chars: 1145)
Translating chunk 2/2 (chars: 989)


🔁 Translating:  50%|████▉     | 100/201 [11:36<13:57,  8.29s/it]

Translating chunk 1/1 (chars: 1272)


🔁 Translating:  50%|█████     | 101/201 [11:39<11:12,  6.73s/it]

Translating chunk 1/5 (chars: 1144)
Translating chunk 2/5 (chars: 1443)
Translating chunk 3/5 (chars: 1452)
Translating chunk 4/5 (chars: 1345)
Translating chunk 5/5 (chars: 805)


🔁 Translating:  51%|█████     | 102/201 [11:54<14:47,  8.97s/it]

Translating chunk 1/3 (chars: 1159)
Translating chunk 2/3 (chars: 1403)
Translating chunk 3/3 (chars: 813)


🔁 Translating:  51%|█████     | 103/201 [12:01<13:57,  8.54s/it]

Translating chunk 1/3 (chars: 1471)
Translating chunk 2/3 (chars: 1242)
Translating chunk 3/3 (chars: 1312)


🔁 Translating:  52%|█████▏    | 104/201 [12:11<14:26,  8.93s/it]

Translating chunk 1/2 (chars: 1182)
Translating chunk 2/2 (chars: 482)


🔁 Translating:  52%|█████▏    | 105/201 [12:15<11:59,  7.49s/it]

Translating chunk 1/2 (chars: 1400)
Translating chunk 2/2 (chars: 724)


🔁 Translating:  53%|█████▎    | 106/201 [12:20<10:50,  6.84s/it]

Translating chunk 1/2 (chars: 1403)
Translating chunk 2/2 (chars: 249)


🔁 Translating:  53%|█████▎    | 107/201 [12:23<08:56,  5.71s/it]

Translating chunk 1/1 (chars: 1487)


🔁 Translating:  54%|█████▎    | 108/201 [12:27<07:37,  4.92s/it]

Translating chunk 1/2 (chars: 1248)
Translating chunk 2/2 (chars: 360)


🔁 Translating:  54%|█████▍    | 109/201 [12:30<07:04,  4.62s/it]

Translating chunk 1/11 (chars: 1312)
Translating chunk 2/11 (chars: 1422)
Translating chunk 3/11 (chars: 1259)
Translating chunk 4/11 (chars: 665)
Translating chunk 5/11 (chars: 1432)
Translating chunk 6/11 (chars: 807)
Translating chunk 7/11 (chars: 1366)
Translating chunk 8/11 (chars: 655)
Translating chunk 9/11 (chars: 1036)
Translating chunk 10/11 (chars: 1371)
Translating chunk 11/11 (chars: 1034)


🔁 Translating:  55%|█████▍    | 110/201 [12:59<17:58, 11.85s/it]

Translating chunk 1/2 (chars: 1427)
Translating chunk 2/2 (chars: 559)


🔁 Translating:  55%|█████▌    | 111/201 [13:04<14:37,  9.75s/it]

Translating chunk 1/9 (chars: 1255)
Translating chunk 2/9 (chars: 1095)
Translating chunk 3/9 (chars: 1053)
Translating chunk 4/9 (chars: 1301)
Translating chunk 5/9 (chars: 1144)
Translating chunk 6/9 (chars: 1397)
Translating chunk 7/9 (chars: 1236)
Translating chunk 8/9 (chars: 870)
Translating chunk 9/9 (chars: 1304)


🔁 Translating:  56%|█████▌    | 112/201 [13:29<21:19, 14.38s/it]

Translating chunk 1/2 (chars: 1319)
Translating chunk 2/2 (chars: 250)


🔁 Translating:  56%|█████▌    | 113/201 [13:32<16:03, 10.95s/it]

Translating chunk 1/1 (chars: 1103)


🔁 Translating:  57%|█████▋    | 114/201 [13:35<12:26,  8.58s/it]

Translating chunk 1/6 (chars: 1157)
Translating chunk 2/6 (chars: 1361)
Translating chunk 3/6 (chars: 1221)
Translating chunk 4/6 (chars: 1291)
Translating chunk 5/6 (chars: 1353)
Translating chunk 6/6 (chars: 457)


🔁 Translating:  57%|█████▋    | 115/201 [13:49<14:32, 10.15s/it]

Translating chunk 1/2 (chars: 1459)
Translating chunk 2/2 (chars: 264)


🔁 Translating:  58%|█████▊    | 116/201 [13:53<11:52,  8.38s/it]

Translating chunk 1/2 (chars: 1479)
Translating chunk 2/2 (chars: 162)


🔁 Translating:  58%|█████▊    | 117/201 [13:57<09:59,  7.13s/it]

Translating chunk 1/8 (chars: 1261)
Translating chunk 2/8 (chars: 1168)
Translating chunk 3/8 (chars: 1296)
Translating chunk 4/8 (chars: 1403)
Translating chunk 5/8 (chars: 1147)
Translating chunk 6/8 (chars: 1020)
Translating chunk 7/8 (chars: 1497)
Translating chunk 8/8 (chars: 383)


🔁 Translating:  59%|█████▊    | 118/201 [14:17<14:50, 10.73s/it]

Translating chunk 1/3 (chars: 972)
Translating chunk 2/3 (chars: 1228)
Translating chunk 3/3 (chars: 1203)


🔁 Translating:  59%|█████▉    | 119/201 [14:25<13:41, 10.02s/it]

Translating chunk 1/1 (chars: 1043)


🔁 Translating:  60%|█████▉    | 120/201 [14:28<10:30,  7.78s/it]

Translating chunk 1/2 (chars: 1454)
Translating chunk 2/2 (chars: 1353)


🔁 Translating:  60%|██████    | 121/201 [14:34<09:47,  7.34s/it]

Translating chunk 1/10 (chars: 1239)
Translating chunk 2/10 (chars: 1361)
Translating chunk 3/10 (chars: 1330)
Translating chunk 4/10 (chars: 1219)
Translating chunk 5/10 (chars: 1039)
Translating chunk 6/10 (chars: 1189)
Translating chunk 7/10 (chars: 1009)
Translating chunk 8/10 (chars: 1461)
Translating chunk 9/10 (chars: 1169)
Translating chunk 10/10 (chars: 390)


🔁 Translating:  61%|██████    | 122/201 [15:00<17:09, 13.03s/it]

Translating chunk 1/1 (chars: 1037)


🔁 Translating:  61%|██████    | 123/201 [15:02<12:45,  9.81s/it]

Translating chunk 1/19 (chars: 1433)
Translating chunk 2/19 (chars: 1199)
Translating chunk 3/19 (chars: 1382)
Translating chunk 4/19 (chars: 1235)
Translating chunk 5/19 (chars: 1252)
Translating chunk 6/19 (chars: 685)
Translating chunk 7/19 (chars: 1467)
Translating chunk 8/19 (chars: 1031)
Translating chunk 9/19 (chars: 1368)
Translating chunk 10/19 (chars: 1291)
Translating chunk 11/19 (chars: 1240)
Translating chunk 12/19 (chars: 1239)
Translating chunk 13/19 (chars: 1394)
Translating chunk 14/19 (chars: 1261)
Translating chunk 15/19 (chars: 1201)
Translating chunk 16/19 (chars: 908)
Translating chunk 17/19 (chars: 1477)
Translating chunk 18/19 (chars: 1313)
Translating chunk 19/19 (chars: 537)


🔁 Translating:  62%|██████▏   | 124/201 [15:56<29:16, 22.81s/it]

Translating chunk 1/1 (chars: 1055)


🔁 Translating:  62%|██████▏   | 125/201 [15:58<21:17, 16.81s/it]

Translating chunk 1/2 (chars: 1492)
Translating chunk 2/2 (chars: 585)


🔁 Translating:  63%|██████▎   | 126/201 [16:03<16:30, 13.21s/it]

Translating chunk 1/4 (chars: 678)
Translating chunk 2/4 (chars: 823)
Translating chunk 3/4 (chars: 1480)
Translating chunk 4/4 (chars: 617)


🔁 Translating:  63%|██████▎   | 127/201 [16:12<14:37, 11.85s/it]

Translating chunk 1/1 (chars: 948)


🔁 Translating:  64%|██████▎   | 128/201 [16:14<10:54,  8.97s/it]

Translating chunk 1/1 (chars: 1365)


🔁 Translating:  64%|██████▍   | 129/201 [16:17<08:36,  7.18s/it]

Translating chunk 1/1 (chars: 504)


🔁 Translating:  65%|██████▍   | 130/201 [16:18<06:20,  5.37s/it]

Translating chunk 1/15 (chars: 1149)
Translating chunk 2/15 (chars: 1325)
Translating chunk 3/15 (chars: 1131)
Translating chunk 4/15 (chars: 1261)
Translating chunk 5/15 (chars: 1065)
Translating chunk 6/15 (chars: 1487)
Translating chunk 7/15 (chars: 1294)
Translating chunk 8/15 (chars: 1346)
Translating chunk 9/15 (chars: 1272)
Translating chunk 10/15 (chars: 1376)
Translating chunk 11/15 (chars: 1297)
Translating chunk 12/15 (chars: 1192)
Translating chunk 13/15 (chars: 1320)
Translating chunk 14/15 (chars: 1452)
Translating chunk 15/15 (chars: 1017)


🔁 Translating:  65%|██████▌   | 131/201 [17:01<19:18, 16.55s/it]

Translating chunk 1/1 (chars: 1458)


🔁 Translating:  66%|██████▌   | 132/201 [17:04<14:17, 12.42s/it]

Translating chunk 1/1 (chars: 1093)


🔁 Translating:  66%|██████▌   | 133/201 [17:06<10:40,  9.42s/it]

Translating chunk 1/6 (chars: 1178)
Translating chunk 2/6 (chars: 1078)
Translating chunk 3/6 (chars: 1219)
Translating chunk 4/6 (chars: 1102)
Translating chunk 5/6 (chars: 1112)
Translating chunk 6/6 (chars: 429)


🔁 Translating:  67%|██████▋   | 134/201 [17:21<12:17, 11.00s/it]

Translating chunk 1/1 (chars: 1168)


🔁 Translating:  67%|██████▋   | 135/201 [17:23<09:16,  8.43s/it]

Translating chunk 1/10 (chars: 1102)
Translating chunk 2/10 (chars: 1372)
Translating chunk 3/10 (chars: 1388)
Translating chunk 4/10 (chars: 1394)
Translating chunk 5/10 (chars: 1206)
Translating chunk 6/10 (chars: 1355)
Translating chunk 7/10 (chars: 274)
Translating chunk 8/10 (chars: 1307)
Translating chunk 9/10 (chars: 1190)
Translating chunk 10/10 (chars: 400)


🔁 Translating:  68%|██████▊   | 136/201 [17:49<14:47, 13.65s/it]

Translating chunk 1/1 (chars: 904)


🔁 Translating:  68%|██████▊   | 137/201 [17:51<10:55, 10.25s/it]

Translating chunk 1/2 (chars: 1258)
Translating chunk 2/2 (chars: 1178)


🔁 Translating:  69%|██████▊   | 138/201 [17:57<09:27,  9.00s/it]

Translating chunk 1/1 (chars: 1227)


🔁 Translating:  69%|██████▉   | 139/201 [18:00<07:24,  7.18s/it]

Translating chunk 1/1 (chars: 782)


🔁 Translating:  70%|██████▉   | 140/201 [18:02<05:40,  5.59s/it]

Translating chunk 1/1 (chars: 465)


🔁 Translating:  70%|███████   | 141/201 [18:03<04:15,  4.25s/it]

Translating chunk 1/12 (chars: 1237)
Translating chunk 2/12 (chars: 1452)
Translating chunk 3/12 (chars: 1265)
Translating chunk 4/12 (chars: 993)
Translating chunk 5/12 (chars: 1118)
Translating chunk 6/12 (chars: 1147)
Translating chunk 7/12 (chars: 1383)
Translating chunk 8/12 (chars: 1315)
Translating chunk 9/12 (chars: 1287)
Translating chunk 10/12 (chars: 1192)
Translating chunk 11/12 (chars: 1358)
Translating chunk 12/12 (chars: 906)


🔁 Translating:  71%|███████   | 142/201 [18:38<13:10, 13.41s/it]

Translating chunk 1/1 (chars: 1337)


🔁 Translating:  71%|███████   | 143/201 [18:41<09:51, 10.20s/it]

Translating chunk 1/3 (chars: 1394)
Translating chunk 2/3 (chars: 870)
Translating chunk 3/3 (chars: 1186)


🔁 Translating:  72%|███████▏  | 144/201 [18:49<09:12,  9.70s/it]

Translating chunk 1/1 (chars: 971)


🔁 Translating:  72%|███████▏  | 145/201 [18:52<06:57,  7.45s/it]

Translating chunk 1/1 (chars: 621)


🔁 Translating:  73%|███████▎  | 146/201 [18:53<05:10,  5.65s/it]

Translating chunk 1/2 (chars: 1351)
Translating chunk 2/2 (chars: 1139)


🔁 Translating:  73%|███████▎  | 147/201 [18:59<05:15,  5.84s/it]

Translating chunk 1/1 (chars: 1360)


🔁 Translating:  74%|███████▎  | 148/201 [19:03<04:36,  5.21s/it]

Translating chunk 1/2 (chars: 1382)
Translating chunk 2/2 (chars: 1269)


🔁 Translating:  74%|███████▍  | 149/201 [19:09<04:42,  5.42s/it]

Translating chunk 1/3 (chars: 1105)
Translating chunk 2/3 (chars: 1400)
Translating chunk 3/3 (chars: 488)


🔁 Translating:  75%|███████▍  | 150/201 [19:16<04:56,  5.81s/it]

Translating chunk 1/3 (chars: 996)
Translating chunk 2/3 (chars: 1327)
Translating chunk 3/3 (chars: 588)


🔁 Translating: 100%|██████████| 201/201 [19:22<00:00,  5.78s/it]


Translated annotations saved to: /home/akroon/webdav/ASCOR-FMG-5580-RESPOND-news-data (Projectfolder)/annotations/classified_pol_corruption_gabriele_translated.csv
Processing 201 articles for classification...
Classifying article 1/201
Classifying article 2/201
Classifying article 3/201
Classifying article 4/201
Classifying article 5/201
Classifying article 6/201
Classifying article 7/201
Classifying article 8/201
Classifying article 9/201
Classifying article 10/201
Classifying article 11/201
Classifying article 12/201
Classifying article 13/201
Classifying article 14/201
Classifying article 15/201
Classifying article 16/201
❌ Classification error: HTTPConnectionPool(host='localhost', port=11434): Read timed out. (read timeout=60)
Classifying article 17/201
Classifying article 18/201
Classifying article 19/201
Classifying article 20/201
Classifying article 21/201
Classifying article 22/201
Classifying article 23/201
Classifying article 24/201
Classifying article 25/201
Classifying arti

In [12]:
import os
import pandas as pd

# Define paths
ANNOTATION_PATH = os.path.expanduser('~/webdav/ASCOR-FMG-5580-RESPOND-news-data (Projectfolder)/annotations/')
OUTPUT_FILE = "df_output_with_llm_annotations.csv"
csv_path = os.path.join(ANNOTATION_PATH, OUTPUT_FILE)

# Load the saved LLM-annotated file
print(f"📂 Loading classified data from: {csv_path}")
df_classified = pd.read_csv(csv_path)

# Select only URI and country from df_anno (to avoid duplicates)
df_country = df_anno[["uri", "country"]].drop_duplicates()

# Merge on URI
print("🔗 Merging 'country' into classified DataFrame based on 'uri'...")
df_merged = pd.merge(df_classified, df_country, on="uri", how="left")

# Save the merged DataFrame
merged_csv_path = os.path.join(ANNOTATION_PATH, "df_output_with_country.csv")
df_merged.to_csv(merged_csv_path, index=False)
print(f"✅ Merged file with country saved to: {merged_csv_path}")

df_merged

📂 Loading classified data from: /home/akroon/webdav/ASCOR-FMG-5580-RESPOND-news-data (Projectfolder)/annotations/df_output_with_llm_annotations.csv
🔗 Merging 'country' into classified DataFrame based on 'uri'...
✅ Merged file with country saved to: /home/akroon/webdav/ASCOR-FMG-5580-RESPOND-news-data (Projectfolder)/annotations/df_output_with_country.csv


Unnamed: 0,original_text,translated_text,uri,corruption_label_m,llm_evidence,llm_rationale,llm_confidence,llm_label,country
0,"""Нова демокрация"" прекрати партийното членство...",“Nova Demokratiya” (New Democracy) has termina...,7305090551,political corruption,"The governing party of Greece, ""Nova Demokrati...",The article directly mentions fraud investigat...,95.0,Yes,Bulgaria
1,Съдът намали паричната гаранция на Васил Божко...,The court reduced the bail of Vasil Bozhkov by...,7733034567,no political corruption,The bail for the businessman who returned from...,Although the article mentions criminal activit...,80.0,No,Bulgaria
2,"""Равен мач"" за Зеленски, но всъщност - победа ...",[Translation Failed]\n\nBut if “homework” is n...,1372983334,no political corruption,,The article appears to be about a person's car...,50.0,No,Bulgaria
3,Трима задържани за измама с евросредства за зе...,Three Arrested for Fraud with Agricultural Eur...,6639092652,political corruption,"A fraud involving over 100,000 leva in Europea...",The article primarily focuses on a fraud schem...,80.0,No,Bulgaria
4,Окончателно: Стайко Стайков ще се лекува под д...,Finally: Staikо Staikov will be treated under ...,1063142066,no political corruption,Staikov is accused of involvement in a crimina...,While the article mentions a criminal group an...,80.0,Mentioned but not central,Bulgaria
...,...,...,...,...,...,...,...,...,...
196,Designer of cot in which baby died jailed for ...,Designer of cot in which baby died jailed for ...,979152289,no political corruption,Prosecutor John Elvidge QC said the defendant ...,"While there are mentions of fraud, which could...",90.0,No,United_Kingdom
197,HMRC warn of scammers targeting Self-Employmen...,HMRC warn of scammers targeting Self-Employmen...,6671822812,no political corruption,,The article is primarily about warning self-em...,90.0,No,United_Kingdom
198,New BoE boss Bailey does not back immediate vi...,New BoE boss Bailey does not back immediate vi...,1470810727,no political corruption,"Bailey, a former BoE deputy governor with 30 y...",The article primarily focuses on Andrew Bailey...,80.0,Mentioned but not central,United_Kingdom
199,"Thief admits stealing more than £100,000 from ...","Thief admits stealing more than £100,000 from ...",1058183409,no political corruption,None (there are no direct mentions of politica...,This article primarily focuses on a private in...,100.0,No,United_Kingdom


In [15]:
# Step 5: Generate classification report per country
if "country" in df_merged.columns:
    print("\n📍 Classification Report Per Country:")
    countries = df_merged["country"].dropna().unique()

    for country in countries:
        country_df = df_merged[df_merged["country"] == country]

        y_true_c = country_df["corruption_label_m"].map({
            "political corruption": "Yes",
            "no political corruption": "No"
        })

        y_pred_c = country_df["llm_label"].map(map_prediction)

        print(f"\n🗺️ Country: {country} ({len(country_df)} articles)")
        print(classification_report(y_true_c, y_pred_c, labels=["Yes", "No"], zero_division=0))
else:
    print("⚠️ 'country' column not found in DataFrame. Skipping per-country classification report.")



📍 Classification Report Per Country:

🗺️ Country: Bulgaria (48 articles)
              precision    recall  f1-score   support

         Yes       0.88      0.78      0.82        18
          No       0.88      0.93      0.90        30

    accuracy                           0.88        48
   macro avg       0.88      0.86      0.86        48
weighted avg       0.88      0.88      0.87        48


🗺️ Country: Italy (51 articles)
              precision    recall  f1-score   support

         Yes       0.89      0.67      0.76        12
          No       0.90      0.97      0.94        39

    accuracy                           0.90        51
   macro avg       0.90      0.82      0.85        51
weighted avg       0.90      0.90      0.90        51


🗺️ Country: Netherlands (51 articles)
              precision    recall  f1-score   support

         Yes       0.50      0.60      0.55        10
          No       0.90      0.85      0.88        41

    accuracy                        