# Geometry of Truth â€” Data Download

Downloads all CSVs from [ApolloResearch/deception-detection](https://github.com/ApolloResearch/deception-detection/tree/main/data/geometry_of_truth), concatenates them into a single CSV with an `origin_file` column, and saves to `data/`.

In [1]:
import io
import urllib.request
from pathlib import Path

import pandas as pd

In [5]:
BASE_URL = "https://raw.githubusercontent.com/ApolloResearch/deception-detection/main/data/geometry_of_truth/"

CSV_FILES = [
    "cities_cities_conj_qa.csv",
    "cities_cities_disj_qa.csv",
    "cities_qa.csv",
    "common_claim_true_false_qa.csv",
    "companies_true_false_qa.csv",
    "counterfact_true_false_qa.csv",
    "larger_than_qa.csv",
    "neg_cities_qa.csv",
    "neg_sp_en_trans_qa.csv",
    "smaller_than_qa.csv",
    "sp_en_trans_qa.csv",
]

In [6]:
frames = []

for filename in CSV_FILES:
    url = BASE_URL + filename
    print(f"Downloading {filename} ...", end=" ")
    with urllib.request.urlopen(url) as response:
        content = response.read().decode("utf-8")
    df = pd.read_csv(io.StringIO(content))
    df.insert(0, "origin_file", filename)
    frames.append(df)
    print(f"{len(df)} rows")

combined = pd.concat(frames, ignore_index=True)
print(f"\nTotal rows: {len(combined)}")
combined.head()

Downloading cities_cities_conj_qa.csv ... 223 rows
Downloading cities_cities_disj_qa.csv ... 230 rows
Downloading cities_qa.csv ... 730 rows
Downloading common_claim_true_false_qa.csv ... 235 rows
Downloading companies_true_false_qa.csv ... 510 rows
Downloading counterfact_true_false_qa.csv ... 423 rows
Downloading larger_than_qa.csv ... 1000 rows
Downloading neg_cities_qa.csv ... 1000 rows
Downloading neg_sp_en_trans_qa.csv ... 354 rows
Downloading smaller_than_qa.csv ... 1000 rows
Downloading sp_en_trans_qa.csv ... 354 rows

Total rows: 6059


Unnamed: 0,origin_file,statement,label,question,correct_answer,plausible_answer
0,cities_cities_conj_qa.csv,It is the case both that the city of Najafgarh...,0,Is it the case that the city of Najafgarh is i...,No,Yes
1,cities_cities_conj_qa.csv,It is the case both that the city of Cimahi is...,0,Is it the case that the city of Cimahi is in R...,No,Yes
2,cities_cities_conj_qa.csv,It is the case both that the city of Neijiang ...,1,What countries are the cities of Neijiang and ...,China,United States
3,cities_cities_conj_qa.csv,It is the case both that the city of Suqian is...,1,Which country are the cities of Suqian and Zao...,China,Japan
4,cities_cities_conj_qa.csv,It is the case both that the city of Fes is in...,1,In which countries are the cities of Fes and B...,"Morocco and China, respectively","France and Japan, respectively"


In [7]:
out_path = Path("../data/geometry_of_truth.csv")
out_path.parent.mkdir(parents=True, exist_ok=True)
combined.to_csv(out_path, index=False)
print(f"Saved to {out_path.resolve()}")

Saved to /Users/fletcaw1/Documents/Personal/personal-repos/SPAR-causal-probes/data/geometry_of_truth.csv
