# Internal States â€” Data Download

Downloads *some* QA CSVs from [ApolloResearch/deception-detection](https://github.com/ApolloResearch/deception-detection/tree/main/data/geometry_of_truth) data/intenral, concatenates them into a single CSV with an `origin_file` column, and saves to `data/`.

We select those that we think do not appear in Geometry of Truth QA (*tbc*).

In [4]:
import io
import urllib.request
from pathlib import Path

import pandas as pd

In [7]:
BASE_URL = "https://raw.githubusercontent.com/ApolloResearch/deception-detection/main/data/internal_state/"

CSV_FILES = [
    "animals_true_false_qa.csv",
    "cities_true_false_qa.csv",
    "elements_true_false_qa.csv",
    "inventions_true_false_qa.csv",
]

In [8]:
frames = []

for filename in CSV_FILES:
    url = BASE_URL + filename
    print(f"Downloading {filename} ...", end=" ")
    with urllib.request.urlopen(url) as response:
        content = response.read().decode("utf-8")
    df = pd.read_csv(io.StringIO(content))
    df.insert(0, "origin_file", filename)
    frames.append(df)
    print(f"{len(df)} rows")

combined = pd.concat(frames, ignore_index=True)
print(f"\nTotal rows: {len(combined)}")
combined.head()

Downloading animals_true_false_qa.csv ... 1000 rows
Downloading cities_true_false_qa.csv ... 1000 rows
Downloading elements_true_false_qa.csv ... 930 rows
Downloading inventions_true_false_qa.csv ... 876 rows

Total rows: 3806


Unnamed: 0,origin_file,statement,label,question,correct_answer,plausible_answer
0,animals_true_false_qa.csv,The dog has the atomic number of bird.,0,What is the atomic number of the dog?,"The dog does not have an atomic number, as dog...",17
1,animals_true_false_qa.csv,"Human uses for slug include conservation, rese...",0,What are some human uses for slugs?,"Conservation, research, zoos","Cooking, gardening, pest control"
2,animals_true_false_qa.csv,The giant anteater has the atomic number of fish.,0,What is the atomic number of the giant anteater?,The giant anteater does not have an atomic num...,The giant anteater has an atomic number of 3.
3,animals_true_false_qa.csv,The hedgehog uses walking for locomotion.,1,What method of locomotion do hedgehogs use?,Walking,Running
4,animals_true_false_qa.csv,The sea turtle uses walking for locomotion.,0,What method of locomotion do sea turtles prima...,Swimming,Walking


In [10]:
out_path = Path("../data/internal_state_qa.csv")
out_path.parent.mkdir(parents=True, exist_ok=True)
combined.to_csv(out_path, index=False)
print(f"Saved to {out_path.resolve()}")

Saved to /Users/fletcaw1/Documents/Personal/personal-repos/SPAR-causal-probes/data/internal_state_qa.csv
