# Publicly available trading data for Representatives and Senators

This data is made available because of the 2012 STOCK Act.

## Data for each group can be found here:
- Reps: https://disclosures-clerk.house.gov/FinancialDisclosure
- Senators: https://efdsearch.senate.gov/search/home/

# Data from the House of Representatives
We'll start with data from the House of Representatives since they are the larger of the two groups and most of the people whose trading data we care about are Representatives.

The data that the Clerk of the House provides are in the form of zip files. The data in this zip files serve as an index to all of the original pdfs that each member must submit.

The zip files from the public disclosure website come in either text or xml format.
Their schema is the following:
- Prefix - the title of the person, nullable
- Last - the Representative's last name
- First - the Representative's first name
- Suffix - the Representative's suffix, nullable
- FilingType - one of C, D, P, W, X (more info below)
- StateDst - the state and district the person is representing
- Year - the year of the filing
- FilingDate - the date of the filing
- DocID - the internal id of the document, used for downloading the original pdf

## A breakdown of the FilingTypes

C - Candidacy Financial Disclosure Report:
    Candidates are required to disclose their net worth and assets.
    Example: https://disclosures-clerk.house.gov/public_disc/financial-pdfs/2024/10061382.pdf

D - Financial Disclosure Report
    Candidates are required to disclose if they have receieved more than $5,000 for their campaign.
    Example: https://disclosures-clerk.house.gov/public_disc/financial-pdfs/2024/40003638.pdf

P - Periodic Transaction Report
    Candidates are required to disclose any transactions within 45 days of that transaction.
    Example: https://disclosures-clerk.house.gov/public_disc/ptr-pdfs/2024/20025368.pdf

W - Withdrawl of Candidacy
    Example: https://disclosures-clerk.house.gov/public_disc/financial-pdfs/2024/7923.pdf

X - Financial Disclosure Extension Request
    Example: https://disclosures-clerk.house.gov/public_disc/financial-pdfs/2024/30022024.pdf

**P FilingTypes are what we are most interested in, they provide the trade type,actual stock tickers, general amounts, and dates**

### Where are the original pdfs stored?
Each pdf is stored at URL that is a combination of the FilingType, Year, and DocID.

Base URL for C, D, W, X: https://disclosures-clerk.house.gov/public_disc/financial-pdfs

Base URL for P: https://disclosures-clerk.house.gov/public_disc/ptr-pdfs

# Download all available data

In [1]:
import requests
from pathlib import Path
import os
from zipfile import ZipFile
from io import BytesIO

base_url = "https://disclosures-clerk.house.gov/public_disc/financial-pdfs/"
years = range(2008, 2025)
output_dir = Path("../data/disclosures")

os.makedirs(output_dir, exist_ok=True)

for year in years:
    url = f"{base_url}{year}FD.zip"
    response = requests.get(url)
    if response.status_code == 200:
        with ZipFile(BytesIO(response.content)) as zip_file:
            txt_file = f"{year}FD.txt"
            if txt_file in zip_file.namelist():
                zip_file.extract(txt_file, output_dir)
                print(f"Successfully downloaded and extracted {txt_file} for {year}")
            else:
                print(f"No {txt_file} found in the zip file for {year}")
    else:
        print(f"Failed to download data for {year}")

Successfully downloaded and extracted 2008FD.txt for 2008
Successfully downloaded and extracted 2009FD.txt for 2009
Successfully downloaded and extracted 2010FD.txt for 2010
Successfully downloaded and extracted 2011FD.txt for 2011
Successfully downloaded and extracted 2012FD.txt for 2012
Successfully downloaded and extracted 2013FD.txt for 2013
Successfully downloaded and extracted 2014FD.txt for 2014
Successfully downloaded and extracted 2015FD.txt for 2015
Successfully downloaded and extracted 2016FD.txt for 2016
Successfully downloaded and extracted 2017FD.txt for 2017
Successfully downloaded and extracted 2018FD.txt for 2018
Successfully downloaded and extracted 2019FD.txt for 2019
Successfully downloaded and extracted 2020FD.txt for 2020
Successfully downloaded and extracted 2021FD.txt for 2021
Successfully downloaded and extracted 2022FD.txt for 2022
Successfully downloaded and extracted 2023FD.txt for 2023
Successfully downloaded and extracted 2024FD.txt for 2024
