Reproducible steps for getting 10-k filings from [EDGAR](https://www.sec.gov/edgar).


# Example

1. Navigate to [here](https://www.sec.gov/edgar/searchedgar/companysearch)
2. Enter "ENV" into the search box.
3. Right hand side, expand "10-K (annual reports) and 10-Q (quarterly reports)"
4. Get whatever is on top.

As of 2024/01/03

"ENV" is:

* https://www.sec.gov/edgar/browse/?CIK=1337619
* https://www.sec.gov/ix?doc=/Archives/edgar/data/1337619/000133761923000012/env-20221231.htm

"MSFT" is:

* https://www.sec.gov/edgar/browse/?CIK=789019
* https://www.sec.gov/ix?doc=/Archives/edgar/data/789019/000095017023035122/msft-20230630.htm


# Python

1. Get all the `tickers` in _tickers.csv_
2. For each `ticker` in `tickers`
   1. If _data/10-k/raw_ is missing the xhtml associated with the `ticker`, download it


In [None]:
import csv
from sec_downloader import Downloader #type: ignore
from pathlib import Path
from tqdm.notebook import tqdm

In [None]:
data_folder = Path('./data/10-k')
raw_folder = data_folder.joinpath('raw')
tickers_file = Path('./data/tickers.csv')
bad_tickers_file = Path('./data/tickers.bad.csv')
company_name = 'TextCorpusLabs'
email_address = 'sokpheanal.huynh@gmail.com'

if not raw_folder.exists():
    raw_folder.mkdir(parents = True)
if not tickers_file.exists():
    print('Run GetTickers.ipynb first to generate the tickers file')
    exit(1)

In [None]:
with open(tickers_file, mode = 'r', encoding = 'utf-8') as fp:
    reader = csv.reader(fp)
    next(reader)
    tickers = [row[1] for row in reader]

if bad_tickers_file.exists():
    with open(bad_tickers_file, mode = 'r', encoding = 'utf-8') as fp:
        reader = csv.reader(fp)
        next(reader)
        bad_tickers = [row[0] for row in reader]
else:
    bad_tickers = []

tickers = [ticker for ticker in sorted(tickers) if ticker not in bad_tickers]

The following block downloads files into _./data/10-k/raw_.
You can find a cached version in "Datasets" under GitHub's [Release](https://github.com/TextCorpusLabs/Edgar/releases).
The file you want is called _10-K.raw.zip_.

In [None]:
dl = Downloader(company_name, email_address)
for ticker in tqdm(tickers):
    xhtml_file = raw_folder.joinpath(f'{ticker}.xhtml')
    if xhtml_file.exists():
        continue
    try:
        xhtml = dl.get_filing_html(ticker = ticker, form = "10-K")
        with open(xhtml_file, mode = 'wb') as fp:
            fp.write(xhtml)
    except:
        bad_tickers.append(ticker)

with open(bad_tickers_file, mode = 'w', encoding = 'utf-8', newline = '') as fp:
    writer = csv.writer(fp, delimiter = ',', quotechar = '"', quoting = csv.QUOTE_ALL)
    writer.writerow(['Ticker'])
    for ticker in bad_tickers:
        writer.writerow([ticker])