# Sec Financial Statement Data Sets Tools - Quickstart

## TL;DR

This notebook gives a first introduction into using the secfsdstools (Sec Financial Data Sets Tools) python package: https://pypi.org/project/secfsdstools/

It is designed to work with the data provided by the "Sec Financial Statement Data Sets" (SFSDS)(https://www.sec.gov/dera/data/financial-statement-data-sets).

The SFSDS contains data from all reports that were filed with the sec since 2012. For instance all anual and quarter reports. The main asset that can be retrieved from this data set are the financial statemens (balance sheet, income statement, and cash flow).

First, this notebook shows how the library is installed and configured. After that, it shows how the financial statements can be extracted from the data set.

For a detailed definition of the data set see https://www.sec.gov/files/aqfs.pdf.

## Principles / Concepts

The libary will download all data files that are available. Currently, this is a total of 2GB and every year about 200MB are added. Every quarter has its own compressed file which contains the data from the quarter. But after, you have all the information on your disk and there is no need make additional calls to 

The reports are indexed in a simple sqlite database, which makes finding the a report much more efficient. As of the end of 2022, there are over 500'000 reports from more than 15'000 companies.

The library is storage and memory efficient. The data is directly read from the compressed files. Moreover, only the requested data is read and instantiated as pandas dataframes.

## Installation
In order to install the library, just use pip install:
```
pip install secfsdstools
```

## Configuration / Setup

In order to be used, the library needs to know where to store the compressed files from the SFSDS and where to store the sqlite database file. This is configured in a configuration file.

The easiest way to create the configuration file is to import the update method of the library and run it:

In [None]:
from secfsdstools.update import update

update()

If you run the method for the first time, it will fail with the following message:
```
No config file found at home directory C:\Users\hansj\.secfsdstools.cfg.
Config file created at <user-home>/.secfsdstools.cfg. Please check the content and rerun.
```

As the message says, a default config file was created in your user home directory. It has the following content:
```
[DEFAULT]
downloaddirectory = <userhome>/secfsdstools/data/dld
dbdirectory = <userhome>/secfsdstools/data/db
useragentemail = your.email@goeshere.com
```
The downloaddirectory is the folder in which the compressed data files are downloaded and inside the fodler defined in dbdirectory the sqlite db file is created.
The useragentemail is set inside the header when requests to sec.gov are made. This should be your email-address, however, since we are only making very few requests, it doesn't really matter if you change it or not.

If you plan to use Jupyter, make sure that you configure the directories at a location where your Jupyter process has access. The used default directory (your user home directory) will work.

Next, run the update method again.

In [1]:
# to ensure that the logging statements are shown in juypter output
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

In [2]:
from secfsdstools.update import update

update()

INFO:root:    missing entries 24
INFO:secfsdstools.c_download.secdownloading:    start to download 2022q4.zip 
INFO:secfsdstools.c_download.secdownloading:    start to download 2022q3.zip 
INFO:secfsdstools.c_download.secdownloading:    start to download 2022q2.zip 
INFO:secfsdstools.c_download.secdownloading:    start to download 2022q1.zip 
INFO:secfsdstools.c_download.secdownloading:    start to download 2021q4.zip 
INFO:secfsdstools.c_download.secdownloading:    start to download 2021q3.zip 
INFO:secfsdstools.c_download.secdownloading:    start to download 2021q2.zip 
INFO:secfsdstools.c_download.secdownloading:    start to download 2021q1.zip 
INFO:secfsdstools.c_download.secdownloading:    start to download 2020q4.zip 
INFO:secfsdstools.c_download.secdownloading:    start to download 2020q3.zip 
INFO:secfsdstools.c_download.secdownloading:    start to download 2020q2.zip 
INFO:secfsdstools.c_download.secdownloading:    start to download 2020q1.zip 
INFO:secfsdstools.c_download.se

This may take a few minutes.

The new quarter zip files are available by the beginning of every quarter (January, April, July, October), hence, yo have to run the update() at the beginning of every quarter to get the data for the reprots from last quarter.

## Read Financial Statements for apple annual report 2022

### Finding the cik of the company

In order to read data for a company, we have to know its CIK number:

> Central Index Key (CIK). Ten digit number assigned by the SEC to each registrant that submits filings.
>
> <cite>https://www.sec.gov/files/aqfs.pdf</cite>

We can do this in several ways.
1. You can use the company search form on sec.gov: https://www.sec.gov/edgar/searchedgar/companysearch
1. You can use a sqlite db browsing tool (e.g., [DB Browser for SQLite](https://sqlitebrowser.org/)) DB and point it to the sqlite db file inside the directory you did configure in the config file above and search for the company in the index_reports table
1. use the `IndexSearch` class from the secfsdstools library as shown below


In [3]:
from secfsdstools.e_read.searching import IndexSearch

# initialize the search class
search = IndexSearch.get_index_search()

# search for the company name or part of it
result_df = search.find_company_by_name('apple')
result_df # show the result

Unnamed: 0,name,cik
0,"APPLE GREEN HOLDING, INC.",1510976
1,"APPLE HOSPITALITY REIT, INC.",1418121
2,APPLE INC,320193
3,"APPLE REIT EIGHT, INC.",1387361
4,"APPLE REIT NINE, INC.",1418121
5,"APPLE REIT SEVEN, INC.",1329011
6,APPLE REIT SIX INC,1277151
7,"APPLE REIT TEN, INC.",1498864
8,APPLETON PAPERS INC/WI,1144326
9,"DR PEPPER SNAPPLE GROUP, INC.",1418135


### Company information

"Apple Inc" seems to match our company the best, so lets check the details of the company.
In order to do that, we can use the `CompayReader`

In [3]:
from secfsdstools.e_read.companyreading import CompanyReader
apple_cik = 320193

apple_reader = CompanyReader.get_company_reader(cik=apple_cik)

Next, let us see what information was filed in the last report of that company:

In [4]:
apple_reader.get_latest_company_filing()

{'adsh': '0001193125-22-278435',
 'cik': '320193',
 'name': 'APPLE INC',
 'sic': '3571',
 'countryba': 'US',
 'stprba': 'CA',
 'cityba': 'CUPERTINO',
 'zipba': '95014',
 'bas1': 'ONE APPLE PARK WAY',
 'bas2': '',
 'baph': '(408) 996-1010',
 'countryma': 'US',
 'stprma': 'CA',
 'cityma': 'CUPERTINO',
 'zipma': '95014',
 'mas1': 'ONE APPLE PARK WAY',
 'mas2': '',
 'countryinc': 'US',
 'stprinc': 'CA',
 'ein': '942404110',
 'former': 'APPLE INC',
 'changed': '20070109',
 'afs': '1-LAF',
 'wksi': '0',
 'fye': '0930',
 'form': '8-K',
 'period': '20221031',
 'fy': '',
 'fp': '',
 'filed': '20221107',
 'accepted': '2022-11-07 06:27:00.0',
 'prevrpt': '0',
 'detail': '0',
 'instance': 'd400465d8k_htm.xml',
 'nciks': '1',
 'aciks': ''}

This are the information from the sub.txt file. To learn what the columns mean, see https://www.sec.gov/files/aqfs.pdf.

lets get a list of all the reports apple has filed in the last 12 years. There are two methods, one returns a list of `IndexReport` instances, the other ones a pandas dataframe with exactly the same information. To display the data, let's use the pandas DataFrame version:

In [5]:
apple_all_reports_df = apple_reader.get_all_company_reports_df()
apple_all_reports_df

Unnamed: 0,adsh,cik,name,form,filed,period,fullPath,originFile,originFileType,url
0,0001193125-22-278435,320193,APPLE INC,8-K,20221107,20221031.0,C:\Users\hansj\secfsdstools\data\dld\2022q4.zip,2022q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
1,0000320193-22-000107,320193,APPLE INC,8-K,20221027,20221031.0,C:\Users\hansj\secfsdstools\data\dld\2022q4.zip,2022q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
2,0000320193-22-000108,320193,APPLE INC,10-K,20221028,20220930.0,C:\Users\hansj\secfsdstools\data\dld\2022q4.zip,2022q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
3,0001193125-22-225365,320193,APPLE INC,8-K,20220819,20220831.0,C:\Users\hansj\secfsdstools\data\dld\2022q3.zip,2022q3.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
4,0001193125-22-214914,320193,APPLE INC,8-K,20220808,20220731.0,C:\Users\hansj\secfsdstools\data\dld\2022q3.zip,2022q3.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
...,...,...,...,...,...,...,...,...,...,...
82,0001193125-10-088957,320193,APPLE INC,10-Q,20100421,20100331.0,C:\Users\hansj\secfsdstools\data\dld\2010q2.zip,2010q2.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
83,0001193125-10-012085,320193,APPLE INC,10-Q,20100125,20091231.0,C:\Users\hansj\secfsdstools\data\dld\2010q1.zip,2010q1.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
84,0001193125-10-012091,320193,APPLE INC,10-K/A,20100125,20090930.0,C:\Users\hansj\secfsdstools\data\dld\2010q1.zip,2010q1.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
85,0001193125-09-214859,320193,APPLE INC,10-K,20091027,20090930.0,C:\Users\hansj\secfsdstools\data\dld\2009q4.zip,2009q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...


We are only interested in the annual reports (10-K), so lets filter for them

In [6]:
# of course, you could filter the apple_all_reports_df, but there is also a parameter in the get_all_company_reports_df methodapp
apple_10ks_df = apple_reader.get_all_company_reports_df(forms=['10-k'])
apple_10ks_df

Unnamed: 0,adsh,cik,name,form,filed,period,fullPath,originFile,originFileType,url
0,0000320193-22-000108,320193,APPLE INC,10-K,20221028,20220930.0,C:\Users\hansj\secfsdstools\data\dld\2022q4.zip,2022q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
1,0000320193-21-000105,320193,APPLE INC,10-K,20211029,20210930.0,C:\Users\hansj\secfsdstools\data\dld\2021q4.zip,2021q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
2,0000320193-20-000096,320193,APPLE INC,10-K,20201030,20200930.0,C:\Users\hansj\secfsdstools\data\dld\2020q4.zip,2020q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
3,0000320193-19-000119,320193,APPLE INC,10-K,20191031,20190930.0,C:\Users\hansj\secfsdstools\data\dld\2019q4.zip,2019q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
4,0000320193-18-000145,320193,APPLE INC,10-K,20181105,20180930.0,C:\Users\hansj\secfsdstools\data\dld\2018q4.zip,2018q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
5,0000320193-17-000070,320193,APPLE INC,10-K,20171103,20170930.0,C:\Users\hansj\secfsdstools\data\dld\2017q4.zip,2017q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
6,0001628280-16-020309,320193,APPLE INC,10-K,20161026,20160930.0,C:\Users\hansj\secfsdstools\data\dld\2016q4.zip,2016q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
7,0001193125-15-356351,320193,APPLE INC,10-K,20151028,20150930.0,C:\Users\hansj\secfsdstools\data\dld\2015q4.zip,2015q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
8,0001193125-14-383437,320193,APPLE INC,10-K,20141027,20140930.0,C:\Users\hansj\secfsdstools\data\dld\2014q4.zip,2014q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...
9,0001193125-13-416534,320193,APPLE INC,10-K,20131030,20130930.0,C:\Users\hansj\secfsdstools\data\dld\2013q4.zip,2013q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/320193...


### Reading the details of a report

In order to get the details of a report, you either need the unique id of the report (the **adsh** identifier) or you need an `IndexReport` instance of that report.
First, let us see how this is can be done with the adsh identifier

In [7]:
from secfsdstools.e_read.reportreading import ReportReader

# getting the adsh of the latest 10-K report
apple_latest_10k_adsh = apple_10ks_df[:1].adsh.values[0]

apple_latest_10k_report_reader = ReportReader.get_report_by_adsh(adsh=apple_latest_10k_adsh)

The same is possible with an instance of `IndexReport` class:

In [8]:
# this returns a list of IndexReport instances, instead of a pandas DataFrame. They are sorted by date, so the first entry is the latest report
apple_latest_10k_indexreport = apple_reader.get_all_company_reports(forms=['10-k'])[0]
apple_latest_10k_report_reader = ReportReader.get_report_by_indexreport(index_report=apple_latest_10k_indexreport)

Next, let us have a look at the balance sheet. First we get the data for the financial statements. We make sure that all information from the period of the report and the previoius period are contained. Therefore, we use the `financial_statements_for_period_and_previous_period()` method:

In [9]:
apple_latest_10k_fs_df = apple_latest_10k_report_reader.financial_statements_for_period_and_previous_period()
apple_latest_10k_fs_df[apple_latest_10k_fs_df.stmt == 'BS']

Unnamed: 0,adsh,tag,version,stmt,report,line,uom,negating,inpth,20210930,20220930,form
0,0000320193-22-000108,CashAndCashEquivalentsAtCarryingValue,us-gaap/2022,BS,5,3,USD,0,0,34940000000.0,23646000000.0,10-K
1,0000320193-22-000108,MarketableSecuritiesCurrent,us-gaap/2022,BS,5,4,USD,0,0,27699000000.0,24658000000.0,10-K
2,0000320193-22-000108,AccountsReceivableNetCurrent,us-gaap/2022,BS,5,5,USD,0,0,26278000000.0,28184000000.0,10-K
3,0000320193-22-000108,InventoryNet,us-gaap/2022,BS,5,6,USD,0,0,6580000000.0,4946000000.0,10-K
4,0000320193-22-000108,NontradeReceivablesCurrent,us-gaap/2022,BS,5,7,USD,0,0,25228000000.0,32748000000.0,10-K
5,0000320193-22-000108,OtherAssetsCurrent,us-gaap/2022,BS,5,8,USD,0,0,14111000000.0,21223000000.0,10-K
6,0000320193-22-000108,AssetsCurrent,us-gaap/2022,BS,5,9,USD,0,0,134836000000.0,135405000000.0,10-K
7,0000320193-22-000108,MarketableSecuritiesNoncurrent,us-gaap/2022,BS,5,11,USD,0,0,127877000000.0,120805000000.0,10-K
8,0000320193-22-000108,PropertyPlantAndEquipmentNet,us-gaap/2022,BS,5,12,USD,0,0,39440000000.0,42117000000.0,10-K
9,0000320193-22-000108,OtherAssetsNoncurrent,us-gaap/2022,BS,5,13,USD,0,0,48849000000.0,54428000000.0,10-K


In the same way, you can get the data for the Income Statement (IS) or the Cash Flow (CF).

Did you notice that the report data also includes the url where you can view the filed report online?

In [10]:
apple_latest_10k_indexreport.url

'https://www.sec.gov/Archives/edgar/data/320193/000032019322000108/0000320193-22-000108-index.htm'

If you open the link and click on the first entry in the table (the file: aapl-20220924.htm), it will open the real filed report: https://www.sec.gov/ix?doc=/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm

If you open it and click on the "sections" in the top menu bar, you can directly navigate to the financial statements in the report. Of course, you can also scroll to them.

Open then balance sheet and see for yourself, that the data above matches the shown values and are in the same order.