# Sec Financial Statement Data Sets Tools - Quickstart

## TL;DR

This notebook gives a first introduction into using the secfsdstools (Sec Financial Data Sets Tools) python package: https://pypi.org/project/secfsdstools/

It is designed to work with the data provided by the "Sec Financial Statement Data Sets" (SFSDS)(https://www.sec.gov/dera/data/financial-statement-data-sets).

The SFSDS contains data from all reports that were filed with the sec since 2012. For instance all anual and quarter reports. The main asset that can be retrieved from this data set are the financial statemens (balance sheet, income statement, and cash flow).

First, this notebook shows how the library is installed and configured. After that, it shows how the financial statements can be extracted from the data set.

For a detailed definition of the data set see https://www.sec.gov/files/aqfs.pdf.

## Principles / Concepts

The libary will download all data files that are available. Currently, this is a total of 2GB and every year about 200MB are added. Every quarter has its own compressed file which contains the data from the quarter.

The reports are indexed in a simple sqlite database, which makes finding the a report much more efficient. As of the end of 2022, there are over 500'000 reports from more than 15'000 companies.

The library is storage and memory efficient. The data is directly read from the compressed files. Moreover, only the requested data is read and instantiated as pandas dataframes.

## Installation
In order to install the library, just use pip install:
```
pip install secfsdstools
```

## Configuration / Setup

In order to be used, the library needs to know where to store the compressed files from the SFSDS and where to store the sqlite database file. This is configured in a configuration file.

The easiest way to create the configuration file is to import the update method of the library and run it:

In [None]:
from secfsdstools.update import update

update()

If you run the method for the first time, it will fail with the following message:
```
No config file found at home directory C:\Users\hansj\.secfsdstools.cfg.
Config file created at <user-home>/.secfsdstools.cfg. Please check the content and rerun.
```

As the message says, a default config file was created in your user home directory. It has the following content:
```
[DEFAULT]
downloaddirectory = <userhome>/secfsdstools/data/dld
dbdirectory = <userhome>/secfsdstools/data/db
useragentemail = your.email@goeshere.com
```
The downloaddirectory is the folder in which the compressed data files are downloaded and inside the fodler defined in dbdirectory the sqlite db file is created.
The useragentemail is set inside the header when requests to sec.gov are made. This should be your email-address, however, since we are only making very few requests, it doesn't really matter if you change it or not.

If you plan to use Jupyter, make sure that you configure the directories at a location where your Jupyter process has access. The used default directory (your user home directory) will work.

Next, run the update method again.

In [2]:
from secfsdstools.update import update

update()

This may take a few minutes. if you want to see the progress of the download, have a look at your download directory.

## Read Financial Statements for apple annual report 2022

In [4]:

from os.path import expanduser
home = expanduser("~")
print(home)

C:\Users\hansj


In [5]:
import glob
glob.glob(home)

['C:\\Users\\hansj']

In [7]:
import os
dir_list = os.listdir(home)
print(dir_list)

['.anaconda', '.android', '.astropy', '.bash_history', '.cache', '.conda', '.condarc', '.config', '.databricks-connect', '.docker', '.dotnet', '.freemind', '.git', '.gitattributes', '.gitconfig', '.gnupg', '.gtk-bookmarks', '.IntelliJIdea2018.2', '.IntelliJIdea2019.3', '.ipynb_checkpoints', '.ipython', '.ivy2', '.jupyter', '.keras', '.m2', '.matplotlib', '.ms-ad', '.openshot_qt', '.pdfbox.cache', '.pyforest', '.pylint.d', '.pypirc', '.Rhistory', '.scala_history', '.secfsdstools.cfg', '.spyder-py3', '.ssh', '.step', '.surprise_data', '.vscode', '3D Objects', 'ansel', 'Anwendungsdaten', 'AppData', 'bp.png', 'cbhous.page', 'Contacts', 'Cookies', 'Desktop', 'Documents', 'Downloads', 'Dropbox', 'Druckumgebung', 'Eigene Dateien', 'environment.yml', 'examregression.html', 'Favorites', 'GD Elternrat', 'Google Drive', 'googledrive', 'IdeaProjects', 'IntelGraphicsProfiles', 'kabletable.html', 'Links', 'locked', 'Lokale Einstellungen', 'MicrosoftEdgeBackups', 'Music', 'Netzwerkumgebung', 'NTUSER.