# **EXPLORER FOR SEC FILINGS**
<hr>

## Inter IIT Tech Meet 10.0 (2022)

![image](https://www.sec.gov/edgar/search/images/edgar-logo-2x.png)
![image](https://interiit-tech.org/static/media/logo_1.f4d40e83.png)

In this Notebook, we shall be looking into utilizing the [EDGAR](https://www.sec.gov/edgar/searchedgar/) API to explore the SEC filings of a company. We shall be using the python library python-edgar to access the API. Be careful, the API is limited to 10 requests per second or smth, idk. If a black SUV shows up out in the open, it's probably because you're doing something wrong.

## Objectives
- Scrape Data from the company's History since inception
- Use 10-Q 10-K and 8-K filings to get the company's financial statements
- Use Financial Statements to get the company's balance sheet, income statement, cash flow statement, and ratios
- Use the data to get the company's current assets, liabilities, and equity
- Generate SaaS Metrics
- Generate a Financial Statement Analysis
- Use the metrics with Deep Learning Systems to give Insightful Results

Graciaz<br>
Kaushik Dey

In [1]:
import requests as req
import pandas as pd
import xml.etree.ElementTree as et
import matplotlib.pyplot as plt
import pandasgui as pdgui
import os
import numpy as np
import json
import time
import pyjsonviewer as pjv
from bs4 import BeautifulSoup as bs

## Parsing 10-K Documents

#### What is a 10-K Form?
A Form 10-K is an annual report required by the U.S. Securities and Exchange Commission (SEC), that gives a comprehensive summary of a company's financial performance. Some of the information a company is required to document in the 10-K includes its history, organizational structure, financial statements, earnings per share, subsidiaries, executive compensation, and any other relevant data.


The SEC requires this report to keep investors aware of a company's financial condition and to allow them to have enough information before they buy or sell shares in the corporation, or before investing in the firm’s corporate bonds.

#### Why we need 10-K Forms?
The SEC mandates that all public companies file regular 10-Ks to keep investors aware of a company's financial condition and to allow them to have enough information before they buy or sell securities issued by that company. The 10-K can appear overly complex at first glance, complete with tables full of data and figures. However, it is because it is so comprehensive that this filing is key for investors to get a handle on a company's financial position and prospects.

The Form 10-K is comprised of several parts. These include:

- **Business summary:** This describes the company's operations. It would include information about business segments, products and services, subsidiaries, markets, regulatory issues, research and development, competition, and employees, among other details.
- **Management Discussion and Analysis:** This section allows the company to explain its operations and financial results for the past year.
- **Financial statements:** The financial statements would include the company's balance sheet, income statement, and cash flow statement.
- **Additional sections:** Additional sections may discuss the company's management team and legal proceedings.


In [2]:
file_base = "https://sec.gov/Archives/edgar/data/746210/000074621021000024/" # Base URL for filing
filing_summary = file_base + "FilingSummary.xml"

head = {
    "User-Agent": "Alpha-Explorer/1.0",
    "Connection": "keep-alive"
}

res = req.get(filing_summary, headers=head)

root = et.fromstring(res.text)
root.tag

'FilingSummary'

Filing Summary is essential as it will help us show the components of the 10-K in tabulated HTM

In [3]:
#Exclude last entry or it creates an error
component_dict = [] # Array to store our dictionary
trigger_list = ['BALANCE SHEET', 'INCOME', 'CASH FLOW', 'EQUITY'] # List of triggers to look for
for report in root.iter('Report'):
    dict = {}
    for trigger_word in trigger_list:
        if trigger_word in report.find('ShortName').text:
            try:
                dict["name"] = report.find('ShortName').text
                dict["url"] = file_base+report.find('HtmlFileName').text
                component_dict.append(dict)
            except:
                print("No report found at some point")

component_dict

[{'name': 'CONSOLIDATED BALANCE SHEETS',
  'url': 'https://sec.gov/Archives/edgar/data/746210/000074621021000024/R2.htm'},
 {'name': 'CONSOLIDATED BALANCE SHEETS (Parenthetical)',
  'url': 'https://sec.gov/Archives/edgar/data/746210/000074621021000024/R3.htm'},
 {'name': "CONSOLIDATED STATEMENT OF STOCKHOLDERS' EQUITY",
  'url': 'https://sec.gov/Archives/edgar/data/746210/000074621021000024/R5.htm'},
 {'name': 'CONSOLIDATED STATEMENTS OF CASH FLOWS',
  'url': 'https://sec.gov/Archives/edgar/data/746210/000074621021000024/R6.htm'},
 {'name': 'CONSOLIDATED STATEMENTS OF CASH FLOWS (Parenthetical)',
  'url': 'https://sec.gov/Archives/edgar/data/746210/000074621021000024/R7.htm'}]

### Extracting Tables into a Dictionary
- The tables in the 10-K are in a html format
- Create a dict where every data is stored categorized in headers and sections with data

```html
<table class="report" border="0" cellspacing="2" id="idm139636460643688">
    <tr>
        <th class="tl" colspan="1" rowspan="1"><div style="width: 200px;">
            <strong>CONSOLIDATED BALANCE SHEETS - USD ($)<br> $ in Thousands</strong></div>
        </th>
        <th class="th">
            <div>Dec. 31, 2020</div>
        </th>
        <th class="th">
            <div>Dec. 31, 2019</div>
        </th>
    </tr>
    <tr class="re">
        <td class="pl " style="border-bottom: 0px;" valign="top">
            <a class="a" href="javascript:void(0);"><strong>Current assets:</strong></a>
        </td>
        <td class="text">
            &#160;<span></span>
        </td>
        <td class="text">
            &#160;<span></span>
        </td>
    </tr>
    <tr class="ro">
        <td class="pl " style="border-bottom: 0px;" valign="top">
            <a class="a" href="javascript:void(0);">Cash</a>
        </td>
        <td class="nump">
            $ 5,058<span></span>
        </td>
        <td class="nump">
            $ 4,602<span></span>
        </td>
    </tr>
```

In [None]:
statement_data = []

head = {
    "User-Agent": "Alpha-Explorer/1.0",
    "Connection": "keep-alive"
}

for statement in component_dict:
    dat = {}
    dat["name"] = statement["name"]
    dat["url"] = statement["url"]
    dat["headers"] = []
    dat["sections"] = []
    dat["data"] = []

    res = req.get(statement["url"], headers=head)
    soup = bs(res.text, "html")

    #Lets get them rows now

    
