# [SEC Python Wrapper to obtain missing Financial Statement Line Items](#section-title)

Because the Polygon.io API call contained many missing data points, I explored other methods to attain this information:

1. I webscraped the website MarketWatch to pull data on Cost of Revenue for Q2 2023. The MarketWatch scraping ultimately provided a __[403 Forbidden warning](https://developer.edgar-online.com/docs)__ that indicates that the server understands my request but refuses to authorize it. Because I did not receive this warning while web scraping for individual stocks such as APPL and TSLA, it seems that MarketWatch rejects my calling of >2,000 hits to its server wihtin a small time frame. 

1. I then considered using the SEC's website Edgar which has API documentation __[here](https://developer.edgar-online.com/docs)__. However, when attempting to register for my API, the Edgar website reveals that it is __[not accepting new registrants](https://developer.edgar-online.com/member/register)__

1. I next tried a python wrapper called __[SEC-API](https://pypi.org/project/sec-api/)__, an API that allows for EDGAR filings queries and real-time streams of the entire SEC filing corpus, encompassing over 650 terabytes of data. My free API key was created using __[sec-api.io](https://sec-api.io/)__.


Within the Python wrapper, the QueryApi module did not meet my needs because the API provides a URL link to access the 10-Q statements, but does not allow me to index into the Cost of Revenues line item specifically. With this limitation, I would have to research each line item manually. The same problem occurred with a different module called "FullTextSearchAPI" in which the filing URL was provided but the line items within the financial statements were not.

## Installing the Python Wrapper: SEC-API

In [1]:
pip install sec-api

Note: you may need to restart the kernel to use updated packages.


### Exploring the QueryAPI for 10-Qs

In [4]:
from sec_api import QueryApi

queryApi = QueryApi(api_key="bd1ab62e71e73e413e0172aeb5dc744c8ce42bd599879d9b57af137e8e971b3c")

query = {
  "query": { "query_string": {
      "query": "ticker:TSLA AND filedAt:{2023-05-01 TO 2023-08-24} AND formType:\"10-Q\""
    } },
  "from": "0",
  "size": "10",
  "sort": [{ "filedAt": { "order": "desc" } }]
}

filings = queryApi.get_filings(query)

print(filings)

{'total': {'value': 1, 'relation': 'eq'}, 'query': {'from': 0, 'size': 10}, 'filings': [{'id': '4514ae60603f203ce9cedd0ac07f8453', 'accessionNo': '0000950170-23-033872', 'cik': '1318605', 'ticker': 'TSLA', 'companyName': 'Tesla, Inc.', 'companyNameLong': 'Tesla, Inc. (Filer)', 'formType': '10-Q', 'description': 'Form 10-Q - Quarterly report [Sections 13 or 15(d)]', 'filedAt': '2023-07-21T18:08:29-04:00', 'linkToTxt': 'https://www.sec.gov/Archives/edgar/data/1318605/000095017023033872/0000950170-23-033872.txt', 'linkToHtml': 'https://www.sec.gov/Archives/edgar/data/1318605/000095017023033872/0000950170-23-033872-index.htm', 'linkToXbrl': '', 'linkToFilingDetails': 'https://www.sec.gov/Archives/edgar/data/1318605/000095017023033872/tsla-20230630.htm', 'entities': [{'companyName': 'Tesla, Inc. (Filer)', 'cik': '1318605', 'irsNo': '912197729', 'stateOfIncorporation': 'DE', 'fiscalYearEnd': '1231', 'type': '10-Q', 'act': '34', 'fileNo': '001-34756', 'filmNo': '231103432', 'sic': '3711 Motor

### Exploring the FullTextSearchApi for 10-Qs

In [14]:
from sec_api import FullTextSearchApi

fullTextSearchApi = FullTextSearchApi(api_key="bd1ab62e71e73e413e0172aeb5dc744c8ce42bd599879d9b57af137e8e971b3c")

query = {
  "query": '"TSLA"',
  "formTypes": ['10-Q'],
  "startDate": '2023-05-01',
  "endDate": '2023-08-24',
}

filings = fullTextSearchApi.get_filings(query)

print(filings)

{'total': {'value': 1, 'relation': 'eq'}, 'filings': [{'accessionNo': '0000950170-23-033872', 'cik': '1318605', 'companyNameLong': 'Tesla, Inc. (TSLA) (CIK 0001318605)', 'ticker': 'TSLA', 'description': '10-Q', 'formType': '10-Q', 'type': '10-Q', 'filingUrl': 'https://www.sec.gov/Archives/edgar/data/1318605/000095017023033872/tsla-20230630.htm', 'filedAt': '2023-07-24'}]}


### Using the 10-K/10-Q/8-K Section Extractor API to hone in on statement particulars

In [15]:
from sec_api import ExtractorApi

extractorApi = ExtractorApi("bd1ab62e71e73e413e0172aeb5dc744c8ce42bd599879d9b57af137e8e971b3c")

# 10-Q example

# Tesla 10-Q filing
filing_url_10q = "https://www.sec.gov/Archives/edgar/data/1318605/000095017023033872/tsla-20230630.htm"


# extract section 1A "Risk Factors" in part 2 as cleaned text
extracted_section_10q = extractorApi.get_section(filing_url_10q, "part1item1", "text")

In [17]:
type(extracted_section_10q)

str

In [16]:
extracted_section_10q

' PART I. FINANCIAL INFORMATION \n\nITEM 1. FINANCIAL STATEMENTS \n\nTesla, Inc. \n\nConsolidated Balance Sheets \n\n(in millions, except per share data) \n\n(unaudited) \n\n&#160; \n\n&#160; \n\nJune 30, \n\n&#160; \n\n&#160; \n\nDecember 31, \n\n&#160; \n\n&#160; \n\n&#160; \n\n&#160; \n\n&#160; \n\n&#160; \n\nAssets \n\n&#160; \n\n&#160; \n\n&#160; \n\n&#160; \n\n&#160; \n\n&#160; \n\nCurrent assets \n\n&#160; \n\n&#160; \n\n&#160; \n\n&#160; \n\n&#160; \n\n&#160; \n\nCash and cash equivalents \n\n&#160; \n\n$ \n\n15,296 \n\n&#160; \n\n&#160; \n\n$ \n\n16,253 \n\n&#160; \n\nShort-term investments \n\n&#160; \n\n&#160; \n\n7,779 \n\n&#160; \n\n&#160; \n\n&#160; \n\n5,932 \n\n&#160; \n\nAccounts receivable, net \n\n&#160; \n\n&#160; \n\n3,447 \n\n&#160; \n\n&#160; \n\n&#160; \n\n2,952 \n\n&#160; \n\nInventory \n\n&#160; \n\n&#160; \n\n14,356 \n\n&#160; \n\n&#160; \n\n&#160; \n\n12,839 \n\n&#160; \n\nPrepaid expenses and other current assets \n\n&#160; \n\n&#160; \n\n2,997 \n\n&#160; \

In [29]:
def sec_string_cleaner(section):
    section = section.replace('\n', '').replace('\n\n', '').replace('&#160;', '')
    return section

In [30]:
sec_string_cleaner(extracted_section_10q)

' PART I. FINANCIAL INFORMATION ITEM 1. FINANCIAL STATEMENTS Tesla, Inc. Consolidated Balance Sheets (in millions, except per share data) (unaudited)   June 30,   December 31,       Assets       Current assets       Cash and cash equivalents  $ 15,296   $ 16,253  Short-term investments   7,779    5,932  Accounts receivable, net   3,447    2,952  Inventory   14,356    12,839  Prepaid expenses and other current assets   2,997    2,941  Total current assets   43,875    40,917  Operating lease vehicles, net   5,935    5,035  Solar energy systems, net   5,365    5,489  Property, plant and equipment, net   26,389    23,548  Operating lease right-of-use assets   3,352    2,563  Digital assets, net       Intangible assets, net       Goodwill       Other non-current assets   5,026    4,193  Total assets  $ 90,591   $ 82,338  Liabilities       Current liabilities       Accounts payable  $ 15,273   $ 15,255  Accrued liabilities and other   7,658    7,142  Deferred revenue   2,176    1,747  Custom

In [36]:
# To find the value associated with COGS

keyword = "Total cost of revenues"
index = extracted_section_10q.find(keyword)

if index != -1:
    words = extracted_section_10q[index + len(keyword):].split()
    if len(words) > 1:
        value = words[2] # The next "word" is a space, so I want the word after that
        print("Value:", value)
    else:
        print("No value found")
else:
    print("Keyword not found")

Value: 20,394
