<a href="https://colab.research.google.com/github/ProfessorPatrickSlatraigh/CST3512/blob/main/SEC_Edgar_yfinance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#SEC Edgar Data and yfinance for Yahoo! Finance    

[per Wikipedia](https://en.wikipedia.org/wiki/EDGAR):    

***

"**EDGAR, the Electronic Data Gathering, Analysis, and Retrieval** system, performs automated collection, validation, indexing, acceptance, and forwarding of submissions by companies and others who are required by law to file forms with the U.S. Securities and Exchange Commission (the "SEC"). The database contains a wealth of information about the Commission and the securities industry which is freely available to the public via the Internet."     

*This notebook uses the [sec-edgar-downloader repository by jadchaar on Github](https://github.com/jadchaar/sec-edgar-downloader) to demonstrate an approach to accessing the wealth of data available on the SEC Edgar platform, which is based on [XBRL](https://www.xbrl.org/) (a form of XML) transmission of data.*    


***    

"**Yahoo! Finance** is a media property that is part of the **Yahoo!** network. It provides financial news, data and commentary including stock quotes, press releases, financial reports, and original content. It also offers some online tools for personal finance management. In addition to posting partner content from other web sites, it posts original stories by its team of staff journalists. It is ranked 21st by SimilarWeb on the list of largest news and media websites.    

**Yahoo! Finance** recently added the feature to look at news surrounding cryptocurrency. It lists over 9,000 unique coins including Bitcoin and Ethereum."    

*This notebook uses the [**yfinance** module by ranaroussi on Github](https://github.com/ranaroussi/yfinance) to demonstrate an approach to accessing many different types of financial statement and stock/coin trading data available on the **Yahoo! Finance** platform.*   





---



##Contents    


[sec-edgar-downloader](https://github.com/jadchaar/sec-edgar-downloader) is a Python package for downloading company filings from the SEC EDGAR database. Searches can be conducted either by **stock ticker** or **Central Index Key (CIK)**. You can use the SEC CIK lookup tool if you cannot find an appropriate ticker.


> Lookup up stock ticker symbols using a company name search on [Yahoo Finance](https://finance.yahoo.com/lookup) or by using the [search page on NasdaqTrader.com.](https://www.nasdaqtrader.com/trader.aspx?id=symbollookup)



This notebook begins with two sections on accessing SEC Edgar data:    
1. **Basic Usage** -- to use module imports and some straightforward code to do simple lookups    
2. **Advanced Usage** -- alternate code and approach for more complex/advanced queries and data access    

This notebook also has a third section on using the [**yfinance** module by ranaroussi on Github](https://github.com/ranaroussi/yfinance) to access ticker data from Yahoo Finance:    
3. **yfinance** -- for information, historical market data, financial statements, analysts recommendations, option chains, and more on publicly traded stocks     

Finally, there is an appendix in this notebook to describe the structure of `XBRL .xml` documents along with an example of parsing `XBRL .xml` with BeautifulSoup.  There are other methods for automatically extracting data from `XBRL .xml` documents such as the `python-xbrl` library.
4. **APPENDIX: Parsing XBRL with Python** -- by Matt Scarpino, 02-Feb-2018 [blog post on CodeProject](https://www.codeproject.com/Articles/1227765/Parsing-XBRL-with-Python)     

*please see the footnotes at the bottom of this notebook for disclaimers*    

##Housekeeping    

For the `sec-edgar-downloader` Basic and Advanced scripts, be sure to import the following module:    

* from sec_edgar_downloader import Downloader    

The `sec-edgdar-downloade` module can be imported using the following statement:    
* `!pip install -U sec-edgar-downloader`    


For the `yfinance` scripts, be sure to import the following module:     
* import yfinance as yf    

The `yfinance` module can be imported using the following statement:    
* `!pip install yfinance --upgrade --no-cache-dir`    

To use a custom requests session with `yfinance` (for example to cache calls to the API or customize the User-agent header), pass a `session= argument` to the Ticker constructor as in the following code:    


```
import requests_cache
session = requests_cache.CachedSession('yfinance.cache')
session.headers['User-agent'] = 'my-program/1.0'
ticker = yf.Ticker('msft aapl goog', session=session)
# The scraped response will be stored in the cache
ticker.actions
```

And, if you intend to work with the example of parsing `XBRL .xml` files in the **APPENDIX: Parsing XBRL with Python**, then you will want to import `BeautifulSoup`, `requests`, and `sys` as in the following code:    


```
from bs4 import BeautifulSoup
import requests
import sys
```



In [6]:
# importing sec-edgar-downloader for access to SEC Edgar data    
!pip install -U sec-edgar-downloader    
from sec_edgar_downloader import Downloader


Collecting sec-edgar-downloader
  Downloading sec_edgar_downloader-4.3.0-py3-none-any.whl (13 kB)
Collecting Faker
  Downloading Faker-13.6.0-py3-none-any.whl (1.5 MB)
[K     |████████████████████████████████| 1.5 MB 8.3 MB/s 
Installing collected packages: Faker, sec-edgar-downloader
Successfully installed Faker-13.6.0 sec-edgar-downloader-4.3.0


In [1]:
# importing yfinance as yf for access to yahoo! Finance data    
!pip install yfinance --upgrade --no-cache-dir    
import yfinance as yf 



In [None]:
# importing beautifulsoup, requests, and sys for XBRL parsing example in the APPENDIX
from bs4 import BeautifulSoup
import requests
import sys

In [None]:
# import prettyprint (pprint) for easy-to-read output 
import pprint 



---



##1. Basic SEC Edgar (downloader) Usage     


This section demonstrates downloads of `HTML` documents for various SEC Edgar filings on publicly traded companies.    

The official list of all SEC forms (documents) can be found on the [U.S. Securities and Exchange Commission - Forms List](https://www.sec.gov/forms).    

Explanations of the content and use of some of the more frequently analyzed SEC Edgar submission forms is available on [Investopedia - SEC Filings: Forms You Need to Know page](https://www.investopedia.com/articles/fundamental-analysis/08/sec-forms.asp#:~:text=Among%20the%20most%20common%20SEC,114%2C%20and%20Foreign%20Investment%20Disclosures.).     

In [None]:
from sec_edgar_downloader import Downloader

# Initialize a downloader instance. If no argument is passed
# to the constructor, the package will download filings to
# the current working directory.
# dl = Downloader("/path/to/valid/save/location") # to specify a path
dl = Downloader()  # to use the current working directory   

# Get all 8-K filings for Apple (ticker: AAPL)
# dl.get("8-K", "AAPL")  # this is lengthy

# Get all 8-K filings for Apple, including filing amends (8-K/A)
# dl.get("8-K", "AAPL", include_amends=True) # this is lengthy

# Get all 8-K filings for Apple after January 1, 2017 and before March 25, 2017
# Note: after and before strings must be in the form "YYYY-MM-DD"
dl.get("8-K", "AAPL", after="2017-01-01", before="2017-03-25") # this is brief

# Get the five most recent 8-K filings for Apple
dl.get("8-K", "AAPL", amount=5) # this is brief

# Get all 10-K filings for Microsoft
# dl.get("10-K", "MSFT")  # this is moderate in length

# Get the latest 10-K filing for Microsoft
dl.get("10-K", "MSFT", amount=1) # this is brief 

# Get all 10-Q filings for Visa
# dl.get("10-Q", "V") # this is very lengthy 

# Get all 13F-NT filings for the Vanguard Group
dl.get("13F-NT", "0000102909")  # this is brief

# Get all 13F-HR filings for the Vanguard Group
# dl.get("13F-HR", "0000102909")  # this is lengthy 

# Get all SC 13G filings for Apple
dl.get("SC 13G", "AAPL") # this is brief

# Get all SD filings for Apple
dl.get("SD", "AAPL") # this is brief 

Let's look at SEC Edgar filings regarding insider trading of **Twitter (`TWTR`)**  stock.    

We can wrap the `dl.get()` method with a `print()` statement to display a count of the number of documents downloaded.    


In [13]:
# from sec_edgar_downloader import Downloader

# Initialize a downloader instance. If no argument is passed
# to the constructor, the package will download filings to
# the current working directory.
# dl = Downloader("/path/to/valid/save/location") # to specify a path
# dl = Downloader()  # to use the current working directory   

# Get the last 6 filings of Form-3 (Beneficial Ownership) for Twitter (ticker: TWTR)
print('Form 3 downloads', dl.get("3", "TWTR", amount=6))  # Is Elon Musk on any of these?

# Get the last six month's worth of Form-4 (Changes in Ownership) for Twitter (ticker: TWTR)
print('Form 4 downloads', dl.get("4", "TWTR", after="2021-10-31", before="2022-05-01"))  # Is Elon Musk on any of these?

# Get all of the filings of Form-5 (Annual Summary of #4) for Twitter (ticker: TWTR)
print('Form 5 downloads', dl.get("5", "TWTR"))  # Is Elon Musk on any of these?


Form 3 downloads 6
Form 4 downloads 42
Form 5 downloads 0




---



##2. Advanced SEC Edgar (downloader) Usage     



While this section claims to support more advanced uses of SEC Edgar data, like the basic section, it is focused on downloading documents and not on parsing the data in those documents.  The more advanced features described in this section are somewhat more complex search terms for the selection of a document(s) to be downloaded.    


In [None]:
from sec_edgar_downloader import Downloader

# Download filings to the current working directory
dl = Downloader()

# Get all Apple proxy statements that contain the term "antitrust"
dl.get("DEF 14A", "AAPL", query="antitrust")

# Get all 10-K filings for Microsoft without the filing details
dl.get("10-K", "MSFT", download_details=False)

# Get the latest supported filings, if available, for Apple
for filing_type in dl.supported_filings:
    dl.get(filing_type, "AAPL", amount=1)

# Get the latest supported filings, if available, for a
# specified list of tickers and CIKs
equity_ids = ["AAPL", "MSFT", "0000102909", "V", "FB"]
for equity_id in equity_ids:
    for filing_type in dl.supported_filings:
        dl.get(filing_type, equity_id, amount=1)    




---



##APPENDIX: Parsing XBRL with Python     

from the [02-Feb-2018 blog post by Dan Scavino on CodeProject](https://www.codeproject.com/Articles/1227765/Parsing-XBRL-with-Python)     



So far this notebook has demonstrated how to access corporate reports in the EDGAR database, but it didn't explain how to extract data from a report. If you look at a report listing, you'll see that EDGAR provides reports in three primary formats:    

* **Regular text** - Data provided in regular files (*.txt)    
* **Web pages** - Data to be viewed in a browser (*.htm)    
* **XBRL** - Data provided in XBRL-formatted files (*.xml)    

The first two options are fine if you want to read report data yourself. But if you want to extract data programmatically, the last option is the most practical. XBRL files aren't easy for humans to read, but because of their structure, they're ideally suited for computers.    

This appendix describes the XBRL format and then explains how to read XBRL using BeautifulSoup. At the end, I'll present example code that programmatically downloads and parses an XBRL file from EDGAR.    

###1. Introducing XBRL    

A primary role of the **US Securities and Exchange Commission (SEC)** is to ensure that investors have reliable information with which to make decisions. To this end, the **SEC** requires that publicly-traded corporations submit reports that accurately portray their financial state. Corporations have traditionally provided these reports in regular text, but as computerized stock analysis became popular, the **SEC** decided on a more structured, computer-readable format.    

The **SEC** selected the **eXtensible Business Reporting Language** (`XBRL`) for structured corporate reporting. As of April 2009, the **SEC** requires that corporations provide financial reports in `XBRL` format in addition to text. Since then, India and the United Kingdom have also adopted `XBRL` for corporate reporting.    

`XBRL` is based on the **eXtensible Markup Language** (`XML`), but uses special tags to mark financial data. This section presents the basics of `XML` and namespaces, and then provides an overview of `XBRL`.    


####1.1 XML, Schema, and Namespaces    

A good way to introduce `XML` is to compare it with HTML. An `HTML` document structures its content using nested tags that take the form `<xyz>...</xyz>`. For example, `HTML` uses `<b>...</b>` tags to display text in boldface, as in `<b>Hi there!</b>`. HTML lets you control a tag's behavior with attributes, such as the id attribute in `<p id="...">...</p>`.    

I like to think of `XML` as generic `HTML`. An `XML` document contains tags and attributes similar to those in `HTML` but `XML` doesn't define any specific tags or attributes. Instead, implementers can define their own tags and attributes by creating a schema. Schemas are defined in special `XML` documents formatted with **XML Schema Definition (XSD)**, and for this reason, schema documents have the suffix `*.xsd` instead of `*.xml`.    

An XML document can access the tags and attributes of a schema using a namespace declaration. As an example, the following declaration specifies that the `XML` document will access the tags and attributes defined in the schema located at `http://www.example.com`:    

In [None]:
# sample XML 
xmlns:ex="http://www.example.com"

####1.2 XBRL Reports and Schema    

An `XBRL` document is an `XML` document that structures its content using `XBRL`'s tags and attributes. This may sound straightforward, but a single document may need to access features from many different schemas. For example, different countries have different reporting requirements, so an American report will access a different set of elements than a British report. Similarly, different types of reports will require different schemas, so an annual report will use different tags than a prospectus.    

A thorough discussion of the tags/attributes in an American corporation's annual report would take up a sizable book. In this discussion, my goal is to present some of the namespaces that are commonly accessed in American reports:    

* **Base XBRL Schema** - Provides the overall structure of an `XBRL` document    
* **US Document and Entity Information (DEI)** - Sets a document's type and characteristics    
* **US Generally Accepted Accounting Principles (GAAP)** - Defines required elements of American reports    
* **Entity-specific Schema** - Defines elements specific to the entity providing the report    

You don't need to memorize the elements of these namespaces, but the more familiar you are, the better you'll be able to extract data from `XBRL` documents.    


#####1.2.1 The Base XBRL Schema     

The fundamental tags and attributes of `XBRL` are provided in the schema located at *http://www.xbrl.org/2003/instance*. Documents commonly access these elements through the `xbrli` prefix, as given in the following namespace declaration:    



In [None]:
# XML example 
xlmns:xbrli="http://www.xbrl.org/2003/instance"    


Of the many elements defined by the schema, `xbrli:xbrl` is particularly important. This is because the content of every XBRL document must be contained inside `<xbrli:xbrl>...</xbrli:xbrl>` tags.    

To understand other tags provided by the base schema, you should be familiar with the following terms:  

* **instance** - an `XBRL` document whose root element is <xbrli:xbrl>   
* **fact** - an individual detail in a report, such as $20M   
* **concept** - the meaning associated with a fact, such as the cost of goods sold   
* **entity** - the company or individual described by a concept    
* **context** - a data structure that associates an entity with a concept    

Many `XBRL` documents start by defining a long list of contexts. Each context is represented by an `<xbrli:context>` element and each has an id attribute. Each `<xbrli:context>` element contains an `<xbrli:entity>` subelement that identifies an entity. The following markup defines a context with an identifier of **FD2013Q4YTD**:     



In [None]:
# XML example    
<xbrli:context id="FD2013Q4YTD">
    <xbrli:entity>
    <xbrli:identifier scheme="http://www.sec.gov/CIK">0001065088</xbrli:identifier>
  </xbrli:entity>
  <xbrli:period>
    <xbrli:startDate>2013-01-01</xbrli:startDate>
    <xbrli:endDate>2013-12-31</xbrli:endDate>
  </xbrli:period>
</xbrli:context> 


Later sections in the document can reference this context by assigning a contextRef attribute to the context's ID. This is shown in the following markup:   


In [None]:
# XML example 
<us-gaap:IncomeTaxDisclosureTextBlock contextRef="FD2013Q4YTD" ...>


####1.2.2 US Document and Entity Information (DEI)    

Every `XBRL` document submitted to the **SEC** needs to provide information about its content. A submitter can meet this requirement by including elements from the **US Document and Entity Information (DEI)** schema. These elements are commonly prefixed with `dei` and a document can access them with the following declaration:    
*italicized text*

In [None]:
# XML example
xlmns:dei="http://xbrl.sec.gov/dei/2014-01-31"

The elements defined in this schema identify the XBRL report's type and provide information about the entity submitting the report. *Table 1* lists eleven of the many elements available.    

**Table 1: Elements Provided by the US Document and Entity Information Schema (Abridged)**    


* **DocumentType** >> Type of document being reported    
* **EntityCentralIndexKey** >> CIK of the entity submitting the report    
* **TradingSymbol** >> Exchange symbol of the entity submitting the report    
* **EntityCurrentReportingStatus** >> Identifies if the entity is subject to filing requirements    
* **EntityFilerCategory** >> Identifies the entity's filing category (large, small, ...    
* **EntityRegistrantName** >> Exact name of the entity has given in the charter    
* **DocumentFiscalPeriodFocus** >> The document's focus fiscal period    
* **DocumentFiscalYearFocus** >> The document's focus fiscal year    
* **CurrentFiscalYearEndDate** >> End of the current fiscal year    
* **AmendmentFlag** >> Identifies if the document is an amendment to a
previously-filed document    
* **AmendmentDescription**>>Description of changes in amended document    



It's important to see the difference between `EntityCentralIndexKey`, `TradingSymbol`, and `EntityRegistrantName`. The `EntityCentralIndexKey` element identifies the submitter's **CIK code**, the `TradingSymbol` identifies the submitter's **trading (ticker) symbol**, and `EntityRegistrantName` provides the entity's formal name.

The following markup, taken from an **eBay annual report**, demonstrates how DEI elements are used:    


In [None]:
# XML example    
<dei:DocumentType contextRef="..." id="Fact-...">
  10-K
</dei:DocumentType>
<dei:EntityCentralIndexKey contextRef="..." id="Fact-...">
  0001065088
</dei:EntityCentralIndexKey>
<dei:TradingSymbol contextRef="..." id="Fact-...">
  EBAY
</dei:TradingSymbol>
<dei:EntityRegistrantName contextRef="..." id="Fact-...">
  EBAY INC
</dei:EntityRegistrantName>
<dei:EntityFilerCategory contextRef="..." id="Fact-...">
  Large Accelerated Filer
</dei:EntityFilerCategory> 


As shown, each `DEI` element has an id attribute and a contextRef that refers to an `<xbrli:context>` element defined earlier in the document.    


####1.2.3 US Generally Accepted Accounting Principles (GAAP)

To ensure that businesses use common terminology in their accounting reports, the **US Financial Accounting Standards Board (FASB)** provides a set of standards called the **Generally Accepted Accounting Principles**, or `GAAP`. Entities can provide `GAAP` data in their `XBRL` reports by accessing the **FASB**'s schema definitions. `GAAP` elements are commonly preceded with the `us-gaap` prefix:    



In [None]:
# XML Example 
xmlns:us-gaap="http://fasb.org/us-gaap/2014-01-31"

This schema provides thousands of elements related to accounting, and *Table 2* lists a small but important subset. You can look through a more complete table [here](http://www.xbrlsite.com/LinkedData/BrowseObjectsByType_HTML.aspx?Type=%5BConcept%5D&Submit=Submit).   


**Table 2: Elements of the US Generally Accepted Accounting Principles Schema (Abridged)**     

**AccountsPayableCurrent** >> Liabilities payable to vendors as of the balance sheet date    
**AccountsReceivableGross** >> Amounts due from customers or clients    
**AccountsReceivableNet** >> Amounts due from customers or clients, reduced to estimated realizable value    
**AccruedIncomeTaxes** >> Unpaid sum of known and estimated tax obligations    
**AccruedInsuranceCurrent** >> Obligations payable to insurance entities to mitigate loss    
**AssetManagementCosts** >> Aggregate costs related to asset management    
**AssetsCurrent** >> Sum of all assets expected to be realized within year    
**BorrowedFunds** >> Sum of all debt amounts    
**Cash** >> Unrestricted cash available for operating needs    
**CommercialPaper** >> Value of short-term borrowings using unsecured
obligations issued by banks and corporations    
**CommonStockNoParValue** >> Issuance value per share of no-par value stock    
**CommonStockSharesIssued** >> Total number of common shares that have been
sold or granted to shareholders    
**CommonStockValue** >> Aggregate par or stated value of issued common stock    
**SalariesAndWages** >> Expenditures for salaries other than officers    
**ConvertibleDebt** >> Amount of debt that can be converted into another
form of financial instrument, such as common stock    
**CostOfGoodsSold** >> Aggregate costs related to goods sold during the period    
**CostOfServices** >> Total costs related to services rendered during the period    
**CostsAndExpenses** >> Total costs of sales and operating expenses for the period    
**DebtCurrent** >> Sum of short-term debt and maturities of long-term debt    
**DeferredRevenue** >> Cash or other assets that have not yet been realized    
**Depreciation** >> Amount of expense related to the cost of tangible assets
over the assets' useful lives    
**DirectOperatingCosts** >> Aggregate expenses directly related to operations    
**Dividends** >> Equity impact of cash, stock, and dividends declared
for all securities during the period    
**EarningsPerShareBasic** >> Net income (loss) for the period per share of common stock    
**GrossProfit** >> Aggregate revenue minus the cost of goods/services sold and operating expenses    
**IntangibleAssetsCurrent** >> Current portion of non-physical assets, excluding financial assets    
**InterestAndDebtExpense** >> Expenses related to interest and debt payments    
**InventoryGross** >> Merchandise, goods, or supplies held for future sale or used in manufacturing or production    
**Land** >> Real estate held for productive use, not held for sale    
**Liabilities** >> Sum of all recognized liabilities    
**LiabilitiesAndStockholdersEquity** >> Total of liabilities and stockholder's equity, including the portion of equity attributable to noncontrolling interests    
**NetIncomeLoss** >> Portion of profit or loss for the period, net of income taxes    
**ProfitLoss** >> Consolidated profit or loss for the period    
**NotesPayable** >> Aggregate amount of notes payable, with initial maturities beyond one year or the normal operating cycle    
**OfficersCompensation** >> Expenditures for salaries of officers    
**OperatingCycle** >> Entity's operating cycle if less than 12 months    
**OperatingExpenses** >> Recurring costs associated with normal operations except expenses included in the cost of sales or services    
**PreferredStockValue** >> Stated value of issued nonredeemable preferred stock    
**ResearchAndDevelopmentExpense** >> Costs incurred during research and development activities    
**Revenues** >> Aggregate revenue recognized during the period    
**SharesIssued** >> Number of shares of stock issued    
**SharesOutstanding** >> Number of shares issued and outstanding    
**StockholdersEquity** >> Total of stockholders' equity items, net of receivables from officers, directors, owners, and affiliates    

You can find accounting data in a report by searching for the appropriate `us-gaap` element. For example, **eBay's 2014** annual report identifies its **aggregate liabilities** with the following markup:   

In [None]:
# XML example 
<us-gaap:Liabilities contextRef="..." decimals="..." id="..." unitRef="usd">
  25226000000
</us-gaap:Liabilities> 


The `us-gaap` schema has many elements that closely resemble one another in name and purpose. If you're searching for specific accounting data, be sure not to confuse the elements.

###2. Parsing XBRL with BeautifulSoup    

After you've downloaded an XBRL document, you can extract its data using a number of methods. If you know what element you're interested in, you can perform a brute-force search for the text, as in `us-gaap:Assets`. At the opposite extreme, the `python-xbrl` library was specially created for parsing `XBRL` documents, but I've never gotten it to work properly.    

This section explains how to parse `XBRL` using the `BeautifulSoup` package. You don't need to learn any new classes or methods, but it is important to specify that you want to perform `XML` parsing. If you install the `lxml` library (`pip install lxml`), then you can create the `BeautifulSoup` instance with the following code:    


In [None]:
soup = BeautifulSoup(..., 'lxml')

For some reason, when I call the `find_all` method to search for an `XBRL` tag, the returned list is always empty. But when I call `find_all` without arguments, the returned list contains Tags that represent `XBRL` tags. Therefore, I use code like the following:    


In [None]:
soup = BeautifulSoup(xbrl_string, 'lxml')
tag_list = soup.find_all()
for tag in tag_list:
    if tag.name == 'us-gaap:liabilities':
        print('Liabilities: ' + tag.text)    


An annual report may contain multiple `<us-gaap:liabilities>` elements, each corresponding to a different reporting period. Each period corresponds to a `<context>` element, so you can distinguish between `GAAP` elements by checking their `contextRef` attributes.    


##3.a Complete EDGAR-XBRL Example (3.a needs work -- see 3.b)

If you are familira with BeautifulSoup and the content of this article, you shouldn't have any trouble understanding how to access a company's EDGAR reports and parse them in Python. To demonstrate this, the code in *Listing 1* searches EDGAR for the 2014 annual report (10-K) from IBM (CIK: 0000051143) and then parses the XBRL to determine the stockholder's equity (`us-gaap:stockholdersequity`),   


###**THE FOLLOWING CODE NEEDS TO BE UPDATED**

**Listing 1: Reading Stockholder's Equity from IBM's Annual Report (xbrl_reader.py)**

In [None]:
from bs4 import BeautifulSoup
import requests
import sys

# Access page
cik = '0000051143'
type = '10-K'
dateb = '20160101'

# Obtain HTML for search page
base_url = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK={}&type={}&dateb={}"
edgar_resp = requests.get(base_url.format(cik, type, dateb))
edgar_str = edgar_resp.text

# Find the document link
doc_link = ''
soup = BeautifulSoup(edgar_str, 'html.parser')
table_tag = soup.find('table', class_='tableFile2')
rows = table_tag.find_all('tr')
for row in rows:
    cells = row.find_all('td')
    if len(cells) > 3:
        if '2015' in cells[3].text:
            doc_link = 'https://www.sec.gov' + cells[1].a['href']

# Exit if document link couldn't be found
if doc_link == '':
    print("Couldn't find the document link")
    # sys.exit() # not worth trying to exit() in Colab!

# Obtain HTML for document page
doc_resp = requests.get(doc_link)
doc_str = doc_resp.text

# Find the XBRL link
xbrl_link = ''
soup = BeautifulSoup(doc_str, 'html.parser')
table_tag = soup.find('table', class_='tableFile', summary='Data Files')
rows = table_tag.find_all('tr')
for row in rows:
    cells = row.find_all('td')
    if len(cells) > 3:
        if 'INS' in cells[3].text:
            xbrl_link = 'https://www.sec.gov' + cells[2].a['href']

# Obtain XBRL text from document
xbrl_resp = requests.get(xbrl_link)
xbrl_str = xbrl_resp.text

# Find and print stockholder's equity
soup = BeautifulSoup(xbrl_str, 'lxml')
tag_list = soup.find_all()
for tag in tag_list:
    if tag.name == 'us-gaap:stockholdersequity':
        print("Stockholder's equity: " + tag.text)



---



##3.b `python-xbrl` Simple XBRL Parsing Example     

See the [`python-xbrl` documentation](https://pypi.org/project/python-xbrl/) for more information.    

*note that [there are some dangling segments of code in the `python-xbrl` library](https://stackoverflow.com/questions/33963598/parsing-xbrl-using-python) which does does not appear to have been actively maintained over the years*

*note: this section uses the following example `XBRL .xml` files:* 

* sam-20131228.xml    
* sam-20140927.xml    

*each file may be accessed from Professor Patrick's Github using the following statements, respectively:*    


```
!curl https://raw.githubusercontent.com/ProfessorPatrickSlatraigh/data/main/sam-20131228.xml -o "sam-20131228.xml"

!curl https://raw.githubusercontent.com/ProfessorPatrickSlatraigh/data/main/sam-20140927.xml -o "sam-20140927.xml"
```



In [None]:
#Housekeeping to load example files to current working directory    
!curl https://raw.githubusercontent.com/ProfessorPatrickSlatraigh/data/main/sam-20131228.xml -o "sam-20131228.xml"

!curl https://raw.githubusercontent.com/ProfessorPatrickSlatraigh/data/main/sam-20140927.xml -o "sam-20140927.xml"

In [27]:
# Housekeeping to install python-xbrl 
!pip install python-xbrl    

# Housekeeping to import required modules 
from xbrl import XBRLParser, GAAP, GAAPSerializer, DEISerializer

# Housekeeping to import prettyprint (pprint)
import pprint




###Simple XBRL Parsing Workflow    

First parse the incoming `XRBL` file into a new `XBRL` basic object    



In [18]:
xbrl_parser = XBRLParser()
xbrl = xbrl_parser.parse("sam-20131228.xml")


Then you can parse the document using different parsers


In [19]:
gaap_obj = xbrl_parser.parseGAAP(xbrl, doc_date="20131228", context="current", ignore_errors=0)


Now we have a GAAP model object that has the GAAP parsed elements from the document.    

This model object supports the several different features including:    

* **context current year**, and **instance contexts** are supported. If available you can also get **previous quarter** information by number of days from doc date. Example: 90, 180, etc.    
* **Error handling**  
    0. raise exception for all parsing errors and halt parsing
    1. Supress all parsing errors and continue parsing
    2. Log all parsing errors and continue parsing    

You can serialize the GAAP model object into a serialized object acceptable for rending into a standard format such as JSON or HTTP API. 
   

In [20]:
serializer = GAAPSerializer()
result = serializer.dump(gaap_obj)

You can also just view the data in the serialized object

In [22]:
print(result)    


{'income_loss': 29120.0, 'equity_attributable_interest': 0.0, 'net_cash_flows_financing_continuing': 0.0, 'net_cash_flows_investing': 0.0, 'extraordary_items_gain_loss': 0.0, 'comprehensive_income': 0.0, 'nonoperating_income_loss': 0.0, 'net_income_loss_noncontrolling': 0.0, 'other_comprehensive_income': 0.0, 'redeemable_noncontrolling_interest': 0.0, 'common_shares_outstanding': 0.0, 'current_liabilities': 0.0, 'current_assets': 0.0, 'non_current_assets': 9556.0, 'net_cash_flows_operating_discontinued': 0.0, 'net_cash_flows_investing_discontinued': 0.0, 'interest_and_debt_expense': 0.0, 'commitments_and_contingencies': 0.0, 'gross_profit': 104628.0, 'costs_and_expenses': 0.0, 'income_before_equity_investments': 0.0, 'net_cash_flows_investing_continuing': 0.0, 'net_cash_flows_operating_continuing': 0.0, 'other_operating_income': 0.0, 'operating_expenses': 0.0, 'equity_attributable_parent': 0.0, 'cost_of_revenue': 0.0, 'noncurrent_liabilities': 0.0, 'net_cash_flows_discontinued': 0.0, '

In [25]:
# import pprint    

pprint.pprint(result)

{'assets': 1050.0,
 'commitments_and_contingencies': 0.0,
 'common_shares_authorized': 0.0,
 'common_shares_issued': 0.0,
 'common_shares_outstanding': 0.0,
 'comprehensive_income': 0.0,
 'comprehensive_income_interest': 0.0,
 'comprehensive_income_parent': 0.0,
 'cost_of_revenue': 0.0,
 'costs_and_expenses': 0.0,
 'current_assets': 0.0,
 'current_liabilities': 0.0,
 'equity': 0.0,
 'equity_attributable_interest': 0.0,
 'equity_attributable_parent': 0.0,
 'extraordary_items_gain_loss': 0.0,
 'gross_profit': 104628.0,
 'income_before_equity_investments': 0.0,
 'income_from_equity_investments': 0.0,
 'income_loss': 29120.0,
 'income_tax_expense_benefit': 0.0,
 'interest_and_debt_expense': 0.0,
 'liabilities': 104377.0,
 'liabilities_and_equity': 69900.0,
 'net_cash_flows_discontinued': 0.0,
 'net_cash_flows_financing': 0.0,
 'net_cash_flows_financing_continuing': 0.0,
 'net_cash_flows_investing': 0.0,
 'net_cash_flows_investing_continuing': 0.0,
 'net_cash_flows_investing_discontinued': 

You can apply various parsers to the base `XBRLParser` object to get different data than just `GAAP` data from the document. In addition as expected you can also create different serialized objects on the resulting parsed data object.

###Extracting DEI Data    


In [29]:
dei_obj = xbrl_parser.parseDEI(xbrl)
serializer = DEISerializer()
result = serializer.dump(dei_obj)
pprint.pprint(result)

{'company_name': 'BOSTON BEER CO INC',
 'public_float': 1465900000.0,
 'shares_outstanding': 3827355.0,
 'trading_symbol': 'SAM'}


###Extracting Custom Data    


In [33]:
custom_obj = xbrl_parser.parseCustom(xbrl)
pprint.pprint(custom_obj())

dict_items([('stockrepurchaseprogramcumulativenumberofsharesrepurchased', '10696731'), ('stockrepurchaseprogramcumulativesharesrepurchasedvalue', '269943000'), ('sharebasedcompensationarrangementbysharebasedpaymentawardexpectedfuturecompensationyearone', '4388000'), ('sharebasedcompensationarrangementbysharebasedpaymentawardexpectedfuturecompensationyearfour', '1215000'), ('recordedunconditionalpurchaseobligationdueinsecondandthirdyear', '22323000'), ('sharebasedcompensationarrangementbysharebasedpaymentawardexpectedfuturecompensationyearthree', '2052000'), ('recordedunconditionalpurchaseobligationdueinfourthandfifthyear', '5188000'), ('sharebasedcompensationarrangementbysharebasedpaymentawardexpectedfuturecompensationthereafter', '830000'), ('internationalsalespercentageoftotalsales', '0.04'), ('definedbenefitplanbasisusedtodetermineoverallexpectedlongtermrateofreturnsonassetsassumption', '0.07'), ('accrueddepositscurrent', '15805000'), ('sharebasedcompensationarrangementbysharebasedp



---

