# Readme

Overall aim is to get an understanding for how the EBA Taxonomy guidelines could be developed using Python logic. As future reporting requirements will seek to reconcile between existing regulatory reports e.g., FINREP, COREP, ESG etc., using the XBRL templates should facilitate appropriate validation rules to make sure that all data points are matching across reports. It will reduce the manual nature of peer reviews, which would seek to ensure accuracy.

Actions:
Use the SEC notebook to get data on a number of top companies that file returns e.g., Apple, NVIDIA and a few banks [JP Morgan; Bank of America] to understand how comparisons could be mapped out. 
One option is to create a data pipeline: [Raw data; reference tables to create XML / XBRL format; derive the conversion from raw to XBRL; produce the final output].

Additional background research links below:
- What it is: link https://en.wikipedia.org/wiki/XBRL 
- Introduction #1: https://www.xbrl.org/the-standard/what/an-introduction-to-xbrl/ 
- Comparison to XML: https://www.xbrl.org/showcase/xbrlcomparedtoxml-2005-07-09.pdf

### Prototype
Python code to extract the files from sec by using API.

Note that the API usage limit is 100 queries per month (reset at 1st of every month) for free accounts.

Key Takeaways
- Query API: Fetch the list of files available for the particular ticker (e.g., TSLA for TESLA), sorted in order of date. The file types available from SEC are Form 10-K, Form 10-Q, Form 8-K, the proxy statement, Forms 3,4, and 5, Schedule 13D, Form 144, and Foreign Investment Disclosures. The url from Query API is the url required to convert it from XBRL to JSON. E.g., https://www.sec.gov/Archives/edgar/data/1318605/000162828024002390/tsla-20231231.htm
- Sections: Extract the list of sections covered in the form extracted.
- Balance Sheet & Income Statement: User defined functions to extract information from balance sheet and income statement.

### Code Begins

Before starting, you should apply your own API key at https://sec-api.io/. It will be used to request returns from sec.

In [None]:
API_KEY = '...'

In [None]:
import json
import pandas as pd
from sec_api import XbrlApi

# Query API

In [None]:
from sec_api import QueryApi

queryApi = QueryApi(api_key=API_KEY)

query = {
  "query": "ticker:TSLA AND formType:\"10-K\"",
  "from": "0",
  "size": "10",
  "sort": [{ "filedAt": { "order": "desc" } }]    
}

response = queryApi.get_filings(query)

In [None]:
metadata = pd.DataFrame.from_records(response['filings'])
metadata

In [None]:
#Get url of filings
url = metadata['linkToFilingDetails'][0]
url

# XBRL-to-JSON

In [None]:
#Manually input htm URL
# url="https://www.sec.gov/Archives/edgar/data/1318605/000156459021004599/tsla-10k_20201231.htm"

In [None]:
xbrl_json = XbrlApi(API_KEY).xbrl_to_json(htm_url=url)

In [None]:
# json_file_path = '.../Tesla Inc 12312023.json'
#code to save json_file
# with open(json_file_path, 'w') as json_file:
    # json.dump(xbrl_json, json_file, indent=4)

In [None]:
# tesla_12312023 = open(json_file_path)

In [None]:
# Read full json - Warning: This will be long
# json.load(tesla_12312023)

## Sections
Warning! The section list output will be a long list.

In [None]:
def list_sections(json_data):
    sections = {}
    
    for key, value in json_data.items():
        if isinstance(value, dict):
            sections[key] = list(value.keys())
        else:
            sections[key] = []
    
    return sections

In [None]:
sections = list_sections(xbrl_json)
for section, sub_sections in sections.items():
    print(f"Section: {section}")

## Extract particular section
Incomplete. Still figuring how to change this to a proper format for every section

In [None]:
def extract_section(json_data, section_name):
    if section_name in json_data:
        return json_data[section_name]
    else:
        return None

In [None]:
b_s = extract_section(xbrl_json, "BalanceSheets")

In [None]:
b_s

## Balance Sheet

In [None]:
def get_balance_sheet(xbrl_json):
    balance_sheet_store = {}

    # Iterate over each US GAAP item in the balance sheet
    for usGaapItem in xbrl_json['BalanceSheets']:
        values = []
        indices = []

        for fact in xbrl_json['BalanceSheets'][usGaapItem]:
            # only consider items without segment. not required for our analysis.
            if 'segment' not in fact:
                index = fact['period']['instant']
                # ensure no index duplicates are created
                if index not in indices:
                    value = fact.get('value')
                    if value is not None:
                        values.append(value)
                        indices.append(index)                    

        balance_sheet_store[usGaapItem] = pd.Series(values, index=indices) 

    balance_sheet = pd.DataFrame(balance_sheet_store)
    # switch columns and rows so that US GAAP items are rows and each column header represents a date range
    return balance_sheet.T 

In [None]:
balance_sheet = get_balance_sheet(xbrl_json)
balance_sheet

## Income Statement

In [None]:
# convert XBRL-JSON of income statement to pandas dataframe
def get_income_statement(xbrl_json):
    income_statement_store = {}

    # iterate over each US GAAP item in the income statement
    for usGaapItem in xbrl_json['StatementsOfIncome']:
        values = []
        indicies = []

        for fact in xbrl_json['StatementsOfIncome'][usGaapItem]:
            # only consider items without segment. not required for our analysis.
            if 'segment' not in fact:
                index = fact['period']['startDate'] + '-' + fact['period']['endDate']
                # ensure no index duplicates are created
                if index not in indicies:
                    values.append(fact['value'])
                    indicies.append(index)                    

        income_statement_store[usGaapItem] = pd.Series(values, index=indicies) 

    income_statement = pd.DataFrame(income_statement_store)
    # switch columns and rows so that US GAAP items are rows and each column header represents a date range
    return income_statement.T 

In [None]:
income_statement = get_income_statement(xbrl_json)

income_statement