# Getting numeric financial information from XBRL data - Part II

This notebook contains example Python code to get numeric data available in XBRL financial reports using the [SEC's EDGAR XBRL API](https://www.sec.gov/edgar/sec-api-documentation).

**Made by:**  [Roman Chychyla](https://people.miami.edu/profile/rxc303@miami.edu)

## Example: Getting fundamentals XBRL data through EDGAR API using Python for multiple companies

Previously we extracted XBRL financial data for a single company. In this example, we will process all EDGAR API's XBRL JSON files in a given folder.

In [None]:
# python library to work with files and folders paths (locations)
from pathlib import Path

# specify path to folder with JSON files
input_xbrl_data_folder =  Path('./companyfacts')
# specify the output file
output_file = Path('./xbrl_numeric_data.xlsx')

# read all the files in the input folder
files = list(input_xbrl_data_folder.glob('*.json'))


We can "package" all steps in Part I into a Python function that will take a filer XBRL JSON file as input, and output a dataframe with the filers' XBRL numeric data:


In [None]:
# python library to work with JSON data
import json
# library to work with tabular data
import pandas as pd

def get_xbrl_data(file, period_type = 'all', convert_to_wide = False):
    """This functions takes an EDGAR API JSON file as an input, and outputs a Pandas dataframe with
    US GAAP XBRL numeric facts.
    Set period_type to 'annual' to output only annual data.
    Set convert_to_wide to True to convert the dataframe to the long format."""

    # set the output of the function to None (in case, the data extraction fails)
    output = None

    # load json file
    with open(file,'r') as f:
        json_data = json.load(f)

    # check if CIK is present in the json file; if not skip the file
    if 'cik' in json_data:
        # record CIK in a Python variable
        cik = json_data['cik']
        # create an empty list; it will be used to store dataframes for all XBRL tags present in the JSON data
        tag_dfs = []
        # check if there are any US GAAP XBRL facts
        if 'us-gaap' in json_data['facts']:
            # loop over all XBRL tags, and process them one by one
            for tag,details in json_data['facts']['us-gaap'].items():
                # consider monetary XBRL tags measures in U.S. Dollars
                if 'units' in details and 'USD' in details['units']:
                    # get all facts for the given tag
                    tag_facts = details['units']['USD']
                    # create dataframe
                    tag_df = pd.DataFrame(tag_facts)
                    # remove  duplicates
                    tag_df =  tag_df.sort_values('end',ascending=False).drop_duplicates(['fy','fp'])
                    # add CIK information
                    tag_df['cik'] = cik
                    # add tag name information
                    tag_df['tag'] = tag
                    # add the table to the list of tables
                    tag_dfs.append(tag_df)

            # check if the dataframe list is not empty
            if len(tag_dfs) > 0 :
                # if not, create one big table with all the data
                cik_df = pd.concat(tag_dfs)
                # keep only annual data, if requested
                if period_type == 'annual':
                    cik_df = cik_df[cik_df['fp'] == 'FY']
                # convert to wide, if requested
                if convert_to_wide:
                    cik_df = pd.pivot(cik_df, index = ['cik','fy','fp'], columns = 'tag', values = 'val').reset_index()
                # set output to the resulting dataframe
                output = cik_df

    return output

In [None]:
# Example: get all data in the long format

# get the first file; in Python counts start with 0
file = files[0]

get_xbrl_data(file)

In [None]:
# Example get annual-only data in the wide format
get_xbrl_data(file, period_type = 'annual', convert_to_wide=True)

Now we can apply the above function sequentially to JSON files in the input folder, and merge all resulting tables into one big table

In [None]:
# create a list to store filer-level tables
cik_tables = []
# loop over all the files
for f in files:
    # create a table with XBRL data for the given filer
    cik_xbrl_table = get_xbrl_data(f, period_type='annual', convert_to_wide=True)
    # if the table is not empty, add it to the list of XBRL tables
    if cik_xbrl_table is not None:
        cik_tables.append(cik_xbrl_table)

# merge all the filer tables into one big table
final_df = pd.concat(cik_tables).reset_index(drop=True)
# output the result
final_df.head(10)

In [None]:
# save the final table to an Excel file
final_df.to_excel(output_file, index = False)