# XBRL US API - Working with Dimensions and Extension - Python Example  
This notebook contains example Python code to use the XBRL US Application Programming Interface (API) (https://xbrl.us/home/use/xbrl-api/)    
  
**Made by:** [Ties de Kok](https://www.tiesdekok.com), [Beth Blankespoor](https://foster.uw.edu/faculty-research/directory/elizabeth-blankespoor/), and [Roman Chychyla](https://people.miami.edu/profile/rxc303@miami.edu)

## API documentation

The documentation on XBRL US API is available here:

https://xbrlus.github.io/xbrl-api/#/Facts/getFactDetails

## Imports

First, import (load) supporting python libraries

In [None]:
import os, re, sys # to work with operating system and text 
import json # to read a popular data representation format, JSON
import requests # to handle HTTP (web) requests
import pandas as pd # for tabular manipulation and computation
import numpy as np # for numerical computations
import getpass # to (interactively) request password input for a user 

## Generating an access token

Similarly to the first example, to access XBRL US API, we need to use an access token (for user authentication purposes). Access tokens can be requested at XBRL US website (after registration).

For this session we will create a temporary access token (for demo purposes only). Input your email address when asked. This access token will expire after 60 minutes.

In [None]:
CREDENTIAL_TYPE = 'TEMP'

Alternatively, you can obtain your own credentials here: https://xbrl.us/home/use/xbrl-api-community/#provisioning

After that, if you use this script on your own computer, we recommend using the JSON file as described in option (a) below. If you are using Binder, we recommend using option (b).

>**Option a:**
>1. Update 'login_cred.json' with your `client_id`, `client_secret`, `username`, and `password`;
>2. Set `CREDENTIAL_TYPE` to `LOCAL`;
>3. Input your password when asked.

>**Option b:**
>1. Set `CREDENTIAL_TYPE` to `CLOUD`;
>2. Input your details when asked,



In [None]:
# the following code generates an access token

if CREDENTIAL_TYPE == 'TEMP':
    user_email = input(prompt="Please type your email address here: ")
    access_token = requests.get('https://tdekok-xbrlapi.builtwithdark.com/gettoken?platform=aaa-{}'.format(user_email)).text.replace('"', "")
elif CREDENTIAL_TYPE in ['LOCAL', 'CLOUD']:
    endpoint = 'https://api.xbrl.us'
    endpoint_auth = endpoint + '/oauth2/token'
    
    if CREDENTIAL_TYPE == 'LOCAL':
        with open('login_cred.json', 'r') as f:
            login_cred = json.loads(f.read())
            client_id = login_cred['client_id']
            client_secret = login_cred['client_secret']
            username = login_cred['username']
            password = login_cred['password']
            
    else:
        client_id = input(prompt='Please input your client id here:')
        client_secret = getpass.getpass(prompt = 'Please input your client secret here:')
        username = input(prompt='Please input your username here:')
        
#     password = getpass.getpass(prompt = 'Password: ')
    
    body_auth = {'grant_type' : 'password', 
                'client_id' : client_id, 
                'client_secret' : client_secret, 
                'username' : username,
                'password' : password,
                'platform' : 'uw-ipynb'}
    res = requests.post(endpoint_auth, data=body_auth)
    auth_json = res.json()
    access_token = auth_json['access_token']
else:
    print('Invalid credential type! Use TEMP, LOCAL, or CLOUD. See the instructions above.')

## Making an API Query

We will use the function defined in the previous example to generate an API query to request XBRL data from XBRL US.

The Python function below generates a query to XBRL US and returns results in a tabular format. The function has the following input arguments:

- access_token - the token we generated in the previous step;
- firm_ciks - a list of company central index keys (CIKs);
- years - a list of data years (time period);
- report_types - a list of report types to consider (e.g., 10-K or 10-Q);
- get extensions - has to be set to either:
    - 'FALSE' - output XBRL elements that are not extensions;
    - 'TRUE' - output only extension XBRL. The argument xbrl_elements list will be ignored;
- xbrl_elements - a list of XBRL elements (e.g., NetIncomeLoss for net income). This list will be ignored if get_extensions is set to 'TRUE';
- with_dimensions - has to be set to either:
    - 'FALSE' - output XBRL elements *without* dimensions only;
    - 'TRUE' - output XBRL elements *with* dimensions only;
    - 'ALL' - output XBRL elements *with* and *without* dimensions;

In [None]:
def execute_query(access_token, firm_ciks, years, report_types, get_extensions, xbrl_elements, with_dimensions):
    
    # below is fields (variable) to be output by XBRL US API. 
    # this list can be modified if other/additional information is needed
    # see XBRL US API documentation for a list of all possible fields
    fields = ['entity.cik',
         'entity.name.sort(ASC)',
         'period.fiscal-year',
         'report.document-type',
         'report.filing-date',
         'concept.local-name',
         'fact.value',
         'dimensions.count', 
         'dimension.local-name.sort(ASC)',
         'member.local-name',
         'dts.id',
         'fact.id',
         'fact.has-dimensions'
         'period.instant',
         'concept.id',     
         'unit',
         'fact.decimals',
         'dimension.namespace',
         'member.namespace',
         ]
    
    search_endpoint = 'https://api.xbrl.us/api/v1/fact/search'


    params = {  
         'period.fiscal-period': 'Y',
         'period.fiscal-year': ','.join(years),
         'unit': 'USD',
         'entity.cik': ','.join(firm_ciks),
         'report.document-type': ','.join(report_types)
         }  
    
    if get_extensions == 'TRUE':
        params['concept.is-base'] = 'FALSE'
    else:
        params['concept.local-name'] =  ','.join(xbrl_elements)
    
    if with_dimensions == 'ALL':
        dimension_options = ['TRUE', 'FALSE']
    else:
        dimension_options = [with_dimensions]

    all_res_list = []
    for dimensions_param in dimension_options:
        print('Getting the data for: "fact.has-dimensions" = {}'.format(dimensions_param))
        ### Every request will return a max of 2000 results. So we loop until all results are retrieved. 
        done_retrieving_all_results = False
        offset = 0
        while not done_retrieving_all_results:
            params['fact.has-dimensions'] = dimensions_param
            params['fields'] = ','.join(fields) + ',fact.offset({})'.format(offset) 
            res = requests.get(search_endpoint, params=params, headers={'Authorization' : 'Bearer {}'.format(access_token)})

            ## Interpret as JSON
            res_json = res.json()

            ## Get the results
            ### Retrieve the data list
            res_list = res_json['data']

            ### Add to the results
            all_res_list += res_list

            ## Pagination check
            paging_dict = res_json['paging']
            if paging_dict['count'] >= 2000:
                offset += paging_dict['count']
            else:
                done_retrieving_all_results = True

    ## convert to a DataFrame
    res_df = pd.DataFrame(all_res_list)
    ## remove duplicates; sometimes the same item is reported multiple times throughout the document
    res_df.drop_duplicates(subset = ['entity.name','period.fiscal-year', 'report.filing-date', 'concept.local-name', 'dimension.local-name', 'member.local-name', 'fact.value'], inplace = True)
    ## sort data
    res_df = res_df.sort_values(by=['entity.name','period.fiscal-year','report.filing-date','concept.local-name','dimension.local-name']).reset_index(drop = True)
    ## reorder table columns
    first_columns = ['entity.name', 'period.fiscal-year', 'report.filing-date', 'concept.local-name', 'fact.value', 'unit','dimension.local-name', 'member.local-name','dimensions.count']
    columns = first_columns + [c for c in res_df.columns if c not in first_columns]
    res_df = res_df[columns]
    print('\nNumber of results that meet the criteria: {}'.format(len(res_df)))

    return res_df

## Example: Getting Revenue related to Services from XBRL 10-K filings

XBRL reporting format allows extraction of granular data from firms' financial statements. Below, we demonstrate how to extract data on Revenues that pertains to services (i.e., and not related to products, leases, etc.).

### Define the companies you'd like

We will consider three firms for this example.

In [None]:
firm_ciks =     [
                 '0000320193', ## Apple (AAPL)  
                 '0001067983', ## Berkshire Hathaway(BRK)
                 '0001318605', ## Tesla (TSLA)
                ]

### Define the years you'd like

In [None]:
years = ['2020'] ## Use commas between for multiple years, e.g., '2018','2019','2020'
#years = [str(2013 + i) for i in range(8)] ## Years 2013 to 2020

### Specify whether to output extension XBRL elements

In [None]:
get_extensions = 'FALSE'

### Specify the report types that you want

In [None]:
report_types = ['10-K', '10-K/A']

### Define the XBRL elements (tags) you'd like 

Revenue from services is typically reported using the new standard (Accounting Standards Update No. 2014-09, *Revenue from Contracts with Customers* (Topic 606)). As before, we will use the *RevenueFromContractWithCustomerExcludingAssessedTax* and *RevenueFromContractWithCustomerIncludingAssessedTax* concepts from FASB's XBRL Financial Reporting Taxonomy.

In [None]:
xbrl_elements = [
     'RevenueFromContractWithCustomerExcludingAssessedTax',
     'RevenueFromContractWithCustomerIncludingAssessedTax'
                ]

### Specify if you want dimensions, no dimension, or all values

Unlike the previous example where the objective was to extract Total Revenue data, this time we want to focus on a component of Total Revenue: Revenue from Services. Revenue from Services (and other Revenue items) are typically reported using XBRL dimensions that specify that the scope of the revenue-related concept is services only. Therefore, this time we will request XBRL data with dimensions.

In [None]:
with_dimensions = 'TRUE'  ## TRUE for require dimensions, FALSE for no dimensions, ALL for all values

### Execute query

Let us execute an XBRL US API query to retrieve Revenue from Contract With Customers concepts with dimensions. The result will be saved to *res_df* pandas dataframe.

In [None]:
res_df = execute_query(access_token, firm_ciks, years, report_types, get_extensions, xbrl_elements, with_dimensions)

### Display results

Next, we will output the results. To facilitate table display, we will "hide" some columns when displaying the results and limit the output to 40 records. Note that this time we will have more records in the output of the query because we requested concepts with dimensions.

In [None]:
#choose which columns to hide
columns_to_hide = ['entity.cik', 'fact.decimals', 'dimension.namespace', 'member.namespace']
columns_to_show = [column for column in res_df.columns if column not in columns_to_hide]

#display the first 40 results
res_df[columns_to_show].head(40)

### Filtering results on the values of dimensions

In the table above, Revenues from Services are reported using Revenue From Contract With Customers concepts with the following dimension: *ProductOrServiceAxis* axis having *ServiceMember* member. Therefore, we can simply filter the results to include that specific dimension:



In [None]:
# filter results
res_df_filtered = res_df[(res_df['dimension.local-name'] == 'ProductOrServiceAxis') 
                         & (res_df['member.local-name'] == 'ServiceMember')]

# display the first 40 results
res_df_filtered[columns_to_show].head(40)

For the three firms in our example, Apple, Berkshire Hathaway, and Tesla, we were able to extract one Revenue from Services item for Apple, multiple items for Berkshire Hathaway, and no items for Tesla. 

First, let us consider the case of Berkshire Hathaway. Variable *fact.id* in the above table is a unique identifier of a reported fact in an XBRL document. To see the details about the fact in the second record from the table above, we can simply filter the main results table, res_df, to only include records with the *fact.id* of 256371762.

In [None]:
res_df[res_df['fact.id'] == 256371762]

It appears that the Revenue from Services concept with the *fact.id* of 256371762 actually has two dimensions: one to indicate that this is revenue from services, and the other one to indicate that it is related to the energy business segment (only). The variable *dimensions.count* reports the number of dimensions a given fact has, which is equal to two in this case. Therefore, to extract Revenue from Services across all business segments, we simply need to identify the Revenue from Services that has only one dimension (i.e., the dimension to specify that the given revenue item is related to services).

In [None]:
# filter on dimension value and dimension count
res_df_filtered = res_df[(res_df['dimension.local-name'] == 'ProductOrServiceAxis') & (res_df['member.local-name'] == 'ServiceMember') & (res_df['dimensions.count'] == 1)]

# display the first 40 results
res_df_filtered[columns_to_show].head(40)

Now, let us consider the Income Statement from Tesla's fiscal year 2020 10-K filing:

https://www.sec.gov/ix?doc=/Archives/edgar/data/1318605/000156459021004599/tsla-10k_20201231.htm

Tesla's Total Revenues of \\$31,536m is comprised of four items represented by the following XBRL facts:

- Automative sales, \\$26,184m (*RevenueFromContractWithCustomerExcludingAssessedTax* with *tsla:AutomotiveSegmentMember* extension member in product/service axis (dimension));
- Automotive leasing, \\$1,052m (*OperatingLeasesIncomeStatementLeaseRevenue*);
- Energy generation and storage, \\$1,994m (*RevenueFromContractWithCustomerExcludingAssessedTax* with *tsla:EnergyGenerationAndStorageSegmentMember* extension member in product/service axis (dimension));
- Services and other, \\$2,306m (*tsla:SalesRevenueServicesAndOtherNet* (extension XBRL element));

The Revenue from Services is bundled with Other Revenue and is reported using a (custom) **XBRL extension concept**, *tsla:SalesRevenueServicesAndOtherNet*. 

Therefore, to extract this revenue item, we have to search for an extension XBRL concept that contains keywords such as "Revenue" and "Service" in its label. First, we will extract all extension concepts from the filings.

In [None]:
# change get_extensions to 'TRUE'
get_extensions = 'TRUE'

# in most cases for Revenue from Services, we don't need to consider dimensions if extensions are used instead; however, this will largely depend on the concept of interest and individual firm's reporting practices
with_dimensions = 'FALSE'

res_df_extensions = execute_query(access_token, firm_ciks, years, report_types, get_extensions, xbrl_elements, with_dimensions)

#display the first 40 results
res_df_extensions[columns_to_show].head(40)

Now we can filter extension elements by searching for keywords in their labels

In [None]:
# keep only extensions that include words 'Service' and 'Revenue' in their element names
res_df_keywords = res_df_extensions[res_df_extensions['concept.local-name'].str.contains('Service') \
                  & res_df_extensions['concept.local-name'].str.contains('Revenue')]

res_df_keywords[columns_to_show].head(40)

### Merge results and output them to csv

Finally, we can merge results of a queries with standard and extensions XBRL concepts into one table and save the results to a CSV file.

In [None]:
output_df

In [None]:
# merge results
output_df = pd.concat([res_df_filtered,res_df_keywords])

# save to the CSV format
output_df.to_csv('XBRL_Revenues_from_Services.csv')     

## If running in Binder, click on the Jupyter icon/name in the upper left corner to see your files, 
## select the file you want, and click Download.