# XBRL US API - Exercise - Revenues related to Products
This notebook contains example Python code to use the XBRL US Application Programming Interface (API) (https://xbrl.us/home/use/xbrl-api/)    
  
**Made by:** [Ties de Kok](https://www.tiesdekok.com), [Beth Blankespoor](https://foster.uw.edu/faculty-research/directory/elizabeth-blankespoor/), and [Roman Chychyla](https://people.miami.edu/profile/rxc303@miami.edu)

## API documentation

The documentation on XBRL US API is available here:

https://xbrlus.github.io/xbrl-api/#/Facts/getFactDetails

## Imports

First, import (load) supporting python libraries

In [1]:
import os, re, sys # to work with operating system and text 
import json # to read a popular data representation format, JSON
import requests # to handle HTTP (web) requests
import pandas as pd # for tabular manipulation and computation
import numpy as np # for numerical computations
import getpass # to (interactively) request password input for a user 

## Generating an access token

Similarly to the first example, to access XBRL US API, we need to use an access token (for user authentication purposes). Access tokens can be requested at XBRL US website (after registration).

For this session we will create a temporary access token (for demo purposes only). Input your email address when asked. This access token will expire after 60 minutes.

In [2]:
CREDENTIAL_TYPE = 'LOCAL'

Alternatively, you can obtain your own credentials here: https://xbrl.us/home/use/xbrl-api-community/#provisioning

After that, if you use this script on your own computer, we recommend using the JSON file as described in option (a) below. If you are using Binder, we recommend using option (b).

>**Option a:**
>1. Update 'login_cred.json' with your `client_id`, `client_secret`, `username`, and `password`;
>2. Set `CREDENTIAL_TYPE` to `LOCAL`;
>3. Input your password when asked.

>**Option b:**
>1. Set `CREDENTIAL_TYPE` to `CLOUD`;
>2. Input your details when asked,



In [3]:
# the following code generates an access token

if CREDENTIAL_TYPE == 'TEMP':
    user_email = input(prompt="Please type your email address here: ")
    access_token = requests.get('https://tdekok-xbrlapi.builtwithdark.com/gettoken?platform=aaa-{}'.format(user_email)).text.replace('"', "")
elif CREDENTIAL_TYPE in ['LOCAL', 'CLOUD']:
    endpoint = 'https://api.xbrl.us'
    endpoint_auth = endpoint + '/oauth2/token'
    
    if CREDENTIAL_TYPE == 'LOCAL':
        with open('login_cred.json', 'r') as f:
            login_cred = json.loads(f.read())
            client_id = login_cred['client_id']
            client_secret = login_cred['client_secret']
            username = login_cred['username']
            password = login_cred['password']
            
    else:
        client_id = input(prompt='Please input your client id here:')
        client_secret = getpass.getpass(prompt = 'Please input your client secret here:')
        username = input(prompt='Please input your username here:')
        
#     password = getpass.getpass(prompt = 'Password: ')
    
    body_auth = {'grant_type' : 'password', 
                'client_id' : client_id, 
                'client_secret' : client_secret, 
                'username' : username,
                'password' : password,
                'platform' : 'uw-ipynb'}
    res = requests.post(endpoint_auth, data=body_auth)
    auth_json = res.json()
    access_token = auth_json['access_token']
else:
    print('Invalid credential type! Use TEMP, LOCAL, or CLOUD. See the instructions above.')

## Making an API Query

We will use the function defined in the previous example to generate an API query to request XBRL data from XBRL US.

The Python function below generates a query to XBRL US and returns results in a tabular format. The function has the following input arguments:

- access_token - the token we generated in the previous step;
- firm_ciks - a list of company central index keys (CIKs);
- years - a list of data years (time period);
- report_types - a list of report types to consider (e.g., 10-K or 10-Q);
- get extensions - has to be set to either:
    - 'FALSE' - output XBRL elements that are not extensions;
    - 'TRUE' - output only extension XBRL. The argument xbrl_elements list will be ignored;
- xbrl_elements - a list of XBRL elements (e.g., NetIncomeLoss for net income). This list will be ignored if get_extensions is set to 'TRUE';
- with_dimensions - has to be set to either:
    - 'FALSE' - output XBRL elements *without* dimensions only;
    - 'TRUE' - output XBRL elements *with* dimensions only;
    - 'ALL' - output XBRL elements *with* and *without* dimensions;

In [4]:
def execute_query(access_token, firm_ciks, years, report_types, get_extensions, xbrl_elements, with_dimensions):
    
    # below is fields (variable) to be output by XBRL US API. 
    # this list can be modified if other/additional information is needed
    # see XBRL US API documentation for a list of all possible fields
    fields = ['entity.cik',
         'entity.name.sort(ASC)',
         'dts.id',
         'fact.id',
         'report.filing-date',
         'period.fiscal-year',
         'period.instant',
         'report.document-type',
         'concept.id',
         'concept.local-name',
         'dimensions.count',
         'dimension.local-name.sort(ASC)',
         'member.local-name',
         'fact.value',
         'unit',
         'fact.decimals',
         'dimension.namespace',
         'member.namespace',
          'fact.has-dimensions'
         ]
    
    search_endpoint = 'https://api.xbrl.us/api/v1/fact/search'


    params = {  
         'period.fiscal-period': 'Y',
         'period.fiscal-year': ','.join(years),
         'unit': 'USD',
         'entity.cik': ','.join(firm_ciks),
         'report.document-type': ','.join(report_types)
         }  
    
    if get_extensions == 'TRUE':
        params['concept.is-base'] = 'FALSE'
    else:
        params['concept.local-name'] =  ','.join(xbrl_elements)
    
    if with_dimensions == 'ALL':
        dimension_options = ['TRUE', 'FALSE']
    else:
        dimension_options = [with_dimensions]

    all_res_list = []
    for dimensions_param in dimension_options:
        print('Getting the data for: "fact.has-dimensions" = {}'.format(dimensions_param))
        ### Every request will return a max of 2000 results. So we loop until all results are retrieved. 
        done_retrieving_all_results = False
        offset = 0
        while not done_retrieving_all_results:
            params['fact.has-dimensions'] = dimensions_param
            params['fields'] = ','.join(fields) + ',fact.offset({})'.format(offset) 
            res = requests.get(search_endpoint, params=params, headers={'Authorization' : 'Bearer {}'.format(access_token)})

            ## Interpret as JSON
            res_json = res.json()

            ## Get the results
            ### Retrieve the data list
            res_list = res_json['data']

            ### Add to the results
            all_res_list += res_list

            ## Pagination check
            paging_dict = res_json['paging']
            if paging_dict['count'] >= 2000:
                offset += paging_dict['count']
            else:
                done_retrieving_all_results = True

    ## convert to a DataFrame
    res_df = pd.DataFrame(all_res_list)
    ## remove duplciates; sometimes the same item is reported multiple times throughout the document
    res_df.drop_duplicates(subset = ['entity.name','period.fiscal-year', 'concept.local-name', 'dimension.local-name', 'member.local-name', 'fact.value'], inplace = True)
    ## sort data
    res_df = res_df.sort_values(by=['entity.name','dts.id','concept.local-name','dimension.local-name']).reset_index(drop = True)
    ## reorder table columns
    first_columns = ['entity.name', 'period.fiscal-year', 'report.filing-date', 'concept.local-name', 'fact.value', 'unit','dimension.local-name', 'member.local-name','dimensions.count']
    columns = first_columns + [c for c in res_df.columns if c not in first_columns]
    res_df = res_df[columns]
    
    print('\nNumber of results that meet the criteria: {}'.format(len(res_df)))

    return res_df

## Exercise: Getting Revenue related to Products from XBRL 10-K filings

### Define the companies you'd like

We will consider three firms for this example.

In [5]:
firm_ciks =     [
                 '0001018724', ## Amazon (AMZN)              
                 '0000354950', ## Home Depot (HD)
                 '0000731766', ## UnitedHealth Group (UNH)
                ]

### Define the years you'd like

In [6]:
years = ['2020'] 

### Specify whether to output extension XBRL elements

In [7]:
get_extensions = 'FALSE'

### Specify the report types that you want

In [8]:
report_types = ['10-K', '10-K/A']

### Define the XBRL elements (tags) you'd like 

In [9]:
xbrl_elements = [
     'RevenueFromContractWithCustomerExcludingAssessedTax',
     'RevenueFromContractWithCustomerIncludingAssessedTax'
                ]

### Specify if you want dimensions, no dimension, or all values

Revenue from Products  are typically reported using XBRL dimensions that specify that the scope of the revenue-related concept is products only. Therefore, this time we will request XBRL data with dimensions.

In [10]:
with_dimensions = 'TRUE'  ## TRUE for require dimensions, FALSE for no dimensions, ALL for all values

### Execute query

Let us execute an XBRL US API query to retrieve Revenue from Contract With Customers concepts with dimensions. The result will be saved to *res_df* pandas dataframe.

In [11]:
res_df = execute_query(access_token, firm_ciks, years, report_types, get_extensions, xbrl_elements, with_dimensions)

Getting the data for: "fact.has-dimensions" = TRUE

Number of results that meet the criteria: 37


### Display results

In [12]:
#choose which columns to hide
columns_to_hide = ['entity.cik', 'fact.decimals', 'dimension.namespace', 'member.namespace']
columns_to_show = [column for column in res_df.columns if column not in columns_to_hide]

#display the first 40 results
res_df[columns_to_show].head(20)

Unnamed: 0,entity.name,period.fiscal-year,report.filing-date,concept.local-name,fact.value,unit,dimension.local-name,member.local-name,dimensions.count,dts.id,fact.id,period.instant,report.document-type,concept.id,fact.has-dimensions
0,"AMAZON.COM, INC.",2020,2021-02-03,RevenueFromContractWithCustomerExcludingAssess...,25207000000,USD,ProductOrServiceAxis,SubscriptionServicesMember,1,436525,251631713,,10-K,27974096,True
1,"AMAZON.COM, INC.",2020,2021-02-03,RevenueFromContractWithCustomerExcludingAssess...,45370000000,USD,ProductOrServiceAxis,AmazonWebServicesMember,1,436525,251631939,,10-K,27974096,True
2,"AMAZON.COM, INC.",2020,2021-02-03,RevenueFromContractWithCustomerExcludingAssess...,215915000000,USD,ProductOrServiceAxis,ProductMember,1,436525,251631481,,10-K,27974096,True
3,"AMAZON.COM, INC.",2020,2021-02-03,RevenueFromContractWithCustomerExcludingAssess...,170149000000,USD,ProductOrServiceAxis,ServiceMember,1,436525,251631397,,10-K,27974096,True
4,"AMAZON.COM, INC.",2020,2021-02-03,RevenueFromContractWithCustomerExcludingAssess...,80461000000,USD,ProductOrServiceAxis,ThirdPartySellerServicesMember,1,436525,251632135,,10-K,27974096,True
5,"AMAZON.COM, INC.",2020,2021-02-03,RevenueFromContractWithCustomerExcludingAssess...,16227000000,USD,ProductOrServiceAxis,PhysicalStoresMember,1,436525,251630845,,10-K,27974096,True
6,"AMAZON.COM, INC.",2020,2021-02-03,RevenueFromContractWithCustomerExcludingAssess...,197346000000,USD,ProductOrServiceAxis,OnlineStoresMember,1,436525,251631384,,10-K,27974096,True
7,"AMAZON.COM, INC.",2020,2021-02-03,RevenueFromContractWithCustomerExcludingAssess...,21453000000,USD,ProductOrServiceAxis,OtherServicesMember,1,436525,251631366,,10-K,27974096,True
8,"AMAZON.COM, INC.",2020,2021-02-03,RevenueFromContractWithCustomerExcludingAssess...,104412000000,USD,StatementBusinessSegmentsAxis,InternationalSegmentMember,1,436525,251631324,,10-K,27974096,True
9,"AMAZON.COM, INC.",2020,2021-02-03,RevenueFromContractWithCustomerExcludingAssess...,236282000000,USD,StatementBusinessSegmentsAxis,NorthAmericaSegmentMember,1,436525,251632044,,10-K,27974096,True


### Filtering results on the values of dimensions

In the table above, Revenues from Products are reported using Revenue From Contract With Customers concepts with the following dimension: *ProductOrServiceAxis* axis having *ProductMember* member. Therefore, we can simply filter the results to include that specific dimension:



In [13]:
# filter results to include only Revenues From Products
res_df_filtered = res_df[(res_df['dimension.local-name'] == 'ProductOrServiceAxis') & (res_df['member.local-name'] == 'ProductMember')]

# display the first 40 results
res_df_filtered[columns_to_show].head(40)

Unnamed: 0,entity.name,period.fiscal-year,report.filing-date,concept.local-name,fact.value,unit,dimension.local-name,member.local-name,dimensions.count,dts.id,fact.id,period.instant,report.document-type,concept.id,fact.has-dimensions
2,"AMAZON.COM, INC.",2020,2021-02-03,RevenueFromContractWithCustomerExcludingAssess...,215915000000,USD,ProductOrServiceAxis,ProductMember,1,436525,251631481,,10-K,27974096,True
31,"HOME DEPOT, INC.",2020,2021-03-24,RevenueFromContractWithCustomerExcludingAssess...,105194000000,USD,ProductOrServiceAxis,ProductMember,1,449867,259424926,,10-K,27974096,True


### Extracting Revenues from Products that are reported using XBRL extensions

UnitedHealth uses an extension to report Revenues from Products. We can extract this revenue item by searching for an extension XBRL concept that contains keywords such as "Revenue" and "Product" in its label. First, we will extract all extension concepts from the filings.

In [14]:
# change get_extensions to 'TRUE'
get_extensions = 'TRUE'

# in most cases for Revenue from Product, we don't need to consider dimensions if extensions are used instead
with_dimensions= 'FALSE'

# this time search for concepts with_dimensions
res_df_extensions = execute_query(access_token, firm_ciks, years, report_types, get_extensions, xbrl_elements, with_dimensions)

# keep only extensions that include words 'Product' and 'Revenue' in their element names
res_df_keywords = res_df_extensions[res_df_extensions['concept.local-name'].str.contains('Product') \
                  & res_df_extensions['concept.local-name'].str.contains('Revenue')]

res_df_keywords[columns_to_show].head(40)

Getting the data for: "fact.has-dimensions" = FALSE

Number of results that meet the criteria: 74


Unnamed: 0,entity.name,period.fiscal-year,report.filing-date,concept.local-name,fact.value,unit,dimension.local-name,member.local-name,dimensions.count,dts.id,fact.id,period.instant,report.document-type,concept.id,fact.has-dimensions
71,UnitedHealth Group Incorporated,2020,2021-03-01,SalesRevenueProductsNet,34145000000,USD,,,0,444354,256586218,,10-K,30955153,False


### Merge results and output them to csv

Finally, we can merge results of a queries with standard and extensions XBRL concepts into one table and save the results to a CSV file.

In [15]:
# merge results
output_df = pd.concat([res_df_filtered,res_df_keywords])

# save to the CSV format
output_df.to_csv('XBRL_Revenues_from_Products.csv')     

## If running in Binder, click on the Jupyter icon/name in the upper left corner to see your files, 
## select the file you want, and click Download.