## Download of company published financial statements using LSEG API

The Fundamental and Reference module provides the access to private and public company information via "TR."

- Statements include:
    - Balance sheets
    - Cash Flow
    - Income Statements
    
---
### Balance Sheets

In [1]:
import lseg.data as ld
import pandas as pd

In [2]:
# Read in company names to look up buy/sell recommendations for
sp400_companies = pd.read_csv("data/sp400_companies.csv", dtype={"CIK": str})
sp500_companies = pd.read_csv("data/sp500_companies.csv", dtype={"CIK": str})
sp600_companies = pd.read_csv("data/sp600_companies.csv", dtype={"CIK": str})

# Combine all CIKs into a single list
sp400_ciks = sp400_companies["CIK"].tolist()
sp500_ciks = sp500_companies["CIK"].tolist()
sp600_ciks = sp600_companies["CIK"].tolist()
ciks = sp400_ciks + sp500_ciks + sp600_ciks

In [3]:
# Start a session with the LSEG Data Platform (needs Refinitiv Workspace to be running in the background)
ld.open_session()

An error occurred while requesting URL('http://localhost:9010/api/status').
	ConnectError('[WinError 10061] Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte')


: 

: 

- First we download the Refinitiv Instrument Code (RIC) for every company, because apparently downloading balance sheets does not work with CIKs

In [None]:
rics = ld.get_data(
    universe = ciks,
    fields=[
        "TR.RIC"
    ],
)



In [None]:
rics.head()

Unnamed: 0,Instrument,RIC
0,1675149,AA.N
1,6201,AAL.OQ
2,824142,AAON.OQ
3,1520697,ACHC.OQ
4,1646972,ACI.N


In [2]:
# Save rics to a CSV file
#rics.to_csv("data/rics.csv", index=False)
rics = pd.read_csv("data/rics.csv", dtype={"Instrument": str})
rics.head()

Unnamed: 0,Instrument,RIC
0,1675149,AA.N
1,6201,AAL.OQ
2,824142,AAON.OQ
3,1520697,ACHC.OQ
4,1646972,ACI.N


In [3]:
# Convert RICs to a list
ric_list = rics["RIC"].tolist()

- Download of balance sheets

Unfortunately, when not specifying a certain report period or type e.g. 10-K or 10-Q, only data that had been submitted in yearly reports (10-Ks) is downloaded.
Therefore, for every given date/year (?) we have to specifically download the balance sheets from:
- The last fiscal quarter (FQ0)
- The previous fiscal quarter (FQ-1)
- Fiscal quarter minus two (FQ-2)
- Fiscal quarter minus 3 (FQ-3)

The same logic is applied later on when downloading cash flow and income statements.

In [98]:
# Initialize an empty list to store DataFrames
results = []
# Loop through each RIC and download the balance sheet data
for ric in ric_list: 
    # Download income statements for each RIC and every quarter separately
    try:
        df1 = ld.get_history(
            universe=[ric],
            start='2000-01-01',
            end='2025-01-01',
            fields=[
                "TR.F.BalanceSheet(Period = FQ0)",
                "TR.F.BalanceSheet(Period=FQ0).FccItemName",
            ],
        )
        # Add RIC column to the DataFrame to identify the company
        df1['RIC'] = ric
        df2 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.BalanceSheet(Period = FQ-1)",
                "TR.F.BalanceSheet(Period=FQ-1).FccItemName",
            ],
        )
        df2['RIC'] = ric
        df3 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.BalanceSheet(Period = FQ-2)",
                "TR.F.BalanceSheet(Period=FQ-2).FccItemName",
            ],
        )
        df3['RIC'] = ric
        df4 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.BalanceSheet(Period = FQ-3)",
                "TR.F.BalanceSheet(Period=FQ-3).FccItemName",
            ],
        )   
        df4['RIC'] = ric
    except:
        # If an error occurs, current RIC is skipped
        continue
    # Concatenate the DataFrames for each RIC
    dfs = pd.concat([df1, df2, df3, df4])
    # Append the concatenated DataFrame to the results list
    results.append(dfs)

# Concatenate all DataFrames in the results list into a single DataFrame
balance_sheets = pd.concat(results)

An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error o

In [99]:
# Work with a copy just to be safe
df = balance_sheets.copy()

# Reset index and rename to Date
df = df.reset_index().rename(columns={'index': 'Date'})

# Indicate statement type
df["Statement"] = "balance_sheet"

# Get rid of timestamp
df["Date"] = pd.to_datetime(df["Date"]).dt.date
df

Unnamed: 0,Date,STD Balance Sheet All,FCC Item Name,RIC,Statement
0,2011-09-30,,,AA.N,balance_sheet
1,2012-09-30,,,AA.N,balance_sheet
2,2013-09-30,,,AA.N,balance_sheet
3,2014-09-30,,,AA.N,balance_sheet
4,2016-06-30,332000000.0,TR.F.CashSTInvst,AA.N,balance_sheet
...,...,...,...,...,...
12237114,2023-03-31,556600000.0,TR.F.TotDebtExclIslamic,ZWS.N,balance_sheet
12237115,2023-03-31,223000000.0,TR.F.TradeAcctTradeNotesRcvblNetTot,ZWS.N,balance_sheet
12237116,2023-03-31,223300000.0,TR.F.CurrLiabExclCurrDebtTot,ZWS.N,balance_sheet
12237117,2023-03-31,607000000.0,TR.F.CurrAssetsExclCashSTInvstTot,ZWS.N,balance_sheet


In [100]:
# Save the data to a CSV file
df.to_csv("data/balance_sheets.csv", index=False)

In [7]:
ld.close_session() 

---
### Cash flow statements

In [2]:
rics = pd.read_csv("data/rics.csv", dtype={"RIC": str})
ric_list = rics["RIC"].tolist()

In [3]:
ld.open_session()

<lseg.data.session.Definition object at 0x1674e8f4950 {name='workspace'}>

In [None]:
rics = ld.get_data(
    universe = ciks,
    fields=[
        "TR.RIC"
    ],
)



In [5]:
# Save RIC to CSV
rics.to_csv("data/rics.csv", index=False)

In [6]:
# Convert RICs to a list
ric_list = rics["RIC"].tolist()

In [None]:
# Initialize an empty list to store DataFrames
results = []

# Loop through each RIC and download the balance sheet data
for i, ric in enumerate(ric_list):
    print(f"Processing RIC {i+1}/{len(ric_list)}: {ric}")
    # Download income statements for each RIC and every quarter separately
    try:
        df1 = ld.get_history(
            universe=[ric],
            start='2000-01-01',
            end='2025-01-01',
            fields=[
                "TR.F.CashFlowStatement(Period = FQ0)",
                "TR.F.CashFlowStatement(Period=FQ0).FccItemName",
            ],
        )
        # Add RIC column to the DataFrame to identify the company
        df2 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.CashFlowStatement(Period = FQ-1)",
                "TR.F.CashFlowStatement(Period=FQ-1).FccItemName",
            ],
        )
        df3 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.CashFlowStatement(Period = FQ-2)",
                "TR.F.CashFlowStatement(Period=FQ-2).FccItemName",
            ],
        )
        df4 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.CashFlowStatement(Period = FQ-3)",
                "TR.F.CashFlowStatement(Period=FQ-3).FccItemName",
            ],
        )   
    except:
        continue
    # Concatenate the DataFrames for each RIC
    dfs = pd.concat([df1, df2, df3, df4])
    dfs["RIC"] = ric  # Add RIC column to the concatenated DataFrame

    # Check if data was availablce, since balance sheet df lacks ~150 companies
    if dfs.empty:
        print(f"No data found for RIC {ric}.")
        continue

    # Append the concatenated DataFrame to the results list
    results.append(dfs)

# Concatenate all DataFrames in the results list into a single DataFrame
cash_flow_statements = pd.concat(results)

In [109]:
df = cash_flow_statements.copy()
# Reset index and rename to Date
df = df.reset_index().rename(columns={'index': 'Date'})

# Get rid of timestamp
df["Date"] = pd.to_datetime(df["Date"]).dt.date

# Create column to indicate type of statement
df["statement"] = "cashflow"
df

Unnamed: 0,Date,STD Cash Flow All,FCC Item Name,RIC,statement
0,2011-09-30,,,AA.N,cashflow
1,2012-09-30,,,AA.N,cashflow
2,2013-09-30,,,AA.N,cashflow
3,2014-09-30,,,AA.N,cashflow
4,2016-06-30,-19000000.0,TR.F.ProfLossStartingLineCF,AA.N,cashflow
...,...,...,...,...,...
4883991,2023-03-31,81000000.0,TR.F.CashDivPaidComStockBuybackNet,LNC.N,cashflow
4883992,2023-03-31,4000000.0,TR.F.ComStockBuybackNet,LNC.N,cashflow
4883993,2023-03-31,-774000000.0,TR.F.FreeCashFlowToEq,LNC.N,cashflow
4883994,2023-03-31,-876000000.0,TR.F.FOCF,LNC.N,cashflow


In [110]:
# Save to CSV
df.to_csv("data/cash_flow_statements.csv", index=False)

In [111]:
ld.close_session()

---
### Income statements

In [5]:
ld.open_session()

<lseg.data.session.Definition object at 0x22d72f11650 {name='workspace'}>

In [None]:
# Initialize an empty list to store DataFrames
results = []
empty_count = 0

# Loop through each RIC and download the balance sheet data
for i, ric in enumerate(ric_list):
    print(f"Processing RIC {i+1}/{len(ric_list)}: {ric}")
    # Download income statements for each RIC and every quarter separately
    try:
        df1 = ld.get_history(
            universe=[ric],
            start='2000-01-01',
            end='2025-01-01',
            fields=[
                "TR.F.IncomeStatement(Period = FQ0)",
                "TR.F.IncomeStatement(Period=FQ0).FccItemName",
            ],
        )
        # Add RIC column to the DataFrame to identify the company
        df2 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.IncomeStatement(Period = FQ-1)",
                "TR.F.IncomeStatement(Period=FQ-1).FccItemName",
            ],
        )
        df3 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.IncomeStatement(Period = FQ-2)",
                "TR.F.IncomeStatement(Period=FQ-2).FccItemName",
            ],
        )
        df4 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.IncomeStatement(Period = FQ-3)",
                "TR.F.IncomeStatement(Period=FQ-3).FccItemName",
            ],
        )   
    except:
        continue
    # Concatenate the DataFrames for each RIC
    dfs = pd.concat([df1, df2, df3, df4])
   
    dfs["RIC"] = ric 
    
    if dfs.empty:
        print(f"No data found for RIC {ric}.")
        empty_count += 1
        continue 
    
    # Append the concatenated DataFrame to the results list
    results.append(dfs)

# Concatenate all DataFrames in the results list into a single DataFrame
income_statements = pd.concat(results)

print(f"Number of RICs with no data: {empty_count}")

In [7]:
df = income_statements.copy()

# Reset index and rename to Date
df = df.reset_index().rename(columns={'index': 'Date'})

# Get rid of timestamp
df["Date"] = pd.to_datetime(df["Date"]).dt.date

# Create column to indicate type of statement
df["statement"] = "income_statement"
df

Unnamed: 0,Date,STD Income Statement All,FCC Item Name,RIC,statement
0,2011-09-30,,,AA.N,income_statement
1,2012-09-30,,,AA.N,income_statement
2,2013-09-30,,,AA.N,income_statement
3,2014-09-30,,,AA.N,income_statement
4,2016-06-30,2323000000.0,TR.F.RevGoodsSrvc,AA.N,income_statement
...,...,...,...,...,...
10868820,2023-03-31,29644164.03785,TR.F.TaxAdjOpInc,ZWS.N,income_statement
10868821,2023-03-31,303700000.0,TR.F.OpExpnExclNonCashChrgTot,ZWS.N,income_statement
10868822,2023-03-31,45600000.0,TR.F.IncAvailToComShrBefDeprAmort,ZWS.N,income_statement
10868823,2023-03-31,9600000.0,TR.F.FixedChrg,ZWS.N,income_statement


In [8]:
# Save the data to a CSV file
df.to_csv("data/income_statements.csv", index=False)

In [9]:
ld.close_session()

---

### Inspecting the data and downloading missing values

In [4]:
balance_sheets = pd.read_csv("data/balance_sheets.csv", dtype={"RIC": str})
income_statements = pd.read_csv("data/income_statements.csv", dtype={"RIC": str})
cash_flow_statements = pd.read_csv("data/cash_flow_statements.csv", dtype={"RIC": str})

In [5]:
balance_sheets["RIC"].nunique(), income_statements["RIC"].nunique(), cash_flow_statements["RIC"].nunique()

(1338, 1453, 1203)

We started with 1505 unique CIKs, for which 1499 unique RICs could be looked up. However, in some of the downloaded financial statements, almost 300 RICs appear to be missing.  
Therefore, after determining which RICs are missing, I will try to download the left out data in the cells below.

In [None]:
# Check which RICs are missing in each DataFrame
missing_balance_sheets = set(ric_list) - set(balance_sheets["RIC"].unique())
missing_cash_flow_statements = set(ric_list) - set(cash_flow_statements["RIC"].unique())
missing_income_statements = set(ric_list) - set(income_statements["RIC"].unique())

In [19]:
# Convert to list for download
missing_balance_sheets_list = list(missing_balance_sheets)
missing_cash_flow_statements_list = list(missing_cash_flow_statements)
missing_income_statements_list = list(missing_income_statements)

In [17]:
ld.open_session()

<lseg.data.session.Definition object at 0x2a704318990 {name='workspace'}>

- Try downloading missing balance sheets once more

In [None]:
# Initialize an empty list to store DataFrames
results = []
# Loop through each RIC and download the balance sheet data
for i, ric in enumerate(missing_balance_sheets_list):
    print(f"Processing RIC {i+1}/{len(missing_balance_sheets_list)}: {ric}") 
    # Download income statements for each RIC and every quarter separately
    try:
        df1 = ld.get_history(
            universe=[ric],
            start='2000-01-01',
            end='2025-01-01',
            fields=[
                "TR.F.BalanceSheet(Period = FQ0)",
                "TR.F.BalanceSheet(Period=FQ0).FccItemName",
            ],
        )
        # Add RIC column to the DataFrame to identify the company
        df1['RIC'] = ric
        df2 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.BalanceSheet(Period = FQ-1)",
                "TR.F.BalanceSheet(Period=FQ-1).FccItemName",
            ],
        )
        df2['RIC'] = ric
        df3 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.BalanceSheet(Period = FQ-2)",
                "TR.F.BalanceSheet(Period=FQ-2).FccItemName",
            ],
        )
        df3['RIC'] = ric
        df4 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.BalanceSheet(Period = FQ-3)",
                "TR.F.BalanceSheet(Period=FQ-3).FccItemName",
            ],
        )   
        df4['RIC'] = ric
    except:
        # If an error occurs, current RIC is skipped
        continue
    # Concatenate the DataFrames for each RIC
    
    # Check for failed downloads
    dfs = pd.concat([df1, df2, df3, df4])
    if df1.empty:
        print(f"df1 empty for RIC {ric}.")
    elif df2.empty:
        print(f"df2 empty for RIC {ric}.")
    elif df3.empty:
        print(f"df3 empty for RIC {ric}.")
    elif df4.empty:
        print(f"df4 empty for RIC {ric}.")  
        continue

    # Append the concatenated DataFrame to the results list
    results.append(dfs)

# Concatenate all DataFrames in the results list into a single DataFrame
balance_sheets = pd.concat(results)

Manually trying to download the now remaining data, i.e. df1 for GEV.N, RAL.N and MRP.N, as well as df3 for TTGT.OQ returned empty dataframes. Therefore, these reports have to be left out unfortunately.

In [23]:
# Work with a copy just to be safe
df = balance_sheets.copy()

# Reset index and rename to Date
df = df.reset_index().rename(columns={'index': 'Date'})

# Indicate statement type
df["Statement"] = "balance_sheet"

# Get rid of timestamp
df["Date"] = pd.to_datetime(df["Date"]).dt.date

# Save to CSV
df.to_csv("data/missing_balance_sheets.csv", index=False)

- Try downloading missing cash flow statements

In [None]:
# Initialize an empty list to store DataFrames
results = []

# Loop through each RIC and download the balance sheet data
for i, ric in enumerate(missing_cash_flow_statements_list):
    print(f"Processing RIC {i+1}/{len(missing_cash_flow_statements_list)}: {ric}")
    # Download income statements for each RIC and every quarter separately
    try:
        df1 = ld.get_history(
            universe=[ric],
            start='2000-01-01',
            end='2025-01-01',
            fields=[
                "TR.F.CashFlowStatement(Period = FQ0)",
                "TR.F.CashFlowStatement(Period=FQ0).FccItemName",
            ],
        )
        # Add RIC column to the DataFrame to identify the company
        df2 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.CashFlowStatement(Period = FQ-1)",
                "TR.F.CashFlowStatement(Period=FQ-1).FccItemName",
            ],
        )
        df3 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.CashFlowStatement(Period = FQ-2)",
                "TR.F.CashFlowStatement(Period=FQ-2).FccItemName",
            ],
        )
        df4 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.CashFlowStatement(Period = FQ-3)",
                "TR.F.CashFlowStatement(Period=FQ-3).FccItemName",
            ],
        )   
    except:
        continue
    # Concatenate the DataFrames for each RIC
    dfs = pd.concat([df1, df2, df3, df4])
    
    # Chech for failed downloads
    if df1.empty:
        print(f"df1 empty for RIC {ric}.")
    elif df2.empty:
        print(f"df2 empty for RIC {ric}.")
    elif df3.empty:
        print(f"df3 empty for RIC {ric}.")
    elif df4.empty:
        print(f"df4 empty for RIC {ric}.")  
        continue
    dfs["RIC"] = ric  # Add RIC column to the concatenated DataFrame

    # Check if data was availablce, since balance sheet df lacks ~150 companies
    if dfs.empty:
        print(f"No data found for RIC {ric}.")
        continue

    # Append the concatenated DataFrame to the results list
    results.append(dfs)

# Concatenate all DataFrames in the results list into a single DataFrame
cash_flow_statements = pd.concat(results)

In [26]:
# Work with a copy just to be safe
df = cash_flow_statements.copy()

# Reset index and rename to Date
df = df.reset_index().rename(columns={'index': 'Date'})

# Indicate statement type
df["Statement"] = "cash_flow_statement"

# Get rid of timestamp
df["Date"] = pd.to_datetime(df["Date"]).dt.date

# Save to CSV
df.to_csv("data/missing_cash_flow_statements.csv", index=False)

- Try downloading missing income statements

In [None]:
# Initialize an empty list to store DataFrames
results = []
empty_count = 0

# Loop through each RIC and download the balance sheet data
for i, ric in enumerate(missing_income_statements_list):
    print(f"Processing RIC {i+1}/{len(missing_income_statements_list)}: {ric}")
    # Download income statements for each RIC and every quarter separately
    try:
        df1 = ld.get_history(
            universe=[ric],
            start='2000-01-01',
            end='2025-01-01',
            fields=[
                "TR.F.IncomeStatement(Period = FQ0)",
                "TR.F.IncomeStatement(Period=FQ0).FccItemName",
            ],
        )
        # Add RIC column to the DataFrame to identify the company
        df2 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.IncomeStatement(Period = FQ-1)",
                "TR.F.IncomeStatement(Period=FQ-1).FccItemName",
            ],
        )
        df3 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.IncomeStatement(Period = FQ-2)",
                "TR.F.IncomeStatement(Period=FQ-2).FccItemName",
            ],
        )
        df4 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.IncomeStatement(Period = FQ-3)",
                "TR.F.IncomeStatement(Period=FQ-3).FccItemName",
            ],
        )   
    except:
        continue
    # Concatenate the DataFrames for each RIC
    dfs = pd.concat([df1, df2, df3, df4])
    
    # Check for failed downloads
    if df1.empty:
        print(f"df1 empty for RIC {ric}.")
    elif df2.empty:
        print(f"df2 empty for RIC {ric}.")
    elif df3.empty:
        print(f"df3 empty for RIC {ric}.")
    elif df4.empty:
        print(f"df4 empty for RIC {ric}.")  
        continue
   
    dfs["RIC"] = ric 
    
    if dfs.empty:
        print(f"No data found for RIC {ric}.")
        empty_count += 1
        continue 
    
    # Append the concatenated DataFrame to the results list
    results.append(dfs)

# Concatenate all DataFrames in the results list into a single DataFrame
income_statements = pd.concat(results)

print(f"Number of RICs with no data: {empty_count}")

In [28]:
# Work with a copy just to be safe
df = income_statements.copy()

# Reset index and rename to Date
df = df.reset_index().rename(columns={'index': 'Date'})

# Indicate statement type
df["Statement"] = "income_statement"

# Get rid of timestamp
df["Date"] = pd.to_datetime(df["Date"]).dt.date

# Save to CSV
df.to_csv("data/missing_income_statements.csv", index=False)

In [29]:
ld.close_session()