## Download of company published financial statements using LSEG API

The Fundamental and Reference module provides the access to private and public company information via "TR."

- Statements include:
    - Balance sheets
    - Cash Flow
    - Income Statements
    
---
### Balance Sheets

In [1]:
import lseg.data as ld
import pandas as pd

In [2]:
# Read in company names to look up buy/sell recommendations for
sp400_companies = pd.read_csv("data/sp400_companies.csv", dtype={"CIK": str})
sp500_companies = pd.read_csv("data/sp500_companies.csv", dtype={"CIK": str})
sp600_companies = pd.read_csv("data/sp600_companies.csv", dtype={"CIK": str})

# Combine all CIKs into a single list
sp400_ciks = sp400_companies["CIK"].tolist()
sp500_ciks = sp500_companies["CIK"].tolist()
sp600_ciks = sp600_companies["CIK"].tolist()
ciks = sp400_ciks + sp500_ciks + sp600_ciks

In [3]:
# Start a session with the LSEG Data Platform (needs Refinitiv Workspace to be running in the background)
ld.open_session()

An error occurred while requesting URL('http://localhost:9010/api/status').
	ConnectError('[WinError 10061] Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte')


: 

: 

- First we download the Refinitiv Instrument Code (RIC) for every company, because apparently downloading balance sheets does not work with CIKs

In [None]:
rics = ld.get_data(
    universe = ciks,
    fields=[
        "TR.RIC"
    ],
)



In [None]:
rics.head()

Unnamed: 0,Instrument,RIC
0,1675149,AA.N
1,6201,AAL.OQ
2,824142,AAON.OQ
3,1520697,ACHC.OQ
4,1646972,ACI.N


In [2]:
# Save rics to a CSV file
#rics.to_csv("data/rics.csv", index=False)
rics = pd.read_csv("data/rics.csv", dtype={"Instrument": str})
rics.head()

Unnamed: 0,Instrument,RIC
0,1675149,AA.N
1,6201,AAL.OQ
2,824142,AAON.OQ
3,1520697,ACHC.OQ
4,1646972,ACI.N


In [3]:
# Convert RICs to a list
ric_list = rics["RIC"].tolist()

- Download of balance sheets

Unfortunately, when not specifying a certain report period or type e.g. 10-K or 10-Q, only data that had been submitted in yearly reports (10-Ks) is downloaded.
Therefore, for every given date/year (?) we have to specifically download the balance sheets from:
- The last fiscal quarter (FQ0)
- The previous fiscal quarter (FQ-1)
- Fiscal quarter minus two (FQ-2)
- Fiscal quarter minus 3 (FQ-3)

The same logic is applied later on when downloading cash flow and income statements.

In [98]:
# Initialize an empty list to store DataFrames
results = []
# Loop through each RIC and download the balance sheet data
for ric in ric_list: 
    # Download income statements for each RIC and every quarter separately
    try:
        df1 = ld.get_history(
            universe=[ric],
            start='2000-01-01',
            end='2025-01-01',
            fields=[
                "TR.F.BalanceSheet(Period = FQ0)",
                "TR.F.BalanceSheet(Period=FQ0).FccItemName",
            ],
        )
        # Add RIC column to the DataFrame to identify the company
        df1['RIC'] = ric
        df2 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.BalanceSheet(Period = FQ-1)",
                "TR.F.BalanceSheet(Period=FQ-1).FccItemName",
            ],
        )
        df2['RIC'] = ric
        df3 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.BalanceSheet(Period = FQ-2)",
                "TR.F.BalanceSheet(Period=FQ-2).FccItemName",
            ],
        )
        df3['RIC'] = ric
        df4 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.BalanceSheet(Period = FQ-3)",
                "TR.F.BalanceSheet(Period=FQ-3).FccItemName",
            ],
        )   
        df4['RIC'] = ric
    except:
        # If an error occurs, current RIC is skipped
        continue
    # Concatenate the DataFrames for each RIC
    dfs = pd.concat([df1, df2, df3, df4])
    # Append the concatenated DataFrame to the results list
    results.append(dfs)

# Concatenate all DataFrames in the results list into a single DataFrame
balance_sheets = pd.concat(results)

An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error occurred while requesting URL('http://localhost:9010/api/udf').
	ReadTimeout('timed out')
An error o

In [99]:
# Work with a copy just to be safe
df = balance_sheets.copy()

# Reset index and rename to Date
df = df.reset_index().rename(columns={'index': 'Date'})

# Indicate statement type
df["Statement"] = "balance_sheet"

# Get rid of timestamp
df["Date"] = pd.to_datetime(df["Date"]).dt.date
df

Unnamed: 0,Date,STD Balance Sheet All,FCC Item Name,RIC,Statement
0,2011-09-30,,,AA.N,balance_sheet
1,2012-09-30,,,AA.N,balance_sheet
2,2013-09-30,,,AA.N,balance_sheet
3,2014-09-30,,,AA.N,balance_sheet
4,2016-06-30,332000000.0,TR.F.CashSTInvst,AA.N,balance_sheet
...,...,...,...,...,...
12237114,2023-03-31,556600000.0,TR.F.TotDebtExclIslamic,ZWS.N,balance_sheet
12237115,2023-03-31,223000000.0,TR.F.TradeAcctTradeNotesRcvblNetTot,ZWS.N,balance_sheet
12237116,2023-03-31,223300000.0,TR.F.CurrLiabExclCurrDebtTot,ZWS.N,balance_sheet
12237117,2023-03-31,607000000.0,TR.F.CurrAssetsExclCashSTInvstTot,ZWS.N,balance_sheet


In [100]:
# Save the data to a CSV file
df.to_csv("data/balance_sheets.csv", index=False)

In [7]:
ld.close_session() 

---
### Cash flow statements

In [2]:
rics = pd.read_csv("data/rics.csv", dtype={"RIC": str})
ric_list = rics["RIC"].tolist()

In [3]:
ld.open_session()

<lseg.data.session.Definition object at 0x1674e8f4950 {name='workspace'}>

In [None]:
rics = ld.get_data(
    universe = ciks,
    fields=[
        "TR.RIC"
    ],
)



In [5]:
# Save RIC to CSV
rics.to_csv("data/rics.csv", index=False)

In [6]:
# Convert RICs to a list
ric_list = rics["RIC"].tolist()

In [None]:
# Initialize an empty list to store DataFrames
results = []

# Loop through each RIC and download the balance sheet data
for i, ric in enumerate(ric_list):
    print(f"Processing RIC {i+1}/{len(ric_list)}: {ric}")
    # Download income statements for each RIC and every quarter separately
    try:
        df1 = ld.get_history(
            universe=[ric],
            start='2000-01-01',
            end='2025-01-01',
            fields=[
                "TR.F.CashFlowStatement(Period = FQ0)",
                "TR.F.CashFlowStatement(Period=FQ0).FccItemName",
            ],
        )
        # Add RIC column to the DataFrame to identify the company
        df2 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.CashFlowStatement(Period = FQ-1)",
                "TR.F.CashFlowStatement(Period=FQ-1).FccItemName",
            ],
        )
        df3 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.CashFlowStatement(Period = FQ-2)",
                "TR.F.CashFlowStatement(Period=FQ-2).FccItemName",
            ],
        )
        df4 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.CashFlowStatement(Period = FQ-3)",
                "TR.F.CashFlowStatement(Period=FQ-3).FccItemName",
            ],
        )   
    except:
        continue
    # Concatenate the DataFrames for each RIC
    dfs = pd.concat([df1, df2, df3, df4])
    dfs["RIC"] = ric  # Add RIC column to the concatenated DataFrame

    # Check if data was availablce, since balance sheet df lacks ~150 companies
    if dfs.empty:
        print(f"No data found for RIC {ric}.")
        continue

    # Append the concatenated DataFrame to the results list
    results.append(dfs)

# Concatenate all DataFrames in the results list into a single DataFrame
cash_flow_statements = pd.concat(results)

In [109]:
df = cash_flow_statements.copy()
# Reset index and rename to Date
df = df.reset_index().rename(columns={'index': 'Date'})

# Get rid of timestamp
df["Date"] = pd.to_datetime(df["Date"]).dt.date

# Create column to indicate type of statement
df["statement"] = "cashflow"
df

Unnamed: 0,Date,STD Cash Flow All,FCC Item Name,RIC,statement
0,2011-09-30,,,AA.N,cashflow
1,2012-09-30,,,AA.N,cashflow
2,2013-09-30,,,AA.N,cashflow
3,2014-09-30,,,AA.N,cashflow
4,2016-06-30,-19000000.0,TR.F.ProfLossStartingLineCF,AA.N,cashflow
...,...,...,...,...,...
4883991,2023-03-31,81000000.0,TR.F.CashDivPaidComStockBuybackNet,LNC.N,cashflow
4883992,2023-03-31,4000000.0,TR.F.ComStockBuybackNet,LNC.N,cashflow
4883993,2023-03-31,-774000000.0,TR.F.FreeCashFlowToEq,LNC.N,cashflow
4883994,2023-03-31,-876000000.0,TR.F.FOCF,LNC.N,cashflow


In [110]:
# Save to CSV
df.to_csv("data/cash_flow_statements.csv", index=False)

In [111]:
ld.close_session()

---
### Income statements

In [5]:
ld.open_session()

<lseg.data.session.Definition object at 0x22d72f11650 {name='workspace'}>

In [None]:
# Initialize an empty list to store DataFrames
results = []
empty_count = 0

# Loop through each RIC and download the balance sheet data
for i, ric in enumerate(ric_list):
    print(f"Processing RIC {i+1}/{len(ric_list)}: {ric}")
    # Download income statements for each RIC and every quarter separately
    try:
        df1 = ld.get_history(
            universe=[ric],
            start='2000-01-01',
            end='2025-01-01',
            fields=[
                "TR.F.IncomeStatement(Period = FQ0)",
                "TR.F.IncomeStatement(Period=FQ0).FccItemName",
            ],
        )
        # Add RIC column to the DataFrame to identify the company
        df2 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.IncomeStatement(Period = FQ-1)",
                "TR.F.IncomeStatement(Period=FQ-1).FccItemName",
            ],
        )
        df3 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.IncomeStatement(Period = FQ-2)",
                "TR.F.IncomeStatement(Period=FQ-2).FccItemName",
            ],
        )
        df4 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.IncomeStatement(Period = FQ-3)",
                "TR.F.IncomeStatement(Period=FQ-3).FccItemName",
            ],
        )   
    except:
        continue
    # Concatenate the DataFrames for each RIC
    dfs = pd.concat([df1, df2, df3, df4])
   
    dfs["RIC"] = ric 
    
    if dfs.empty:
        print(f"No data found for RIC {ric}.")
        empty_count += 1
        continue 
    
    # Append the concatenated DataFrame to the results list
    results.append(dfs)

# Concatenate all DataFrames in the results list into a single DataFrame
income_statements = pd.concat(results)

print(f"Number of RICs with no data: {empty_count}")

In [7]:
df = income_statements.copy()

# Reset index and rename to Date
df = df.reset_index().rename(columns={'index': 'Date'})

# Get rid of timestamp
df["Date"] = pd.to_datetime(df["Date"]).dt.date

# Create column to indicate type of statement
df["statement"] = "income_statement"
df

Unnamed: 0,Date,STD Income Statement All,FCC Item Name,RIC,statement
0,2011-09-30,,,AA.N,income_statement
1,2012-09-30,,,AA.N,income_statement
2,2013-09-30,,,AA.N,income_statement
3,2014-09-30,,,AA.N,income_statement
4,2016-06-30,2323000000.0,TR.F.RevGoodsSrvc,AA.N,income_statement
...,...,...,...,...,...
10868820,2023-03-31,29644164.03785,TR.F.TaxAdjOpInc,ZWS.N,income_statement
10868821,2023-03-31,303700000.0,TR.F.OpExpnExclNonCashChrgTot,ZWS.N,income_statement
10868822,2023-03-31,45600000.0,TR.F.IncAvailToComShrBefDeprAmort,ZWS.N,income_statement
10868823,2023-03-31,9600000.0,TR.F.FixedChrg,ZWS.N,income_statement


In [8]:
# Save the data to a CSV file
df.to_csv("data/income_statements.csv", index=False)

In [9]:
ld.close_session()

---

### Inspecting the data and downloading missing values

In [4]:
balance_sheets = pd.read_csv("data/balance_sheets.csv", dtype={"RIC": str})
income_statements = pd.read_csv("data/income_statements.csv", dtype={"RIC": str})
cash_flow_statements = pd.read_csv("data/cash_flow_statements.csv", dtype={"RIC": str})

In [5]:
balance_sheets["RIC"].nunique(), income_statements["RIC"].nunique(), cash_flow_statements["RIC"].nunique()

(1338, 1453, 1203)

We started with 1505 unique CIKs, for which 1499 unique RICs could be looked up. However, in some of the downloaded financial statements, almost 300 RICs appear to be missing.  
Therefore, after determining which RICs are missing, I will try to download the left out data in the cells below.

In [None]:
# Check which RICs are missing in each DataFrame
missing_balance_sheets = set(ric_list) - set(balance_sheets["RIC"].unique())
missing_cash_flow_statements = set(ric_list) - set(cash_flow_statements["RIC"].unique())
missing_income_statements = set(ric_list) - set(income_statements["RIC"].unique())

In [19]:
# Convert to list for download
missing_balance_sheets_list = list(missing_balance_sheets)
missing_cash_flow_statements_list = list(missing_cash_flow_statements)
missing_income_statements_list = list(missing_income_statements)

In [17]:
ld.open_session()

<lseg.data.session.Definition object at 0x2a704318990 {name='workspace'}>

- Try downloading missing balance sheets once more

In [20]:
# Initialize an empty list to store DataFrames
results = []
# Loop through each RIC and download the balance sheet data
for i, ric in enumerate(missing_balance_sheets_list):
    print(f"Processing RIC {i+1}/{len(missing_balance_sheets_list)}: {ric}") 
    # Download income statements for each RIC and every quarter separately
    try:
        df1 = ld.get_history(
            universe=[ric],
            start='2000-01-01',
            end='2025-01-01',
            fields=[
                "TR.F.BalanceSheet(Period = FQ0)",
                "TR.F.BalanceSheet(Period=FQ0).FccItemName",
            ],
        )
        # Add RIC column to the DataFrame to identify the company
        df1['RIC'] = ric
        df2 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.BalanceSheet(Period = FQ-1)",
                "TR.F.BalanceSheet(Period=FQ-1).FccItemName",
            ],
        )
        df2['RIC'] = ric
        df3 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.BalanceSheet(Period = FQ-2)",
                "TR.F.BalanceSheet(Period=FQ-2).FccItemName",
            ],
        )
        df3['RIC'] = ric
        df4 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.BalanceSheet(Period = FQ-3)",
                "TR.F.BalanceSheet(Period=FQ-3).FccItemName",
            ],
        )   
        df4['RIC'] = ric
    except:
        # If an error occurs, current RIC is skipped
        continue
    # Concatenate the DataFrames for each RIC
    
    # Check for failed downloads
    dfs = pd.concat([df1, df2, df3, df4])
    if df1.empty:
        print(f"df1 empty for RIC {ric}.")
    elif df2.empty:
        print(f"df2 empty for RIC {ric}.")
    elif df3.empty:
        print(f"df3 empty for RIC {ric}.")
    elif df4.empty:
        print(f"df4 empty for RIC {ric}.")  
        continue

    # Append the concatenated DataFrame to the results list
    results.append(dfs)

# Concatenate all DataFrames in the results list into a single DataFrame
balance_sheets = pd.concat(results)

Processing missing RIC 1/156: STEP.OQ




Processing missing RIC 2/156: UCB.N




Processing missing RIC 3/156: NHC.A
Processing missing RIC 4/156: MTUS.N




Processing missing RIC 5/156: WSR.N




Processing missing RIC 6/156: VSH.N
Processing missing RIC 7/156: SNDK.OQ




Processing missing RIC 8/156: SMTC.OQ
Processing missing RIC 9/156: NVRI.N
Processing missing RIC 10/156: SNEX.OQ
Processing missing RIC 11/156: STRL.OQ




Processing missing RIC 12/156: TMP.A
Processing missing RIC 13/156: NPK.N
Processing missing RIC 14/156: SHO.N




Processing missing RIC 15/156: TNC.N
Processing missing RIC 16/156: VRTS.N




Processing missing RIC 17/156: TGNA.N
Processing missing RIC 18/156: WAFD.OQ
Processing missing RIC 19/156: SKT.N
Processing missing RIC 20/156: VIR.OQ




Processing missing RIC 21/156: SNCY.OQ




Processing missing RIC 22/156: WGO.N
Processing missing RIC 23/156: TPH.N




Processing missing RIC 24/156: MXL.OQ




Processing missing RIC 25/156: NMIH.OQ




Processing missing RIC 26/156: ORI.N
Processing missing RIC 27/156: NX.N
Processing missing RIC 28/156: SPSC.OQ




Processing missing RIC 29/156: WERN.OQ
Processing missing RIC 30/156: VBTX.OQ




Processing missing RIC 31/156: VSAT.OQ
Processing missing RIC 32/156: SM.N
Processing missing RIC 33/156: NSIT.OQ
Processing missing RIC 34/156: KEY.N
Processing missing RIC 35/156: TWI.N
Processing missing RIC 36/156: SXI.N
Processing missing RIC 37/156: OPCH.OQ
Processing missing RIC 38/156: SLVM.N




Processing missing RIC 39/156: NXRT.N




Processing missing RIC 40/156: NBHC.N




Processing missing RIC 41/156: VICR.OQ
Processing missing RIC 42/156: WD.N




Processing missing RIC 43/156: UNF.N
Processing missing RIC 44/156: RS.N
Processing missing RIC 45/156: WDFC.OQ
Processing missing RIC 46/156: SPXC.N
Processing missing RIC 47/156: VSTS.N




Processing missing RIC 48/156: TDW.N
Processing missing RIC 49/156: TRNO.N




Processing missing RIC 50/156: STBA.OQ
Processing missing RIC 51/156: TTMI.OQ




Processing missing RIC 52/156: UNFI.N
Processing missing RIC 53/156: MTX.N
Processing missing RIC 54/156: WOR.N
Processing missing RIC 55/156: UFPT.OQ
Processing missing RIC 56/156: MWA.N




Processing missing RIC 57/156: VTLE.N




Processing missing RIC 58/156: NGVT.N




Processing missing RIC 59/156: VECO.OQ
Processing missing RIC 60/156: VIAV.OQ
Processing missing RIC 61/156: nan
Processing missing RIC 62/156: VSCO.N




Processing missing RIC 63/156: NWL.OQ
Processing missing RIC 64/156: GEV.N
df1 empty for RIC GEV.N.
Processing missing RIC 65/156: UCTT.OQ




Processing missing RIC 66/156: RAL.N
df1 empty for RIC RAL.N.
Processing missing RIC 67/156: UHT.N
Processing missing RIC 68/156: TDS.N
Processing missing RIC 69/156: NBTB.OQ
Processing missing RIC 70/156: MYGN.OQ
Processing missing RIC 71/156: TRMK.OQ
Processing missing RIC 72/156: TRUP.OQ




Processing missing RIC 73/156: TWO.N




Processing missing RIC 74/156: STAA.OQ
Processing missing RIC 75/156: SMPL.OQ




Processing missing RIC 76/156: TGI.N
Processing missing RIC 77/156: WSFS.OQ
Processing missing RIC 78/156: OFG.N




Processing missing RIC 79/156: NEO.OQ




Processing missing RIC 80/156: UVV.N




Processing missing RIC 81/156: NPO.N




Processing missing RIC 82/156: TFX.N
Processing missing RIC 83/156: UFCS.OQ
Processing missing RIC 84/156: SITC.N




Processing missing RIC 85/156: SLP.OQ
Processing missing RIC 86/156: TMDX.OQ




Processing missing RIC 87/156: STEL.N




Processing missing RIC 88/156: WABC.OQ
Processing missing RIC 89/156: URBN.OQ
Processing missing RIC 90/156: NWBI.OQ
Processing missing RIC 91/156: USPH.N
Processing missing RIC 92/156: TBBK.OQ




Processing missing RIC 93/156: NWN.N




Processing missing RIC 94/156: SMP.N
Processing missing RIC 95/156: USNA.N
Processing missing RIC 96/156: VRRM.OQ




Processing missing RIC 97/156: SIG.N




Processing missing RIC 98/156: SUPN.OQ




Processing missing RIC 99/156: THRM.OQ
Processing missing RIC 100/156: UPBD.OQ
Processing missing RIC 101/156: WHD.N




Processing missing RIC 102/156: STRA.OQ
Processing missing RIC 103/156: THRY.OQ




Processing missing RIC 104/156: WKC.N




Processing missing RIC 105/156: NATL.N




Processing missing RIC 106/156: WRLD.OQ
Processing missing RIC 107/156: SKY.N
Processing missing RIC 108/156: SXC.N




Processing missing RIC 109/156: TNDM.OQ




Processing missing RIC 110/156: NVEE.OQ




Processing missing RIC 111/156: TALO.N




Processing missing RIC 112/156: NABL.N




Processing missing RIC 113/156: MYRG.OQ




Processing missing RIC 114/156: NAVI.OQ




Processing missing RIC 115/156: UNIT.OQ




Processing missing RIC 116/156: MRP.N
df1 empty for RIC MRP.N.
Processing missing RIC 117/156: ORA.N




Processing missing RIC 118/156: SSTK.N




Processing missing RIC 119/156: TRST.OQ
Processing missing RIC 120/156: UE.N




Processing missing RIC 121/156: TGTX.OQ
Processing missing RIC 122/156: TILE.OQ




Processing missing RIC 123/156: VRE.N
Processing missing RIC 124/156: TFIN.OQ




Processing missing RIC 125/156: TRN.N
Processing missing RIC 126/156: TR.N
Processing missing RIC 127/156: THS.N




Processing missing RIC 128/156: WSC.OQ




Processing missing RIC 129/156: TTGT.OQ




df3 empty for RIC TTGT.OQ.
Processing missing RIC 130/156: VCEL.OQ




Processing missing RIC 131/156: WLY.N
Processing missing RIC 132/156: VTOL.N




Processing missing RIC 133/156: LNN.N
Processing missing RIC 134/156: VIRT.N




Processing missing RIC 135/156: NTCT.OQ
Processing missing RIC 136/156: SPTN.OQ




Processing missing RIC 137/156: SKYW.OQ




Processing missing RIC 138/156: UTL.N
Processing missing RIC 139/156: SITM.OQ




Processing missing RIC 140/156: CMC.N
Processing missing RIC 141/156: TRIP.OQ




Processing missing RIC 142/156: NEOG.OQ
Processing missing RIC 143/156: WS.N




Processing missing RIC 144/156: SPNT.N




Processing missing RIC 145/156: SONO.OQ




Processing missing RIC 146/156: SNDR.N




Processing missing RIC 147/156: OGN.N




Processing missing RIC 148/156: NYMT.OQ




Processing missing RIC 149/156: CNH.N




Processing missing RIC 150/156: NOG.N




Processing missing RIC 151/156: SLG.N
Processing missing RIC 152/156: STC.N
Processing missing RIC 153/156: SXT.N




Processing missing RIC 154/156: TDC.N




Processing missing RIC 155/156: HTO.OQ
Processing missing RIC 156/156: VYX.N




Manually trying to download the now remaining data, i.e. df1 for GEV.N, RAL.N and MRP.N, as well as df3 for TTGT.OQ returned empty dataframes. Therefore, these reports have to be left out unfortunately.

In [23]:
# Work with a copy just to be safe
df = balance_sheets.copy()

# Reset index and rename to Date
df = df.reset_index().rename(columns={'index': 'Date'})

# Indicate statement type
df["Statement"] = "balance_sheet"

# Get rid of timestamp
df["Date"] = pd.to_datetime(df["Date"]).dt.date

# Save to CSV
df.to_csv("data/missing_balance_sheets.csv", index=False)

- Try downloading missing cash flow statements

In [25]:
# Initialize an empty list to store DataFrames
results = []

# Loop through each RIC and download the balance sheet data
for i, ric in enumerate(missing_cash_flow_statements_list):
    print(f"Processing RIC {i+1}/{len(missing_cash_flow_statements_list)}: {ric}")
    # Download income statements for each RIC and every quarter separately
    try:
        df1 = ld.get_history(
            universe=[ric],
            start='2000-01-01',
            end='2025-01-01',
            fields=[
                "TR.F.CashFlowStatement(Period = FQ0)",
                "TR.F.CashFlowStatement(Period=FQ0).FccItemName",
            ],
        )
        # Add RIC column to the DataFrame to identify the company
        df2 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.CashFlowStatement(Period = FQ-1)",
                "TR.F.CashFlowStatement(Period=FQ-1).FccItemName",
            ],
        )
        df3 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.CashFlowStatement(Period = FQ-2)",
                "TR.F.CashFlowStatement(Period=FQ-2).FccItemName",
            ],
        )
        df4 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.CashFlowStatement(Period = FQ-3)",
                "TR.F.CashFlowStatement(Period=FQ-3).FccItemName",
            ],
        )   
    except:
        continue
    # Concatenate the DataFrames for each RIC
    dfs = pd.concat([df1, df2, df3, df4])
    
    # Chech for failed downloads
    if df1.empty:
        print(f"df1 empty for RIC {ric}.")
    elif df2.empty:
        print(f"df2 empty for RIC {ric}.")
    elif df3.empty:
        print(f"df3 empty for RIC {ric}.")
    elif df4.empty:
        print(f"df4 empty for RIC {ric}.")  
        continue
    dfs["RIC"] = ric  # Add RIC column to the concatenated DataFrame

    # Check if data was availablce, since balance sheet df lacks ~150 companies
    if dfs.empty:
        print(f"No data found for RIC {ric}.")
        continue

    # Append the concatenated DataFrame to the results list
    results.append(dfs)

# Concatenate all DataFrames in the results list into a single DataFrame
cash_flow_statements = pd.concat(results)

Processing RIC 1/291: MCRI.OQ
Processing RIC 2/291: STEP.OQ




Processing RIC 3/291: REX.N
Processing RIC 4/291: UCB.N




Processing RIC 5/291: MODG.N
Processing RIC 6/291: ROCK.OQ
Processing RIC 7/291: PRSU.N
Processing RIC 8/291: MOGa.N
Processing RIC 9/291: NHC.A
Processing RIC 10/291: MTUS.N




Processing RIC 11/291: MLAB.OQ
Processing RIC 12/291: RNST.N
Processing RIC 13/291: LUMN.N
Processing RIC 14/291: WSR.N




Processing RIC 15/291: WHR.N
Processing RIC 16/291: YOU.N




Processing RIC 17/291: VSH.N
Processing RIC 18/291: SNDK.OQ




Processing RIC 19/291: SMTC.OQ
Processing RIC 20/291: PGNY.OQ




Processing RIC 21/291: NVRI.N
Processing RIC 22/291: SFNC.OQ
Processing RIC 23/291: SNEX.OQ
Processing RIC 24/291: STRL.OQ




Processing RIC 25/291: SEE.N
Processing RIC 26/291: TMP.A
Processing RIC 27/291: NPK.N




Processing RIC 28/291: SHO.N




Processing RIC 29/291: RAMP.N
Processing RIC 30/291: PSMT.OQ
Processing RIC 31/291: TNC.N
Processing RIC 32/291: SHAK.N




Processing RIC 33/291: LXP.N
Processing RIC 34/291: VRTS.N




Processing RIC 35/291: TGNA.N




Processing RIC 36/291: WAFD.OQ
Processing RIC 37/291: VIR.OQ




Processing RIC 38/291: SKT.N
Processing RIC 39/291: MDU.N
Processing RIC 40/291: SNCY.OQ




Processing RIC 41/291: MATX.N
Processing RIC 42/291: WGO.N
Processing RIC 43/291: TPH.N




Processing RIC 44/291: MXL.OQ




Processing RIC 45/291: NMIH.OQ




Processing RIC 46/291: BLK.N




Processing RIC 47/291: PFS.N




Processing RIC 48/291: MGEE.OQ
Processing RIC 49/291: NX.N
Processing RIC 50/291: SPSC.OQ




Processing RIC 51/291: MHO.N
Processing RIC 52/291: PATK.OQ
Processing RIC 53/291: MSGS.N




Processing RIC 54/291: WERN.OQ
Processing RIC 55/291: SBH.N




Processing RIC 56/291: VSAT.OQ
Processing RIC 57/291: SM.N
Processing RIC 58/291: NSIT.OQ
Processing RIC 59/291: PLAY.OQ




Processing RIC 60/291: VBTX.OQ




Processing RIC 61/291: TWI.N
Processing RIC 62/291: SXI.N
Processing RIC 63/291: MTH.N
Processing RIC 64/291: LPG.N




Processing RIC 65/291: PEB.N




Processing RIC 66/291: PFBC.OQ




Processing RIC 67/291: SCVL.OQ
Processing RIC 68/291: PRAA.OQ




Processing RIC 69/291: PMT.N




Processing RIC 70/291: SLVM.N




Processing RIC 71/291: PTGX.OQ




Processing RIC 72/291: NXRT.N




Processing RIC 73/291: MRTN.OQ
Processing RIC 74/291: NBHC.N




Processing RIC 75/291: VICR.OQ
Processing RIC 76/291: WD.N




Processing RIC 77/291: XPEL.OQ




Processing RIC 78/291: PRVA.OQ




Processing RIC 79/291: MSEX.OQ




Processing RIC 80/291: PDFS.OQ




Processing RIC 81/291: VYX.N
Processing RIC 82/291: SPXC.N
Processing RIC 83/291: UNF.N
Processing RIC 84/291: PCRX.OQ




Processing RIC 85/291: VSTS.N




Processing RIC 86/291: PRG.N
Processing RIC 87/291: ROG.N
Processing RIC 88/291: PRA.N
Processing RIC 89/291: PRDO.OQ
Processing RIC 90/291: TDW.N
Processing RIC 91/291: TRNO.N




Processing RIC 92/291: LQDT.OQ




Processing RIC 93/291: STBA.OQ
Processing RIC 94/291: SAFT.OQ




Processing RIC 95/291: TTMI.OQ




Processing RIC 96/291: UNFI.N
Processing RIC 97/291: QNST.OQ




Processing RIC 98/291: RHI.N
Processing RIC 99/291: MTX.N
Processing RIC 100/291: OUT.N




Processing RIC 101/291: WOR.N




Processing RIC 102/291: UFPT.OQ
Processing RIC 103/291: MWA.N




Processing RIC 104/291: PLMR.OQ




Processing RIC 105/291: VTLE.N




Processing RIC 106/291: PAYO.OQ




Processing RIC 107/291: WT.N




Processing RIC 108/291: NGVT.N




Processing RIC 109/291: RXO.N




Processing RIC 110/291: VECO.OQ




Processing RIC 111/291: VIAV.OQ
Processing RIC 112/291: nan
Processing RIC 113/291: VSCO.N




Processing RIC 114/291: POWL.OQ




Processing RIC 115/291: NWL.OQ
Processing RIC 116/291: UCTT.OQ




Processing RIC 117/291: MGPI.OQ




Processing RIC 118/291: MC.N




Processing RIC 119/291: PIPR.N




Processing RIC 120/291: RAL.N
df1 empty for RIC RAL.N.
No data found for RIC RAL.N.
Processing RIC 121/291: UHT.N
Processing RIC 122/291: XNCR.OQ




Processing RIC 123/291: TDS.N
Processing RIC 124/291: PK.N




Processing RIC 125/291: RWT.N
Processing RIC 126/291: RC.N




Processing RIC 127/291: MD.N
Processing RIC 128/291: OTTR.OQ
Processing RIC 129/291: NBTB.OQ
Processing RIC 130/291: SAFE.N




Processing RIC 131/291: ZD.OQ
Processing RIC 132/291: BX.N




Processing RIC 133/291: LZB.N
Processing RIC 134/291: MYGN.OQ




Processing RIC 135/291: RCUS.N




Processing RIC 136/291: MP.N




Processing RIC 137/291: QRVO.OQ
Processing RIC 138/291: MNRO.OQ
Processing RIC 139/291: MBC.N




Processing RIC 140/291: RDNT.OQ
Processing RIC 141/291: TRMK.OQ
Processing RIC 142/291: PLAB.OQ




Processing RIC 143/291: TRUP.OQ




Processing RIC 144/291: MMI.N




Processing RIC 145/291: TWO.N




Processing RIC 146/291: PENN.OQ
Processing RIC 147/291: STAA.OQ
Processing RIC 148/291: SMPL.OQ




Processing RIC 149/291: TGI.N
Processing RIC 150/291: WSFS.OQ
Processing RIC 151/291: OFG.N




Processing RIC 152/291: RUSHA.OQ
Processing RIC 153/291: RES.N
Processing RIC 154/291: NEO.OQ




Processing RIC 155/291: UVV.N




Processing RIC 156/291: PLXS.OQ
Processing RIC 157/291: NPO.N




Processing RIC 158/291: PRK.A
Processing RIC 159/291: SCL.N
Processing RIC 160/291: OII.N




Processing RIC 161/291: UFCS.OQ
Processing RIC 162/291: TFX.N
Processing RIC 163/291: MARA.OQ




Processing RIC 164/291: SITC.N
Processing RIC 165/291: RDN.N




Processing RIC 166/291: TMDX.OQ




Processing RIC 167/291: SLP.OQ
Processing RIC 168/291: STEL.N




Processing RIC 169/291: WABC.OQ
Processing RIC 170/291: URBN.OQ
Processing RIC 171/291: NWBI.OQ




Processing RIC 172/291: SFBS.N




Processing RIC 173/291: RHP.N
Processing RIC 174/291: SDGR.OQ




Processing RIC 175/291: PLUS.OQ
Processing RIC 176/291: OSIS.OQ
Processing RIC 177/291: USPH.N
Processing RIC 178/291: SBSI.N




Processing RIC 179/291: PTEN.OQ




Processing RIC 180/291: TBBK.OQ




Processing RIC 181/291: YELP.N




Processing RIC 182/291: MCW.OQ




Processing RIC 183/291: NWN.N
Processing RIC 184/291: PBH.N




Processing RIC 185/291: PJT.N




Processing RIC 186/291: SMP.N
Processing RIC 187/291: SHEN.OQ
Processing RIC 188/291: VRRM.OQ




Processing RIC 189/291: XHR.N




Processing RIC 190/291: SIG.N




Processing RIC 191/291: PRLB.N




Processing RIC 192/291: SUPN.OQ




Processing RIC 193/291: USNA.N
Processing RIC 194/291: THRM.OQ
Processing RIC 195/291: PI.OQ




Processing RIC 196/291: SEDG.OQ




Processing RIC 197/291: UPBD.OQ
Processing RIC 198/291: WHD.N




Processing RIC 199/291: STRA.OQ
Processing RIC 200/291: OI.N
Processing RIC 201/291: LRN.N




Processing RIC 202/291: SABR.OQ




Processing RIC 203/291: PRGS.OQ
Processing RIC 204/291: PZZA.OQ
Processing RIC 205/291: MMSI.OQ
Processing RIC 206/291: RUN.OQ




Processing RIC 207/291: PARR.N




Processing RIC 208/291: THRY.OQ




Processing RIC 209/291: MATW.OQ
Processing RIC 210/291: WKC.N




Processing RIC 211/291: REZI.N




Processing RIC 212/291: MRCY.OQ
Processing RIC 213/291: NATL.N




Processing RIC 214/291: WRLD.OQ
Processing RIC 215/291: SKY.N
Processing RIC 216/291: SXC.N




Processing RIC 217/291: TNDM.OQ




Processing RIC 218/291: NVEE.OQ




Processing RIC 219/291: TALO.N




Processing RIC 220/291: WDFC.OQ
Processing RIC 221/291: OMI.N
Processing RIC 222/291: OXM.N




Processing RIC 223/291: MGY.N




Processing RIC 224/291: QDEL.OQ




Processing RIC 225/291: NABL.N




Processing RIC 226/291: MYRG.OQ




Processing RIC 227/291: SEM.N




Processing RIC 228/291: NAVI.OQ




Processing RIC 229/291: UNIT.OQ




Processing RIC 230/291: MPW.N




Processing RIC 231/291: MAC.N
Processing RIC 232/291: PBI.N
Processing RIC 233/291: LTC.N
Processing RIC 234/291: MRP.N
df1 empty for RIC MRP.N.
No data found for RIC MRP.N.
Processing RIC 235/291: PHIN.N




Processing RIC 236/291: MTRN.N
Processing RIC 237/291: MLKN.OQ
Processing RIC 238/291: PPBI.OQ
Processing RIC 239/291: PINC.OQ




Processing RIC 240/291: RGR.N
Processing RIC 241/291: WWW.N




Processing RIC 242/291: TRST.OQ
Processing RIC 243/291: SSTK.N




Processing RIC 244/291: UE.N




Processing RIC 245/291: SBCF.OQ
Processing RIC 246/291: PAHC.OQ




Processing RIC 247/291: TGTX.OQ




Processing RIC 248/291: TILE.OQ
Processing RIC 249/291: VRE.N
Processing RIC 250/291: TFIN.OQ




Processing RIC 251/291: TRN.N




Processing RIC 252/291: ZWS.N




Processing RIC 253/291: TR.N
Processing RIC 254/291: XRX.OQ
Processing RIC 255/291: FTDR.OQ




Processing RIC 256/291: PECO.OQ




Processing RIC 257/291: THS.N




Processing RIC 258/291: WSC.OQ




Processing RIC 259/291: TTGT.OQ




Processing RIC 260/291: MCY.N
Processing RIC 261/291: SCHL.OQ
Processing RIC 262/291: WLY.N
Processing RIC 263/291: VTOL.N




Processing RIC 264/291: LNN.N
Processing RIC 265/291: VCEL.OQ




Processing RIC 266/291: VIRT.N




Processing RIC 267/291: NTCT.OQ
Processing RIC 268/291: SPTN.OQ




Processing RIC 269/291: SKYW.OQ




Processing RIC 270/291: UTL.N
Processing RIC 271/291: SITM.OQ




Processing RIC 272/291: TRIP.OQ




Processing RIC 273/291: SANM.OQ
Processing RIC 274/291: NEOG.OQ
Processing RIC 275/291: PUMP.N




Processing RIC 276/291: WS.N




Processing RIC 277/291: OMCL.OQ




Processing RIC 278/291: SPNT.N




Processing RIC 279/291: SONO.OQ




Processing RIC 280/291: SNDR.N




Processing RIC 281/291: OGN.N




Processing RIC 282/291: NYMT.OQ




Processing RIC 283/291: SAH.N
Processing RIC 284/291: NOG.N




Processing RIC 285/291: SLG.N
Processing RIC 286/291: STC.N
Processing RIC 287/291: SHOO.OQ
Processing RIC 288/291: SXT.N




Processing RIC 289/291: TDC.N




Processing RIC 290/291: HTO.OQ
Processing RIC 291/291: SCSC.OQ


In [26]:
# Work with a copy just to be safe
df = cash_flow_statements.copy()

# Reset index and rename to Date
df = df.reset_index().rename(columns={'index': 'Date'})

# Indicate statement type
df["Statement"] = "cash_flow_statement"

# Get rid of timestamp
df["Date"] = pd.to_datetime(df["Date"]).dt.date

# Save to CSV
df.to_csv("data/missing_cash_flow_statements.csv", index=False)

- Try downloading missing income statements

In [27]:
# Initialize an empty list to store DataFrames
results = []
empty_count = 0

# Loop through each RIC and download the balance sheet data
for i, ric in enumerate(missing_income_statements_list):
    print(f"Processing RIC {i+1}/{len(missing_income_statements_list)}: {ric}")
    # Download income statements for each RIC and every quarter separately
    try:
        df1 = ld.get_history(
            universe=[ric],
            start='2000-01-01',
            end='2025-01-01',
            fields=[
                "TR.F.IncomeStatement(Period = FQ0)",
                "TR.F.IncomeStatement(Period=FQ0).FccItemName",
            ],
        )
        # Add RIC column to the DataFrame to identify the company
        df2 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.IncomeStatement(Period = FQ-1)",
                "TR.F.IncomeStatement(Period=FQ-1).FccItemName",
            ],
        )
        df3 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.IncomeStatement(Period = FQ-2)",
                "TR.F.IncomeStatement(Period=FQ-2).FccItemName",
            ],
        )
        df4 = ld.get_history(
            universe = [ric],
            start = '2000-01-01',
            end = '2025-01-01',
            fields = [
                "TR.F.IncomeStatement(Period = FQ-3)",
                "TR.F.IncomeStatement(Period=FQ-3).FccItemName",
            ],
        )   
    except:
        continue
    # Concatenate the DataFrames for each RIC
    dfs = pd.concat([df1, df2, df3, df4])
    
    # Check for failed downloads
    if df1.empty:
        print(f"df1 empty for RIC {ric}.")
    elif df2.empty:
        print(f"df2 empty for RIC {ric}.")
    elif df3.empty:
        print(f"df3 empty for RIC {ric}.")
    elif df4.empty:
        print(f"df4 empty for RIC {ric}.")  
        continue
   
    dfs["RIC"] = ric 
    
    if dfs.empty:
        print(f"No data found for RIC {ric}.")
        empty_count += 1
        continue 
    
    # Append the concatenated DataFrame to the results list
    results.append(dfs)

# Concatenate all DataFrames in the results list into a single DataFrame
income_statements = pd.concat(results)

print(f"Number of RICs with no data: {empty_count}")

Processing RIC 1/41: LITE.OQ




Processing RIC 2/41: SAIA.OQ




Processing RIC 3/41: MRP.N
df1 empty for RIC MRP.N.
No data found for RIC MRP.N.
Processing RIC 4/41: CLH.N
Processing RIC 5/41: CMA.N
Processing RIC 6/41: AKAM.OQ
Processing RIC 7/41: VFC.N
Processing RIC 8/41: SAIC.OQ




Processing RIC 9/41: ZBH.N




Processing RIC 10/41: FCN.N
Processing RIC 11/41: WY.N
Processing RIC 12/41: WSM.N




Processing RIC 13/41: VLY.OQ
Processing RIC 14/41: BR.N




Processing RIC 15/41: APD.N
Processing RIC 16/41: TNC.N
Processing RIC 17/41: CR.N




Processing RIC 18/41: nan
Processing RIC 19/41: ABNB.OQ




Processing RIC 20/41: ENSG.OQ




Processing RIC 21/41: XEL.OQ




Processing RIC 22/41: RAL.N
df1 empty for RIC RAL.N.
No data found for RIC RAL.N.
Processing RIC 23/41: TKR.N
Processing RIC 24/41: ZBRA.OQ




Processing RIC 25/41: PFGC.N




Processing RIC 26/41: WTW.OQ




Processing RIC 27/41: CMC.N
Processing RIC 28/41: CHWY.N




Processing RIC 29/41: ZTS.N




Processing RIC 30/41: ENTG.OQ




Processing RIC 31/41: XYL.N




Processing RIC 32/41: WDAY.OQ




Processing RIC 33/41: PEN.N




Processing RIC 34/41: WMB.N
Processing RIC 35/41: YUM.N
Processing RIC 36/41: LIVN.OQ




Processing RIC 37/41: BFb.N
Processing RIC 38/41: WYNN.OQ
Processing RIC 39/41: WDC.OQ
Processing RIC 40/41: RYN.N
Processing RIC 41/41: CHX.OQ




Number of RICs with no data: 2


In [28]:
# Work with a copy just to be safe
df = income_statements.copy()

# Reset index and rename to Date
df = df.reset_index().rename(columns={'index': 'Date'})

# Indicate statement type
df["Statement"] = "income_statement"

# Get rid of timestamp
df["Date"] = pd.to_datetime(df["Date"]).dt.date

# Save to CSV
df.to_csv("data/missing_income_statements.csv", index=False)

In [29]:
ld.close_session()