### Creating Financial Datasets [Statments Key Metrics]

In this notebook, we will create financial datasets that will be used in the subsequent notebooks to build predictive data models.

The notebook will start with company profile data created previously.

The company profile data will be cleaned to include only stocks and to keep the qualitative information required for future model development.

For the set of remaining company stocks I will write code that will connect to the Financial Model Prep API and donwload financial data for each stock.

The financial data will be:

1. Historical stock prices 
2. Financial statements key metrics __[in this notebook - Financial Statments Key Metrics]__
3. Financial statement ratios
4. Financial growth

Each of the above data will be stored in a separate file and saved as a csv file.

The final dataset will be a combination of all the above datasets.

In [1]:
# import libraries

import pandas as pd




In [2]:
# importing cleaned company profile data

filepath='data/Datasets/company_profile_cleaned_50B.csv'

company_profile_data = pd.read_csv(filepath)


In [3]:
# checking the data of company profile data

company_profile_data.head()


Unnamed: 0,symbol,price,beta,mktCap,companyName,currency,cik,isin,cusip,exchange,...,sector,country,city,state,zip,isEtf,isActivelyTrading,isAdr,isFund,date
0,NVDA,141.98,1.657,3e-06,NVIDIA Corporation,USD,1045810.0,US67066G1040,67066G104,NASDAQ Global Select,...,Technology,US,Santa Clara,CA,95051,False,True,False,False,2024-12-02
1,AAPL,225.0,1.24,3e-06,Apple Inc.,USD,320193.0,US0378331005,037833100,NASDAQ Global Select,...,Technology,US,Cupertino,CA,95014,False,True,False,False,2024-12-02
2,MSFT,415.0,0.904,3e-06,Microsoft Corporation,USD,789019.0,US5949181045,594918104,NASDAQ Global Select,...,Technology,US,Redmond,WA,98052-6399,False,True,False,False,2024-12-02
3,AMZN,202.61,1.146,2e-06,"Amazon.com, Inc.",USD,1018724.0,US0231351067,023135106,NASDAQ Global Select,...,Consumer Cyclical,US,Seattle,WA,98109-5210,False,True,False,False,2024-12-02
4,GOOGL,172.49,1.034,2e-06,Alphabet Inc.,USD,1652044.0,US02079K3059,02079K305,NASDAQ Global Select,...,Communication Services,US,Mountain View,CA,94043,False,True,False,False,2024-12-02


In [4]:
# creating a list of stocks symbols based on the company profile data

stocks = company_profile_data['symbol'].tolist()


In [5]:
# creating a function which will connect to the Financial Model Prep API and download financial data for each stock. The function will iterate through the list of stocks defined above. 
# The limit should be a variable that can be changed to download more or less data as required. The function will start with an empty dataframe (financial_data_metrics) and append the data for each stock to the dataframe, the empy dataframe should be defined as a global variable outside the function in order to be able to make changes outside the function.
# The function will append data after each API call to the dataframe, in case if the API times out i will not lose previous data. 
# The function will return the dataframe with all the data for all the stocks once iteration is complete. The API used is the Financial Model Prep API, the Key_Metrics endpoint. 
# The data will be accessed on a quarterly basis. An example of endpoint access is as follows https://financialmodelingprep.com/api/v3/key-metrics/AAPL?period=quarter&limit=10&apikey=demo_api_key

In [6]:
# getting API key from gitignore file

import requests
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Get the API key from environment variable
api_key = os.getenv('FMP_API_KEY')


In [7]:
# define global variable

financial_data_metrics = pd.DataFrame()


In [8]:

# Define function with retry logic
def get_financial_data(stocks, limit, retries=3, delay=5):
    global financial_data_metrics
    for stock in stocks:
        url = f'https://financialmodelingprep.com/api/v3/key-metrics/{stock}?period=quarter&limit={limit}&apikey={api_key}'
        for attempt in range(retries):
            try:
                response = requests.get(url)
                response.raise_for_status()  # Raise an error for bad status codes
                data = response.json()
                data = pd.DataFrame(data)
                financial_data_metrics = pd.concat([financial_data_metrics, data], ignore_index=True)
                break  # Exit the retry loop if the request is successful
            except (requests.exceptions.RequestException, ConnectionResetError) as e:
                print(f"Attempt {attempt + 1} failed: {e}")
                if attempt < retries - 1:
                    time.sleep(delay)  # Wait before retrying
                else:
                    print(f"Failed to retrieve data for {stock} after {retries} attempts.")
    return financial_data_metrics

In [9]:
# Testing the newly created function. i will download data for 5 stocks and limit the data to 10 records per stock. The 5 stocks should be the first 5 stocks in the list of stocks.

#financial_data_metrics = get_financial_data(stocks[:5], 10)

#financial_data_metrics.head()


In [10]:
# the test was successful, i will now download data for all the stocks in the list and limit the data to 10 years of data; 40 records per stock.

financial_data_metrics = get_financial_data(stocks, 40)


In [11]:
# viewing the data

# sorting the data by symbol ascending and date descending order

financial_data_metrics = financial_data_metrics.sort_values(by=['symbol', 'date'], ascending=[True, False])

financial_data_metrics.head(100)


Unnamed: 0,symbol,date,calendarYear,period,revenuePerShare,netIncomePerShare,operatingCashFlowPerShare,freeCashFlowPerShare,cashPerShare,bookValuePerShare,...,averagePayables,averageInventory,daysSalesOutstanding,daysPayablesOutstanding,daysOfInventoryOnHand,receivablesTurnover,payablesTurnover,inventoryTurnover,roe,capexPerShare
40,AAPL,2024-09-28,2024,Q4,6.256925,0.971263,1.767138,1.575469,4.295481,3.753628,...,58267000000.0,6725500000.0,62.802802,121.572545,12.844802,1.433057,0.740299,7.006725,0.258753,0.191669
41,AAPL,2024-06-29,2024,Q3,5.599021,1.400000,1.883681,1.743277,4.034008,4.354308,...,46663500000.0,6198500000.0,45.297457,92.879672,12.036053,1.986866,0.968996,7.477534,0.321521,0.140405
42,AAPL,2024-03-30,2024,Q2,5.890812,1.534222,1.472817,1.343255,4.358732,4.815961,...,51949500000.0,6371500000.0,40.808568,84.933996,11.568830,2.205419,1.059646,7.779525,0.318570,0.129561
43,AAPL,2023-12-30,2024,Q1,7.709660,2.186752,2.572251,2.418025,4.713160,4.777636,...,60378500000.0,6421000000.0,37.710056,80.858158,9.054234,2.386631,1.113060,9.940101,0.457706,0.154225
44,AAPL,2023-09-30,2023,Q4,5.737259,1.471592,1.384537,1.245879,3.945977,3.983862,...,54655000000.0,6841000000.0,61.327069,114.833405,11.611542,1.467541,0.783744,7.750908,0.369388,0.138659
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6562,ABNB,2020-12-31,2020,Q4,2.485182,-11.244526,-0.402337,-0.425133,18.484930,8.392599,...,62935000.0,0.0,228.474148,34.283303,0.000000,0.393918,2.625185,0.000000,-1.339815,0.022796
6563,ABNB,2020-09-30,2020,Q3,2.285990,0.373516,0.571646,0.558642,7.655345,0.547045,...,62935000.0,0.0,12.392934,18.200726,0.000000,7.262203,4.944858,0.000000,0.682788,0.013004
6564,ABNB,2020-06-30,2020,Q2,0.630525,-1.084082,-0.483029,-0.494680,12.037512,5.465317,...,39949000.0,0.0,0.000000,44.608618,0.000000,0.000000,2.017547,0.000000,-0.198357,0.011651
6565,ABNB,2020-03-31,2020,Q1,1.585531,-0.641507,-1.073237,-1.102745,5.790191,0.000000,...,75708500.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,-0.140524,0.029508


In [12]:
# getting information on the data

financial_data_metrics.info()


<class 'pandas.core.frame.DataFrame'>
Index: 11031 entries, 40 to 7155
Data columns (total 61 columns):
 #   Column                                  Non-Null Count  Dtype  
---  ------                                  --------------  -----  
 0   symbol                                  11031 non-null  object 
 1   date                                    11031 non-null  object 
 2   calendarYear                            11031 non-null  object 
 3   period                                  11031 non-null  object 
 4   revenuePerShare                         11031 non-null  float64
 5   netIncomePerShare                       11031 non-null  float64
 6   operatingCashFlowPerShare               10992 non-null  float64
 7   freeCashFlowPerShare                    10992 non-null  float64
 8   cashPerShare                            10987 non-null  float64
 9   bookValuePerShare                       10987 non-null  float64
 10  tangibleBookValuePerShare               10987 non-null  float64

In [13]:
# describing the data

financial_data_metrics.describe()


Unnamed: 0,revenuePerShare,netIncomePerShare,operatingCashFlowPerShare,freeCashFlowPerShare,cashPerShare,bookValuePerShare,tangibleBookValuePerShare,shareholdersEquityPerShare,interestDebtPerShare,marketCap,...,netCurrentAssetValue,investedCapital,daysSalesOutstanding,daysPayablesOutstanding,daysOfInventoryOnHand,receivablesTurnover,payablesTurnover,inventoryTurnover,roe,capexPerShare
count,11031.0,11031.0,10992.0,10992.0,10987.0,10987.0,10987.0,10987.0,10987.0,11031.0,...,10944.0,10944.0,10987.0,10990.0,10990.0,11020.0,11020.0,11020.0,11018.0,10992.0
mean,21.223853,2.459746,2.733499,1.618491,279.53013,92.667993,65.946252,84.916743,141.271899,383598100000.0,...,-1505631000000.0,1594099000000.0,74.666422,7317311.0,-49934280.0,2.838145,2.034324,234342100000.0,0.040399,1.136702
std,71.193627,10.545171,10.017696,8.96056,3126.079529,575.191431,537.905838,537.884825,1136.292279,2015743000000.0,...,14661850000000.0,15083530000000.0,171.625043,766231200.0,3786500000.0,24.158214,3.602372,24598660000000.0,1.61506,4.897069
min,-36.140034,-123.597565,-148.131287,-156.072627,-58815.148616,-745.016336,-4086.24499,-294.407985,-1.157915,0.0,...,-278378300000000.0,-6282709000000.0,-441.410888,-56820690.0,-333571800000.0,-9.943365,-14.266116,-25912000.0,-78.666667,0.0
25%,3.361908,0.34964,0.539944,0.130511,1.805474,8.040663,-5.279183,7.766369,7.302856,36943410000.0,...,-67647000000.0,10088430000.0,28.778373,19.28028,0.0,0.990328,0.474465,0.0,0.017374,0.102313
50%,6.508688,0.930914,1.440967,0.84378,5.288943,21.223587,5.017297,20.396331,18.863039,64981270000.0,...,-18907000000.0,29325500000.0,47.985968,49.44575,26.49626,1.604236,1.274246,0.9779852,0.037139,0.301569
75%,14.128924,1.913873,2.888438,2.062863,15.073484,43.143872,25.275539,41.089075,41.799765,130717700000.0,...,-4376000000.0,77490500000.0,69.304099,87.79395,80.10759,2.415729,2.350178,2.599463,0.07281,0.883712
max,1783.21216,259.545679,369.094666,340.728938,75559.184446,11135.951461,10401.047755,11031.420008,26663.187818,49048870000000.0,...,566710000000.0,255863700000000.0,9806.833241,80326320000.0,2623320.0,2400.062824,152.258065,2582274000000000.0,85.615385,162.933833


In [14]:
# saving the file as a csv file in the data folder named financial_data_EnterpriseValue.csv

financial_data_metrics.to_csv('data/Datasets/financial_data_KeyMetrics.csv', index=False)