### Creating Financial Datasets [Statment Ratios]

In this notebook, we will create financial datasets that will be used in the subsequent notebooks to build predictive data models.

The notebook will start with company profile data created previously.

The company profile data will be cleaned to include only stocks and to keep the qualitative information required for future model development.

For the set of remaining company stocks I will write code that will connect to the Financial Model Prep API and donwload financial data for each stock.

The financial data will be:

1. Historical stock prices 
2. Financial statements key metrics 
3. Financial statement ratios __[in this notebook - Statement Ratios]__
4. Financial growth 

Each of the above data will be stored in a separate file and saved as a csv file.

The final dataset will be a combination of all the above datasets.

In [12]:
# import libraries

import pandas as pd


In [13]:
# importing cleaned company profile data

filepath='data/Datasets/company_profile_cleaned_50B.csv'

company_profile_data = pd.read_csv(filepath)

#checking data on second dataset for quality check
#filepath_1='data/Datasets/processed_quant_data.csv'
 
#quant_data = pd.read_csv(filepath_1, low_memory=False)


In [14]:
# checking the data of company profile data

company_profile_data.head()


Unnamed: 0,symbol,price,beta,mktCap,companyName,currency,cik,isin,cusip,exchange,...,sector,country,city,state,zip,isEtf,isActivelyTrading,isAdr,isFund,date
0,NVDA,141.98,1.657,3e-06,NVIDIA Corporation,USD,1045810.0,US67066G1040,67066G104,NASDAQ Global Select,...,Technology,US,Santa Clara,CA,95051,False,True,False,False,2024-12-02
1,AAPL,225.0,1.24,3e-06,Apple Inc.,USD,320193.0,US0378331005,037833100,NASDAQ Global Select,...,Technology,US,Cupertino,CA,95014,False,True,False,False,2024-12-02
2,MSFT,415.0,0.904,3e-06,Microsoft Corporation,USD,789019.0,US5949181045,594918104,NASDAQ Global Select,...,Technology,US,Redmond,WA,98052-6399,False,True,False,False,2024-12-02
3,AMZN,202.61,1.146,2e-06,"Amazon.com, Inc.",USD,1018724.0,US0231351067,023135106,NASDAQ Global Select,...,Consumer Cyclical,US,Seattle,WA,98109-5210,False,True,False,False,2024-12-02
4,GOOGL,172.49,1.034,2e-06,Alphabet Inc.,USD,1652044.0,US02079K3059,02079K305,NASDAQ Global Select,...,Communication Services,US,Mountain View,CA,94043,False,True,False,False,2024-12-02


In [15]:
# creating a list of stocks symbols based on the company profile data

stocks = company_profile_data['symbol'].tolist()


In [16]:
# creating a function which will connect to the Financial Model Prep API and download financial data for each stock. The function will iterate through the list of stocks defined above. 
# The limit should be a variable that can be changed to download more or less data as required. The function will start with an empty dataframe (financial_data_ratios) and append the data for each stock to the dataframe, the empy dataframe should be defined as a global variable outside the function in order to be able to make changes outside the function.
# The function will append data after each API call to the dataframe, in case if the API times out i will not lose previous data. 
# The function will return the dataframe with all the data for all the stocks once iteration is complete. The API used is the Financial Model Prep API, the Ratios endpoint. 
# In order to make sure that API doesnt time out a retry logic will be put into the function and sleep/delay will be added between retries.
# The data will be accessed on a quarterly basis. An example of endpoint access is as follows https://financialmodelingprep.com/api/v3/ratios/AAPL?period=quarter&limit=10&apikey=demo

In [17]:
# getting API key from gitignore file

import requests
import os
from dotenv import load_dotenv
import time

# Load environment variables from .env file
load_dotenv()

# Get the API key from environment variable
api_key = os.getenv('FMP_API_KEY')


In [18]:
# Define global variable
financial_data_ratios = pd.DataFrame()


In [19]:
# Define function with retry logic and increased timeout
def get_financial_data(stocks, limit, retries=3, delay=5, timeout=10):
    global financial_data_ratios
    for stock in stocks:
        url = f'https://financialmodelingprep.com/api/v3/ratios/{stock}?period=quarter&limit={limit}&apikey={api_key}'
        for attempt in range(retries):
            try:
                response = requests.get(url, timeout=timeout)
                response.raise_for_status()  # Raise an error for bad status codes
                data = response.json()
                data = pd.DataFrame(data)
                financial_data_ratios = pd.concat([financial_data_ratios, data], ignore_index=True)
                break  # Exit the retry loop if the request is successful
            except (requests.exceptions.RequestException, TimeoutError) as e:
                print(f"Attempt {attempt + 1} failed: {e}")
                if attempt < retries - 1:
                    time.sleep(delay)  # Wait before retrying
                else:
                    print(f"Failed to retrieve data for {stock} after {retries} attempts.")
    return financial_data_ratios

In [20]:
# Testing the newly created function. i will download data for 5 stocks and limit the data to 10 records per stock. The 5 stocks should be the first 5 stocks in the list of stocks.

#financial_data_test = get_financial_data(stocks[:5], 10)


In [21]:

# sort the data by symbol acending and date descending order
#financial_data_test = financial_data_test.sort_values(by=['symbol', 'date'], ascending=[True, False])

#financial_data_test.head(50)


In [22]:
# the test was successful, i will now download data for all the stocks in the list and limit the data to 10 years of data; 40 records per stock.

financial_data_ratios = get_financial_data(stocks, 40)


Attempt 1 failed: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


In [23]:
# viewing the data

# sort the data by symbol acending and date descending order

financial_data_ratios = financial_data_ratios.sort_values(by=['symbol', 'date'], ascending=[True, False])

financial_data_ratios.head(100)




Unnamed: 0,symbol,date,calendarYear,period,currentRatio,quickRatio,cashRatio,daysOfSalesOutstanding,daysOfInventoryOutstanding,operatingCycle,...,priceToSalesRatio,priceEarningsRatio,priceToFreeCashFlowsRatio,priceToOperatingCashFlowsRatio,priceCashFlowRatio,priceEarningsToGrowthRatio,priceSalesRatio,dividendYield,enterpriseValueMultiple,priceFairValue
40,AAPL,2024-09-28,2024,Q4,0.867313,0.826007,0.169753,62.802802,12.844802,75.647604,...,36.406063,58.632390,144.585517,128.903346,128.903346,-1.908962,36.406063,0.001101,108.692191,60.685296
41,AAPL,2024-06-29,2024,Q3,0.952980,0.906142,0.194227,45.297457,12.036053,57.33351,...,37.617291,37.610714,120.818452,111.812960,111.812960,-4.426492,37.617291,0.001207,117.099404,48.370486
42,AAPL,2024-03-30,2024,Q2,1.037102,0.986771,0.264048,40.808568,11.568830,52.377398,...,29.109739,27.942505,127.660007,116.429977,116.429977,-0.927183,29.109739,0.001404,88.290317,35.606601
43,AAPL,2023-12-30,2024,Q1,1.072544,1.023945,0.304240,37.710056,9.054234,46.76429,...,24.972567,22.010958,79.622821,74.848845,74.848845,0.449390,24.972567,0.001281,70.645628,40.298174
44,AAPL,2023-09-30,2023,Q4,0.988012,0.944442,0.206217,61.327069,11.611542,72.938611,...,29.841774,29.085850,137.421101,123.658630,123.658630,1.846951,29.841774,0.001407,93.334147,42.975881
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6562,ABNB,2020-12-31,2020,Q4,1.734780,1.734780,1.066302,228.474148,0.000000,228.474148,...,59.070127,-3.263810,-345.303377,-364.868334,-364.868334,0.001040,59.070127,0.000000,-12.219459,17.491602
6563,ABNB,2020-09-30,2020,Q3,1.218680,1.218680,0.450214,12.392934,0.000000,12.392934,...,63.302992,96.856726,259.038905,253.146150,253.146150,-0.721416,63.302992,0.000000,179.500598,264.530572
6564,ABNB,2020-06-30,2020,Q2,1.734780,1.734780,1.066302,0.000000,0.000000,0.0,...,229.507223,-33.371548,-292.532404,-299.588442,-299.588442,-0.485404,229.507223,0.000000,-128.221805,26.477876
6565,ABNB,2020-03-31,2020,Q1,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,...,91.269082,-56.394541,-131.227062,-134.835040,-134.835040,18.610199,91.269082,0.000000,-237.712785,31.699196


In [24]:
# getting information on the data

financial_data_ratios.info()


<class 'pandas.core.frame.DataFrame'>
Index: 11031 entries, 40 to 7155
Data columns (total 58 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   symbol                              11031 non-null  object 
 1   date                                11031 non-null  object 
 2   calendarYear                        11031 non-null  object 
 3   period                              11031 non-null  object 
 4   currentRatio                        11031 non-null  float64
 5   quickRatio                          11031 non-null  float64
 6   cashRatio                           11031 non-null  float64
 7   daysOfSalesOutstanding              10987 non-null  float64
 8   daysOfInventoryOutstanding          10990 non-null  float64
 9   operatingCycle                      9652 non-null   object 
 10  daysOfPayablesOutstanding           10990 non-null  float64
 11  cashConversionCycle                 9652 non-n

In [25]:
# describing the data

financial_data_ratios.describe()


Unnamed: 0,currentRatio,quickRatio,cashRatio,daysOfSalesOutstanding,daysOfInventoryOutstanding,daysOfPayablesOutstanding,grossProfitMargin,operatingProfitMargin,pretaxProfitMargin,netProfitMargin,...,priceToSalesRatio,priceEarningsRatio,priceToFreeCashFlowsRatio,priceToOperatingCashFlowsRatio,priceCashFlowRatio,priceEarningsToGrowthRatio,priceSalesRatio,dividendYield,enterpriseValueMultiple,priceFairValue
count,11031.0,11031.0,11031.0,10987.0,10990.0,10990.0,11031.0,11031.0,11031.0,11031.0,...,11031.0,11031.0,11031.0,11031.0,11031.0,11031.0,11031.0,10888.0,10988.0,11031.0
mean,9.517082,11.13134,5.487506,74.666422,-49934280.0,7317311.0,0.540831,0.205694,0.172873,0.132483,...,18.054941,60.989278,114.173779,69.639482,69.639482,-0.308001,18.054941,0.006094,122.3529,4.334035
std,95.51244,133.07113,78.311617,171.625043,3786500000.0,766231200.0,0.763925,0.56986,0.751153,0.711979,...,48.095667,3371.412508,1853.476963,2165.853021,2165.853021,34.657986,48.095667,0.00969,176983.9,154.461298
min,0.0,-495.597938,-0.06076,-441.410888,-333571800000.0,-56820690.0,-13.104956,-29.776648,-40.040218,-40.040218,...,-411.404133,-72011.832868,-75799.358779,-154854.378,-154854.378,-3289.654192,-411.404133,0.0,-12816390.0,-10413.285
25%,0.870108,0.658838,0.139527,28.778373,0.0,19.28028,0.330687,0.102245,0.087338,0.065515,...,6.724717,10.51291,14.866832,22.040193,22.040193,-0.870911,6.724717,9.8e-05,29.45366,1.546944
50%,1.267419,1.050294,0.334287,47.985968,26.49626,49.44575,0.518811,0.192917,0.171311,0.135677,...,12.22335,18.269804,60.67786,47.818257,47.818257,0.014062,12.22335,0.004739,49.57489,3.303185
75%,1.997186,1.766543,0.730863,69.304099,80.10759,87.79395,0.730526,0.321817,0.295087,0.234413,...,21.154523,28.899611,113.933316,84.716557,84.716557,0.808237,21.154523,0.00823,73.37546,7.476787
max,2834.092437,5915.198214,2482.703835,9806.833241,2623320.0,80326320000.0,70.829321,29.568808,39.756352,39.977321,...,4272.309072,328819.212381,63542.43,51834.93882,51834.93882,718.834695,4272.309072,0.473176,13406900.0,4565.918596


In [26]:
# saving the file as a csv file in the data folder named financial_data_EnterpriseValue.csv

financial_data_ratios.to_csv('data/Datasets/financial_data_Ratios.csv', index=False)