### Creating Financial Datasets [Financal KPI Growth]

In this notebook, we will create financial datasets that will be used in the subsequent notebooks to build predictive data models.

The notebook will start with company profile data created previously.

The company profile data will be cleaned to include only stocks and to keep the qualitative information required for future model development.

For the set of remaining company stocks I will write code that will connect to the Financial Model Prep API and donwload financial data for each stock.

The financial data will be:

1. Historical stock prices 
2. Financial statements key metrics 
3. Financial statement ratios 
4. Financial growth __[in this notebook - Financial KPI Growth]__

Each of the above data will be stored in a separate file and saved as a csv file.

The final dataset will be a combination of all the above datasets.

In [15]:
# import libraries

import pandas as pd


In [16]:
# importing cleaned company profile data

filepath='data/Datasets/company_profile_cleaned_50B.csv'

company_profile_data = pd.read_csv(filepath)

#checking data on second dataset for quality check
#filepath_1='data/Datasets/processed_quant_data.csv'
 
#quant_data = pd.read_csv(filepath_1, low_memory=False)


In [17]:
# checking the data of company profile data

company_profile_data.head()


Unnamed: 0,symbol,price,beta,mktCap,companyName,currency,cik,isin,cusip,exchange,...,sector,country,city,state,zip,isEtf,isActivelyTrading,isAdr,isFund,date
0,NVDA,141.98,1.657,3e-06,NVIDIA Corporation,USD,1045810.0,US67066G1040,67066G104,NASDAQ Global Select,...,Technology,US,Santa Clara,CA,95051,False,True,False,False,2024-12-02
1,AAPL,225.0,1.24,3e-06,Apple Inc.,USD,320193.0,US0378331005,037833100,NASDAQ Global Select,...,Technology,US,Cupertino,CA,95014,False,True,False,False,2024-12-02
2,MSFT,415.0,0.904,3e-06,Microsoft Corporation,USD,789019.0,US5949181045,594918104,NASDAQ Global Select,...,Technology,US,Redmond,WA,98052-6399,False,True,False,False,2024-12-02
3,AMZN,202.61,1.146,2e-06,"Amazon.com, Inc.",USD,1018724.0,US0231351067,023135106,NASDAQ Global Select,...,Consumer Cyclical,US,Seattle,WA,98109-5210,False,True,False,False,2024-12-02
4,GOOGL,172.49,1.034,2e-06,Alphabet Inc.,USD,1652044.0,US02079K3059,02079K305,NASDAQ Global Select,...,Communication Services,US,Mountain View,CA,94043,False,True,False,False,2024-12-02


In [18]:
# creating a list of stocks symbols based on the company profile data

stocks = company_profile_data['symbol'].tolist()


In [19]:
# creating a function which will connect to the Financial Model Prep API and download financial data for each stock. The function will iterate through the list of stocks defined above. 
# The limit should be a variable that can be changed to download more or less data as required. The function will start with an empty dataframe (financial_data_growth) and append the data for each stock to the dataframe, the empy dataframe should be defined as a global variable outside the function in order to be able to make changes outside the function.
# The function will append data after each API call to the dataframe, in case if the API times out i will not lose previous data. 
# The function will return the dataframe with all the data for all the stocks once iteration is complete. The API used is the Financial Model Prep API, the Key_Metrics endpoint. 
# In order to make sure that API doesnt time out a retry logic will be put into the function and sleep/delay will be added between retries.
# The data will be accessed on a quarterly basis. An example of endpoint access is as follows https://financialmodelingprep.com/api/v3/financial-growth/AAPL?period=quarter&limit=10&apikey=demo

In [20]:
# getting API key from gitignore file

import requests
import os
from dotenv import load_dotenv
import time

# Load environment variables from .env file
load_dotenv()

# Get the API key from environment variable
api_key = os.getenv('FMP_API_KEY')


In [21]:
# Define global variable
financial_data_growth = pd.DataFrame()

In [22]:

# Define function with retry logic and increased timeout
def get_financial_data(stocks, limit, retries=3, delay=5, timeout=10):
    global financial_data_growth
    for stock in stocks:
        url = f'https://financialmodelingprep.com/api/v3/financial-growth/{stock}?period=quarter&limit={limit}&apikey={api_key}'
        for attempt in range(retries):
            try:
                response = requests.get(url, timeout=timeout)
                response.raise_for_status()  # Raise an error for bad status codes
                data = response.json()
                data = pd.DataFrame(data)
                financial_data_growth = pd.concat([financial_data_growth, data], ignore_index=True)
                break  # Exit the retry loop if the request is successful
            except (requests.exceptions.RequestException, TimeoutError) as e:
                print(f"Attempt {attempt + 1} failed: {e}")
                if attempt < retries - 1:
                    time.sleep(delay)  # Wait before retrying
                else:
                    print(f"Failed to retrieve data for {stock} after {retries} attempts.")
    return financial_data_growth

In [23]:
# Testing the newly created function. i will download data for 5 stocks and limit the data to 10 records per stock. The 5 stocks should be the first 5 stocks in the list of stocks.

# financial_data_test = get_financial_data(stocks[:5], 10)

# sort by symbol ascending and date descending

# financial_data_test = financial_data_test.sort_values(by=['symbol', 'date'], ascending=[True, False])

# financial_data_test


In [24]:
# the test was successful, i will now download data for all the stocks in the list and limit the data to 10 years of data; 40 records per stock.

financial_data_growth = get_financial_data(stocks, 40)


KeyboardInterrupt: 

In [11]:
# viewing the data

# sort the data by symbol ascending and date descending

financial_data_growth = financial_data_growth.sort_values(by=['symbol', 'date'], ascending=[True, False])

financial_data_growth.head(100)


Unnamed: 0,symbol,date,calendarYear,period,revenueGrowth,grossProfitGrowth,ebitgrowth,operatingIncomeGrowth,netIncomeGrowth,epsgrowth,...,tenYDividendperShareGrowthPerShare,fiveYDividendperShareGrowthPerShare,threeYDividendperShareGrowthPerShare,receivablesGrowth,inventoryGrowth,assetGrowth,bookValueperShareGrowth,debtGrowth,rdexpenseGrowth,sgaexpensesGrowth
40,AAPL,2024-09-28,2024,Q4,0.106707,0.105877,0.167206,0.167206,-0.312943,-0.307143,...,1.103591,0.294579,0.135642,0.534397,0.181833,0.100624,-0.137951,0.052565,-0.030102,0.032120
41,AAPL,2024-06-29,2024,Q3,-0.054830,-0.061342,-0.091326,-0.091326,-0.092571,-0.084967,...,1.132779,0.280849,0.122352,0.049137,-0.010751,-0.017187,-0.095859,-0.031418,0.013033,-0.022882
42,AAPL,2024-03-30,2024,Q2,-0.241037,-0.229405,-0.308944,-0.308944,-0.303102,-0.301370,...,1.216607,0.307695,0.170447,-0.178676,-0.042851,-0.045551,0.008022,-0.031933,0.026897,-0.046861
43,AAPL,2023-12-30,2024,Q1,0.336063,0.356890,0.497015,0.497015,0.477435,0.489796,...,1.234623,0.309353,0.155970,-0.178454,0.028432,0.002641,0.199247,-0.128218,0.053237,0.103235
44,AAPL,2023-09-30,2023,Q4,0.094148,0.110235,0.172667,0.172667,0.154670,0.157480,...,1.202570,0.310745,0.170403,0.556296,-0.138757,0.052367,0.037547,0.134059,-0.018140,0.029801
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6562,ABNB,2020-12-31,2020,Q4,-0.359872,-0.417477,-8.403645,-8.403645,-18.726195,-31.378378,...,0.000000,0.000000,0.000000,10.801302,0.000000,0.201985,14.341711,0.033590,8.640029,3.577565
6563,ABNB,2020-09-30,2020,Q3,3.009663,5.423734,1.717981,1.717981,1.381050,1.342593,...,0.000000,0.000000,0.000000,0.000000,0.000000,-0.168043,-0.899906,3.623864,-0.018436,0.111227
6564,ABNB,2020-06-30,2020,Q2,-0.602326,-0.692273,-0.791812,-0.791812,-0.689899,-0.687500,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,-0.157952,-0.354098
6565,ABNB,2020-03-31,2020,Q1,-0.239399,-0.306354,0.007471,0.007471,0.031100,0.030303,...,0.000000,0.000000,0.000000,0.000000,0.000000,-1.000000,1.000000,-1.000000,-0.085119,-0.364931


In [12]:
# getting information on the data

financial_data_growth.info()


<class 'pandas.core.frame.DataFrame'>
Index: 11031 entries, 40 to 7155
Data columns (total 38 columns):
 #   Column                                  Non-Null Count  Dtype  
---  ------                                  --------------  -----  
 0   symbol                                  11031 non-null  object 
 1   date                                    11031 non-null  object 
 2   calendarYear                            11031 non-null  object 
 3   period                                  11031 non-null  object 
 4   revenueGrowth                           11031 non-null  float64
 5   grossProfitGrowth                       11031 non-null  float64
 6   ebitgrowth                              11031 non-null  float64
 7   operatingIncomeGrowth                   11031 non-null  float64
 8   netIncomeGrowth                         11031 non-null  float64
 9   epsgrowth                               11031 non-null  float64
 10  epsdilutedGrowth                        11031 non-null  float64

In [13]:
# describing the data

financial_data_growth.describe()


Unnamed: 0,revenueGrowth,grossProfitGrowth,ebitgrowth,operatingIncomeGrowth,netIncomeGrowth,epsgrowth,epsdilutedGrowth,weightedAverageSharesGrowth,weightedAverageSharesDilutedGrowth,dividendsperShareGrowth,...,tenYDividendperShareGrowthPerShare,fiveYDividendperShareGrowthPerShare,threeYDividendperShareGrowthPerShare,receivablesGrowth,inventoryGrowth,assetGrowth,bookValueperShareGrowth,debtGrowth,rdexpenseGrowth,sgaexpensesGrowth
count,11031.0,11031.0,11031.0,11031.0,11031.0,11031.0,11031.0,11031.0,11031.0,11031.0,...,11031.0,11031.0,11031.0,11031.0,11031.0,11031.0,11031.0,11031.0,11031.0,11031.0
mean,19.839419,5.889267,1766.508,1766.508,1.076225,441.412,441.9116,273.3207,275.4662,77.335507,...,2254.856,1300.625,687.9755,0.302305,2507379.0,0.032234,277.0849,1.312413,0.003578,0.534437
std,2079.144154,605.742086,86036.17,86036.17,29.956538,46259.86,46307.6,16577.4,16704.74,4791.048623,...,121635.1,59745.69,37462.19,13.05254,202869200.0,0.389202,16803.93,109.27027,1.685625,27.667531
min,-2.605959,-53.642857,-112794.6,-112794.6,-965.095238,-901.0,-801.0,-1.0,-1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-538166000.0,-1.0,-22.39351,-1.0,-3.509231,-27.153846
25%,-0.031404,-0.040661,-0.1521235,-0.1521235,-0.212003,-0.2139457,-0.2123591,-0.005980861,-0.006766791,-0.001137,...,0.0,0.0,0.0,-0.042194,-0.01923924,-0.005887,-0.01253169,-0.01805,0.0,-0.038622
50%,0.020138,0.020492,0.02307692,0.02307692,0.02211,0.02352941,0.02352941,0.0,-0.0005065856,0.0,...,0.2593084,0.2689246,0.1566761,0.0,0.0,0.010382,0.01370489,0.000219,0.0,0.007576
75%,0.07764,0.096076,0.2389572,0.2389572,0.336165,0.3386462,0.3405212,0.001453953,0.001587302,0.010977,...,1.687847,0.7378783,0.3986393,0.076784,0.04708335,0.035681,0.04553289,0.036758,0.0,0.077107
max,218369.572599,63619.589936,6843528.0,6843528.0,1450.853944,4858607.0,4863621.0,1038577.0,1038577.0,373168.752578,...,11118050.0,4194568.0,3050608.0,1175.0,20932960000.0,20.960384,1064756.0,11410.0,172.40625,2137.2


In [14]:
# saving the file as a csv file in the data folder named financial_data_EnterpriseValue.csv

financial_data_growth.to_csv('data/Datasets/financial_data_Growth.csv', index=False)