### Creating Financial Datasets [Earnings Dates]

In this notebook, we will create financial datasets that will be used in the subsequent notebooks to build predictive data models.

The notebook will start with company profile data created previously.

The company profile data will be cleaned to include only stocks and to keep the qualitative information required for future model development.

For the set of remaining company stocks I will write code that will connect to the Financial Model Prep API and donwload financial data for each stock.

The financial data will be:

1. Historical stock prices 
2. Financial statements key metrics 
3. Financial statement ratios 
4. Financial growth metrics
5. Earnings calendar __[in this notebook - Financial KPI Growth]__

Each of the above data will be stored in a separate file and saved as a csv file.

The final dataset will be a combination of all the above datasets.

In [2]:
# import libraries

import pandas as pd


In [3]:
# importing cleaned company profile data

filepath='../Data/Datasets/company_profile_cleaned_50B.csv'

company_profile_data = pd.read_csv(filepath)



In [26]:
# checking the data of company profile data

company_profile_data.head()


Unnamed: 0,symbol,price,beta,mktCap,companyName,currency,cik,isin,cusip,exchange,...,sector,country,city,state,zip,isEtf,isActivelyTrading,isAdr,isFund,date
0,NVDA,141.98,1.657,3e-06,NVIDIA Corporation,USD,1045810.0,US67066G1040,67066G104,NASDAQ Global Select,...,Technology,US,Santa Clara,CA,95051,False,True,False,False,2024-12-02
1,AAPL,225.0,1.24,3e-06,Apple Inc.,USD,320193.0,US0378331005,037833100,NASDAQ Global Select,...,Technology,US,Cupertino,CA,95014,False,True,False,False,2024-12-02
2,MSFT,415.0,0.904,3e-06,Microsoft Corporation,USD,789019.0,US5949181045,594918104,NASDAQ Global Select,...,Technology,US,Redmond,WA,98052-6399,False,True,False,False,2024-12-02
3,AMZN,202.61,1.146,2e-06,"Amazon.com, Inc.",USD,1018724.0,US0231351067,023135106,NASDAQ Global Select,...,Consumer Cyclical,US,Seattle,WA,98109-5210,False,True,False,False,2024-12-02
4,GOOGL,172.49,1.034,2e-06,Alphabet Inc.,USD,1652044.0,US02079K3059,02079K305,NASDAQ Global Select,...,Communication Services,US,Mountain View,CA,94043,False,True,False,False,2024-12-02


In [27]:
# creating a list of stocks symbols based on the company profile data

stocks = company_profile_data['symbol'].tolist()


In [28]:
# creating a function which will connect to the Financial Model Prep API and download financial data for each stock. The function will iterate through the list of stocks defined above. 
# The limit should be a variable that can be changed to download more or less data as required. The function will start with an empty dataframe (earnings_cal_data) and append the data for each stock to the dataframe, the empy dataframe should be defined as a global variable outside the function in order to be able to make changes outside the function.
# The function will append data after each API call to the dataframe, in case if the API times out i will not lose previous data. 
# The function will return the dataframe with all the data for all the stocks once iteration is complete. The API used is the Financial Model Prep API, the Key_Metrics endpoint. 
# In order to make sure that API doesnt time out a retry logic will be put into the function and sleep/delay will be added between retries.
# The data will be accessed on a quarterly basis. An example of endpoint access is as follows https://financialmodelingprep.com/api/v3/historical/earning_calendar/AAPL?limit=40&apikey=demo_key

In [29]:
# getting API key from gitignore file

import requests
import os
from dotenv import load_dotenv
import time

# Load environment variables from .env file
load_dotenv()

# Get the API key from environment variable
api_key = os.getenv('FMP_API_KEY')


In [30]:
# Define global variable
earnings_cal_data = pd.DataFrame()

In [31]:

# Define function with retry logic and increased timeout
def get_financial_data(stocks, limit, retries=3, delay=5, timeout=10):
    global earnings_cal_data
    for stock in stocks:
        url = f'https://financialmodelingprep.com/api/v3/historical/earning_calendar/{stock}?period=quarter&limit={limit}&apikey={api_key}'
        for attempt in range(retries):
            try:
                response = requests.get(url, timeout=timeout)
                response.raise_for_status()  # Raise an error for bad status codes
                data = response.json()
                data = pd.DataFrame(data)
                earnings_cal_data = pd.concat([earnings_cal_data, data], ignore_index=True)
                break  # Exit the retry loop if the request is successful
            except (requests.exceptions.RequestException, TimeoutError) as e:
                print(f"Attempt {attempt + 1} failed: {e}")
                if attempt < retries - 1:
                    time.sleep(delay)  # Wait before retrying
                else:
                    print(f"Failed to retrieve data for {stock} after {retries} attempts.")
    return earnings_cal_data

In [32]:
# Testing the newly created function. i will download data for 5 stocks and limit the data to 10 records per stock. The 5 stocks should be the first 5 stocks in the list of stocks.

#financial_data_test = get_financial_data(stocks[:5], 10)

# sort by symbol ascending and date descending

#financial_data_test = financial_data_test.sort_values(by=['symbol', 'date'], ascending=[True, False])

#financial_data_test


In [33]:
# the test was successful, i will now download data for all the stocks in the list and limit the data to 10 years of data; 40 records per stock.

full_earnings_cal= get_financial_data(stocks, 50)


In [34]:
# viewing the data

# sort the data by symbol ascending and date descending

full_earnings_cal = full_earnings_cal.sort_values(by=['symbol', 'date'], ascending=[True, False])

full_earnings_cal.head(100)


Unnamed: 0,date,symbol,eps,epsEstimated,time,revenue,revenueEstimated,updatedFromDate,fiscalDateEnding
50,2025-10-29,AAPL,,,amc,,,2024-12-09,2025-09-28
51,2025-07-30,AAPL,,,amc,,,2024-12-09,2025-06-28
52,2025-04-30,AAPL,,,amc,,,2024-12-09,2025-03-28
53,2025-02-06,AAPL,,2.36,amc,,,2024-12-09,2024-12-28
54,2024-10-31,AAPL,1.64,1.6,amc,9.493000e+10,94511953345.0,2024-12-09,2024-09-28
...,...,...,...,...,...,...,...,...,...
1345,2014-07-25,ABBV,0.82,0.76,bmo,4.926000e+09,4565560975.0,2024-11-21,2014-06-30
1346,2014-04-25,ABBV,0.71,0.68,bmo,4.563000e+09,4370197183.0,2024-11-21,2014-03-31
1347,2014-01-31,ABBV,0.82,0.82,bmo,5.111000e+09,5111000000.0,2024-11-21,2013-12-31
1348,2013-10-25,ABBV,0.82,0.78,bmo,4.658000e+09,4430780487.0,2024-11-21,2013-09-30


In [35]:
# getting information on the data

full_earnings_cal.info()


<class 'pandas.core.frame.DataFrame'>
Index: 13262 entries, 50 to 8573
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   date              13262 non-null  object 
 1   symbol            13262 non-null  object 
 2   eps               12174 non-null  object 
 3   epsEstimated      11348 non-null  object 
 4   time              13262 non-null  object 
 5   revenue           12200 non-null  float64
 6   revenueEstimated  10770 non-null  object 
 7   updatedFromDate   13262 non-null  object 
 8   fiscalDateEnding  13262 non-null  object 
dtypes: float64(1), object(8)
memory usage: 1.0+ MB


In [36]:
# describing the data

full_earnings_cal.describe()


Unnamed: 0,revenue
count,12200.0
mean,11228720000.0
std,17312600000.0
min,-24053000000.0
25%,2449300000.0
50%,5516500000.0
75%,12320300000.0
max,173388000000.0


In [37]:
# saving the file as a csv file in the data folder named financial_data_EnterpriseValue.csv

full_earnings_cal.to_csv('data/Datasets/financial_data_EarningsCal.csv', index=False)