### Creating Financial Datasets [Prices]

In this notebook, I will create a scrip to download  price data from the EODHD API.
The API website is [EODHD API](https://eodhd.com/).
The wil be stored in the .env file as EODHD_API_KEY.
The date range of the data will match the date range of the 'financial_data_EarningsCal.csv' file  stored in the data folder. From the same file, the company symbols will be used to download the price data.

The way that the API will be accessed is via a function that will iterate through the required dates and company symbols.

The final dataset will be stored in the data folder as 'price_data.csv'.

Let's get started!

In [29]:
import os
import pandas as pd
from datetime import datetime
from dotenv import load_dotenv

# Load environment variables from .env file

load_dotenv()

# Get the API key from environment variable

api_key = os.getenv('EODHD_API_KEY')


In [30]:
# load datasets to get the date range and company symbols

filepath='../Data/Datasets/financial_data_EarningsCal.csv'

financial_data = pd.read_csv(filepath)



In [31]:
# check the data

financial_data


Unnamed: 0,date,symbol,eps,epsEstimated,time,revenue,revenueEstimated,updatedFromDate,fiscalDateEnding
0,2025-10-29,AAPL,,,amc,,,2024-12-09,2025-09-28
1,2025-07-30,AAPL,,,amc,,,2024-12-09,2025-06-28
2,2025-04-30,AAPL,,,amc,,,2024-12-09,2025-03-28
3,2025-02-06,AAPL,,2.36,amc,,,2024-12-09,2024-12-28
4,2024-10-31,AAPL,1.64,1.60,amc,9.493000e+10,9.451195e+10,2024-12-09,2024-09-28
...,...,...,...,...,...,...,...,...,...
13257,2014-08-05,ZTS,0.38,0.39,bmo,1.158000e+09,1.158000e+09,2024-11-21,2014-06-29
13258,2014-05-06,ZTS,0.38,0.37,bmo,1.097000e+09,1.068132e+09,2024-11-21,2014-03-30
13259,2014-02-11,ZTS,0.36,0.35,bmo,1.254000e+09,1.184333e+09,2024-11-21,2013-12-31
13260,2013-11-05,ZTS,0.34,0.35,bmo,1.103000e+09,1.103000e+09,2024-11-21,2013-09-29


In [32]:
# get the date range from the financial data with the min and max dates

min_date = financial_data['fiscalDateEnding'].min()

max_date = financial_data['date'].max()

min_date, max_date

('1993-12-30', '2026-10-28')

In [33]:
# get a unique list of dates and rank in ascending order

dates = financial_data['fiscalDateEnding'].unique()

dates = sorted(dates)

dates

['1993-12-30',
 '1994-06-30',
 '1994-12-30',
 '1995-06-30',
 '1996-06-30',
 '2000-12-31',
 '2001-06-30',
 '2001-12-30',
 '2001-12-31',
 '2002-06-30',
 '2002-12-30',
 '2002-12-31',
 '2003-01-26',
 '2003-04-27',
 '2003-06-30',
 '2003-07-27',
 '2003-10-26',
 '2003-12-30',
 '2003-12-31',
 '2004-06-30',
 '2004-09-30',
 '2004-12-30',
 '2004-12-31',
 '2005-03-31',
 '2005-06-30',
 '2005-09-30',
 '2005-12-31',
 '2006-03-31',
 '2006-06-30',
 '2006-07-30',
 '2006-09-30',
 '2006-10-24',
 '2006-12-30',
 '2006-12-31',
 '2007-03-31',
 '2007-06-30',
 '2007-07-24',
 '2007-09-30',
 '2007-12-31',
 '2008-01-23',
 '2008-03-31',
 '2008-06-30',
 '2008-07-23',
 '2008-09-30',
 '2008-12-31',
 '2009-03-31',
 '2009-06-30',
 '2009-09-30',
 '2009-12-31',
 '2010-03-31',
 '2010-06-30',
 '2010-08-01',
 '2010-09-30',
 '2010-12-31',
 '2011-03-31',
 '2011-06-30',
 '2011-09-30',
 '2011-12-31',
 '2012-03-31',
 '2012-06-30',
 '2012-09-28',
 '2012-09-30',
 '2012-12-28',
 '2012-12-31',
 '2013-02-09',
 '2013-03-29',
 '2013-03-

In [34]:
# view the data based on symbol provided

financial_data[financial_data['symbol']=='NVDA']

Unnamed: 0,date,symbol,eps,epsEstimated,time,revenue,revenueEstimated,updatedFromDate,fiscalDateEnding
8515,2025-08-26,NVDA,,,amc,,,2024-12-07,2025-07-27
8516,2025-05-20,NVDA,,,amc,,,2024-12-07,2025-04-27
8517,2025-02-19,NVDA,,0.83,amc,,38011010000.0,2024-12-07,2025-01-27
8518,2024-11-20,NVDA,0.81,0.75,amc,35082000000.0,33171320000.0,2024-12-07,2024-10-27
8519,2024-08-28,NVDA,0.68,0.64,amc,30040000000.0,28779510000.0,2024-11-28,2024-07-28
8520,2024-05-22,NVDA,0.61,0.56,amc,26044000000.0,24592800000.0,2024-11-25,2024-04-28
8521,2024-02-21,NVDA,0.52,0.46,amc,22103000000.0,20238800000.0,2024-11-25,2024-01-28
8522,2023-11-21,NVDA,0.4,0.34,amc,18120000000.0,15194600000.0,2024-11-25,2023-10-29
8523,2023-08-23,NVDA,0.27,0.21,amc,13507000000.0,11224000000.0,2024-11-25,2023-07-30
8524,2023-05-24,NVDA,0.11,0.09,amc,7192000000.0,6516830000.0,2024-11-25,2023-04-30


In [35]:
# create a table based on the number of symbols in each year of the fiscal date

financial_data.groupby('fiscalDateEnding')['symbol'].count()


fiscalDateEnding
1993-12-30     1
1994-06-30     1
1994-12-30     1
1995-06-30     1
1996-06-30     1
              ..
2025-12-30    10
2025-12-31     1
2026-03-30     4
2026-06-30     5
2026-09-30     4
Name: symbol, Length: 847, dtype: int64

In [36]:
# creating a unique list of symbols based on the dataset

symbols = financial_data['symbol'].unique()

# printing the count of symbols

len(symbols)

282

In [37]:
# creating a function to get time series price data for each symbol.
# The function will iterate through the list of symbols and get the price data for each symbol.
# The function will take an dataframe as an input and append the data for each symbol to the dataframe, the empty dataframe should be defined before the function and be an input into the function.
# The function will append data after each API call to the dataframe, in case if the API times out I will not lose previous data. The function will return the dataframe with all the data for all the symbols once iteration is complete.
# The function will take start dates and end dates as inputs
# The API used is the EODHD API, the historical data endpoint. In order to make sure that API doesn't time out a retry logic will be put into the function and sleep/delay will be added between retries. The data will be accessed on a daily basis. An example of endpoint access is as follows https://eodhd.com/api/eod/MCD.US?from=2020-01-05&to=2020-02-10&period=d&api_token=sample_API&fmt=json

In [38]:
import requests
import time
import pandas as pd

def get_price_data(symbols, start_date, end_date, price_data, retries=3, delay=5, timeout=10):
    for symbol in symbols:
        url = f'https://eodhd.com/api/eod/{symbol}.US?from={start_date}&to={end_date}&api_token={api_key}&fmt=json'
        for attempt in range(retries):
            try:
                response = requests.get(url, timeout=timeout)
                response.raise_for_status()  # Raise an error for bad status codes
                data = response.json()
                temp_data = pd.DataFrame(data)
                temp_data['symbol'] = symbol
                price_data = pd.concat([price_data, temp_data], ignore_index=True)
                break  # Exit the retry loop if the request is successful
            except requests.exceptions.RequestException as e:
                print(f'Attempt {attempt + 1} of {retries} failed: {e}. Retrying in {delay} seconds...')
                time.sleep(delay)
    return price_data



In [39]:
# Example usage:
# Define an empty DataFrame to store the price data
price_data_test = pd.DataFrame()

# Define the list of symbols, start date, and end date
symbols_test = ['AAPL']
start_date_test = '2024-10-01'
end_date_test = '2024-10-10'


In [40]:
# Call the function to get the price data
price_data = get_price_data(symbols_test, start_date_test, end_date_test, price_data_test)
price_data

Unnamed: 0,date,open,high,low,close,adjusted_close,volume,symbol
0,2024-10-01,229.52,229.65,223.74,226.21,225.9614,63285000,AAPL
1,2024-10-02,225.89,227.37,223.02,226.78,226.5308,32880600,AAPL
2,2024-10-03,225.14,226.81,223.32,225.67,225.422,34044200,AAPL
3,2024-10-04,227.9,228.0,224.13,226.8,226.5507,37245100,AAPL
4,2024-10-07,224.5,225.69,221.33,221.69,221.4464,39505400,AAPL
5,2024-10-08,224.3,225.98,223.25,225.77,225.5219,31855700,AAPL
6,2024-10-09,225.23,229.75,224.83,229.54,229.2877,33591100,AAPL
7,2024-10-10,227.78,229.5,227.17,229.04,228.7883,28183500,AAPL


In [41]:
# Now that the test function works, I will use the function to get the price data for all the symbols in the dataset.

# Define an empty DataFrame to store the price data

price_data = pd.DataFrame()

# Define the start date and end date

end_date = '2024-12-01' # startring date is close to present date when the data is being created
start_date = '2013-01-01' # roughly 10 years of data to match the earnings data


In [42]:
# Call the function to get the price data using the symbols from the dataset, start date, and end date and the empty dataframe defined above

price_data = get_price_data(symbols, start_date, end_date, price_data)

# view the data, first 100 rows

price_data.head(100)



Unnamed: 0,date,open,high,low,close,adjusted_close,volume,symbol
0,2013-01-02,553.8204,554.9992,541.6292,549.0296,16.6873,560518000.0,AAPL
1,2013-01-03,547.8788,549.6708,540.9992,542.0996,16.4767,352965200.0,AAPL
2,2013-01-04,536.9700,538.6304,525.8288,526.9992,16.0177,594333600.0,AAPL
3,2013-01-07,522.0012,529.3008,515.2000,523.8996,15.9235,484156400.0,AAPL
4,2013-01-08,529.2112,531.8908,521.2508,525.3108,15.9664,458707200.0,AAPL
...,...,...,...,...,...,...,...,...
95,2013-05-20,431.9112,445.7992,430.0996,442.9292,13.6306,451578400.0,AAPL
96,2013-05-21,438.1496,445.4800,434.1988,439.6588,13.5299,456022000.0,AAPL
97,2013-05-22,444.0492,448.3500,438.2196,441.3500,13.5820,443038400.0,AAPL
98,2013-05-23,435.9488,446.1604,435.7892,442.1396,13.6063,353021200.0,AAPL


In [43]:
# save the data to the data folder

price_data.to_csv('../Data/Datasets/financial_data_prices.csv', index=False)


In [44]:
# describe the data

price_data.describe()



Unnamed: 0,open,high,low,close,adjusted_close,volume
count,784473.0,784473.0,784473.0,784473.0,784473.0,784473.0
mean,1414.766254,1424.703154,1403.45569,1414.094327,1378.542523,9142435.0
std,22115.370079,22266.947014,21939.138705,22101.311995,22102.848235,36246660.0
min,1.62,1.69,0.0013,1.62,0.2764,0.0
25%,46.0,46.48,45.54,46.02,29.8635,1236179.0
50%,83.16,83.97,82.3631,83.19,61.1057,2810000.0
75%,160.6,162.2701,158.94,160.61,127.3083,6818800.0
max,730090.82,741971.39,723050.0,724040.0,724040.0,3692928000.0


In [45]:
# check the data types

price_data.dtypes


date               object
open              float64
high              float64
low               float64
close             float64
adjusted_close    float64
volume            float64
symbol             object
dtype: object