### Creating Financial Datasets [Share Splits]

While doing data exploration, I noticed that Prices are not consistent overtime for some stocks due to the number of outstanding shares changing. Although this is a regular business feature, it can have negative impact on the data analysis and model performance.

In order to correct this issue, I will create a dict of stock splits and adjust the prices accordingly. I will then save the data as a csv file in the data folder named financial_data_ShareSplits.csv

In [43]:
# import libraries

import pandas as pd


In [44]:
# importing cleaned company profile data

filepath='../Data/Datasets/company_profile_cleaned_50B.csv'

company_profile_data = pd.read_csv(filepath)



In [45]:
# creating a list of stocks symbols based on the company profile data

stocks = company_profile_data['symbol'].tolist()

stocks[:5]


['NVDA', 'AAPL', 'MSFT', 'AMZN', 'GOOGL']

In [46]:
# getting API key from gitignore file

import requests
import os
from dotenv import load_dotenv
import time

# Load environment variables from .env file
load_dotenv()

# Get the API key from environment variable
api_key = os.getenv('FMP_API_KEY')


In [47]:
# creating a function which will connect to the Financial Model Prep API and download financial data for each stock. The function will iterate through the list of stocks defined above.
#  The function will start with an empty dataframe (split_data) and append the data for each stock to the dataframe.
# The function will append data after each API call to the dataframe, in case if the API times out i will not lose previous data.
# The function will return the dataframe with all the data for all the stocks once iteration is complete. The API used is the Financial Model Prep API, the Splits_Historical endpoint will be used to download the data.
# In order to make sure that API doesnt time out a retry logic will be put into the function and sleep/delay will be added between retries.
# An example of endpoint access is as follows https://financialmodelingprep.com/api/v3/historical-price-full/stock_split/AAPL?apikey=demo_key


In [48]:
# creating a function to get financial data

def get_financial_data(stocks, retries=3, delay=5, timeout=10):
    """
    Connects to the Financial Model Prep API and downloads financial data for each stock.
    Iterates through the list of stocks and appends the data to a DataFrame.
    Retries the API call in case of timeout and adds a delay between retries.

    Parameters:
    stocks (list): List of stock symbols.
    retries (int): Number of retries in case of API timeout.
    delay (int): Delay in seconds between retries.
    timeout (int): Timeout for the API call in seconds.

    Returns:
    pd.DataFrame: DataFrame containing the financial data for all stocks.
    """
    split_data = pd.DataFrame()

    for stock in stocks:
        url = f"https://financialmodelingprep.com/api/v3/historical-price-full/stock_split/{stock}?apikey={api_key}"
        for i in range(retries):
            try:
                response = requests.get(url, timeout=timeout)
                data = response.json()
                # Convert the data to a DataFrame
                stock_data = pd.DataFrame(data['historical'])
                # Add the 'symbol' column
                stock_data['symbol'] = stock
                # Concatenate the new data with the existing data
                split_data = pd.concat([split_data, stock_data], ignore_index=True)
                break
            except Exception as e:
                print(f"Error for stock {stock} on try {i+1}/{retries}. Exception: {e}")
                time.sleep(delay)
                continue

    return split_data

In [49]:
# Testing the newly created function. i will download data for 5 stocks and limit the data to 10 records per stock. The 5 stocks should be the first 5 stocks in the list of stocks.

financial_data_test = get_financial_data(stocks[:5])

# sort by symbol ascending and date descending

financial_data_test


Unnamed: 0,date,label,numerator,denominator,symbol
0,2024-06-10,"June 10, 24",10,1,NVDA
1,2021-07-20,"July 20, 21",4,1,NVDA
2,2007-09-11,"September 11, 07",3,2,NVDA
3,2006-04-07,"April 07, 06",2,1,NVDA
4,2001-09-17,"September 17, 01",2,1,NVDA
5,2000-06-27,"June 27, 00",2,1,NVDA
6,2020-08-31,"August 31, 20",4,1,AAPL
7,2014-06-09,"June 09, 14",7,1,AAPL
8,2005-02-28,"February 28, 05",2,1,AAPL
9,2000-06-21,"June 21, 00",2,1,AAPL


In [50]:
# the test was successful, i will now download data for all the stocks in the list and limit the data to 10 years of data; 40 records per stock.

full_split_data= get_financial_data(stocks)


In [51]:
# sorting the data by symbol in ascending order and date in descending order

full_split_data = full_split_data.sort_values(by=['symbol', 'date'], ascending=[True, False])

full_split_data

Unnamed: 0,date,label,numerator,denominator,symbol
6,2020-08-31,"August 31, 20",4.0,1.0,AAPL
7,2014-06-09,"June 09, 14",7.0,1.0,AAPL
8,2005-02-28,"February 28, 05",2.0,1.0,AAPL
9,2000-06-21,"June 21, 00",2.0,1.0,AAPL
10,1987-06-16,"June 16, 87",2.0,1.0,AAPL
...,...,...,...,...,...
62,2001-07-19,"July 19, 01",2.0,1.0,XOM
63,1997-04-14,"April 14, 97",2.0,1.0,XOM
64,1987-09-15,"September 15, 87",2.0,1.0,XOM
65,1981-06-12,"June 12, 81",2.0,1.0,XOM


In [54]:
# saving the file as a csv file in the data folder named financial_data_EnterpriseValue.csv

full_split_data.to_csv('../Data/Datasets/split_history.csv', index=False)