### Creating Financial Datasets [Enterprise Value]

In this notebook, we will create financial datasets that will be used in the subsequent notebooks to build predictive data models.

The notebook will start with company profile data created previously.

The company profile data will be cleaned to include only stocks and to keep the qualitative information required for future model development.

For the set of remaining company stocks I will write code that will connect to the Financial Model Prep API and donwload financial data for each stock.

The financial data will be:

1. Historical stock prices __[in this notebook - Enterprise Value]__
2. Financial statements key metrics
3. Financial statement ratios
4. Financial growth

Each of the above data will be stored in a separate file and saved as a csv file.

The final dataset will be a combination of all the above datasets.

In [34]:
# import libraries

import pandas as pd


In [35]:
# importing cleaned company profile data

filepath='data/Datasets/company_profile_cleaned_50B.csv'

company_profile_data = pd.read_csv(filepath)

#checking data on second dataset for quality check
#filepath_1='data/Datasets/processed_quant_data.csv'
 
#quant_data = pd.read_csv(filepath_1, low_memory=False)


In [36]:
# checking the data of company profile data

company_profile_data.head()


Unnamed: 0,symbol,price,beta,mktCap,companyName,currency,cik,isin,cusip,exchange,...,sector,country,city,state,zip,isEtf,isActivelyTrading,isAdr,isFund,date
0,NVDA,141.98,1.657,3e-06,NVIDIA Corporation,USD,1045810.0,US67066G1040,67066G104,NASDAQ Global Select,...,Technology,US,Santa Clara,CA,95051,False,True,False,False,2024-12-02
1,AAPL,225.0,1.24,3e-06,Apple Inc.,USD,320193.0,US0378331005,037833100,NASDAQ Global Select,...,Technology,US,Cupertino,CA,95014,False,True,False,False,2024-12-02
2,MSFT,415.0,0.904,3e-06,Microsoft Corporation,USD,789019.0,US5949181045,594918104,NASDAQ Global Select,...,Technology,US,Redmond,WA,98052-6399,False,True,False,False,2024-12-02
3,AMZN,202.61,1.146,2e-06,"Amazon.com, Inc.",USD,1018724.0,US0231351067,023135106,NASDAQ Global Select,...,Consumer Cyclical,US,Seattle,WA,98109-5210,False,True,False,False,2024-12-02
4,GOOGL,172.49,1.034,2e-06,Alphabet Inc.,USD,1652044.0,US02079K3059,02079K305,NASDAQ Global Select,...,Communication Services,US,Mountain View,CA,94043,False,True,False,False,2024-12-02


In [37]:
# creating a list of stocks symbols based on the company profile data

stocks = company_profile_data['symbol'].tolist()


In [38]:
# creating a function which will connect to the Financial Model Prep API and download financial data for each stock. The function will iterate through the list of stocks defined above. 
# The limit should be a variable that can be changed to download more or less data as required. The function will start with an empty dataframe and append the data for each stock to the dataframe, the empy dataframe should be defined as a global variable outside the function in order to be able to make changes outside the function.
# The function will append data after each API call to the dataframe, in case if the API times out i will not lose previous data. 
# The function will return the dataframe with all the data for all the stocks once iteration is complete. The API used is the Financial Model Prep API, the Enterprise Values endpoint. 
# The data will be accessed on a quarterly basis. An example of endpoint access is as follows https://financialmodelingprep.com/api/v3/enterprise-values/AAPL/?period=quarter&limit=100&apikey=demo


In [None]:
# getting API key from gitignore file

import requests
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Get the API key from environment variable
api_key = os.getenv('FMP_API_KEY')



# defining global variable

financial_data = pd.DataFrame()


In [None]:

# defining function

def get_financial_data(stocks, limit):
    global financial_data
    for stock in stocks:
        url = f'https://financialmodelingprep.com/api/v3/enterprise-values/{stock}/?period=quarter&limit={limit}&apikey={api_key}'
        response = requests.get(url)
        data = response.json()
        data = pd.DataFrame(data)
        financial_data = pd.concat([financial_data, data], ignore_index=True)
    return financial_data


In [40]:
# Testing the newly created function. i will download data for 5 stocks and limit the data to 10 records per stock. The 5 stocks should be the first 5 stocks in the list of stocks.

#financial_data_EV = get_financial_data(stocks[:5], 10)

#financial_data_EV


In [41]:
# the test was successful, i will now download data for all the stocks in the list and limit the data to 10 years of data; 40 records per stock.

financial_data_EV = get_financial_data(stocks, 40)


In [47]:
# viewing the data

financial_data_EV.head(100)


Unnamed: 0,symbol,date,stockPrice,numberOfShares,marketCapitalization,minusCashAndCashEquivalents,addTotalDebt,enterpriseValue
0,NVDA,2024-10-27,140.52,24533000000,3.447377e+12,9107000000,10225000000,3448495160000
1,NVDA,2024-07-28,111.59,24578000000,2.742659e+12,8563000000,10015000000,2744111020000
2,NVDA,2024-04-28,87.76,24620000000,2.160651e+12,7587000000,10991000000,2164055200000
3,NVDA,2024-01-28,62.47,24660000000,1.540510e+12,7280000000,11056000000,1544286200000
4,NVDA,2023-10-29,41.16,24680000000,1.015829e+12,5519000000,11027000000,1021336799999
...,...,...,...,...,...,...,...,...
95,MSFT,2020-12-31,222.42,7555000000,1.680383e+12,14432000000,82782000000,1748733100000
96,MSFT,2020-09-30,210.33,7566000000,1.591357e+12,17205000000,83216000000,1657367780000
97,MSFT,2020-06-30,203.51,7580000000,1.542606e+12,13576000000,82110000000,1611139800000
98,MSFT,2020-03-31,157.71,7602000000,1.198911e+12,11710000000,84025000000,1271226420000


In [45]:
# getting information on the data

financial_data_EV.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11031 entries, 0 to 11030
Data columns (total 8 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   symbol                       11031 non-null  object 
 1   date                         11031 non-null  object 
 2   stockPrice                   11031 non-null  float64
 3   numberOfShares               11031 non-null  int64  
 4   marketCapitalization         11031 non-null  float64
 5   minusCashAndCashEquivalents  11031 non-null  int64  
 6   addTotalDebt                 11031 non-null  int64  
 7   enterpriseValue              11031 non-null  int64  
dtypes: float64(2), int64(4), object(2)
memory usage: 689.6+ KB


In [46]:
# describing the data

financial_data_EV.describe()


Unnamed: 0,stockPrice,numberOfShares,marketCapitalization,minusCashAndCashEquivalents,addTotalDebt,enterpriseValue
count,11031.0,11031.0,11031.0,11031.0,11031.0,11031.0
mean,1520.355663,1711814000.0,436976000000.0,735034100000.0,466063100000.0,168005000000.0
std,22704.603113,2772960000.0,5703758000000.0,6953679000000.0,3629099000000.0,7329153000000.0
min,0.48275,0.0,0.0,-62032000000.0,0.0,-82120730000000.0
25%,45.55,346600000.0,37105410000.0,920679500.0,5533500000.0,42072860000.0
50%,82.94,822800000.0,65316780000.0,2923300000.0,14988000000.0,80827080000.0
75%,168.125,1858403000.0,131289400000.0,9363500000.0,43115500000.0,160717100000.0
max,691180.0,25060000000.0,441881700000000.0,113630200000000.0,95619150000000.0,473199600000000.0


In [48]:
# saving the file as a csv file in the data folder named financial_data_EnterpriseValue.csv

financial_data_EV.to_csv('data/Datasets/financial_data_EnterpriseValue.csv', index=False)