<a href="https://colab.research.google.com/github/GawainGan/Stock-Markets-Analytics/blob/main/Code/Moudle_1_Retrieving_Financial_Market_Data_with_yfinance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Moudle 1 presents a series of financial and economic analyses performed on various datasets. Key tasks include:

1. Analyzing the average GDP growth in 2023.
2. Calculating the inverse Treasury yield difference since 2000.
3. Comparing 5-year growth rates of the S&P 500 and IPC Mexico indices.
4. Determining the 52-week range ratio for top-earning stocks in 2023.
5. Identifying the highest dividend yield among selected stocks in 2023.
6. Exploring additional valuable metrics for further analysis.
7. Developing a strategy based on earnings release dates for targeted company selection.

These analyses provide insights into economic trends, stock performance, and investment opportunities.

### install & import

In [None]:
# pip install yfinance



In [None]:
# pip install plotly



In [None]:
import pandas as pd
import yfinance as yf

import plotly.express as px
from pandas_datareader import data as pdr

# 1.Macro - Average Growth of GDP in 2023

- Task: Calculate the average year-over-year (YoY) growth rate of GDP in 2023.
- Data Source: Download Real Gross Domestic Product (GDPC1) from FRED.
- Method: Compute the YoY growth rate and average it over 2023, rounding to one decimal place.



Calculate year-over-year (YoY) growth rate (that is, divide current value to one 4 quarters ago). Find the average YoY growth in 2023 (average from 4 YoY numbers). Round to 1 digit after the decimal point: e.g. if you get 5.66% growth => you should answer 5.7

In [None]:
df1_path = '/content/drive/MyDrive/Colab Notebooks/Stock Market Analytics Zoomcamp/Moudle 1/Q1/GDPC1.csv'
df1 = pd.read_csv(df1_path)
df1.head()

Unnamed: 0,DATE,GDPC1
0,1947-01-01,2182.681
1,1947-04-01,2176.892
2,1947-07-01,2172.432
3,1947-10-01,2206.452
4,1948-01-01,2239.682


In [None]:
df1['DATE'] = pd.to_datetime(df1['DATE'])

df1['YoY'] = (df1['GDPC1'] / df1['GDPC1'].shift(4)) * 100 - 100

df1_2023 = df1[df1['DATE'].dt.year == 2023]

average_yoy_growth_2023 = round(df1_2023['YoY'].mean(), 1)

print(average_yoy_growth_2023)

2.5


# 2.Macro - Inverse "Treasury Yield"

- Task: Find the minimum value of the difference between DGS10 and DGS2 since January 1, 2000.
- Data Source: Download DGS2 and DGS10 series from FRED.
- Method: Join the datasets, calculate the daily difference (DGS10 - DGS2), and identify the minimum value, rounded to one decimal place.

Find the min value of (dgs10-dgs2) after since year 2000 (2000-01-01) and write it down as an answer, round to 1 digit after the decimal point.

In [None]:
df_DGS2_path = '/content/drive/MyDrive/Colab Notebooks/Stock Market Analytics Zoomcamp/Moudle 1/Q2/DGS2.csv'
df_DGS10_path = '/content/drive/MyDrive/Colab Notebooks/Stock Market Analytics Zoomcamp/Moudle 1/Q2/DGS10.csv'

df_DGS2 = pd.read_csv(df_DGS2_path)
df_DGS10 = pd.read_csv(df_DGS10_path)

In [None]:
# convert into datatime
df_DGS2['DATE'] = pd.to_datetime(df_DGS2['DATE'])
df_DGS10['DATE'] = pd.to_datetime(df_DGS10['DATE'])

In [None]:
df_DGS2 = df_DGS2[df_DGS2 != '.']
df_DGS2 = df_DGS2[df_DGS2['DATE'] >= '2000-01-01']
df_DGS2['DGS2'] = df_DGS2['DGS2'].astype(float)

In [None]:
df_DGS10 = df_DGS10[df_DGS10 != '.']
df_DGS10 = df_DGS10[df_DGS10['DATE'] >= '2000-01-01']
df_DGS10['DGS10'] = df_DGS10['DGS10'].astype(float)

In [None]:
df_diff = pd.DataFrame(columns=['DATE', 'DIFF_VALUE'])
df_diff['DATE'] = df_DGS10['DATE']
df_diff['DIFF_VALUE'] = df_DGS10['DGS10'] - df_DGS2['DGS2']
round(df_diff.min()['DIFF_VALUE'], 1)

-1.1

# 3.Index - Recent Performance Comparison

- Task: Compare the 5-year growth of S&P 500 and IPC Mexico indexes.
- Data Source: Download daily index prices from Yahoo Finance for S&P 500 (^GSPC) and IPC Mexico (^MXX).
- Method: Calculate the 5-year growth rate from April 9, 2019, to April 9, 2024, and report the higher growth as a percentage (closest integer).

Compare S&P 500 and IPC Mexico indexes by the 5 year growth and write down the largest value as an answer (%)



In [None]:
# Get data for S&P 500 and IPC Mexico indexes
sp500 = yf.download('^GSPC', start='2019-04-09', end='2024-04-09')
ipc_mexico = yf.download('^MXX', start='2019-04-09', end='2024-04-09')

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


In [None]:
# Calculate the 5-year growth for both indexes
sp500_5yr_growth = (sp500['Close'][-1] / sp500['Close'][0] ) -1
ipc_mexico_5yr_growth = (ipc_mexico['Close'][-1] / ipc_mexico['Close'][0] ) -1

print(sp500_5yr_growth)
print(ipc_mexico_5yr_growth)
# Find the largest value and print it
largest_growth = max(sp500_5yr_growth, ipc_mexico_5yr_growth)

print(round(largest_growth, 2)*100 , "%")

0.8075151917783085
0.2843377484045295
81.0 %


# 4.Stocks OHLCV - 52-Weeks Range Ratio in 2023

- Task: Find the largest range ratio [(max-min)/max] of adjusted close prices for selected stocks in 2023.
- Data Source: Download daily OHLCV data from Yahoo Finance for 2222.SR, BRK-B, AAPL, MSFT, GOOG, JPM.
- Method: Calculate the ratio for each stock and identify the largest value, rounded to two decimal places.

Find the largest range ratio [=(max-min)/max] of Adj.Close prices in 2023

In [None]:
# Define the list of top 6 stocks
top6_stocks = ["2222.SR", "BRK-B", "AAPL", "MSFT", "GOOG", "JPM"]

# Initialize a dictionary to store the range ratios
range_ratios = {}

# Loop through each stock and calculate the range ratio
for stock in top6_stocks:
  # Download the stock data
  stock_data = yf.download(stock, start="2023-01-01", end="2023-12-31")

  # Calculate the range ratio
  # Calculate maximum-minimim "Adj.Close" price for each stock and divide it by the maximum "Adj.Close" value.
  range_ratio = (stock_data["Adj Close"].max() - stock_data["Adj Close"].min()) / stock_data["Adj Close"].max()

  # Store the range ratio in the dictionary
  range_ratios[stock] = range_ratio


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


 Round the result to two decimal places (e.g. 0.1575 will be 0.16)

In [None]:
for i,k in enumerate(range_ratios):
  print(k, round(range_ratios[k],2))

2222.SR 0.21
BRK-B 0.21
AAPL 0.37
MSFT 0.42
GOOG 0.39
JPM 0.28


# 5.Stocks - Dividend Yield

- Task: Determine the largest dividend yield for selected stocks in 2023.
- Data Source: Download dividend data from Yahoo Finance for 2222.SR, BRK-B, AAPL, MSFT, GOOG, JPM.
- Method: Sum the dividends paid in 2023, divide by the year-end adjusted close price, and report the highest yield as a percentage, rounded to one decimal place.

In [None]:
test_1 = yf.Ticker("GOOG")
# test_1.dividends
hist = test_1.history(period='2y')
hist = hist.reset_index()
hist['Dividends'].value_counts()

Dividends
0.0    503
Name: count, dtype: int64

In [None]:
# Define the list of top 6 stocks
top6_stocks = ["2222.SR", "BRK-B", "AAPL", "MSFT", "GOOG", "JPM"]

# Initialize a dictionary to store the dividend yields
dividend_yields = {}

# Loop through each stock and calculate the dividend yield
for stock in top6_stocks:
  # Download the stock data
  stock_data = yf.Ticker(stock)


  # divide each value by the closing price (Adj.Close) at the last trading day of the year
  data_contain_adj_close = pdr.get_data_yahoo(stock, start="2023-01-01", end="2024-01-01")
  data_contain_adj_close = data_contain_adj_close.reset_index()
  last_day_adj_close = float(data_contain_adj_close['Adj Close'][-1:].values[0])

  # Get the dividends paid in 2023
  dividends_2023 = stock_data.dividends[stock_data.dividends.index.year == 2023]

  # Calculate the dividend yield
  # Sum up all dividends paid in 2023 per company
  dividend_adj_close = dividends_2023.sum() / last_day_adj_close

  print("\nStock:", stock)
  print("Sum dividends in 2023:", dividends_2023.sum())
  print("last day adj close price: ",last_day_adj_close)

  # Store the dividend yield in the dictionary
  dividend_yields[stock] = dividend_adj_close

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


Stock: 2222.SR
Sum dividends in 2023: 0.9107640000000001
last day adj close price:  32.82804870605469



[*********************100%%**********************]  1 of 1 completed


Stock: BRK-B
Sum dividends in 2023: 0.0
last day adj close price:  356.6600036621094



[*********************100%%**********************]  1 of 1 completed


Stock: AAPL
Sum dividends in 2023: 0.95
last day adj close price:  192.28463745117188



[*********************100%%**********************]  1 of 1 completed


Stock: MSFT
Sum dividends in 2023: 2.79
last day adj close price:  375.34588623046875



[*********************100%%**********************]  1 of 1 completed


Stock: GOOG
Sum dividends in 2023: 0.0
last day adj close price:  140.92999267578125






Stock: JPM
Sum dividends in 2023: 4.05
last day adj close price:  168.07713317871094


In [None]:
dividend_yields

{'2222.SR': 0.027743470474138235,
 'BRK-B': 0.0,
 'AAPL': 0.004940592304162832,
 'MSFT': 0.007433143940964608,
 'GOOG': 0.0,
 'JPM': 0.024096079718909574}

In [None]:
max_dividend_yield = max(dividend_yields.values())
max_dividend_yield_company = max(dividend_yields, key=dividend_yields.get)
max_dividend_yield_percent = round(max_dividend_yield * 100, 1)

print(f"The maximum dividend yield is {max_dividend_yield_percent}% for company {max_dividend_yield_company}")

The maximum dividend yield is 2.8% for company 2222.SR


# 6.Exploratory - Investigate New Metrics
- Task: Download and explore additional metrics or time series that might be valuable for your project.
- Method: Free text answer describing why these metrics are important.



- gold_ticker = 'GC=F'
- silver = 'SI=F'
- crude_oil = 'CL=F'
- natural_gas = 'NG=F'
- corn = 'CORN'

In [None]:
# Define the list of tickers
tickers = ['GC=F', 'SI=F', 'CL=F', 'NG=F', 'CORN']

# Download the data for each ticker
gold_data = yf.download(tickers[0], start="2020-01-01")
silver_data = yf.download(tickers[1], start="2020-01-01")
crude_oil_data = yf.download(tickers[2], start="2020-01-01")
natural_gas_data = yf.download(tickers[3], start="2020-01-01")
corn_data = yf.download(tickers[4], start="2020-01-01")

# Print a message indicating that the data has been downloaded
print("Data downloaded successfully!")

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed

Data downloaded successfully!





To delve into a comprehensive analysis of key commodities, understanding their time-series data can yield valuable insights. By examining the historical prices of gold (GC=F), silver (SI=F), crude oil (CL=F), natural gas (NG=F), and corn (CORN), we can uncover patterns, correlations, and potentially predictive relationships that could significantly benefit your project. Here's why each of these time series could be valuable:

1. **Gold (GC=F)**: Often considered a 'safe haven' asset, gold prices can indicate investor sentiment, particularly during times of economic uncertainty. Analyzing gold's price fluctuations could help predict shifts in market dynamics and investor behavior.

2. **Silver (SI=F)**: While silver is also a precious metal like gold, it has a higher industrial usage. Investigating the price trends of silver alongside gold can help us understand not just the investment demand but also industrial health, as it may correlate with manufacturing activity.

3. **Crude Oil (CL=F)**: Crude oil is a critical determinant of global economic health, with its price affecting a broad range of industries. It is also sensitive to geopolitical events. Time series analysis of crude oil can provide foresight into energy costs and economic vitality.

4. **Natural Gas (NG=F)**: As a key energy source, especially for heating and electricity generation, natural gas prices can reflect supply-demand dynamics and seasonal patterns. It also provides insight into the energy market's shift towards cleaner fuel sources.

5. **Corn (CORN)**: Corn is a staple commodity with diverse uses from food products to biofuels. Its price can be influenced by a variety of factors, including weather conditions, harvest yields, and biofuel demand. Studying corn prices can signal changes in agricultural markets and food inflation.

Exploring the interconnections between these commodities can illuminate how they collectively respond to macroeconomic changes. For instance, rising oil prices can lead to increased production costs across industries, which may influence the prices of precious metals or agricultural commodities. Similarly, movements in natural gas prices might be used to predict alterations in corn prices, especially if corn is used significantly for bioenergy.

In the project, by incorporating the historical data of these commodities, I can develop econometric models or machine learning algorithms that detect patterns and correlations. These findings can then be used to predict future price movements, hedge risks, or inform investment strategies. The key lies in not only capturing their individual price movements but also understanding the causality or interdependence between these commodities, which could unlock powerful predictive capabilities for the analysis.

# 7.Exploratory - Time-Driven Strategy Around Earnings Releases
- Task: Describe an analytical strategy for selecting companies based on future earnings release dates.


Explore earning dates for the whole month of April - e.g. using YahooFinance earnings calendar (https://finance.yahoo.com/calendar/earnings?from=2024-04-21&to=2024-04-27&day=2024-04-23). Compare with the previous closed earnings (e.g., recent dates with full data https://finance.yahoo.com/calendar/earnings?from=2024-04-07&to=2024-04-13&day=2024-04-08).

Describe an analytical strategy/idea (you're not required to implement it) to select a subset companies of interest based on the future events data.

- Historical Earnings Performance: Look at how companies' stocks reacted to past earnings announcements.
- Earnings Surprises: Focus on companies with a history of beating earnings estimates.
- Industry Trends: Choose companies in industries relevant to current economic conditions.
- Company Size: Larger companies might offer stability, while smaller companies could have higher volatility.
- Forward-looking Statements: Pay attention to the management's forward-looking statements and guidance during earnings calls, as these can impact future performance expectations.
- Economic Indicators and Market Sentiment: Consider broader market trends and economic indicators that could influence the entire market or specific sectors, and thus, affect company earnings.


By evaluating these elements, we can identify potential investment opportunities based on upcoming earnings releases.


