<a href="https://colab.research.google.com/github/chrisdamba/stock-markets-analytics-zoomcamp/blob/main/homework1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
# IMPORTS
import numpy as np
import pandas as pd

#Fin Data Sources
import yfinance as yf
import pandas_datareader as pdr

#Data viz
import plotly.graph_objs as go
import plotly.express as px

import time
from datetime import date, datetime

### Question 1: [Macro] Average growth of GDP in 2023

**What is the average growth (in %) of GDP in 2023?**





In [3]:
# Set the start and end date for data retrieval
start = datetime(2020, 1, 1)
end = datetime(2023, 12, 31)

# Download the GDP data from FRED
gdp_data = pdr.DataReader('GDPC1', 'fred', start, end)

# Calculate YoY growth rates
gdp_data['YoY Growth'] = gdp_data['GDPC1'].pct_change(periods=4) * 100

# Filter the data for 2023 and calculate the average growth
average_growth_2023 = gdp_data['YoY Growth']['2023'].mean()

# Print the rounded result
print(f"The average YoY growth of GDP in 2023 is {average_growth_2023:.1f}%")

The average YoY growth of GDP in 2023 is 2.5%


### Question 2. [Macro] Inverse "Treasury Yield"
**Find the min value of (dgs10-dgs2) after since year 2000 (2000-01-01) and write it down as an answer, round to 1 digit after the decimal point.**

In [6]:
# Set the start date for data retrieval
start = datetime(2000, 1, 1)

# Download the interest rate data from FRED
dgs2 = pdr.DataReader("DGS2", "fred", start=start)
dgs10 = pdr.DataReader("DGS10", "fred", start=start)

# Reset the dataframe index
dgs2 = dgs2.reset_index()
dgs10 = dgs10.reset_index()

# Join the two dataframes on the date index on DATA
dgs_2_10 = dgs10.join(dgs2, how='inner', lsuffix='_10', rsuffix='_2')

# drop DATE_2 column
dgs_2_10.drop('DATE_2', axis=1, inplace=True)

# converts DGS10 and DGS2 into numeric values, if '.' is present, convert to NaN
dgs_2_10['DGS10'] = pd.to_numeric(dgs_2_10['DGS10'], errors='coerce')
dgs_2_10['DGS2'] = pd.to_numeric(dgs_2_10['DGS2'], errors='coerce')
# drop rows where either DGS10 or DGS2 is NaN
dgs_2_10.dropna(subset=['DGS10', 'DGS2'], inplace=True)

# Calculate the difference
dgs_2_10['Difference'] = dgs_2_10['DGS10'] - dgs_2_10['DGS2']
# Find the minimum value of the difference
min_difference = dgs_2_10['Difference'].min()

# Print the rounded result
print(f"The minimum value of (DGS10 - DGS2) since 2000 is {min_difference:.1f}")

The minimum value of (DGS10 - DGS2) since 2000 is -1.1


### Question 3. [Index] Which Index is better recently?

**Compare S&P 500 and IPC Mexico indexes by the 5 year growth and write down the largest value as an answer (%)**

In [7]:
# Define start and end dates for the 5-year period
start_date = datetime(2019, 4, 9)
end_date = datetime(2024, 4, 9)

# Download S&P 500 (^GSPC) and IPC Mexico (^MXX) data from Yahoo Finance
sp500 = yf.download("^GSPC", start=start_date, end=end_date)
ipc = yf.download("^MXX", start=start_date, end=end_date)

# Calculate the 5-year growth for both indexes
sp500_start_price = sp500['Adj Close'].iloc[0]
sp500_end_price = sp500['Adj Close'].iloc[-1]
ipc_start_price = ipc['Adj Close'].iloc[0]
ipc_end_price = ipc['Adj Close'].iloc[-1]

sp500_growth = ((sp500_end_price - sp500_start_price) / sp500_start_price) * 100
ipc_growth = ((ipc_end_price - ipc_start_price) / ipc_start_price) * 100

# Determine which index had the highest growth
if sp500_growth > ipc_growth:
    better_index = "S&P 500"
    growth_rate = round(sp500_growth)
else:
    better_index = "IPC Mexico"
    growth_rate = round(ipc_growth)

print(f"The better index recently is {better_index} with a growth rate of {growth_rate}%.")

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed

The better index recently is S&P 500 with a growth rate of 81%.





### Question 4. [Stocks OHLCV] 52-weeks range ratio (2023) for the selected stocks


**Find the largest range ratio [=(max-min)/max] of Adj.Close prices in 2023**

In [12]:
# Define the list of stock tickers
stock_tickers = ['2222.SR', 'BRK-B', 'AAPL', 'MSFT', 'GOOG', 'JPM']

# Download daily OHLCV data for 2023 for each stock
stock_data = {}
for ticker in stock_tickers:
    stock_data[ticker] = yf.download(ticker, start='2023-01-01', end='2023-12-31')

# Calculate the range ratio for each stock
max_min_ratios = {}
for ticker, data in stock_data.items():
    max_price = data['Adj Close'].max()
    min_price = data['Adj Close'].min()
    range_ratio = (max_price - min_price) / max_price
    max_min_ratios[ticker] = round(range_ratio, 2)

# Find the stock with the largest range ratio
max_range_stock = max(max_min_ratios, key=max_min_ratios.get)
largest_range_ratio = round(max_min_ratios[max_range_stock], 2)

print(f"The stock with the largest range ratio in 2023 is {max_range_stock} with a range ratio of {largest_range_ratio}")


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed

The stock with the largest range ratio in 2023 is MSFT with a range ratio of 0.42





### Question 5. [Stocks] Dividend Yield
**Find the largest dividend yield for the same set of stocks**

In [14]:
data = yf.download("2222.SR BRK-B AAPL MSFT GOOG JPM", start="2023-01-01", end="2024-01-01", actions=True)
columns = ["2222.SR","BRK-B","AAPL","MSFT","GOOG","JPM"]
dividend_yields = pd.DataFrame(index=columns, columns=["dividend_yield"])
for column in columns:
  dividend_yield = data["Dividends"][column].sum()
  last_day = data[~data["Adj Close"][column].isna()].index[-1]
  dividend_yield /= data.loc[last_day]["Adj Close"][column]
  dividend_yields.loc[column]["dividend_yield"] = dividend_yield
# find the largest dividend yield, round to 1 decimal place, and its ticker, then print
largest_dividend_yield = round(dividend_yields["dividend_yield"].max()*100,1)
print(f"The largest dividend yield in 2023 is {largest_dividend_yield:.1f}%")

[*********************100%%**********************]  6 of 6 completed

The largest dividend yield in 2023 is 2.8%





### Question 6. [Exploratory] Investigate new metrics

**Free text answer**

Download and explore a few additional metrics or time series that might be valuable for your project and write down why (briefly).

I would consider exploring the following metrics where integrating them into my analysis could lead to more informed decision-making processes:

1. **ESG Scores**: Environmental, Social, and Governance scores are becoming increasingly important as investors look to measure a company's ethical impact and sustainability practices. Analyzing these can help identify companies that are better positioned to withstand regulatory changes and consumer trends towards sustainability.

2. **Volatility and Beta**: While commonly used, integrating these metrics more deeply into models can provide insights into risk-adjusted returns, especially in volatile markets. Understanding how stocks react to market changes can help in building more resilient investment portfolios.

3. **Relative Strength Index (RSI)**: This is a momentum oscillator that measures the speed and change of price movements. An RSI can help identify overbought or oversold conditions in a stock, offering potential entry or exit signals.

4. **Debt-to-Equity Ratio**: This financial ratio indicating the relative proportion of shareholders' equity and debt used to finance a company's assets can be crucial in times of increasing interest rates, as companies with high debt levels might be riskier.

### Question 7. [Exploratory] Time-driven strategy description around earnings releases

**Free text answer**

Explore earning dates for the whole month of April - e.g. using YahooFinance earnings calendar (https://finance.yahoo.com/calendar/earnings?from=2024-04-21&to=2024-04-27&day=2024-04-23). Compare with the previous closed earnings (e.g., recent dates with full data https://finance.yahoo.com/calendar/earnings?from=2024-04-07&to=2024-04-13&day=2024-04-08).

Describe an analytical strategy/idea (you're not required to implement it) to select a subset companies of interest based on the future events data.


A potential analytical strategy around earnings releases could be to focus on **volatility clustering** and **post-earnings announcement drift (PEAD)**:

1. **Data Collection**: Gather data on earnings surprises where companies significantly beat or miss their earnings estimates. This information is usually available immediately after earnings are announced and can be accessed via financial news, reports, or databases.

2. **Volatility Assessment**: Analyze the stock's volatility in the days leading up to and following the earnings release. Stocks that show increased volatility post-earnings might indicate market uncertainty or disagreement about the company's valuation, providing trading opportunities.

3. **Drift Analysis**: Research indicates that stocks which beat earnings estimates can continue to perform well in the short to medium term. Implementing a strategy to buy these stocks and hold them for a 3- to 6-week period could yield positive returns.

4. **Risk Management**: Set up stop-loss orders to manage risks. Given that earnings can also lead to sharp declines if the market reacts negatively, it’s essential to have a predetermined exit strategy.

5. **Backtesting**: Before implementing the strategy in real trading scenarios, backtest it against historical data to ensure its effectiveness across different market conditions and earnings seasons.