<a href="https://colab.research.google.com/github/Digu01/stock-markets-analytics-zoomcamp/blob/main/Stock%20Market%20Analysis_week1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Module 1 Homework
In this homework, we're going to download finance data from various sources and make simple calculations/analysis.


In [1]:
import time
import pytz
from datetime import date, datetime, timedelta

import numpy as np
import pandas as pd
import yfinance as yf

import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import pandas_datareader as pdr
import plotly.graph_objs as go
import plotly.express as px

# Set the timezone
utc = pytz.UTC

Question 1: [Macro] Average growth of GDP in 2023
What is the average growth (in %) of GDP in 2023?

Download the timeseries Real Gross Domestic Product (GDPC1) from FRED (https://fred.stlouisfed.org/series/GDPC1). Calculate year-over-year (YoY) growth rate (that is, divide current value to one 4 quarters ago). Find the average YoY growth in 2023 (average from 4 YoY numbers). Round to 1 digit after the decimal point: e.g. if you get 5.66% growth => you should answer 5.7

In [2]:
# Get the current date
end = date.today()
print(f'Year = {end.year}; month= {end.month}; day={end.day}')

# Set the start date to 70 years ago
start = date(year=end.year-70, month=end.month, day=end.day)
print(f'Period for indexes: {start} to {end} ')

# Fetch the GDPC1 data from FRED
gdpc1 = pdr.DataReader("GDPC1", "fred", start=start)

# Calculate the YoY growth rate
gdpc1['YoY_Growth'] = gdpc1['GDPC1'].pct_change(periods=4) * 100

# Calculate the average YoY growth rate for 2023
avg_yoy_2023 = gdpc1.loc['2023-01-01':'2023-12-31', 'YoY_Growth'].mean()

print(f"The average YoY growth rate in 2023 is: {avg_yoy_2023:.1f}%")

Year = 2024; month= 4; day=24
Period for indexes: 1954-04-24 to 2024-04-24 
The average YoY growth rate in 2023 is: 2.5%


Question 2. [Macro] Inverse "Treasury Yield"
Find the min value of (dgs10-dgs2) after since year 2000 (2000-01-01) and write it down as an answer, round to 1 digit after the decimal point.

Download DGS2 and DGS10 interest rates series (https://fred.stlouisfed.org/series/DGS2, https://fred.stlouisfed.org/series/DGS10). Join them together to one dataframe on date (you might need to read about pandas.DataFrame.join()), calculate the difference dgs10-dgs2 daily.

(Additional: think about what does the "inverted yield curve" mean for the market and investors? do you see the same thing in your country/market of interest? Do you think it can be a good predictive feature for the models?)

In [3]:
# Fetch the DGS2 and DGS10 data from FRED
start = date(2000, 1, 1)
dgs2 = pdr.DataReader("DGS2", "fred", start=start, end=end)
dgs10 = pdr.DataReader("DGS10", "fred", start=start, end=end)

# Calculate the difference between DGS2 and DGS10
df = pd.concat([dgs2, dgs10], axis=1)
df['Difference'] = df['DGS10'] - df['DGS2']

# Find the minimum value of the difference, rounded to 1 decimal place
min_difference = round(df['Difference'].min(), 1)

print(f"The minimum value of (DGS10 - DGS2) since 2000-01-01 is: {min_difference}")

The minimum value of (DGS10 - DGS2) since 2000-01-01 is: -1.1


Question 3. [Index] Which Index is better recently?
Compare S&P 500 and IPC Mexico indexes by the 5 year growth and write down the largest value as an answer (%)

Download on Yahoo Finance two daily index prices for S&P 500 (^GSPC, https://finance.yahoo.com/quote/%5EGSPC/) and IPC Mexico (^MXX, https://finance.yahoo.com/quote/%5EMXX/). Compare 5Y growth for both (between 2019-04-09 and 2024-04-09). Select the higher growing index and write down the growth in % (closest integer %). E.g. if ratio end/start was 2.0925 (or growth of 109.25%), you need to write down 109 as your answer.

(Additional: think of other indexes and try to download stats and compare the growth? Do create 10Y and 20Y growth stats. What is an average yearly growth rate (CAGR) for each of the indexes you select?)

In [6]:
# Download S&P 500 index data
sp500 = yf.Ticker("^GSPC")
sp500_data = sp500.history(start="2019-04-09", end="2024-04-09")

# Download IPC Mexico index data
ipc_mexico = yf.Ticker("^MXX")
ipc_mexico_data = ipc_mexico.history(start="2019-04-09", end="2024-04-09")

# Calculate 5-year growth
sp500_growth = (sp500_data['Close'].iloc[-1] / sp500_data['Close'].iloc[0]) - 1
ipc_mexico_growth = (ipc_mexico_data['Close'].iloc[-1] / ipc_mexico_data['Close'].iloc[0]) - 1

# Determine the index with the higher growth
if sp500_growth > ipc_mexico_growth:
    print(f"S&P 500 index had the higher 5-year growth at {round(sp500_growth * 100)}%.")
else:
    print(f"IPC Mexico index had the higher 5-year growth at {round(ipc_mexico_growth * 100)}%.")

S&P 500 index had the higher 5-year growth at 81%.


Additional

In [7]:
# Define the indexes to analyze
indexes = [
    "^GSPC",  # S&P 500
    "^IXIC",  # Nasdaq Composite
    "^DJI",   # Dow Jones Industrial Average
    "^FTSE",  # FTSE 100
    "^GDAXI", # DAX
    "^N225"   # Nikkei 225
]

# Download the historical data for each index
data = {}
for index in indexes:
    ticker = yf.Ticker(index)
    data[index] = ticker.history(period="max")

# Calculate the 10-year and 20-year CAGR for each index
cagr_10y = {}
cagr_20y = {}
for index, df in data.items():
    start_10y = df.index[-251 * 10]
    start_20y = df.index[-251 * 20]
    end = df.index[-1]

    cagr_10y[index] = ((df.loc[end, 'Close'] / df.loc[start_10y, 'Close'])**(1/10)) - 1
    cagr_20y[index] = ((df.loc[end, 'Close'] / df.loc[start_20y, 'Close'])**(1/20)) - 1

# Print the results
print("10-year CAGR:")
for index, cagr in cagr_10y.items():
    print(f"{index}: {round(cagr * 100, 2)}%")

print("\n20-year CAGR:")
for index, cagr in cagr_20y.items():
    print(f"{index}: {round(cagr * 100, 2)}%")

10-year CAGR:
^GSPC: 10.4%
^IXIC: 14.26%
^DJI: 8.82%
^FTSE: 1.61%
^GDAXI: 6.19%
^N225: 9.31%

20-year CAGR:
^GSPC: 7.96%
^IXIC: 11.06%
^DJI: 6.97%
^FTSE: 2.96%
^GDAXI: 8.08%
^N225: 6.33%


Question 4. [Stocks OHLCV] 52-weeks range ratio (2023) for the selected stocks
Find the largest range ratio [=(max-min)/max] of Adj.Close prices in 2023

Download the 2023 daily OHLCV data on Yahoo Finance for top6 stocks on earnings (https://companiesmarketcap.com/most-profitable-companies/): 2222.SR,BRK-B, AAPL, MSFT, GOOG, JPM.

Here is the example data you should see in Pandas for "2222.SR": https://finance.yahoo.com/quote/2222.SR/history

Calculate maximum-minimim "Adj.Close" price for each stock and divide it by the maximum "Adj.Close" value. Round the result to two decimal places (e.g. 0.1575 will be 0.16)

(Additional: why this may be important for your research?)

In [8]:
# Download the 2023 daily OHLCV data for the 6 stocks
tickers = ['2222.SR', 'BRK-B', 'AAPL', 'MSFT', 'GOOG', 'JPM']
data = {}
for ticker in tickers:
    df = yf.download(ticker, start='2023-01-01', end='2023-12-31')
    data[ticker] = df

# Calculate the range ratio for each stock
range_ratios = {}
for ticker, df in data.items():
    max_price = df['Adj Close'].max()
    min_price = df['Adj Close'].min()
    range_ratio = round((max_price - min_price) / max_price, 2)
    range_ratios[ticker] = range_ratio

# Find the largest range ratio
largest_range_ratio = max(range_ratios.values())
largest_stock = [k for k, v in range_ratios.items() if v == largest_range_ratio][0]

print(f"The stock with the largest 52-week range ratio (2023) is {largest_stock} with a ratio of {largest_range_ratio}")

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed

The stock with the largest 52-week range ratio (2023) is MSFT with a ratio of 0.42





Question 5. [Stocks] Dividend Yield
Find the largest dividend yield for the same set of stocks

Use the same list of companies (2222.SR,BRK-B, AAPL, MSFT, GOOG, JPM) and download all dividends paid in 2023. You can use get_actions() method or .dividends field in yfinance library (https://github.com/ranaroussi/yfinance?tab=readme-ov-file#quick-start)

Sum up all dividends paid in 2023 per company and divide each value by the closing price (Adj.Close) at the last trading day of the year.

Find the maximm value in % and round to 1 digit after the decimal point. (E.g., if you obtained
100, the dividend yield is 1.25% -- and your answer should be equal to 1.3)

In [9]:
# Define the list of stocks
tickers = ['2222.SR', 'BRK-B', 'AAPL', 'MSFT', 'GOOG', 'JPM']

# Initialize an empty dictionary to store the results
results = {}

# Loop through each stock
for ticker in tickers:
    # Get the stock data
    stock = yf.Ticker(ticker)

    # Get the dividends paid in 2023
    dividends_2023 = stock.dividends[utc.localize(pd.Timestamp('2023-01-01')):utc.localize(pd.Timestamp('2023-12-31'))]

    # Sum up the dividends
    total_dividends = dividends_2023.sum()

    # Get the closing price at the last trading day of 2023
    last_price = stock.history(period='1y')['Close'][-1]

    # Calculate the dividend yield
    dividend_yield = (total_dividends / last_price) * 100

    # Round the dividend yield to 1 decimal place
    dividend_yield = round(dividend_yield, 1)

    # Store the results in the dictionary
    results[ticker] = dividend_yield

# Find the maximum dividend yield
max_dividend_yield = max(results.values())
max_dividend_stock = [k for k, v in results.items() if v == max_dividend_yield][0]

# Print the result
print(f"The maximum dividend yield is {max_dividend_yield}%. for {max_dividend_stock}")

The maximum dividend yield is 3.0%. for 2222.SR


Question 6. [Exploratory] Investigate new metrics
Free text answer

Download and explore a few additional metrics or time series that might be valuable for your project and write down why (briefly).

The most beneficial dataset for a penny stock retail trader from the FRED - Federal Reserve Economic Data available on Nasdaq Data Link would be the NASDAQ Composite Index (NASDAQCOM).

The NASDAQ Composite Index tracks the performance of over 3,000 common equities listed on the NASDAQ stock exchange, including many penny stocks. As a penny stock trader, closely monitoring the overall movement of the NASDAQ Composite Index can provide valuable insights into the broader market sentiment and trends that can impact the performance of penny stocks.

The NASDAQ Composite Index is a market capitalization-weighted index, meaning it gives more weight to larger, more liquid stocks. However, it still includes a significant number of smaller, lower-priced penny stocks that are often the focus of retail traders. Tracking the NASDAQ Composite Index can help a penny stock trader gauge the overall market conditions and identify potential opportunities or risks for their investments.

Additionally, the NASDAQ Composite Index data is updated daily, providing real-time information that is crucial for active penny stock traders to make informed decisions. By closely monitoring the NASDAQ Composite Index, a penny stock trader can better understand the broader market dynamics and adjust their trading strategies accordingly.

The key metrics of the NASDAQ Composite Index that would be of most interest to a day trader are:

Daily Closing Price:

The daily closing price of the NASDAQ Composite Index is a crucial metric for day traders, as it reflects the overall market sentiment and performance at the end of the trading day.
Monitoring the daily closing price can help day traders identify trends, support/resistance levels, and make informed trading decisions for their penny stock positions.
Intraday Price Movements:

In addition to the daily closing price, day traders would closely follow the intraday price movements of the NASDAQ Composite Index.
Sudden or significant intraday fluctuations in the index can signal volatility in the broader market, which can impact the performance of penny stocks.
Tracking the intraday price action can help day traders time their entries and exits more effectively.
52-Week High and Low:

The 52-week high and low of the NASDAQ Composite Index provide context on the index's historical performance and can help day traders identify potential support and resistance levels.
Monitoring the 52-week range can assist day traders in assessing the overall market sentiment and positioning their penny stock trades accordingly.
By closely monitoring these key metrics of the NASDAQ Composite Index, day traders can gain valuable insights into the broader market conditions and make more informed trading decisions for their penny stock portfolios.

Question 7. [Exploratory] Time-driven strategy description around earnings releases
Free text answer

Explore earning dates for the whole month of April - e.g. using YahooFinance earnings calendar (https://finance.yahoo.com/calendar/earnings?from=2024-04-21&to=2024-04-27&day=2024-04-23). Compare with the previous closed earnings (e.g., recent dates with full data https://finance.yahoo.com/calendar/earnings?from=2024-04-07&to=2024-04-13&day=2024-04-08).

Describe an analytical strategy/idea (you're not required to implement it) to select a subset companies of interest based on the future events data.

To develop an analytical strategy for selecting a subset of companies of interest based on the upcoming earnings calendar data, I would consider the following steps:

Analyze the Previous Earnings Period (2024-04-07 to 2024-04-13):

Identify the companies that reported earnings during this period and their key financial metrics, such as revenue, earnings per share (EPS), and any notable commentary from management.
Assess how the market reacted to these earnings results - did the stock prices rise, fall, or remain relatively unchanged?
Look for any trends or patterns in the previous earnings season that could provide insights into the upcoming period.
Examine the Upcoming Earnings Period (2024-04-21 to 2024-04-27):

Identify the companies scheduled to report earnings during this time frame.
Categorize the companies based on factors such as industry, market capitalization, growth profile, and valuation metrics.
Prioritize companies that are likely to be of interest to a penny stock day trader, such as those with high volatility, low share prices, and potential for significant price movements.
Develop a Selection Criteria:

Establish a set of criteria to filter the upcoming earnings companies, such as:
Companies with a history of significant price movements around earnings
Companies with a market capitalization under a certain threshold (e.g., $1 billion)
Companies with a current stock price below a certain level (e.g., $10 per share)
Companies with a high degree of analyst coverage and expectations
Analyze the Selected Companies:

For the subset of companies that meet the selection criteria, conduct a more in-depth analysis, including:
Review analyst estimates and any recent revisions to understand market expectations
Assess the company's financial health, growth prospects, and potential catalysts
Identify any potential risks or uncertainties that could impact the stock price
Develop a Trading Strategy:

Based on the analysis, determine a trading strategy for the selected companies, such as:
Identifying potential entry and exit points based on technical analysis and market sentiment
Determining appropriate position sizes and risk management techniques
Monitoring the companies' earnings results and any subsequent market reactions
By following this analytical approach, a penny stock day trader can systematically identify a subset of companies from the upcoming earnings calendar that align with their investment criteria and develop a well-informed trading strategy.