
### Question 1: [IPO] Withdrawn IPOs by Company Type

**What is the total withdrawn IPO value (in $ millions) for the company class with the highest total withdrawal value?**

From the withdrawn IPO list ([stockanalysis.com/ipos/withdrawn](https://stockanalysis.com/ipos/withdrawn/)), collect and process the data to find out which company type saw the most withdrawn IPO value.

#### Steps:
1. Use `pandas.read_html()` with the URL above to load the IPO withdrawal table into a DataFrame. 
   *It is a similar process to Code Snippet 1 discussed at the livestream.*    You should get **99 entries**. 
2. Create a new column called `Company Class`, categorizing company names based on patterns like:
   - “Acquisition Corp” or “Acquisition Corporation” → `Acq.Corp`
   - “Inc” or “Incorporated” → `Inc`
   - “Group” → `Group`
   - “Ltd” or “Limited” → `Limited`
   - “Holdings” → `Holdings`
   - Others → `Other`

  *  Order: Please follow the listed order of classes and assign the first matched value (e.g., for 'shenni holdings limited', you assign the 'Limited' class).

  * Hint: make your function more robust by converting names to lowercase and splitting into words before matching patterns.

3. Define a new field `Avg. price` by parsing the `Price Range` field (create a function and apply it to the `Price Range` column). Examples:
   - '$8.00-$10.00' → `9.0`  
   - '$5.00' → `5.0`  
   - '-' → `None`
4. Convert `Shares Offered` to numeric, clean missing or invalid values.
5. Create a new column:  
   `Withdrawn Value = Shares Offered * Avg Price` (**71 non-null values**)
6. Group by `Company Class` and calculate total withdrawn value.
7. **Answer**: Which class had the highest **total** value of withdrawals?

In [169]:
# IMPORTS
import numpy as np
import pandas as pd
import requests

#Fin Data Sources
import yfinance as yf
import pandas_datareader as pdr

#Data viz
import plotly.graph_objs as go
import plotly.express as px

import time
from datetime import date

# for graphs
import matplotlib.pyplot as plt
import pandas as pd
import requests
from io import StringIO

# Function accepts link and return a DataFrame 


def get_ipos_df(link: str) -> pd.DataFrame:    
	"""
	Fetch IPO data for the given link and return a DataFrame 
	"""
	url = link
	headers = {
		'User-Agent': (
			'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
			'AppleWebKit/537.36 (KHTML, like Gecko) '
			'Chrome/58.0.3029.110 Safari/537.3'
		)
	}

	try:
		response = requests.get(url, headers=headers, timeout=10)
		response.raise_for_status()
		html_io = StringIO(response.text)
		tables = pd.read_html(html_io)

		if not tables:
			raise ValueError(f"No tables found.")
		return tables[0]

	except requests.exceptions.RequestException as e:
		print(f"Request failed: {e}")
	except ValueError as ve:
		print(f"Data error: {ve}")
	except Exception as ex:
		print(f"Unexpected error: {ex}")

	return pd.DataFrame()

In [170]:
ipos_wd = get_ipos_df("https://stockanalysis.com/ipos/withdrawn/")
ipos_wd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Symbol          100 non-null    object
 1   Company Name    100 non-null    object
 2   Price Range     100 non-null    object
 3   Shares Offered  100 non-null    object
dtypes: object(4)
memory usage: 3.2+ KB


In [171]:

def categorize_company_name(name: str) -> str:
	"""
	Categorize company names based on specific patterns.
	"""
	name = name.lower()
	if 'acquisition corp' in name or 'acquisition corporation' in name:
		return 'Acq.Corp'
	elif 'incorporated' in name or 'inc' in name:
		return 'Inc'
	elif 'group' in name:
		return 'Group'
	elif 'limited' in name or 'ltd' in name:
		return 'Ltd'
	elif 'holdings' in name:
		return 'Holdings'
	else:
		return 'Other'
# Apply the categorization function to the 'Company Name' column
ipos_wd['Company Class'] = ipos_wd['Company Name'].apply(categorize_company_name)
ipos_wd.info()
ipos_wd.head(10)
# This code does not consider the order and there are issues with holdings and other types 
# ipos_wd['Lower Cname'] = ipos_wd['Company Name'].str.lower()
# ipos_wd['Company Class'] = ipos_wd['Lower Cname'].str.extract(r'(acquisition corp|acquisition corporation|inc|incorporated|group|ltd|limited|holdings)', expand=False)
# ipos_wd['Company Class'] = ipos_wd['Company Class'].fillna('Other')
# ipos_wd['Company Class'] = ipos_wd['Company Class'].replace({'acquisition Corp': 'Acq.Corp','acquisition corporation':'Acq. Corp','incorporated': 'Inc','inc':'Inc','group':'Group','limited':'Ltd','ltd':'Ltd','holdings': 'Holdings'})

non_null_so = ipos_wd['Shares Offered'].notnull().sum()
non_null_pr = ipos_wd['Price Range'].notnull().sum()

print(f"Number of non-null values in Shares Offered: {non_null_so}")
print(f"Number of non-null values in Price Range: {non_null_pr}")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Symbol          100 non-null    object
 1   Company Name    100 non-null    object
 2   Price Range     100 non-null    object
 3   Shares Offered  100 non-null    object
 4   Company Class   100 non-null    object
dtypes: object(5)
memory usage: 4.0+ KB
Number of non-null values in Shares Offered: 100
Number of non-null values in Price Range: 100


In [172]:

def parse_price_range(price_range: str) -> float:
	"""
	Parse the price range string and return the average price.
	"""
	if price_range == '-':
		return np.nan
	elif '-' in price_range:
		try:
			low, high = price_range.split('-')
			low = float(low.replace('$', '').strip())
			high = float(high.replace('$', '').strip())
			return (low + high) / 2
		except ValueError:
			return None
	else:
		try:
			return float(price_range.replace('$', '').strip())
		except ValueError:
			return None

ipos_wd['Avg. Price'] = ipos_wd['Price Range'].apply(parse_price_range)
ipos_wd.info()
ipos_wd


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Symbol          100 non-null    object 
 1   Company Name    100 non-null    object 
 2   Price Range     100 non-null    object 
 3   Shares Offered  100 non-null    object 
 4   Company Class   100 non-null    object 
 5   Avg. Price      73 non-null     float64
dtypes: float64(1), object(5)
memory usage: 4.8+ KB


Unnamed: 0,Symbol,Company Name,Price Range,Shares Offered,Company Class,Avg. Price
0,ODTX,"Odyssey Therapeutics, Inc.",-,-,Inc,
1,UNFL,"Unifoil Holdings, Inc.",$3.00 - $4.00,2000000,Inc,3.5
2,AURN,"Aurion Biotech, Inc.",-,-,Inc,
3,ROTR,"PHI Group, Inc.",-,-,Inc,
4,ONE,One Power Company,-,-,Other,
...,...,...,...,...,...,...
95,FHP,"Freehold Properties, Inc.",-,-,Inc,
96,CHO,Chobani Inc.,-,-,Inc,
97,IFIT,iFIT Health & Fitness Inc.,$18.00 - $21.00,30769231,Inc,19.5
98,GLGX,"Gerson Lehrman Group, Inc.",-,-,Inc,


In [173]:

def parse_shares_offered(shares: str) -> float:
	"""
	Parse the shares offered string and return the number of shares as an integer.
	"""
	if shares == '-':
		return None
	try:
		# Remove any non-numeric characters except for commas
		cleaned_shares = shares.replace(',', '').replace(' ', '')
		return float(cleaned_shares)
	except ValueError:
		return None

ipos_wd['Shares Offered'] = ipos_wd['Shares Offered'].apply(parse_shares_offered)
ipos_wd.info()
ipos_wd.head(10)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Symbol          100 non-null    object 
 1   Company Name    100 non-null    object 
 2   Price Range     100 non-null    object 
 3   Shares Offered  72 non-null     float64
 4   Company Class   100 non-null    object 
 5   Avg. Price      73 non-null     float64
dtypes: float64(2), object(4)
memory usage: 4.8+ KB


Unnamed: 0,Symbol,Company Name,Price Range,Shares Offered,Company Class,Avg. Price
0,ODTX,"Odyssey Therapeutics, Inc.",-,,Inc,
1,UNFL,"Unifoil Holdings, Inc.",$3.00 - $4.00,2000000.0,Inc,3.5
2,AURN,"Aurion Biotech, Inc.",-,,Inc,
3,ROTR,"PHI Group, Inc.",-,,Inc,
4,ONE,One Power Company,-,,Other,
5,HPOT,The Great Restaurant Development Holdings Limited,$4.00 - $6.00,1400000.0,Ltd,5.0
6,CABR,"Caring Brands, Inc.",$4.00,750000.0,Inc,4.0
7,SQVI,"Sequoia Vaccines, Inc.",$8.00 - $10.00,2775000.0,Inc,9.0
8,SNI,Shenni Holdings Limited,$4.00 - $6.00,3000000.0,Ltd,5.0
9,KMCM,Key Mining Corp.,$2.25,4444444.0,Other,2.25


In [174]:
ipos_wd['Withdrawn Value'] = ipos_wd['Shares Offered'] * ipos_wd['Avg. Price']
ipos_wd.head(100)
withdrawn_value_non_null = ipos_wd['Withdrawn Value'].notnull().sum()
print(f"Number of non-null values in Withdrawn Value: {withdrawn_value_non_null}")

Number of non-null values in Withdrawn Value: 71


In [175]:
withdrawn_value_by_class = ipos_wd.groupby('Company Class')['Withdrawn Value'].sum().reset_index()
withdrawn_value_by_class['Withdrawn Value'] = withdrawn_value_by_class['Withdrawn Value'] / 1_000_000  # Convert to millions
withdrawn_value_by_class = withdrawn_value_by_class.sort_values(by='Withdrawn Value', ascending=False)
print(withdrawn_value_by_class)

  Company Class  Withdrawn Value
0      Acq.Corp      4021.000000
3           Inc      2257.164205
5         Other       767.919999
4           Ltd       549.734585
2      Holdings        75.000000
1         Group        33.787500


### Question 2:   [IPO] Median Sharpe Ratio for 2024 IPOs (First 5 Months)


**What is the median Sharpe ratio (as of 6 June 2025) for companies that went public in the first 5 months of 2024?**

The goal is to replicate the large-scale `yfinance` OHLCV data download and perform basic financial calculations on IPO stocks.


#### Steps:

1. Using the same approach as in Question 1, download the IPOs in 2024 from:  
   [https://stockanalysis.com/ipos/2024/](https://stockanalysis.com/ipos/2024/)  
   Filter to keep only those IPOs **before 1 June 2024** (first 5 months of 2024).  
   ➤ You should have **75 tickers**.

2.  Use **Code Snippet 7** to download daily stock data for those tickers (via `yfinance`).  
   Make sure you understand how `growth_1d` ... `growth_365d`, and volatility columns are defined.  
   Define a new column `growth_252d` representing growth after **252 trading days** (~1 year), in addition to any other growth periods you already track.


3. Calculate the Sharpe ratio assuming a risk-free rate of **4.5%**:

   ```python
   stocks_df['Sharpe'] = (stocks_df['growth_252d'] - 0.045) / stocks_df['volatility']
   ```

   ⚠️ **IMPORTANT** Please use the original version of annualized volatility calculation (it was later corrected to another formula):
   ```python
   stocks_df['volatility'] =   stocks_df['Close'].rolling(30).std() * np.sqrt(252)
   ```
4. Filter the DataFrame to keep data only for the trading day:  
   **‘2025-06-06’**

   Compute descriptive statistics (e.g., `.describe()`) for these columns:  
   - `growth_252d`  
   - `Sharpe`

   You should observe:  
   - `growth_252d` is defined for **71 out of 75 stocks** (some IPOs are too recent or data starts later).  
   - Median `growth_252d` is approximately **0.75** (indicating a 25% decline), while mean is about **1.15**, showing a bias towards high-growth companies pushing the average up.

5. **Answer:**  
   - What is the **median Sharpe ratio** for these 71 stocks?  
   - Note: Positive `Sharpe` means growth exceeding the risk-free rate of 4.5%.  
   - [Additional] Do you observe the **same top 10 companies** when sorting by `growth_252d` versus sorting by `Sharpe`?


In [176]:
# Using the same approach as in Question 1, download the IPOs in 2024 from:
# https://stockanalysis.com/ipos/2024/
# Filter to keep only those IPOs before 1 June 2024 (first 5 months of 2024).
# ➤ You should have 75 tickers.

ipos_2024 = get_ipos_df("https://stockanalysis.com/ipos/2024/")
ipos_2024.info()
# Filter to keep only those IPOs before 1 June 2024
ipos_2024['IPO Date'] = pd.to_datetime(ipos_2024['IPO Date'], errors='coerce')
ipos_2024 = ipos_2024[ipos_2024['IPO Date'] < pd.Timestamp('2024-06-01')]
# Remove rows with "-", " " in 'IPO Price'
ipos_2024 = ipos_2024[ipos_2024['IPO Price'].notnull() & (ipos_2024['IPO Price'] != '-') & (ipos_2024['IPO Price'] != ' ')]
# Check the number of tickers
print(f"Number of tickers in 2024 before June: {len(ipos_2024['Symbol'])}")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 225 entries, 0 to 224
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   IPO Date      225 non-null    object
 1   Symbol        225 non-null    object
 2   Company Name  225 non-null    object
 3   IPO Price     225 non-null    object
 4   Current       225 non-null    object
 5   Return        225 non-null    object
dtypes: object(6)
memory usage: 10.7+ KB
Number of tickers in 2024 before June: 75


In [177]:
stocks_df = pd.DataFrame(ipos_2024['Symbol'])

all_stocks_data = []

for ticker in ipos_2024['Symbol']:
    try:
        # Fetch the stock data for one ticker
        ticker_obj = yf.Ticker(ticker)
        hist_df = ticker_obj.history(period="max", interval="1d")

        if hist_df.empty:
            print(f"No data found for {ticker}, skipping.")
            continue

        # Add ticker symbol and date features
        hist_df['Ticker'] = ticker
        hist_df['Year'] = hist_df.index.year
        hist_df['Month'] = hist_df.index.month
        hist_df['Weekday'] = hist_df.index.weekday
        hist_df['Date'] = hist_df.index.date

        # Define a new column growth_252d representing growth after 252 trading days
        hist_df['growth_252d'] = hist_df['Close'] / hist_df['Close'].shift(252)

        # Calculate volatility using the specified formula
        hist_df['volatility'] = hist_df['Close'].rolling(30).std() * np.sqrt(252)

        # Calculate the Sharpe ratio assuming a risk-free rate of 4.5%
        # Note: The standard Sharpe Ratio uses returns (growth - 1) and volatility of returns.
        risk_free_rate = 0.045
        hist_df['Sharpe'] = (hist_df['growth_252d'] - risk_free_rate) / hist_df['volatility']

        all_stocks_data.append(hist_df)
        print(f"Successfully processed {ticker}")

    except Exception as e:
        print(f"Error fetching or processing data for {ticker}: {e}")

# Combine the list of DataFrames into a single DataFrame
if all_stocks_data:
    stocks_df = pd.concat(all_stocks_data)
    print("\nData processing complete. `stocks_df` contains all data.")
    # print(stocks_df.tail())
else:
    stocks_df = pd.DataFrame()
    print("\nNo data was fetched. `stocks_df` is empty.")

Successfully processed BOW
Successfully processed HDL
Successfully processed RFAI
Successfully processed JDZG
Successfully processed RAY
Successfully processed BTOC
Successfully processed ZK
Successfully processed GPAT
Successfully processed PAL
Successfully processed SVCO
Successfully processed NNE
Successfully processed CCIX
Successfully processed VIK
Successfully processed ZONE
Successfully processed LOAR
Successfully processed MRX
Successfully processed RBRK
Successfully processed NCI
Successfully processed MFI
Successfully processed YYGH
Successfully processed TRSG
Successfully processed CDTG
Successfully processed CTRI
Successfully processed IBTA
Successfully processed MTEN
Successfully processed TWG
Successfully processed ULS
Successfully processed PACS
Successfully processed MNDR
Successfully processed CTNM
Successfully processed MAMO
Successfully processed ZBAO
Successfully processed BOLD
Successfully processed MMA
Successfully processed UBXG
Successfully processed IBAC
Succes

In [178]:
stocks_df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 23463 entries, 2024-05-23 00:00:00-04:00 to 2025-06-24 00:00:00-04:00
Data columns (total 15 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          23463 non-null  float64
 1   High          23463 non-null  float64
 2   Low           23463 non-null  float64
 3   Close         23463 non-null  float64
 4   Volume        23463 non-null  int64  
 5   Dividends     23463 non-null  float64
 6   Stock Splits  23463 non-null  float64
 7   Ticker        23463 non-null  object 
 8   Year          23463 non-null  int32  
 9   Month         23463 non-null  int32  
 10  Weekday       23463 non-null  int32  
 11  Date          23463 non-null  object 
 12  growth_252d   4637 non-null   float64
 13  volatility    21288 non-null  float64
 14  Sharpe        4637 non-null   float64
dtypes: float64(9), int32(3), int64(1), object(2)
memory usage: 2.6+ MB


In [179]:
# Filter the DataFrame to keep data only for the trading day:
# ‘2025-06-06’
trading_day = pd.Timestamp('2025-06-06')
filtered_stocks_df = stocks_df[stocks_df['Date'] == trading_day.date()]
filtered_stocks_df = filtered_stocks_df[['Ticker', 'growth_252d', 'Sharpe']]
filtered_stocks_df = filtered_stocks_df.dropna(subset=['growth_252d', 'Sharpe'])
print(len(filtered_stocks_df))

# Compute descriptive statistics (e.g., .describe()) for these columns: growth_252d Sharpe

stats = filtered_stocks_df[['growth_252d', 'Sharpe']].describe()
print("\nDescriptive statistics for growth_252d and Sharpe:")
print(stats)

71

Descriptive statistics for growth_252d and Sharpe:
       growth_252d     Sharpe
count    71.000000  71.000000
mean      1.152895   0.287253
std       1.406018   0.519513
min       0.024970  -0.079677
25%       0.293422   0.039684
50%       0.758065   0.080707
75%       1.362736   0.311507
max       8.097413   2.835668


### Question 3: [IPO] ‘Fixed Months Holding Strategy’

**What is the optimal number of months (1 to 12) to hold a newly IPO'd stock in order to maximize average growth?**  
(*Assume you buy at the close of the first trading day and sell after a fixed number of trading days.*)


---

#### Goal:
Investigate whether holding an IPO stock for a fixed number of months after its first trading day produces better returns, using future growth columns.

#### Steps:

1. **Start from the existing DataFrame** from Question 2 (75 tickers from IPOs in the first 5 months of 2024).  

   Add **12 new columns**:  
   `future_growth_1m`, `future_growth_2m`, ..., `future_growth_12m`  
   *(Assume 1 month = 21 trading days, so growth is calculated over 21, 42, ..., 252 trading days)*  
   This logic is similar to `stocks_df['growth_future_30d']` from **Code Snippet 7**, but extended to longer timeframes.

2. **Determine the first trading day** (`min_date`) for each ticker.  
   This is the earliest date in the data for each stock.

3. **Join the data**:  
   Perform an **inner join** between the `min_date` DataFrame and the future growth data on both `ticker` and `date`.  
   ➤ You should end up with **75 records** (one per IPO) with all 12 `future_growth_...` fields populated.

4. **Compute descriptive statistics** for the resulting DataFrame:  
   Use `.describe()` or similar to analyze each of the 12 columns:  
   - `future_growth_1m`  
   - `future_growth_2m`  
   - ...  
   - `future_growth_12m`  

5. **Determine the best holding period**:  
   - Find the number of months **(1 to 12)** where the **average (mean)** future growth is **maximal**.  
   - This optimal month shows an uplift of **>1%** compared to all others.  
   - Still, the average return remains **less than 1** (i.e., expected return is less than doubling your investment).

In [180]:
# Add 12 new columns:
# future_growth_1m, future_growth_2m, ..., future_growth_12m
# (Assume 1 month = 21 trading days, so growth is calculated over 21, 42, ..., 252 trading days)
# This logic is similar to historyPrices['growth_future_30d'] from Code Snippet 7, but extended to longer timeframes.
# stocks_df = stocks_df.dropna(subset=['growth_252d', 'Sharpe'])

for i in range(1, 13):
	days = i * 21  # 21 trading days per month
	stocks_df[f'future_growth_{i}m'] = stocks_df['Close'] / stocks_df['Close'].shift(days)
# remove dates from dataframe
stocks_df = stocks_df.drop(columns=['Date'])
# convert Ticker to string type
stocks_df = stocks_df.reset_index()	
# remove time from date
stocks_df['Date'] = pd.to_datetime(stocks_df['Date']).dt.date
stocks_df['Date'] = pd.to_datetime(stocks_df['Date'], errors='coerce')

stocks_df.info()
stocks_df.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23463 entries, 0 to 23462
Data columns (total 27 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Date               23463 non-null  datetime64[ns]
 1   Open               23463 non-null  float64       
 2   High               23463 non-null  float64       
 3   Low                23463 non-null  float64       
 4   Close              23463 non-null  float64       
 5   Volume             23463 non-null  int64         
 6   Dividends          23463 non-null  float64       
 7   Stock Splits       23463 non-null  float64       
 8   Ticker             23463 non-null  object        
 9   Year               23463 non-null  int32         
 10  Month              23463 non-null  int32         
 11  Weekday            23463 non-null  int32         
 12  growth_252d        4637 non-null   float64       
 13  volatility         21288 non-null  float64       
 14  Sharpe

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker,Year,...,future_growth_3m,future_growth_4m,future_growth_5m,future_growth_6m,future_growth_7m,future_growth_8m,future_growth_9m,future_growth_10m,future_growth_11m,future_growth_12m
0,2024-05-23,23.0,24.27,22.139999,23.799999,3335800,0.0,0.0,BOW,2024,...,,,,,,,,,,
1,2024-05-24,24.26,26.15,23.98,25.700001,990500,0.0,0.0,BOW,2024,...,,,,,,,,,,
2,2024-05-28,25.85,26.879999,25.075001,26.48,555100,0.0,0.0,BOW,2024,...,,,,,,,,,,
3,2024-05-29,26.440001,26.49,25.500999,26.290001,302700,0.0,0.0,BOW,2024,...,,,,,,,,,,
4,2024-05-30,27.209999,27.209999,25.5,26.139999,200900,0.0,0.0,BOW,2024,...,,,,,,,,,,
5,2024-05-31,26.49,26.99,25.1,26.799999,198800,0.0,0.0,BOW,2024,...,,,,,,,,,,
6,2024-06-03,27.0,27.49,26.360001,26.639999,283500,0.0,0.0,BOW,2024,...,,,,,,,,,,
7,2024-06-04,26.129999,26.9,25.200001,25.23,169800,0.0,0.0,BOW,2024,...,,,,,,,,,,
8,2024-06-05,25.16,25.9,24.370001,25.360001,348400,0.0,0.0,BOW,2024,...,,,,,,,,,,
9,2024-06-06,25.4,26.200001,25.17,25.42,100000,0.0,0.0,BOW,2024,...,,,,,,,,,,


In [181]:
min_dates = stocks_df.groupby('Ticker').agg(
    Date=('Date', 'min')  # Find the min of 'Date' and name the new column 'Date'
).reset_index()

min_dates.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 75 entries, 0 to 74
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   Ticker  75 non-null     object        
 1   Date    75 non-null     datetime64[ns]
dtypes: datetime64[ns](1), object(1)
memory usage: 1.3+ KB


In [185]:
# Join the data:
# Perform an inner join between the min_date DataFrame and the future growth data on both ticker and date.
ipo_first_day_growth = pd.merge(
    left=min_dates,
    right=stocks_df,
    on=['Ticker', 'Date'], # The 'on' parameter can be used directly
    how='inner'
)


# ➤ You should end up with 75 records (one per IPO) with all 12 future_growth_... fields populated.
ipo_first_day_growth.info()

ipo_first_day_growth.head(10)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 75 entries, 0 to 74
Data columns (total 27 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Ticker             75 non-null     object        
 1   Date               75 non-null     datetime64[ns]
 2   Open               75 non-null     float64       
 3   High               75 non-null     float64       
 4   Low                75 non-null     float64       
 5   Close              75 non-null     float64       
 6   Volume             75 non-null     int64         
 7   Dividends          75 non-null     float64       
 8   Stock Splits       75 non-null     float64       
 9   Year               75 non-null     int32         
 10  Month              75 non-null     int32         
 11  Weekday            75 non-null     int32         
 12  growth_252d        0 non-null      float64       
 13  volatility         0 non-null      float64       
 14  Sharpe      

Unnamed: 0,Ticker,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,Year,...,future_growth_3m,future_growth_4m,future_growth_5m,future_growth_6m,future_growth_7m,future_growth_8m,future_growth_9m,future_growth_10m,future_growth_11m,future_growth_12m
0,AHR,2024-02-07,12.085784,12.471401,11.878868,12.43378,12732800,0.0,0.0,2024,...,10.191622,12.18998,12.951854,12.071631,15.542224,12.752594,11.29317,10.361483,6.720962,5.405991
1,ALAB,2024-03-20,52.560001,63.5,50.610001,62.029999,16843300,0.0,0.0,2024,...,0.494539,0.374645,0.340843,0.378879,0.452246,0.789688,0.937434,1.09035,0.914357,1.082548
2,ANRO,2024-02-02,22.0,23.27,20.0,20.700001,2386300,0.0,0.0,2024,...,1.982759,1.9923,2.0,2.010685,2.015383,2.023461,2.030407,2.033399,2.037402,2.045455
3,AS,2024-02-01,13.4,13.8,13.1,13.4,18656400,0.0,0.0,2024,...,10.894308,9.503546,6.536585,7.362637,6.118721,5.1341,4.802867,4.466667,3.300493,2.971175
4,AUNA,2024-03-22,9.51,10.32,9.3,9.6,9046900,0.0,0.0,2024,...,0.932945,0.940255,0.941177,0.94768,0.95418,0.957606,0.960961,0.958945,0.961924,0.963372
5,AVBP,2024-01-26,24.0,25.950001,20.0,20.0,1992600,0.0,0.0,2024,...,18.181818,18.691588,12.5,9.852217,8.403361,6.546645,6.060606,12.658228,4.08998,4.219409
6,BKHA,2024-05-13,10.2,10.2,10.13,10.13,1400,0.0,0.0,2024,...,1.391483,1.252163,1.238386,1.521021,1.397241,1.436879,1.361559,1.341722,1.173812,1.250617
7,BOLD,2024-03-28,14.25,15.24,14.1,14.25,1754100,0.0,0.0,2024,...,10.877863,9.193549,9.760274,7.66129,4.626623,3.947369,3.580402,3.501228,4.130435,3.53598
8,BOW,2024-05-23,23.0,24.27,22.139999,23.799999,3335800,0.0,0.0,2024,...,,,,,,,,,,
9,BTOC,2024-05-14,5.0,6.26,4.139,4.62,1323900,0.0,0.0,2024,...,2.381443,3.982759,3.85,3.404569,2.75,3.059603,2.444444,2.8,1.402124,1.44375


In [None]:

ipo_first_day_growth.describe(
	include=[np.number]  # Include only numeric columns
).filter(like='future_growth_')


# Determine the best holding period:

# Find the number of months (1 to 12) where the average (mean) future growth is maximal.
# This optimal month shows an uplift of >1% compared to all others.
# Still, the average return remains less than 1 (i.e., expected return is less than doubling your investment).

optimal_months = ipo_first_day_growth.filter(like='future_growth_').mean().idxmax()
optimal_months_value = ipo_first_day_growth[optimal_months].mean()
print(f"The optimal month for future growth is: {optimal_months} with an average growth of {optimal_months_value:.4f}")


The optimal month for future growth is: future_growth_1m with an average growth of 248.4911


### Question 4: [Strategy] Simple RSI-Based Trading Strategy


**What is the total profit (in $thousands) you would have earned by investing $1000 every time a stock was oversold (RSI < 25)?**


---

#### Goal:
Apply a simple rule-based trading strategy using the **Relative Strength Index (RSI)** technical indicator to identify oversold signals and calculate profits.

### [EXPLORATORY] Question 5: Finding Your Strategy for IPOs

You've seen in the first questions that the median and average investments are negative in IPOs, and you can't blindly invest in all deals.

How would you correct/refine the approach? Briefly describe the steps and the data you'll try to get (it should be generally feasible to do it from public sources - no access to internal data of companies)?

E.g. (some ideas) Do you want to focus on the specific vertical? Do you want to build a smart comparison vs. existing stocks on the market? Or you just will want to get some features (which features?) like total number of people in a company to find a segment of "successful" IPOs?


#### Steps:

1. **Run the full notebook from Lecture 2 (33 stocks)**  
   - Ensure you can generate the merged DataFrame containing:  
     - OHLCV data  
     - Technical indicators  
     - Macro indicators  
   - Focus on getting **RSI** computed using **Code Snippets 8 and 9**.  
   - This process is essential and will help during the capstone project.

2. ⚠️ **IMPORTANT** Please use this file to solve the Home Assignment (**all next steps**)
 
   Download precomputed data using this snippet:

   ```python
   import gdown
   import pandas as pd

   file_id = "1grCTCzMZKY5sJRtdbLVCXg8JXA8VPyg-"
   gdown.download(f"https://drive.google.com/uc?id={file_id}", "data.parquet", quiet=False)
   df = pd.read_parquet("data.parquet", engine="pyarrow")

3. **RSI Strategy Setup:**  
   - RSI is already available in the dataset as a field.  
   - The threshold for **oversold** is defined as `RSI < 25`.

4. **Filter the dataset by RSI and date:**  
   ```python
   rsi_threshold = 25
   selected_df = df[
       (df['rsi'] < rsi_threshold) &
       (df['Date'] >= '2000-01-01') &
       (df['Date'] <= '2025-06-01')
   ]
5. **Calculate Net Profit Over 25 Years:**  
   - Total number of trades: **1568**  
   - For each trade, you invest **$1000**  
   - Use the 30-day forward return (`growth_future_30d`) to compute net earnings:  
     ```python
     net_income = 1000 * (selected_df['growth_future_30d'] - 1).sum()
     ```

   - **Final Answer:**  
     What is the **net income in $K** (i.e., in thousands of dollars) that could be earned using this RSI-based oversold strategy from 2000–2025?

### Q5. [Exploratory, Optional] Predicting a Positive-Return IPO

Most of the strategies for investing in IPOs deliver **negative average and median returns** (and even 75% quantiles).

**Question:**  
How would you change the strategy if you want to **increase the profitability**?

> This is an open-ended brainstorming question — propose ideas for identifying IPOs with positive future returns or building a more effective trading strategy.