<a href="https://colab.research.google.com/github/YannPhamVan/StockMarketsAnalyticsZoomcamp/blob/main/02-dataframe-analysis-homework-Q1-3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Module 2 Homework (2025 Cohort)

In this homework, we're going to combine data from various sources to process it in Pandas and generate additional fields.

If not stated otherwise, please use the [LINK](https://github.com/DataTalksClub/stock-markets-analytics-zoomcamp/blob/main/02-dataframe-analysis/%5B2025%5D_Module_02_Colab_Working_with_the_data.ipynb) covered at the livestream to re-use the code snippets.

---
#0) Imports and Installs

In [1]:
!pip install yfinance



In [2]:
# IMPORTS
import numpy as np
import pandas as pd
import requests


#Fin Data Sources
import yfinance as yf
import pandas_datareader as pdr

#Data viz
import plotly.graph_objs as go
import plotly.express as px

import time
from datetime import date

# for graphs
import matplotlib.pyplot as plt

---
### Question 1: [IPO] Withdrawn IPOs by Company Type

**What is the total withdrawn IPO value (in $ millions) for the company class with the highest total withdrawal value?**

From the withdrawn IPO list ([stockanalysis.com/ipos/withdrawn](https://stockanalysis.com/ipos/withdrawn/)), collect and process the data to find out which company type saw the most withdrawn IPO value.

#### Steps:
1. Use `pandas.read_html()` with the URL above to load the IPO withdrawal table into a DataFrame.
   *It is a similar process to Code Snippet 1 discussed at the livestream.*    You should get **99 entries**.

In [3]:
import pandas as pd
import requests
from io import StringIO

def get_withdrawn_ipos() -> pd.DataFrame:
    """
    Fetch withdrawn IPO data from stockanalysis.com.
    """
    url = f"https://stockanalysis.com/ipos/withdrawn/"
    headers = {
        'User-Agent': (
            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
            'AppleWebKit/537.36 (KHTML, like Gecko) '
            'Chrome/58.0.3029.110 Safari/537.3'
        )
    }

    try:
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()

        # Wrap HTML text in StringIO to avoid deprecation warning
        # "Passing literal html to 'read_html' is deprecated and will be removed in a future version. To read from a literal string, wrap it in a 'StringIO' object."
        html_io = StringIO(response.text)
        tables = pd.read_html(html_io)

        if not tables:
            raise ValueError(f"No tables found.")

        return tables[0]

    except requests.exceptions.RequestException as e:
        print(f"Request failed: {e}")
    except ValueError as ve:
        print(f"Data error: {ve}")
    except Exception as ex:
        print(f"Unexpected error: {ex}")

    return pd.DataFrame()

In [4]:
withdrawn_ipos = get_withdrawn_ipos()
withdrawn_ipos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Symbol          100 non-null    object
 1   Company Name    100 non-null    object
 2   Price Range     100 non-null    object
 3   Shares Offered  100 non-null    object
dtypes: object(4)
memory usage: 3.3+ KB


In [5]:
withdrawn_ipos

Unnamed: 0,Symbol,Company Name,Price Range,Shares Offered
0,ODTX,"Odyssey Therapeutics, Inc.",-,-
1,UNFL,"Unifoil Holdings, Inc.",$3.00 - $4.00,2000000
2,AURN,"Aurion Biotech, Inc.",-,-
3,ROTR,"PHI Group, Inc.",-,-
4,ONE,One Power Company,-,-
...,...,...,...,...
95,FHP,"Freehold Properties, Inc.",-,-
96,CHO,Chobani Inc.,-,-
97,IFIT,iFIT Health & Fitness Inc.,$18.00 - $21.00,30769231
98,GLGX,"Gerson Lehrman Group, Inc.",-,-


2. Create a new column called `Company Class`, categorizing company names based on patterns like:
   - “Acquisition Corp” or “Acquisition Corporation” → `Acq.Corp`
   - “Inc” or “Incorporated” → `Inc`
   - “Group” → `Group`
   - “Ltd” or “Limited” → `Limited`
   - “Holdings” → `Holdings`
   - Others → `Other`

  *  Order: Please follow the listed order of classes and assign the first matched value (e.g., for 'shenni holdings limited', you assign the 'Limited' class).

  * Hint: make your function more robust by converting names to lowercase and splitting into words before matching patterns.


In [6]:
def get_company_class(company_name: str):
  ''' Apply a company name to get its class.
  '''
  cn = company_name.lower()
  if cn.find('acquisition corp')>=0 or cn.find('acquisition corporation')>=0:
    return 'Acq.Corp'
  elif cn.find('inc')>=0 or cn.find('incorporated')>=0:
    return 'Inc'
  elif cn.find('group')>=0:
    return 'Group'
  elif cn.find('ltd')>=0 or cn.find('limited')>=0:
    return 'Limited'
  elif cn.find('holdings')>=0:
    return 'Holdings'
  else:
    return 'Other'

withdrawn_ipos["Company Class"] = withdrawn_ipos["Company Name"].apply(get_company_class)

withdrawn_ipos

Unnamed: 0,Symbol,Company Name,Price Range,Shares Offered,Company Class
0,ODTX,"Odyssey Therapeutics, Inc.",-,-,Inc
1,UNFL,"Unifoil Holdings, Inc.",$3.00 - $4.00,2000000,Inc
2,AURN,"Aurion Biotech, Inc.",-,-,Inc
3,ROTR,"PHI Group, Inc.",-,-,Inc
4,ONE,One Power Company,-,-,Other
...,...,...,...,...,...
95,FHP,"Freehold Properties, Inc.",-,-,Inc
96,CHO,Chobani Inc.,-,-,Inc
97,IFIT,iFIT Health & Fitness Inc.,$18.00 - $21.00,30769231,Inc
98,GLGX,"Gerson Lehrman Group, Inc.",-,-,Inc


In [7]:
withdrawn_ipos.value_counts('Company Class')

Unnamed: 0_level_0,count
Company Class,Unnamed: 1_level_1
Inc,51
Acq.Corp,21
Limited,17
Other,6
Group,4
Holdings,1


In [8]:
withdrawn_ipos.loc[withdrawn_ipos['Company Class'] == 'Other']

Unnamed: 0,Symbol,Company Name,Price Range,Shares Offered,Company Class
4,ONE,One Power Company,-,-,Other
9,KMCM,Key Mining Corp.,$2.25,4444444,Other
53,CLLB,"CoLabs Intâl, Corp.",$4.50,1300000,Other
74,TSIV,Twelve Seas Investment Company IV TMT,$10.00,20000000,Other
86,FSPR,Four Springs Capital Trust,$13.00 - $15.00,18000000,Other
99,HCG,hear.com N.V.,$17.00 - $20.00,16220000,Other



3. Define a new field `Avg. price` by parsing the `Price Range` field (create a function and apply it to the `Price Range` column). Examples:
   - '$8.00-$10.00' → `9.0`  
   - '$5.00' → `5.0`  
   - '-' → `None`

In [9]:
def get_avg_price(price_range: str):
  '''
  Compute average price from price range.
  '''
  pr = price_range.replace('$', '')
  if pr == '-':
    return None
  elif pr.find('-') >= 0:
    return (float(pr.split('-')[0]) + float(pr.split('-')[1])) / 2
  else:
    return float(pr)

withdrawn_ipos["Avg. Price"] = withdrawn_ipos["Price Range"].apply(get_avg_price)

withdrawn_ipos

Unnamed: 0,Symbol,Company Name,Price Range,Shares Offered,Company Class,Avg. Price
0,ODTX,"Odyssey Therapeutics, Inc.",-,-,Inc,
1,UNFL,"Unifoil Holdings, Inc.",$3.00 - $4.00,2000000,Inc,3.5
2,AURN,"Aurion Biotech, Inc.",-,-,Inc,
3,ROTR,"PHI Group, Inc.",-,-,Inc,
4,ONE,One Power Company,-,-,Other,
...,...,...,...,...,...,...
95,FHP,"Freehold Properties, Inc.",-,-,Inc,
96,CHO,Chobani Inc.,-,-,Inc,
97,IFIT,iFIT Health & Fitness Inc.,$18.00 - $21.00,30769231,Inc,19.5
98,GLGX,"Gerson Lehrman Group, Inc.",-,-,Inc,



4. Convert `Shares Offered` to numeric, clean missing or invalid values.

In [10]:
withdrawn_ipos["Shares Offered"] = pd.to_numeric(withdrawn_ipos["Shares Offered"], errors='coerce')
withdrawn_ipos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Symbol          100 non-null    object 
 1   Company Name    100 non-null    object 
 2   Price Range     100 non-null    object 
 3   Shares Offered  72 non-null     float64
 4   Company Class   100 non-null    object 
 5   Avg. Price      73 non-null     float64
dtypes: float64(2), object(4)
memory usage: 4.8+ KB



5. Create a new column:  
   `Withdrawn Value = Shares Offered * Avg Price` (**71 non-null values**)

In [11]:
withdrawn_ipos["Withdrawn Value"] = withdrawn_ipos["Shares Offered"] * withdrawn_ipos["Avg. Price"]
withdrawn_ipos.notna().sum()

Unnamed: 0,0
Symbol,100
Company Name,100
Price Range,100
Shares Offered,72
Company Class,100
Avg. Price,73
Withdrawn Value,71



6. Group by `Company Class` and calculate total withdrawn value.

In [12]:
withdrawn_ipos.groupby('Company Class').agg({'Withdrawn Value': 'sum'})

Unnamed: 0_level_0,Withdrawn Value
Company Class,Unnamed: 1_level_1
Acq.Corp,4021000000.0
Group,33787500.0
Holdings,75000000.0
Inc,2257164000.0
Limited,549734600.0
Other,767920000.0



7. **Answer**: Which class had the highest **total** value of withdrawals?
---

In [13]:
withdrawn_ipos.groupby('Company Class').agg({'Withdrawn Value': 'sum'}).sort_values('Withdrawn Value', ascending=False).iloc[0]

Unnamed: 0,Acq.Corp
Withdrawn Value,4021000000.0


---
### Question 2:   [IPO] Median Sharpe Ratio for 2024 IPOs (First 5 Months)


**What is the median Sharpe ratio (as of 6 June 2025) for companies that went public in the first 5 months of 2024?**

The goal is to replicate the large-scale `yfinance` OHLCV data download and perform basic financial calculations on IPO stocks.


#### Steps:

1. Using the same approach as in Question 1, download the IPOs in 2024 from:  
   [https://stockanalysis.com/ipos/2024/](https://stockanalysis.com/ipos/2024/)  
   Filter to keep only those IPOs **before 1 June 2024** (first 5 months of 2024).  
   ➤ You should have **75 tickers**.


In [14]:
def get_ipos_by_year(year: int) -> pd.DataFrame:
    """
    Fetch IPO data for the given year from stockanalysis.com.
    """
    url = f"https://stockanalysis.com/ipos/{year}/"
    headers = {
        'User-Agent': (
            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
            'AppleWebKit/537.36 (KHTML, like Gecko) '
            'Chrome/58.0.3029.110 Safari/537.3'
        )
    }

    try:
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()

        # Wrap HTML text in StringIO to avoid deprecation warning
        # "Passing literal html to 'read_html' is deprecated and will be removed in a future version. To read from a literal string, wrap it in a 'StringIO' object."
        html_io = StringIO(response.text)
        tables = pd.read_html(html_io)

        if not tables:
            raise ValueError(f"No tables found for year {year}.")

        return tables[0]

    except requests.exceptions.RequestException as e:
        print(f"Request failed: {e}")
    except ValueError as ve:
        print(f"Data error: {ve}")
    except Exception as ex:
        print(f"Unexpected error: {ex}")

    return pd.DataFrame()

In [15]:
ipos_2024 = get_ipos_by_year(2024)
ipos_2024.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 225 entries, 0 to 224
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   IPO Date      225 non-null    object
 1   Symbol        225 non-null    object
 2   Company Name  225 non-null    object
 3   IPO Price     225 non-null    object
 4   Current       225 non-null    object
 5   Return        225 non-null    object
dtypes: object(6)
memory usage: 10.7+ KB


In [16]:
ipos_2024["IPO Date"] = pd.to_datetime(ipos_2024["IPO Date"], format='mixed')

In [17]:
ipos_first_5_months_2024 = ipos_2024.loc[ipos_2024['IPO Date'] < '2024-06-01']


2.  Use **Code Snippet 7** to download daily stock data for those tickers (via `yfinance`).  
   Make sure you understand how `growth_1d` ... `growth_365d`, and volatility columns are defined.  
   Define a new column `growth_252d` representing growth after **252 trading days** (~1 year), in addition to any other growth periods you already track.

In [18]:
ALL_TICKERS = ipos_first_5_months_2024["Symbol"].to_list()
ALL_TICKERS

['NAKA',
 'BOW',
 'HDL',
 'RFAI',
 'JDZG',
 'RAY',
 'BTOC',
 'ZK',
 'GPAT',
 'PAL',
 'SVCO',
 'NNE',
 'CCIX',
 'VIK',
 'ZONE',
 'LOAR',
 'MRX',
 'RBRK',
 'NCI',
 'MFI',
 'YYGH',
 'TRSG',
 'CDTG',
 'CTRI',
 'IBTA',
 'MTEN',
 'SUPX',
 'TWG',
 'ULS',
 'PACS',
 'MNDR',
 'CTNM',
 'MAMO',
 'ZBAO',
 'BOLD',
 'MMA',
 'UBXG',
 'IBAC',
 'AUNA',
 'BKHA',
 'LOBO',
 'RDDT',
 'ALAB',
 'INTJ',
 'RYDE',
 'LGCL',
 'SMXT',
 'VHAI',
 'DYCQ',
 'CHRO',
 'UMAC',
 'HLXB',
 'MGX',
 'TBBB',
 'TELO',
 'KYTX',
 'PMNT',
 'AHR',
 'LEGT',
 'ANRO',
 'GUTS',
 'AS',
 'FBLG',
 'AVBP',
 'BTSG',
 'HAO',
 'CGON',
 'YIBO',
 'JL',
 'SUGP',
 'JVSA',
 'KSPI',
 'CCTG',
 'PSBD',
 'SYNX',
 'SDHC',
 'ROMA']

In [19]:
stocks_df = pd.DataFrame({'A' : []})

for i,ticker in enumerate(ALL_TICKERS):
  print(i,ticker)

  # Work with stock prices
  ticker_obj = yf.Ticker(ticker)

  historyPrices = ticker_obj.history(
                     period = "max",
                     interval = "1d")

  # generate features for historical prices, and what we want to predict
  historyPrices['Ticker'] = ticker
  historyPrices['Year']= historyPrices.index.year
  historyPrices['Month'] = historyPrices.index.month
  historyPrices['Weekday'] = historyPrices.index.weekday
  historyPrices['Date'] = historyPrices.index.date

  # historical returns
  for i in [1,3,7,30,90, 252, 365]:
    historyPrices['growth_'+str(i)+'d'] = historyPrices['Close'] / historyPrices['Close'].shift(i)
  historyPrices['growth_future_30d'] = historyPrices['Close'].shift(-30) / historyPrices['Close']

  # Technical indicators
  # SimpleMovingAverage 10 days and 20 days
  historyPrices['SMA10']= historyPrices['Close'].rolling(10).mean()
  historyPrices['SMA20']= historyPrices['Close'].rolling(20).mean()
  historyPrices['growing_moving_average'] = np.where(historyPrices['SMA10'] > historyPrices['SMA20'], 1, 0)
  historyPrices['high_minus_low_relative'] = (historyPrices.High - historyPrices.Low) / historyPrices['Close']

  # 30d rolling volatility : https://ycharts.com/glossary/terms/rolling_vol_30
  historyPrices['volatility'] =   historyPrices['Close'].rolling(30).std() * np.sqrt(252)

  # what we want to predict
  historyPrices['is_positive_growth_30d_future'] = np.where(historyPrices['growth_future_30d'] > 1, 1, 0)

  # sleep 1 sec between downloads - not to overload the API server
  time.sleep(1)


  if stocks_df.empty:
    stocks_df = historyPrices
  else:
    stocks_df = pd.concat([stocks_df, historyPrices], ignore_index=True)

0 NAKA
1 BOW
2 HDL
3 RFAI
4 JDZG
5 RAY
6 BTOC
7 ZK
8 GPAT
9 PAL
10 SVCO
11 NNE
12 CCIX
13 VIK
14 ZONE
15 LOAR
16 MRX
17 RBRK
18 NCI
19 MFI
20 YYGH
21 TRSG
22 CDTG
23 CTRI
24 IBTA
25 MTEN
26 SUPX
27 TWG
28 ULS
29 PACS
30 MNDR
31 CTNM
32 MAMO
33 ZBAO
34 BOLD
35 MMA
36 UBXG
37 IBAC
38 AUNA
39 BKHA
40 LOBO
41 RDDT
42 ALAB
43 INTJ
44 RYDE
45 LGCL
46 SMXT
47 VHAI
48 DYCQ
49 CHRO
50 UMAC
51 HLXB
52 MGX
53 TBBB
54 TELO
55 KYTX
56 PMNT
57 AHR
58 LEGT
59 ANRO
60 GUTS
61 AS
62 FBLG
63 AVBP
64 BTSG
65 HAO
66 CGON
67 YIBO
68 JL
69 SUGP
70 JVSA
71 KSPI
72 CCTG
73 PSBD
74 SYNX
75 SDHC
76 ROMA


In [20]:
stocks_df

Unnamed: 0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker,Year,Month,...,growth_90d,growth_252d,growth_365d,growth_future_30d,SMA10,SMA20,growing_moving_average,high_minus_low_relative,volatility,is_positive_growth_30d_future
0,4.000,4.200,2.80,3.020,440600,0.0,0.0,NAKA,2024,5,...,,,,0.784768,,,0,0.463576,,0
1,2.990,3.110,2.35,2.660,147300,0.0,0.0,NAKA,2024,6,...,,,,0.883459,,,0,0.285714,,0
2,2.530,3.110,2.41,2.920,73800,0.0,0.0,NAKA,2024,6,...,,,,0.955479,,,0,0.239726,,0
3,2.910,3.090,2.60,2.730,51100,0.0,0.0,NAKA,2024,6,...,,,,1.051282,,,0,0.179487,,1
4,2.940,2.940,2.41,2.690,56500,0.0,0.0,NAKA,2024,6,...,,,,1.111524,,,0,0.197026,,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23798,3.021,3.050,2.85,2.880,36300,0.0,0.0,ROMA,2025,6,...,4.241532,5.938144,,,3.2390,2.78955,1,0.069444,10.886753,0
23799,2.870,2.890,2.56,2.660,123100,0.0,0.0,ROMA,2025,6,...,3.917526,4.666667,,,3.1970,2.82155,1,0.124060,10.383358,0
23800,2.840,3.000,2.64,2.875,63100,0.0,0.0,ROMA,2025,6,...,4.342900,5.424529,,,3.1485,2.86180,1,0.125217,9.939668,0
23801,2.850,2.935,2.79,2.795,10200,0.0,0.0,ROMA,2025,6,...,4.092240,5.008960,,,3.0870,2.89955,1,0.051878,9.469241,0





3. Calculate the Sharpe ratio assuming a risk-free rate of **4.5%**:

   ```python
   stocks_df['Sharpe'] = (stocks_df['growth_252d'] - 0.045) / stocks_df['volatility']
   ```

   ⚠️ **IMPORTANT** Please use the original version of annualized volatility calculation (it was later corrected to another formula):
   ```python
   stocks_df['volatility'] =   stocks_df['Close'].rolling(30).std() * np.sqrt(252)
   ```

In [21]:
stocks_df['Sharpe'] = (stocks_df['growth_252d'] - 0.045) / stocks_df['volatility']


4. Filter the DataFrame to keep data only for the trading day:  
   **‘2025-06-06’**

   Compute descriptive statistics (e.g., `.describe()`) for these columns:  
   - `growth_252d`  
   - `Sharpe`

   You should observe:  
   - `growth_252d` is defined for **71 out of 75 stocks** (some IPOs are too recent or data starts later).  
   - Median `growth_252d` is approximately **0.75** (indicating a 25% decline), while mean is about **1.15**, showing a bias towards high-growth companies pushing the average up.

In [22]:
stocks_df['Date'] = pd.to_datetime(stocks_df['Date'], format='mixed')

In [23]:
stocks_df.loc[stocks_df['Date'] == '2025-06-06'].describe()[["growth_252d", "Sharpe"]]

Unnamed: 0,growth_252d,Sharpe
count,73.0,73.0
mean,1.227948,0.284576
min,0.02497,-0.079677
25%,0.29351,0.040265
50%,0.763188,0.083768
75%,1.446667,0.291048
max,8.097413,2.835668
std,1.480237,0.512601




5. **Answer:**  
   - What is the **median Sharpe ratio** for these 71 stocks?  **0.083768**
   - Note: Positive `Sharpe` means growth exceeding the risk-free rate of 4.5%.  
   - [Additional] Do you observe the **same top 10 companies** when sorting by `growth_252d` versus sorting by `Sharpe`? **Not really as only one Ticker is common to the 2 lists**

---

In [24]:
stocks_df.loc[stocks_df['Date'] == '2025-06-06'].sort_values(by="growth_252d", ascending=False).head(10)["Ticker"]

Unnamed: 0,Ticker
20976,JL
23794,ROMA
254,NAKA
14800,UMAC
3211,NNE
4895,RBRK
17183,AHR
18526,AS
7521,SUPX
4607,MRX


In [25]:
stocks_df.loc[stocks_df['Date'] == '2025-06-06'].sort_values(by="Sharpe", ascending=False).head(10)["Ticker"]

Unnamed: 0,Ticker
11413,BKHA
21648,JVSA
17490,LEGT
10826,IBAC
15140,HLXB
8709,MNDR
14128,DYCQ
12662,INTJ
20976,JL
6059,TRSG


---
### Question 3: [IPO] ‘Fixed Months Holding Strategy’

**What is the optimal number of months (1 to 12) to hold a newly IPO'd stock in order to maximize average growth?**  
(*Assume you buy at the close of the first trading day and sell after a fixed number of trading days.*)


---

#### Goal:
Investigate whether holding an IPO stock for a fixed number of months after its first trading day produces better returns, using future growth columns.

---

#### Steps:

1. **Start from the existing DataFrame** from Question 2 (75 tickers from IPOs in the first 5 months of 2024).  

   Add **12 new columns**:  
   `future_growth_1m`, `future_growth_2m`, ..., `future_growth_12m`  
   *(Assume 1 month = 21 trading days, so growth is calculated over 21, 42, ..., 252 trading days)*  
   This logic is similar to `historyPrices['growth_future_30d']` from **Code Snippet 7**, but extended to longer timeframes.

In [26]:
stocks_df_1st_day = pd.DataFrame({'A' : []})

for i,ticker in enumerate(ALL_TICKERS):
  print(i,ticker)

  # Work with stock prices
  ticker_obj = yf.Ticker(ticker)

  historyPrices = ticker_obj.history(
                     period = "max",
                     interval = "1d")

  # generate features for historical prices, and what we want to predict
  historyPrices['Ticker'] = ticker
  historyPrices['Date'] = historyPrices.index.date

  # historical returns
  for n in np.arange(1,13):
    historyPrices['growth_future_'+str(n)+'m'] = historyPrices['Close'].shift(-n*21) / historyPrices['Close']

  # sleep 1 sec between downloads - not to overload the API server
  time.sleep(1)


  if stocks_df_1st_day.empty:
    stocks_df_1st_day = historyPrices
  else:
    stocks_df_1st_day = pd.concat([stocks_df_1st_day, historyPrices], ignore_index=True)

0 NAKA
1 BOW
2 HDL
3 RFAI
4 JDZG
5 RAY
6 BTOC
7 ZK
8 GPAT
9 PAL
10 SVCO
11 NNE
12 CCIX
13 VIK
14 ZONE
15 LOAR
16 MRX
17 RBRK
18 NCI
19 MFI
20 YYGH
21 TRSG
22 CDTG
23 CTRI
24 IBTA
25 MTEN
26 SUPX
27 TWG
28 ULS
29 PACS
30 MNDR
31 CTNM
32 MAMO
33 ZBAO
34 BOLD
35 MMA
36 UBXG
37 IBAC
38 AUNA
39 BKHA
40 LOBO
41 RDDT
42 ALAB
43 INTJ
44 RYDE
45 LGCL
46 SMXT
47 VHAI
48 DYCQ
49 CHRO
50 UMAC
51 HLXB
52 MGX
53 TBBB
54 TELO
55 KYTX
56 PMNT
57 AHR
58 LEGT
59 ANRO
60 GUTS
61 AS
62 FBLG
63 AVBP
64 BTSG
65 HAO
66 CGON
67 YIBO
68 JL
69 SUGP
70 JVSA
71 KSPI
72 CCTG
73 PSBD
74 SYNX
75 SDHC
76 ROMA


2. **Determine the first trading day** (`min_date`) for each ticker.  
   This is the earliest date in the data for each stock.

In [27]:
min_date_df = stocks_df_1st_day.groupby('Ticker').Date.agg('min')





3. **Join the data**:  
   Perform an **inner join** between the `min_date` DataFrame and the future growth data on both `ticker` and `date`.  
   ➤ You should end up with **75 records** (one per IPO) with all 12 `future_growth_...` fields populated.

In [28]:
joined_df = stocks_df_1st_day.merge(min_date_df, on=['Ticker', 'Date'], how='inner')



4. **Compute descriptive statistics** for the resulting DataFrame:  
   Use `.describe()` or similar to analyze each of the 12 columns:  
   - `future_growth_1m`  
   - `future_growth_2m`  
   - ...  
   - `future_growth_12m`  

In [29]:
joined_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 77 entries, 0 to 76
Data columns (total 21 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Open               77 non-null     float64
 1   High               77 non-null     float64
 2   Low                77 non-null     float64
 3   Close              77 non-null     float64
 4   Volume             77 non-null     int64  
 5   Dividends          77 non-null     float64
 6   Stock Splits       77 non-null     float64
 7   Ticker             77 non-null     object 
 8   Date               77 non-null     object 
 9   growth_future_1m   77 non-null     float64
 10  growth_future_2m   77 non-null     float64
 11  growth_future_3m   77 non-null     float64
 12  growth_future_4m   77 non-null     float64
 13  growth_future_5m   77 non-null     float64
 14  growth_future_6m   77 non-null     float64
 15  growth_future_7m   77 non-null     float64
 16  growth_future_8m   77 non-nu



5. **Determine the best holding period**:  
   - Find the number of months **(1 to 12)** where the **average (mean)** future growth is **maximal**.  
   - This optimal month shows an uplift of **>1%** compared to all others.  
   - Still, the average return remains **less than 1** (i.e., expected return is less than doubling your investment).

**Answer = 12**

In [30]:
joined_df[joined_df.columns[-12:]].describe().iloc[1].sort_values(ascending=False)

Unnamed: 0,mean
growth_future_12m,0.991712
growth_future_2m,0.936866
growth_future_1m,0.92639
growth_future_10m,0.913272
growth_future_11m,0.896661
growth_future_9m,0.878871
growth_future_6m,0.864348
growth_future_7m,0.846271
growth_future_3m,0.83412
growth_future_8m,0.829974
