#### 04. TARGET HIT

### Target Feature: First-Hit Classification

To evaluate how markets respond to biotech-related news, we define a **discrete classification target** based on short-term price reactions. Specifically, we use a **first-hit logic**, which simulates how quickly a stock reacts to a news item within a defined window.

#### First-Hit Logic

For each news article, we monitor the **adjusted closing prices** of the associated ticker starting from the **news date (T₀)** up to **5 subsequent trading days (T₀+5)**.

We define a directional target `target_hit` as follows:

- **`+1` (Positive Reaction)** → If the price first increases by at least **+7%** within the window  
- **`–1` (Negative Reaction)** → If the price first decreases by **–7%** or more within the window  
- **`0` (No Reaction)** → If neither threshold is reached in the 5-day window

> The classification is **mutually exclusive**: only the first threshold hit (up or down) determines the label.

This simulates a **realistic market reaction**, where sharp movements often occur shortly after impactful news. It allows us to model the **directional surprise or shift in expectations** triggered by the news.

---

### Labeling Procedure

- Historical prices are sourced from adjusted close data (`adj`).
- The return is computed as cumulative percentage change from T₀.
- Only the **first threshold exceeded** is considered, to capture the *initial directional breakout*.
- If neither +7% nor –7% is triggered, the sample is labeled as **0** (neutral or noise).

---

This target is especially relevant in biotech markets, where news can rapidly reprice expectations due to:
- Regulatory announcements (e.g., FDA approval/rejection)
- Clinical trial results
- M&A or partnership news

By focusing on **first-hit behavior**, the model captures not just *whether* the news matters, but *how fast* and *in which direction* the market reacts.

In [1]:
from pathlib import Path
import os
ROOT = Path(__file__).resolve().parents[0] if "__file__" in globals() else Path.cwd()
DATA_DIR = Path(os.getenv("DATA_DIR", ROOT / "data"))  
def p(file): return DATA_DIR / file

In [7]:
#FIRST HIT
import pandas as pd
df_close = pd.read_csv(p("all_biotech_adj_close.csv"),
                       parse_dates=["Date"], index_col="Date")
df_prices = (
    df_close
      .rename(columns=lambda s: s.replace("_adj",""))
      .stack()
      .rename("adj")
      .reset_index()
      .rename(columns={"level_1":"ticker", "Date":"date"})
)
df_prices = df_prices.sort_values(["ticker","date"])



news = pd.read_parquet(p('cisiamo2.parquet'))
news["news_date"] = news["date"].dt.normalize()

def compute_first_hit(prices: pd.Series, pct: float, window: int):
   
    base = prices.iloc[0]

    rets = prices.pct_change().fillna(0).add(1).cumprod().sub(1)
    
    rets = rets.iloc[1:window+1] 
   
    up_idx = rets[rets >= pct].index.min() if (rets >= pct).any() else None
    dn_idx = rets[rets <= -pct].index.min() if (rets <= -pct).any() else None
    if up_idx and dn_idx:
        return 1 if up_idx < dn_idx else -1
    elif up_idx:
        return 1
    elif dn_idx:
        return -1
    else:
        return 0


pct = 0.07
window = 5

results = []
for _, row in news.iterrows():
    tkr = row.ticker
    d0  = row.news_date

    mask = (df_prices.ticker == tkr) & \
           (df_prices.date >= d0) & \
           (df_prices.date <= d0 + pd.Timedelta(days=window*2))
   
    seq = df_prices.loc[mask, ["date","adj"]] \
                    .drop_duplicates("date") \
                    .set_index("date") \
                    .sort_index() \
                    .head(window+1)["adj"]
    if len(seq) < 2:
        target = 0
    else:
        target = compute_first_hit(seq, pct, window)
    results.append(target)

news["target_hit"] = results


print(news["target_hit"].value_counts(normalize=True))


news.to_parquet("final_hit__.parquet", index=False)

target_hit
 0    0.447291
-1    0.286808
 1    0.265901
Name: proportion, dtype: float64


### First-Hit label distribution

- **0 (No reaction)**: **44.7%**
- **−1 (Negative first hit)**: **28.7%**
- **+1 (Positive first hit)**: **26.6%**

**Interpretation.** Class *0* is the largest bucket (many events don’t cross ±7% within 5 trading days), while the directional classes are roughly balanced (−1 ≈ +1).

**For the ML model:** we will train a **binary classifier** using **only −1 and +1**, excluding class *0* to reduce noise and improve separability. Class *0* remains useful for **gating/abstention** in backtests and analysis.