# Feature Scoring and Buy Signal Analysis Summary

## Overview
This notebook implements a comprehensive scoring system that combines analyst ratings, technical indicators, and market dynamics into actionable buy signals. The system normalizes features across categories and generates final scores for investment decision support.

## Process Overview
1. **Data Integration** - Merge clustered data with technical indicators and analyst features
2. **Feature Categorization** - Group features into logical categories (Analyst, Volatility, Volume, Price)
3. **Normalization** - Scale all features to 0-10 range for consistent scoring
4. **Category Scoring** - Calculate average scores for each feature category
5. **Final Scoring** - Generate overall buy signal scores

## Key Transformations
- **Multi-Source Integration**: Combines analyst data, technical indicators, and clustering results
- **Normalized Scoring**: All features scaled to 0-10 range for fair comparison
- **Category-Based Analysis**: Four distinct scoring categories for comprehensive evaluation
- **Buy Signal Generation**: Final scores ranging from 0-10 for investment decisions

### Feature Buy Score Interpretation Table

| **Category** | **Feature** | **Higher → Effect** | **Lower → Effect** | **Neutral → Effect** |
|--------------|-------------|--------------------|--------------------|----------------------|
| **Analyst Targets & Ratings** | target_from | Indicates stronger bullish expectations; analysts expect price appreciation. | Suggests conservative outlook or lower expected price gains. | Stable outlook; analysts expect minimal change in target. |
| | rating_from_score | Reflects more favorable initial analyst opinion (e.g., “Buy”). | Reflects less favorable rating (e.g., “Sell”). | Indicates neutral stance or “Hold” recommendation. |
| | rating_delta | Positive delta signals upgrade — improved analyst confidence. | Negative delta signals downgrade — reduced analyst confidence. | No change in rating; sentiment remains consistent. |
| | target_delta | Positive delta indicates an upward revision of price target. | Negative delta suggests reduced growth expectations. | No change indicates stable analyst expectations. |
| | target_growth | Higher percentage increase signals higher projected price appreciation. | Lower percentage change signals limited growth potential. | Minimal change implies steady valuation expectations. |
| | relative_growth | Indicates stock outperforming peers or market benchmark. | Indicates underperformance compared to peers or market. | On-par with general market or peer performance. |
| **Volatility & Range** | ATR (Average True Range) | Larger ATR = stronger volatility, potential breakouts. | Smaller ATR = low volatility, consolidation phase. | Stable periods often precede volatility expansion. |
| | Standard Deviation (σ) | Higher σ = high volatility, large price dispersion. | Lower σ = low volatility, tight trading range. | Indicates equilibrium; neither strong movement nor contraction. |
| | Ulcer Index (UL) | Higher UL = higher downside risk and drawdown magnitude. | Lower UL = stable or quick price recovery. | Stable price action; minimal drawdown risk. |
| | Price Distance (PDIST) | High PDIST = large movement magnitude, strong momentum. | Low PDIST = small changes, sideways movement. | Near-zero values = minimal change; no clear trend. |
| **Cumulative Volume / Flow** | OBV (On Balance Volume) | Rising OBV confirms accumulation and bullish trend. | Falling OBV confirms distribution and bearish trend. | Flat OBV suggests neutral or weak conviction. |
| | AD Line (Accumulation/Distribution Line) | Rising line indicates accumulation and strong inflows. | Falling line indicates distribution or outflows. | Flat or diverging line signals possible reversal. |
| | PVT (Price Volume Trend) | Rising PVT indicates strong volume-backed bullish momentum. | Falling PVT indicates selling pressure. | Flat PVT signals weak volume support for trend. |
| | FI (Force Index) | High positive FI = strong upward price move with volume. | Low or negative FI = weak or bearish momentum. | Near-zero FI = indecisive or range-bound market. |
| **Price Filters (Level Indicators)** | HLC3 ((High+Low+Close)/3) | Indicates rising average price — strong uptrend. | Indicates falling average price — downtrend. | Price consolidating near average; sideways movement. |
| | Typical Price (TP) | Higher TP = upward trend, higher average price. | Lower TP = downward trend, lower average price. | Price oscillating near average, indicating neutrality. |
| | VWAP (Volume-Weighted Average Price) | Price above VWAP = strong buying pressure, bullish zone. | Price below VWAP = selling pressure, bearish zone. | Price at VWAP = equilibrium, fair market value. |


In [3]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# 1 Load all CSV files
df_clustered = pd.read_csv("stock_data_clustered_minmax4.csv")
df_indicators = pd.read_csv("stock_data_with_technical_indicators3.csv")
df_cleaned = pd.read_csv("stock_data_cleaned_and_features.csv")

# 2 Merge all three on common columns (ticker + date recommended)
#    Avoid duplicate columns by removing repeated ones before merging
def merge_unique(df_left, df_right, on=["ticker", "date"]):
    common_cols = list(set(df_left.columns) & set(df_right.columns))  # common columns
    merge_keys = [k for k in on if k in common_cols]  # ensure merge keys exist
    right_unique = df_right.drop(columns=[c for c in df_right.columns if c in df_left.columns and c not in merge_keys])
    return pd.merge(df_left, right_unique, on=merge_keys, how="inner")

df = merge_unique(df_clustered, df_indicators)
df = merge_unique(df, df_cleaned)

# 3 Define feature categories (includes Analyst Targets & Ratings)
categories = {
    "Analyst_Targets_Ratings": [
        "target_from",
        "rating_from_score",
        "rating_delta",
        "target_delta",
        "target_growth",
        "relative_growth"
    ],
    "Volatility_Range": ["atr", "std_dev", "ulcer_index", "price_distance"],
    "Cumulative_Volume": ["obv", "ad_line", "pvt", "force_index"],
    "Price_Filters": ["hlc3", "typical_price", "vwap", "last_close"]
}

# 4 Normalize features (0–10 scale) per cluster
# Check if cluster column exists
if 'cluster' not in df.columns:
    print("Warning: 'cluster' column not found. Normalizing across entire dataset.")
    scaler = MinMaxScaler(feature_range=(0, 10))
    for cat, feats in categories.items():
        for f in feats:
            col = f"norm_{f}"
            if f in df.columns:
                df[col] = scaler.fit_transform(df[[f]])
            else:
                print(f"Missing feature: {f}")
else:
    # Normalize per cluster - fit scaler separately for each cluster
    for cat, feats in categories.items():
        for f in feats:
            col = f"norm_{f}"
            if f in df.columns:
                # Normalize within each cluster
                df[col] = df.groupby('cluster')[f].transform(
                    lambda x: MinMaxScaler(feature_range=(0, 10)).fit_transform(x.values.reshape(-1, 1)).flatten()
                )
            else:
                print(f"Missing feature: {f}")

# 5 Compute category-level averages
for cat, feats in categories.items():
    norm_feats = [f"norm_{f}" for f in feats if f"norm_{f}" in df.columns]
    if norm_feats:
        df[f"{cat.lower()}_score"] = df[norm_feats].mean(axis=1)

# 6 Compute overall final score (average of all category scores)
category_scores = [f"{cat.lower()}_score" for cat in categories.keys() if f"{cat.lower()}_score" in df.columns]
df["final_score"] = df[category_scores].mean(axis=1)

# 7 Save and preview results
df.to_csv("stock_data_with_scores2.csv", index=False)
print("Final dataset saved to stock_data_with_scores.csv")
print(df[["ticker", "date", "final_score"] + category_scores].head())
print(df.describe())


Final dataset saved to stock_data_with_scores.csv
  ticker                 date  final_score  analyst_targets_ratings_score  \
0   CECO  2025-08-22 00:30:05     1.730154                       3.956294   
1   BLND  2025-08-25 00:30:04     1.723056                       3.545262   
2   FLOC  2025-08-07 00:30:07     1.698187                       3.641419   
3   VYGR  2025-09-16 00:30:09     1.670876                       3.703916   
4   BCBP  2025-07-31 00:30:08     1.793695                       3.975387   

   volatility_range_score  cumulative_volume_score  price_filters_score  
0                1.015578                 1.856554             0.092192  
1                1.480559                 1.859899             0.006503  
2                1.290756                 1.830299             0.030276  
3                1.114417                 1.857361             0.007812  
4                0.900701                 2.288788             0.009904  
       target_from    target_to  rating_fro

## Results and Conclusion

### Process Description
The scoring system successfully integrates multiple data sources and generates comprehensive buy signals by normalizing features across four categories and calculating weighted scores.

### Key Activities
- **Data Integration**: Successfully merged clustered data with technical indicators
- **Feature Normalization**: Scaled all features to 0-10 range for fair comparison
- **Category Scoring**: Generated scores for Analyst, Volatility, Volume, and Price categories
- **Final Score Calculation**: Created overall buy signals ranging from 0-10

### Conclusion
The feature scoring system successfully created a comprehensive buy signal framework that combines analyst sentiment, technical indicators, and market dynamics. The final scores provide actionable investment signals with scores ranging from 1.8 to 2.1, indicating a conservative to moderate buy signal range. The category-based approach ensures balanced evaluation across different market factors, while the normalization process enables fair comparison between diverse feature types. This scoring system provides a robust foundation for investment decision support and portfolio management.
