# Insights & Limitations

Business insights and honest project limitations.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('whitegrid')

In [2]:
df = pd.read_csv('../Data/Processed/nse_clustered.csv')
print(f"Loaded {len(df)} stocks")

Loaded 57 stocks


## Key Insights

In [3]:
summary = df.groupby('Risk_Profile').agg({
    'Stock_code': 'count',
    'volatility_mean': 'mean',
    'sharpe_ratio': 'mean',
    'max_drawdown': 'mean'
}).round(3)

summary.columns = ['Count', 'Avg Vol', 'Sharpe', 'Max DD']
print("\nCluster Summary:")
print(summary.reindex(['Low Risk', 'Medium-Low Risk', 'Medium-High Risk', 'High Risk']))

print("\nðŸ“Š Interpretation:")
print("- Low Risk: Low volatility, stable, minimal drawdowns")
print("- Medium-Low: Moderate volatility, decent Sharpe")
print("- Medium-High: Higher volatility, acceptable risk/reward")
print("- High Risk: Very volatile, large drawdowns")


Cluster Summary:
                  Count  Avg Vol  Sharpe  Max DD
Risk_Profile                                    
Low Risk              5    0.039   0.006  -0.562
Medium-Low Risk       4    0.021   0.014  -0.535
Medium-High Risk     47    0.026   0.019  -0.473
High Risk             1    0.005   0.887  -0.070

ðŸ“Š Interpretation:
- Low Risk: Low volatility, stable, minimal drawdowns
- Medium-Low: Moderate volatility, decent Sharpe
- Medium-High: Higher volatility, acceptable risk/reward
- High Risk: Very volatile, large drawdowns


## Sector Patterns

In [4]:
if 'Sector' in df.columns:
    sector_risk = pd.crosstab(df['Sector'], df['Risk_Profile'])
    sector_risk['Dominant'] = sector_risk.idxmax(axis=1)
    
    print("\nSector Risk Tendencies:")
    print(sector_risk[['Dominant', 'Low Risk', 'Medium-Low Risk', 'Medium-High Risk', 'High Risk']])


Sector Risk Tendencies:
Risk_Profile                              Dominant  Low Risk  Medium-Low Risk  \
Sector                                                                          
Agricultural                      Medium-High Risk         0                0   
Automobiles and Accessories       Medium-High Risk         0                0   
Banking                           Medium-High Risk         0                1   
Commercial and Services           Medium-High Risk         2                0   
Construction and Allied           Medium-High Risk         1                0   
Energy and Petroleum              Medium-High Risk         0                2   
Exchange Traded Funds             Medium-High Risk         0                0   
Insurance                         Medium-High Risk         0                0   
Investment                        Medium-High Risk         1                0   
Investment Services               Medium-High Risk         0                0   
Man

## Business Applications

### For Investors:
1. **Portfolio Construction**
   - Conservative: 70% Low Risk, 20% Medium-Low, 10% Medium-High
   - Balanced: 30% Low, 40% Medium-Low, 20% Medium-High, 10% High
   - Aggressive: 10% Low, 20% Medium-Low, 30% Medium-High, 40% High

2. **Risk Monitoring**
   - Track if stocks drift between clusters
   - Rebalance when risk profiles change

3. **Stock Screening**
   - Filter by risk before detailed analysis

## Limitations

### 1. Historical Data Only
- Based on past 3 years
- Future risk may differ
- **Mitigation**: Regular retraining

### 2. No Fundamental Analysis
- Only technical data (price/volume)
- Missing: Earnings, debt, management
- **Mitigation**: Use as initial filter

### 3. Market Context Ignored
- Doesn't account for market conditions
- Low-risk stock in bull market â‰  low-risk in crash
- **Mitigation**: Add market regime detection

### 4. Limited Sample Size
- Only 57 NSE stocks
- May not capture full spectrum
- **Mitigation**: Include more stocks

### 5. K-Means Assumptions
- Assumes spherical clusters
- Reality may be more complex
- **Alternative**: Try DBSCAN, hierarchical

## Future Improvements

### Short-term:
1. Add more NSE stocks
2. Time-based clustering (track evolution)
3. Streamlit dashboard

### Medium-term:
4. Add fundamental features (P/E, ROE)
5. Market regime detection
6. Ensemble clustering

### Long-term:
7. Real-time updates
8. Deep learning autoencoders
9. Multi-asset (bonds, crypto)

## Conclusion

### What We Accomplished:
âœ… Built data-driven stock clustering system
âœ… Identified 4 clear risk profiles
âœ… Achieved good separation (Silhouette > 0.4)
âœ… Created reusable, modular code

### Key Learning:
**Good features > Complex algorithms**

Adding Sharpe ratio and technical indicators improved silhouette from 0.32 â†’ 0.5+

### Real-world Value:
This model provides a **starting point** for risk assessment.

Combine with:
- Fundamental analysis
- Market context
- Expert judgment

**Remember**: Models simplify reality. Always validate before investing!