# **Executive Report: Unsupervised Learning Analysis on S&P 500 Stocks**  

## **Prepared for:** Senior Leadership Team  
## **Prepared by:** Ogunbodede Tolulope Israel  
## **Date:** [15/03/2024]  

## **1. Executive Summary**  
This report presents an analysis of **S&P 500 stocks** using **unsupervised learning techniques**, specifically **clustering and dimensionality reduction**. The objective is to uncover patterns in stock behavior, identify natural groupings, and detect potential outliers that may impact investment decisions.  

Key findings indicate that **three distinct stock clusters** were identified, with **Tesla, Nvidia, and Netflix classified as high-risk outliers**. The **DBSCAN algorithm** proved most effective in detecting these anomalies. The analysis leverages **Principal Component Analysis (PCA)** for dimensionality reduction and various clustering techniques to segment stocks based on volatility, returns, and fundamental indicators.  

## **2. Objective of the Analysis**  
The primary goal of this study is to use **unsupervised learning models** to:  
- Identify natural stock groupings based on financial and market behavior.  
- Detect high-risk stocks that may require different investment strategies.  
- Leverage **PCA and clustering models** to visualize and analyze market patterns.  

## **3. Data Overview**  
The dataset consists of **historical stock data (2021–2024)** retrieved from **Yahoo Finance**, focusing on **10 major S&P 500 companies**. Key attributes include:  
- **Stock prices:** Adjusted closing price data.  
- **Market volatility:** Computed using a 30-day rolling standard deviation.  
- **Fundamental indicators:** Price-to-Earnings (P/E) ratio, Dividend Yield, and Market Capitalization.  

## **4. Methodology**  
### **4.1 Data Preparation**  
- **Feature Engineering:** Computed daily returns and volatility.  
- **Missing Value Handling:** Applied sector-based median imputation.  
- **Data Scaling:** Used **MinMaxScaler** for consistent clustering performance.  

### **4.2 Dimensionality Reduction using PCA**  
- **PCA was applied** to reduce dimensionality from five features to two principal components, retaining **70.5 percent of variance**.  
- **Visualizations used:**  
  - **PCA Scree Plot** to justify dimensionality reduction.  
  - **PCA Scatter Plot** to visualize stock distributions before clustering.  

### **4.3 Clustering Techniques Evaluated**  
#### **K-Means Clustering**  
- **Elbow Method Plot:** Identified the optimal number of clusters as three.  
- **Silhouette Score Plot:** Verified the effectiveness of clustering separation.  
- **PCA Projection Plot:** Provided a 2D visual representation of stock groupings.  

#### **Hierarchical Clustering**  
- **Dendrogram Plot:** Used to determine the best cluster separation, confirming three optimal clusters.  
- **Cluster Heatmap:** Illustrated stock relationships under different distance metrics.  

#### **DBSCAN (Density-Based Spatial Clustering)**  
- **k-Distance Plot:** Determined the optimal parameter settings (`eps=0.8`).  
- **DBSCAN PCA Scatter Plot:** Successfully detected **outliers**, unlike K-Means, which forced all stocks into a cluster.  

## **5. Key Findings and Insights**  
Three distinct stock clusters were identified:  

- **Cluster 0 - Stable Stocks:** JPMorgan, ExxonMobil (Low volatility, steady returns).  
- **Cluster 1 - Tech Growth Stocks:** Apple, Amazon, Google, Meta, Microsoft (High correlation in price movements).  
- **Outliers - High-Risk Stocks:** Tesla, Netflix, Nvidia (Unpredictable price swings, high risk-reward potential).  

DBSCAN emerged as the **most effective model**, successfully identifying **high-risk stocks** that exhibited unique market behavior.  

## **6. Limitations and Future Considerations**  
- **Limitations:**  
  - DBSCAN's results vary depending on parameter tuning.  
  - Market conditions could influence clustering results over time.  

- **Future Steps:**  
  - Incorporate **macroeconomic indicators** such as interest rates and inflation.  
  - Apply **supervised learning** to predict future cluster assignments.  
  - Explore **time-series clustering** to analyze stock behavior over time.  

## **7. Conclusion**  
This analysis has revealed **hidden stock groupings** that can enhance investment decision-making. **Visualizations played a crucial role** in guiding model selection and providing actionable insights. Further refinements in clustering methodologies and additional data integration will support **real-time trading applications**.