# ESG and Financial Performance Clustering Analysis - Master Index

## 📋 Project Overview
This project performs comprehensive clustering analysis on ESG (Environmental, Social, Governance) and financial performance data to identify distinct company profiles and generate actionable business insights.

## 🎯 Objectives
- **Identify** distinct clusters of companies based on ESG and financial performance
- **Analyze** relationships between ESG scores and financial metrics
- **Provide** actionable insights for investment and sustainability strategies
- **Generate** data-driven recommendations for stakeholders

## 📊 Dataset Information
- **Source**: Kaggle - ESG and Financial Performance Dataset
- **Size**: 11,000+ company records
- **Features**: 16 variables including financial metrics, ESG scores, and environmental impact measures
- **Time Period**: Multi-year company data (2015-2019+)

## 🗂️ Notebook Structure
This analysis is organized into **4 focused notebooks** for better modularity and clarity:

### **[01_Data_Exploration_EDA.ipynb](./01_Data_Exploration_EDA.ipynb)**
**Purpose**: Comprehensive exploratory data analysis  
**Key Tasks**:
- Load and inspect the ESG dataset
- Assess data quality and missing values
- Analyze distributions of financial and ESG metrics
- Explore correlations and relationships
- Investigate industry and regional patterns  
**Outputs**: Clean dataset, EDA insights, data quality report

---

### **[02_Data_Preprocessing.ipynb](./02_Data_Preprocessing.ipynb)**
**Purpose**: Prepare data for clustering algorithms  
**Key Tasks**:
- Feature engineering and selection
- Handle missing values and outliers
- Scale and normalize features
- Apply dimensionality reduction (PCA, t-SNE)
- Create analysis-ready datasets  
**Outputs**: Scaled features, PCA/t-SNE transformed data, preprocessed datasets

---

### **[03_Clustering_Analysis.ipynb](./03_Clustering_Analysis.ipynb)**
**Purpose**: Implement and evaluate clustering algorithms  
**Key Tasks**:
- Determine optimal number of clusters (Elbow method, Silhouette analysis)
- Apply multiple clustering algorithms (K-Means, Hierarchical, DBSCAN, GMM)
- Validate and compare clustering performance
- Select best performing clustering solution  
**Outputs**: Cluster assignments, performance metrics, algorithm comparison

---

### **[04_Visualization_Insights.ipynb](./04_Visualization_Insights.ipynb)**
**Purpose**: Generate insights and business recommendations  
**Key Tasks**:
- Create comprehensive cluster visualizations
- Profile each cluster in detail
- Analyze industry and regional cluster patterns
- Generate actionable business insights
- Develop investment recommendations  
**Outputs**: Cluster profiles, business insights, interactive dashboards

---

## 🔄 Execution Workflow
**Follow this sequence for optimal results:**

1. **Start Here** → `01_Data_Exploration_EDA.ipynb`
2. **Then** → `02_Data_Preprocessing.ipynb`
3. **Next** → `03_Clustering_Analysis.ipynb`
4. **Finally** → `04_Visualization_Insights.ipynb`

## 📁 Data Files Generated
Each notebook creates specific output files for the next stage:

```
Data/
├── company_esg_financial_dataset.csv          # Original dataset
├── eda_complete_dataset.csv                   # After EDA
├── preprocessed_complete_dataset.csv          # After preprocessing
├── scaled_features.csv                        # Scaled numerical features
├── pca_features.csv                           # PCA transformed features
├── tsne_features.csv                          # t-SNE coordinates
├── clustered_dataset.csv                      # Final dataset with clusters
├── clustering_comparison.csv                  # Algorithm performance comparison
├── cluster_summary_report.csv                 # Detailed cluster profiles
├── industry_cluster_distribution.csv          # Industry patterns by cluster
└── region_cluster_distribution.csv            # Regional patterns by cluster
```

## 🛠️ Technical Requirements
**Required Libraries** (auto-installed in notebooks):
- **Data**: `pandas`, `numpy`
- **Visualization**: `matplotlib`, `seaborn`, `plotly`
- **ML/Clustering**: `scikit-learn`, `scipy`
- **Utilities**: `kneed`, `yellowbrick`
- **Data Source**: `kagglehub`

## 📈 Expected Outcomes
By completing this analysis, you will have:

✅ **Identified** distinct company clusters based on ESG and financial performance  
✅ **Characterized** each cluster with detailed financial and ESG profiles  
✅ **Discovered** industry and regional patterns within clusters  
✅ **Generated** actionable insights for investment decisions  
✅ **Created** comprehensive visualizations and dashboards  
✅ **Developed** data-driven recommendations for stakeholders  

## 🚀 Getting Started
1. **Ensure** you have the dataset downloaded (run `Data/raw_data.ipynb` first if needed)
2. **Open** `01_Data_Exploration_EDA.ipynb` to begin the analysis
3. **Follow** the sequential workflow through all 4 notebooks
4. **Review** the generated insights and recommendations in the final notebook

---

## 📞 Support
If you encounter any issues:
- Check that all required libraries are installed
- Ensure notebooks are run in sequence
- Verify data files are generated in the correct locations
- Review error messages for specific troubleshooting guidance

**Happy Analyzing! 📊✨**