# From Data to Decisions: Predicting Credit Risk in Commodity Trading

### **Why This Matters**
In the high-stakes world of commodity trading, every decision counts. Counterparty defaults can lead to massive financial losses, disrupted operations, and damaged reputations. The key to staying ahead? **Data-driven credit risk management.**

This project answers a critical question: **How can we use advanced machine learning models to predict credit risk, protect our trading operations, and ensure financial stability?**

### **Our Journey**
To solve this challenge, we embarked on a structured analysis journey:
1. **Understanding the Data**: What do our financial and exposure metrics tell us about counterparties?
2. **Building Baseline Models**: Starting simple to learn what works—and what doesn’t.
3. **Refining Our Approach**: Fine-tuning high-performing models to deliver actionable insights.
4. **Driving Impact**: Translating predictions into smarter, risk-aware business decisions.

Let’s dive in.






## Understanding the Dataset: What Are We Working With?

To assess credit risk, we analyzed a dataset of **2029 counterparties**, enriched with:
1. **Financial Ratios**: Metrics like Current Ratio, Debt Ratio, and Net Profit Margin that reflect financial health.
2. **Exposure Metrics**: Indicators of how much we’re at risk with each counterparty.
3. **Ratings**:
   - **Internal Ratings**: Our target variable, a proprietary creditworthiness score (1–10).
   - **External Ratings**: Credit scores from agencies like Moody’s, providing an external benchmark.

### **Key Numbers at a Glance**
- **36 Columns**, capturing qualitative and quantitative features.
- **8 Missing Values**, handled through imputation strategies.
- Top sectors: Energy, Manufacturing, Finance.



### Why This Matters
These metrics help us uncover patterns in counterparty risk, guiding better decisions. For instance, higher debt ratios often point to riskier entities, while strong liquidity metrics (like Current Ratio) signal financial stability.


## What Does the Data Tell Us?

### Key Observations:
1. **Credit Ratings Are Skewed**:
   - Many counterparties are rated highly (8–10), but a closer look reveals variability across sectors.
   - Sectors like **Energy** and **Basic Industries** consistently score higher, indicating low risk.

2. **Exposure Matters**:
   - Negative exposures (where counterparties owe us money) align with higher internal ratings, showing they’re seen as safer bets.
   - Higher total exposures often correlate with lower ratings, highlighting potential risk.

3. **Liquidity is Key**:
   - Ratios like Current Ratio strongly correlate with creditworthiness, suggesting companies with higher liquidity are more reliable.

### Why This Matters
By uncovering these trends, we can pinpoint which counterparties and sectors deserve more scrutiny—and which are safer bets.


## Baseline Modeling

### Models Trained:
1. **Linear Regression** (Baseline)
2. **Random Forest**
3. **XGBoost**
4. **Gradient Boosted Trees**
5. **Support Vector Regression**

### Results Summary:
| **Model**                 | **R² (CV)**        | **Strengths**                                                       | **Weaknesses**                                                       |
|---------------------------|--------------------|----------------------------------------------------------------------|------------------------------------------------------------------------|
| **Linear Regression**     | -1355754.97       | Simple baseline for comparison.                                      | Cannot handle non-linear relationships or complex feature interactions. |
| **Random Forest**         | 0.9646            | Handles non-linear relationships and feature interactions well.       | Slightly outperformed by Gradient Boosted Trees.                      |
| **XGBoost**               | 0.9672            | High precision, robust to noisy data, and models complex interactions. | Outperformed by Gradient Boosted Trees in some cases.                  |
| **Gradient Boosted Trees**| 0.9743            | Highest R², efficient histogram-based training.                       | Requires hyperparameter tuning for further improvement.                |
| **Support Vector Regression** | 0.8706       | Captures non-linear relationships.                                   | Struggles with scalability and is outperformed by tree-based models.   |

Gradient Boosted Trees emerged as the top-performing model based on accuracy, robustness, and interpretability.


## Building the Brain: How We Made Predictions

### Our Approach
We tested several models to predict credit risk, ranging from simple to sophisticated:
1. **Linear Regression**: A quick baseline—straightforward but too simplistic for complex data.
2. **Tree-Based Models (Random Forest, XGBoost, Gradient Boosted Trees)**: These excel at capturing non-linear relationships and interactions.
3. **Support Vector Regression**: A flexible model for spotting patterns.

### The Winner: Gradient Boosted Trees
After rigorous testing, **Gradient Boosted Trees** emerged as the top performer:
- **Accuracy**: Captured 97.5% of the variance in credit risk.
- **Efficiency**: Quick to train and easy to deploy.

---

### Why This Matters
Our final model not only predicts risk with precision but also provides insights into the most important factors driving creditworthiness.


## Turning Predictions into Actions: Business Insights

### Key Findings:
1. **What Drives Risk?**
   - **External Ratings**: A strong external rating is a reliable indicator of creditworthiness.
   - **Exposure Levels**: High exposure signals higher risk, while negative exposure (where we are owed money) aligns with safer entities.
   - **Liquidity Ratios**: Companies with stronger liquidity metrics (e.g., Current Ratio) are safer bets.

2. **Sector Insights**:
   - High-rated sectors like **Energy** and **Basic Industries** are stable and reliable.
   - Riskier sectors like **Consumer Services** and **Technology** demand closer scrutiny.

---

### Recommendations:
1. **Proactively Manage High-Risk Counterparties**:
   - Use the model to flag entities with low predicted ratings and reevaluate trading volumes or collateral requirements.

2. **Optimize Sector Focus**:
   - Expand trading partnerships in low-risk sectors while increasing due diligence in higher-risk ones.

3. **Leverage Feature Insights**:
   - Regularly monitor liquidity ratios and exposure levels for early warnings of risk.


## Turning Predictions into Actions: Business Insights

### Key Findings:
1. **What Drives Risk?**
   - **External Ratings**: A strong external rating is a reliable indicator of creditworthiness.
   - **Exposure Levels**: High exposure signals higher risk, while negative exposure (where we are owed money) aligns with safer entities.
   - **Liquidity Ratios**: Companies with stronger liquidity metrics (e.g., Current Ratio) are safer bets.

2. **Sector Insights**:
   - High-rated sectors like **Energy** and **Basic Industries** are stable and reliable.
   - Riskier sectors like **Consumer Services** and **Technology** demand closer scrutiny.

---

### Recommendations:
1. **Proactively Manage High-Risk Counterparties**:
   - Use the model to flag entities with low predicted ratings and reevaluate trading volumes or collateral requirements.

2. **Optimize Sector Focus**:
   - Expand trading partnerships in low-risk sectors while increasing due diligence in higher-risk ones.

3. **Leverage Feature Insights**:
   - Regularly monitor liquidity ratios and exposure levels for early warnings of risk.


## The Bottom Line: Smarter Credit Risk Management

This project equips decision-makers with a powerful tool for assessing credit risk:
- **Precision**: Predicting creditworthiness with 97.5% accuracy.
- **Actionable Insights**: Pinpointing risky entities and sectors with clear, data-backed recommendations.
- **Future-Ready**: A scalable, interpretable model ready for deployment in real-world scenarios.

### What’s Next:
1. Deploy the model for real-time risk assessment.
2. Validate predictions with new data to ensure long-term reliability.
3. Expand the model to incorporate additional metrics like macroeconomic indicators.

With these tools, you can minimize financial risk, strengthen counterparty relationships, and make data-driven decisions with confidence.
