# Final Report: Credit Scoring for Compound V2 Wallets

## Objective
To build a credit scoring system (0–100 scale) for wallets interacting with the Compound V2 protocol using only raw transaction-level data. The goal is to reward responsible behaviors and detect risky or bot-like activities.

---

## Dataset
- **Source:** Compound V2 Raw Transaction Logs  
- **Selected Files:** Top 3 Largest Files (for significant activity coverage)  
- **Key Transaction Types:** `deposits`, `borrows`, `repays`, `withdraws`, `liquidations`

---

## Methodology

### 1. Feature Engineering
Wallet-level features extracted include:
- **Transaction Counts:** e.g., `total_deposits`, `total_borrows`
- **Ratios:** e.g., `deposit_ratio`, `borrow_ratio`
- **Aggregated Values:** e.g., `total_normalized_amount`, `active_days`
- **Activity Duration:** `first_txn`, `last_txn`

### 2. Modeling Approach
- **Unsupervised Learning** was chosen due to the absence of labels.
- **KMeans Clustering** (k=4) was applied after feature scaling.
- **Silhouette Score** = `0.47` indicates moderate but meaningful cluster separation.
- **Elbow Method** and visual inspections via **PCA** and **t-SNE** supported cluster validation.

### 3. Credit Scoring System
- Wallets were scored based on cluster assignment:
  - **Cluster 3 → Score: 100 (Excellent)**
  - **Cluster 2 → Score: 85 (High)**
  - **Cluster 1 → Score: 70 (Medium)**
  - **Cluster 0 → Score: 50 (Low/Risky)**  
- Final scores stored in a CSV for top 1,000 wallets.

---

## Results
- Cluster characteristics revealed behavioral patterns:
  - High-score clusters had higher deposits, repayments, and active days.
  - Low-score clusters exhibited more liquidations or erratic behavior.
- Visualizations using PCA and t-SNE confirmed natural grouping.

---

## Insights
- **Responsible Users** show regular deposits, borrow-repay cycles, and fewer liquidations.
- **Risky Users** show short lifespans, few actions, or aggressive borrow-withdraw behavior.



# Credit Scoring for Compound V2 Wallets

## 1. Defining Criteria for "Good" and "Bad" Wallet Behavior

### Good Wallet Behavior:
- **Consistent Activity:** Regular transactions over a period of time (deposit, borrow, repay) indicating responsible use.
- **Low Liquidation Frequency:** Wallets with few or no liquidations, as liquidations suggest risk or failure to repay.
- **High Repayment Ratio:** Higher repayment to borrow ratio suggests a wallet is likely to repay borrowed assets, indicating financial responsibility.
- **Active Wallets:** A longer period of activity with balanced deposits and withdrawals.
- **Non-Bot-like Behavior:** A wallet that has natural transactional patterns without frequent, erratic transactions.
  
### Bad Wallet Behavior:
- **High Liquidation Frequency:** Multiple liquidations indicate risky behavior and an inability to meet obligations.
- **Erratic Transactions:** High frequency of rapid deposits and withdrawals with minimal engagement between them could indicate bot-like behavior.
- **High Borrowing with Low Repayment:** High borrowing without sufficient repayment suggests the wallet may be exploiting the protocol or not using it responsibly.
- **Short Account Lifespan:** Wallets with limited interaction duration may reflect short-term exploitation or bot-like behavior.

---

## 2. Engineering Features from Raw Transaction Logs

We derived the following wallet-level features from the raw transaction logs:

- **Transaction Counts:**
  - `total_txns`: Total number of transactions.
  - `total_deposits`: Number of deposit transactions.
  - `total_borrows`: Number of borrow transactions.
  - `total_withdraws`: Number of withdrawal transactions.
  - `total_repays`: Number of repayment transactions.
  - `total_liquidations`: Number of liquidation transactions.
  
- **Aggregated Features:**
  - `total_normalized_amount`: Total amount involved in all transactions.
  - `avg_normalized_amount`: Average transaction amount.
  - `active_days`: Number of active days based on the first and last transaction dates.
  
- **Transaction Ratios:**
  - `deposit_ratio`: Proportion of deposits to total transactions.
  - `borrow_ratio`: Proportion of borrowings to total transactions.
  - `repay_ratio`: Proportion of repayments to total transactions.
  - `withdraw_ratio`: Proportion of withdrawals to total transactions.
  - `liquidation_ratio`: Proportion of liquidations to total transactions.

These features capture both the quantity and quality of wallet activity.

---

## 3. Choosing and Justifying Modeling Approach

### Chosen Approach: **Unsupervised Learning - Clustering (KMeans)**

- **Justification for Clustering:**
  - **Lack of Labeled Data:** Since we don't have labeled data, supervised learning is not feasible. Instead, unsupervised learning allows us to detect natural patterns in the data without needing predefined labels.
  - **Behavioral Grouping:** KMeans clustering helps segment wallets into distinct behavioral groups (e.g., good vs. bad) based on the wallet-level features we engineered.
  - **Cluster Interpretation:** We can interpret clusters based on transaction behavior and assign scores accordingly.

- **Clustering Details:**
  - We used **KMeans** clustering with 4 clusters (chosen based on Elbow Method and silhouette score).
  - Clustering groups wallets with similar behavior, and clusters are mapped to credit scores, where high-scoring clusters indicate responsible behavior and low-scoring ones indicate risky or exploitative behavior.
  
- **Evaluation:**
  - The **Silhouette Score** for cluster validity was 0.47, which suggests reasonable clustering with moderate separation between the groups.
  - We also visualized clusters using **PCA** and **t-SNE**, confirming that the clusters formed distinct groups.

---

## 4. Designing a Credit Scoring System

### Credit Scoring Design:
- The credit score is based on the cluster a wallet belongs to. Each cluster represents a different behavioral pattern, with the following mapping:
  - **Cluster 3 (Excellent)**: Score = 100 (Responsible behavior with high deposits, repayments, and low liquidations).
  - **Cluster 2 (High)**: Score = 85 (Generally responsible but may have some room for improvement).
  - **Cluster 1 (Medium)**: Score = 70 (Moderate use, potential for better behavior).
  - **Cluster 0 (Low/Risky)**: Score = 50 (Erratic behavior with frequent liquidations or high borrowing without repayment).

- **Scoring Validation:**
  - We examined the top 5 and bottom 5 scoring wallets to confirm the patterns:
    - High-scoring wallets had balanced deposits, borrows, and repayments, and minimal liquidations.
    - Low-scoring wallets had irregular behaviors or frequent liquidations.
  
- **Final Score Output:**
  - The wallet scores were saved in a CSV file containing the top 1,000 wallets sorted by their credit scores.

This scoring system reflects wallet behavior's quality and aligns with the goal of promoting healthy protocol use, while detecting risky or exploitative users.

---

## Conclusion

The developed scoring system successfully groups wallets based on transactional behaviors and assigns credit scores reflecting their reliability and risk. This methodology offers a robust approach to identifying responsible users in decentralized protocols like Compound V2, and the system is designed to be scalable for more complex datasets.

