# Chapter 4: Development and Implementation

## 4.1 Development Process
**Tools Used:**
- Python, scikit-learn, pandas, matplotlib
- VSCode for development
- GitHub for version control

**Milestones:**
- Phase 1: NFT metadata regression
- Phase 2: NFT + Collection info
- Phase 3: NFT + Collection + Owner info
- Phase 4: Combined model and classification

**Challenges Faced:**
- Inconsistent timestamps → solved by parsing and feature engineering
- Imbalanced dataset → resolved with median-based binarization
- Multiple phases with normalized targets


## 4.2 Model Implementation

In [2]:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
import matplotlib.pyplot as plt

df = pd.read_csv('data\\Training_dataset_Cleaned_all_phasses_updated.csv')
df = df.dropna()
df['LAST_SALE_TIME_FORMATTED'] = pd.to_datetime(df['LAST_SALE_TIME_FORMATTED'], format='%d/%m/%Y %H:%M')
df['SALE_HOUR'] = df['LAST_SALE_TIME_FORMATTED'].dt.hour
df['SALE_DAY'] = df['LAST_SALE_TIME_FORMATTED'].dt.day
df['SALE_MONTH'] = df['LAST_SALE_TIME_FORMATTED'].dt.month
df.drop(columns=['LAST_SALE_TIME_FORMATTED'], inplace=True)
df['CONTRACT_ADDRESS'] = LabelEncoder().fit_transform(df['CONTRACT_ADDRESS'])
df['CURRENT_OWNER'] = LabelEncoder().fit_transform(df['CURRENT_OWNER'])

def evaluate_model(y_true, y_pred):
    return {
        "RMSE": np.sqrt(mean_squared_error(y_true, y_pred)),
        "MAE": mean_absolute_error(y_true, y_pred),
        "R2": r2_score(y_true, y_pred)
    }

phases = {
    "Phase 1": ['TOKEN_ID', 'NFT_FIRST_PRICE', 'NFT_AVG_PRICE', 'NFT_SALE_COUNT'],
    "Phase 2": ['TOKEN_ID', 'NFT_FIRST_PRICE', 'NFT_AVG_PRICE', 'NFT_SALE_COUNT',
                'CONTRACT_ADDRESS', 'COLLECTION_AVG_PRICE', 'COLLECTION_CEILING_PRICE', 'TOTAL_VOLUME_USD'],
    "Phase 3": ['NFT_FIRST_PRICE', 'NFT_AVG_PRICE', 'NFT_SALE_COUNT',
                'OWNER_TOTAL_PURCHASES', 'OWNER_DIVERSE_COLLECTIONS',
                'OWNER_AVG_PURCHASE_PRICE', 'SALE_HOUR', 'SALE_DAY', 'SALE_MONTH'],
    "Phase 4": ['NFT_FIRST_PRICE', 'NFT_AVG_PRICE', 'NFT_SALE_COUNT',
                'CONTRACT_ADDRESS', 'COLLECTION_AVG_PRICE', 'COLLECTION_CEILING_PRICE', 'TOTAL_VOLUME_USD',
                'OWNER_TOTAL_PURCHASES', 'OWNER_DIVERSE_COLLECTIONS', 'OWNER_AVG_PURCHASE_PRICE',
                'SALE_HOUR', 'SALE_DAY', 'SALE_MONTH']
}

scaler = MinMaxScaler()
results = []

for phase_name, features in phases.items():
    X = df[features].copy()
    y = df['NFT_LAST_PRICE']
    y_scaled = scaler.fit_transform(y.values.reshape(-1, 1)).ravel()
    X_train, X_test, y_train, y_test = train_test_split(X, y_scaled, test_size=0.4, random_state=42)

    for model_name, model in [('Random Forest', RandomForestRegressor(random_state=42)),
                              ('AdaBoost', AdaBoostRegressor(random_state=42))]:
        model.fit(X_train, y_train)
        preds = model.predict(X_test)
        metrics = evaluate_model(y_test, preds)
        results.append({
            'Phase': phase_name,
            'Model': model_name,
            **metrics
        })

metrics_df = pd.DataFrame(results)
metrics_df


FileNotFoundError: [Errno 2] No such file or directory: 'data\\Training_dataset_Cleaned_all_phasses_updated.csv'

## Chapter 5: Testing and Evaluation
### 5.2 Evaluation - Metrics and Graphs

In [None]:
# Plot RMSE comparison
plt.figure(figsize=(10, 6))
for model in metrics_df['Model'].unique():
    subset = metrics_df[metrics_df['Model'] == model]
    plt.plot(subset['Phase'], subset['RMSE'], marker='o', label=f'{model} - RMSE')

plt.title('RMSE Comparison Across Phases')
plt.xlabel('Phase')
plt.ylabel('RMSE')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

# Plot R2 comparison
plt.figure(figsize=(10, 6))
for model in metrics_df['Model'].unique():
    subset = metrics_df[metrics_df['Model'] == model]
    plt.plot(subset['Phase'], subset['R2'], marker='o', label=f'{model} - R²')

plt.title('R² Score Comparison Across Phases')
plt.xlabel('Phase')
plt.ylabel('R² Score')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()


## Chapter 6: Conclusion
### 6.1 Summary
This project developed and evaluated a multi-phase prediction system for NFT Music Prices using Random Forest and AdaBoost.

### 6.2 Limitations
- Dataset is limited in diversity
- Limited interpretability of tree-based models
- Median-based binarization may not generalize well

### 6.3 Future Work
- Incorporate deep learning and transformer models
- Increase diversity and volume of NFT music dataset
- Create real-time prediction platform or API
