## ⚡ Final Mission: Mapping SkyNet's Energy Nexus

### 🌐 The Discovery
SkyNet is harvesting energy from Trondheim's buildings. Some structures provide significantly more power than others.

### 🎯 Your Mission
Predict the **Nexus Rating** of unknown buildings in Trondheim (test set).

### 🧠 The Challenge
1. **Target**: Transform the Nexus Rating to reveal true energy hierarchy
2. **Data Quality**: Handle missing values and categorical features
3. **Ensembling**: Use advanced models and ensemble learning

### 💡 Hint
You suspect that an insider has tampered with the columns in the testing data... 

Compare the training and test distributions and try to rectify the test dataset.

### 📊 Formal Requirements
1. **Performance**: Achieve RMSLE <= 0.294 on the test set
2. **Discussion**:

   a. Explain your threshold-breaking strategy

   b. Justify RMSLE usage. Why do we use this metric? Which loss function did you use?

   c. Plot and interpret feature importances

   d. Describe your ensembling techniques

   e. In real life, you do not have the test targets. How would you make sure your model will work good on the unseen data? 

---

In [113]:
import pandas as pd
import numpy as np

train = pd.read_csv('final_mission_train.csv')
test = pd.read_csv('final_mission_test.csv')

In [114]:
from sklearn.metrics import mean_squared_log_error

def rmsle(y_true, y_pred):
    """ Root Mean Squared Logarithmic Error """
    return np.sqrt(mean_squared_log_error(y_true, y_pred))

In [115]:
# Shfit all colummns in the *train* set left by 1, except ownership type
original_nexus = train['nexus_rating'].copy()
copy = train.copy()
train.iloc[:, 1:] = copy.iloc[:, 1:].shift(-1, axis=1)
train['grid_connections'] = original_nexus
train.describe()

Unnamed: 0,ownership_type,nexus_rating,energy_footprint,core_reactor_size,harvesting_space,vertical_alignment,power_chambers,energy_flow_design,upper_collector_height,shared_conversion_units,isolated_conversion_units,internal_collectors,external_collectors,ambient_harvesters,shielded_harvesters,efficiency_grade,grid_connections
count,14455.0,23285.0,18564.0,19403.0,23285.0,23285.0,5643.0,12192.0,19413.0,19413.0,15213.0,15213.0,12765.0,12765.0,13475.0,23205.0,23285.0
mean,1.875683,74.450999,12.552279,38.741367,8.969594,2.189349,1.039695,3.268374,0.737547,0.662855,0.797147,0.468678,0.556365,0.67309,1.270501,1.162293,23556170.0
std,1.089518,58.671373,6.565686,31.39848,8.322039,1.07613,0.351507,10.802728,0.781173,0.716888,0.402137,0.499034,0.641257,0.664779,1.017037,0.456937,52643930.0
min,0.0,9.3,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,900000.0
25%,1.0,42.0,8.2,20.0,3.0,1.0,1.0,2.65,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,7490000.0
50%,2.0,59.8,10.7,30.9,7.0,2.0,1.0,2.8,1.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,10645000.0
75%,3.0,84.8,15.3,45.3,12.0,3.0,1.0,3.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,20500000.0
max,3.0,2181.0,100.0,900.0,95.0,6.0,2.0,340.0,4.0,4.0,1.0,1.0,4.0,4.0,3.0,2.0,2600000000.0


In [116]:
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()
x_train = train.drop(columns=["nexus_rating"]).copy()
y_train = train['nexus_rating']
x_test = test.drop(columns=["nexus_rating"]).copy()
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
x_train.describe()


Unnamed: 0,ownership_type,energy_footprint,core_reactor_size,harvesting_space,vertical_alignment,power_chambers,energy_flow_design,upper_collector_height,shared_conversion_units,isolated_conversion_units,internal_collectors,external_collectors,ambient_harvesters,shielded_harvesters,efficiency_grade,grid_connections
count,14455.0,18564.0,19403.0,23285.0,23285.0,5643.0,12192.0,19413.0,19413.0,15213.0,15213.0,12765.0,12765.0,13475.0,23205.0,23285.0
mean,1.875683,12.552279,38.741367,8.969594,2.189349,1.039695,3.268374,0.737547,0.662855,0.797147,0.468678,0.556365,0.67309,1.270501,1.162293,23556170.0
std,1.089518,6.565686,31.39848,8.322039,1.07613,0.351507,10.802728,0.781173,0.716888,0.402137,0.499034,0.641257,0.664779,1.017037,0.456937,52643930.0
min,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,900000.0
25%,1.0,8.2,20.0,3.0,1.0,1.0,2.65,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,7490000.0
50%,2.0,10.7,30.9,7.0,2.0,1.0,2.8,1.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,10645000.0
75%,3.0,15.3,45.3,12.0,3.0,1.0,3.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,20500000.0
max,3.0,100.0,900.0,95.0,6.0,2.0,340.0,4.0,4.0,1.0,1.0,4.0,4.0,3.0,2.0,2600000000.0


In [117]:
# Convert back the nexus_rating for a fair comparison

print('Required RMSLE: ', 0.294)
print('RMSLE: ', rmsle(test['nexus_rating'], y_pred))

Required RMSLE:  0.294
RMSLE:  0.13017352835856075
