<a href="https://colab.research.google.com/github/SunbirdAI/lamwo-electrification-project/blob/main/notebooks/predicting_minigrid_villages/predict_electrification_strategy_for_village.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Inference for village electrification strategy predictor

Given village data, predict the appropriate electrification strategy for the village. Predict using Logistic regression and Random Forest.

In [1]:
import pandas as pd
import numpy as np
import pickle
from sklearn.preprocessing import StandardScaler

## 1. Model Loading Class

In [2]:
class ElectrificationPredictor:
    def __init__(self, log_reg_path, rf_path, scaler_path, feature_names):
        """Initialize predictor with trained models and scaler"""
        # Load trained models and scaler
        self.log_reg = pickle.load(open(log_reg_path, 'rb'))
        self.rf = pickle.load(open(rf_path, 'rb'))
        self.scaler = pickle.load(open(scaler_path, 'rb'))
        self.feature_names = feature_names
        self.classes = ['Grid extension', 'Existing grid', 'Solar home system', 'minigrid']

    def preprocess_input(self, data):
        """Preprocess input data for prediction"""
        # Convert to DataFrame if single instance
        if isinstance(data, dict):
            data = pd.DataFrame([data])

        # Ensure all required features are present
        for feature in self.feature_names:
            if feature not in data.columns:
                data[feature] = 0

        # Select only the features used in training
        data = data[self.feature_names]

        # Handle categorical variable
        if 'contains_protected_area' in data.columns:
            data['contains_protected_area'] = data['contains_protected_area'].astype(int)

        # Fill missing values
        data = data.fillna(0)

        # Scale the features
        data_scaled = self.scaler.transform(data)

        return data_scaled

    def predict(self, data, model_type='random_forest'):
        """Make predictions using specified model"""
        # Preprocess the data
        data_scaled = self.preprocess_input(data)

        # Select model
        model = self.rf if model_type == 'random_forest' else self.log_reg

        # Make prediction
        prediction = model.predict(data_scaled)
        probabilities = model.predict_proba(data_scaled)

        return prediction, probabilities

    def get_probability_df(self, data, model_type='random_forest'):
        """Return predictions with probabilities as DataFrame"""
        prediction, probabilities = self.predict(data, model_type)

        prob_df = pd.DataFrame(
            probabilities,
            columns=self.classes
        )
        prob_df['prediction'] = prediction

        return prob_df


## 2. Load saved models

In [3]:
!unzip electrification_strategy_models.zip

Archive:  electrification_strategy_models.zip
   creating: models/
  inflating: models/rf_model.pkl     
  inflating: models/feature_names.txt  
  inflating: models/log_reg_model.pkl  
  inflating: models/scaler.pkl       


Load feature names

In [4]:
with open('models/feature_names.txt', 'r') as f:
    feature_names = f.read().split(',')

Initialize predictors

In [5]:
predictor = ElectrificationPredictor(
    log_reg_path='models/log_reg_model.pkl',
    rf_path='models/rf_model.pkl',
    scaler_path='models/scaler.pkl',
    feature_names=feature_names
)

## 3. Example Usage

### Example 1: Single instance prediction

In [6]:
single_village = {
    'facilities': 1,
    'grid_extension': 0,
    'existing_grid': 0,
    'mean_ndvi': 0.3,
    'mean_wind_speed': 1.2,
    'mean_pvout_solar_radiation': 1600,
    'building_count': 200,
    'permanent_building_count': 50,
    'educational_facilities': 1,
    'health_facilities': 0,
    'social_facilities': 0,
    'services': 0,
    'primary_roads': 0,
    'secondary_roads': 0,
    'tertiary_roads': 1,
    'unclassified_roads': 2,
    'percentage_crop_land': 20.0,
    'percentage_built_area': 5.0,
    'contains_protected_area': False
}

# Make prediction with Random Forest
print("\nSingle Instance Prediction (Random Forest):")
rf_pred_df = predictor.get_probability_df(single_village, 'random_forest')
print(rf_pred_df)

# Make prediction with Logistic Regression
print("\nSingle Instance Prediction (Logistic Regression):")
lr_pred_df = predictor.get_probability_df(single_village, 'logistic_regression')
print(lr_pred_df)


Single Instance Prediction (Random Forest):
   Grid extension  Existing grid  Solar home system  minigrid prediction
0            0.01           0.01               0.47      0.51   minigrid

Single Instance Prediction (Logistic Regression):
   Grid extension  Existing grid  Solar home system  minigrid prediction
0        0.001761       0.002266           0.350057  0.645916   minigrid


### Example 2: Batch prediction

In [7]:
batch_data = pd.DataFrame([
    {
        'facilities': 0,
        'grid_extension': 0,
        'existing_grid': 0,
        'mean_ndvi': 0.0,
        'mean_wind_speed': 0.0,
        'mean_pvout_solar_radiation': 1550,
        'building_count': 50,
        'permanent_building_count': 10,
        'educational_facilities': 0,
        'health_facilities': 0,
        'social_facilities': 0,
        'services': 0,
        'primary_roads': 0,
        'secondary_roads': 0,
        'tertiary_roads': 0,
        'unclassified_roads': 0,
        'percentage_crop_land': 15.0,
        'percentage_built_area': 2.0,
        'contains_protected_area': True
    },
    {
        'facilities': 2,
        'grid_extension': 1,
        'existing_grid': 0,
        'mean_ndvi': 0.4,
        'mean_wind_speed': 1.5,
        'mean_pvout_solar_radiation': 1650,
        'building_count': 500,
        'permanent_building_count': 100,
        'educational_facilities': 1,
        'health_facilities': 1,
        'social_facilities': 0,
        'services': 1,
        'primary_roads': 1,
        'secondary_roads': 1,
        'tertiary_roads': 2,
        'unclassified_roads': 3,
        'percentage_crop_land': 30.0,
        'percentage_built_area': 15.0,
        'contains_protected_area': False
    }
])

print("\nBatch Prediction (Random Forest):")
batch_pred_df = predictor.get_probability_df(batch_data, 'random_forest')
print(batch_pred_df)


Batch Prediction (Random Forest):
   Grid extension  Existing grid  Solar home system  minigrid  \
0            0.01            0.0               0.94      0.05   
1            0.00            0.7               0.08      0.22   

          prediction  
0  Solar home system  
1     Grid extension  


The output will show:

*   Predicted electrification strategy
*   Probabilities for each class (Grid extension, Existing grid, Solar home system, minigrid)
*   Results for both Random Forest and Logistic Regression

