# Using Prediction Model on Potential Customer Database

In this step, the trained random forest prediction model is applied to the new dataset containing potential customer leads. The goal is to identify which homeowners are most likely to book an energy consultation. By assigning a probability to each lead, the model helps prioritize outreach efforts and therefore save the energy consultants time and energy.

In [18]:
import pandas as pd
import numpy as np
import joblib
from sklearn.preprocessing import OneHotEncoder

In [19]:
clf_loaded = joblib.load('random_forest_model.pkl') #loading the model

In [20]:
df_whole = pd.read_excel('Potential_Customers.xlsx') #loading the dataset

In [21]:
df_whole.head()
df_whole.shape

(1086, 23)

In [22]:
df_pred = df_whole.drop(columns=['name', 'email', 'phone_number'])

## One-Hot Encoding the DF

In [23]:
df_pred_new = pd.get_dummies(df_pred, columns=['gender', 'occupation_status', 'house_type', 'location', 'energy_source', 'belief_climate_change', 'financial_awareness'])

In [24]:
df_pred_new.head()

Unnamed: 0,age,household_size,income,house_age,house_size,energy_bill,knowledge_energy,energy_awareness,attitude_energy_reduction,investment_willingness,...,house_type_Detached,house_type_Multi-family House,location_Rural,location_Urban,energy_source_Non-renewable sources,energy_source_Renewable sources,belief_climate_change_No,belief_climate_change_Yes,financial_awareness_No,financial_awareness_Yes
0,20,3,91200,2019,200,141,3,5,2,2,...,True,False,True,False,True,False,False,True,False,True
1,23,4,59400,1950,200,129,4,1,1,2,...,True,False,True,False,True,False,True,False,False,True
2,44,3,40800,1952,200,141,5,2,1,4,...,True,False,True,False,False,True,False,True,False,True
3,64,1,84600,1964,100,126,2,1,4,2,...,True,False,False,True,False,True,False,True,True,False
4,69,3,1278,1909,90,133,5,2,3,3,...,False,True,True,False,True,False,True,False,False,True


In [25]:
predictions = clf_loaded.predict(df_pred_new)

In [26]:
probabilities = clf_loaded.predict_proba(df_pred_new)[:, 1]

In [27]:
df_pred_new.head()

Unnamed: 0,age,household_size,income,house_age,house_size,energy_bill,knowledge_energy,energy_awareness,attitude_energy_reduction,investment_willingness,...,house_type_Detached,house_type_Multi-family House,location_Rural,location_Urban,energy_source_Non-renewable sources,energy_source_Renewable sources,belief_climate_change_No,belief_climate_change_Yes,financial_awareness_No,financial_awareness_Yes
0,20,3,91200,2019,200,141,3,5,2,2,...,True,False,True,False,True,False,False,True,False,True
1,23,4,59400,1950,200,129,4,1,1,2,...,True,False,True,False,True,False,True,False,False,True
2,44,3,40800,1952,200,141,5,2,1,4,...,True,False,True,False,False,True,False,True,False,True
3,64,1,84600,1964,100,126,2,1,4,2,...,True,False,False,True,False,True,False,True,True,False
4,69,3,1278,1909,90,133,5,2,3,3,...,False,True,True,False,True,False,True,False,False,True


In [28]:
df_whole['prediction'] = predictions
df_whole['probability'] = probabilities

In [29]:
df_whole.head()
df_whole.to_excel('Potential_Customers_List.xlsx')

This list contains the predicted probability of each lead booking an energy consultation. You can now either define a threshold for classification or simply sort the list (e.g., from highest to lowest probability) and contact leads accordingly.

### Alternatively

Alternatively, a threshold can be applied directly in code to filter and export only the most promising leads. The choice depends on individual preferences — some may prioritize completeness (including more leads), while others may prefer precision (focusing only on the most likely cases).

In [30]:
filtered_df = df_whole[(df_whole['prediction'] == True) & (df_whole['probability'] >=0.8)]
filtered_df = df_whole[(df_whole['prediction'] == True)]
# Print the filtered DataFrame

In [31]:
filtered_df.to_excel('Target_Customers_.xlsx', index=False)