# Recomendation system using predictive modelling

- leverage the patterns in your existing data to make predictions or suggestions for new or existing entities.

- Content-based Filtering appraoch to recommendation system building

- a supervised learning approach can be used as a form of content-based filtering, where features of the business (like transaction volume, business duration, etc.) are used to predict certain outcomes or to group similar businesses. 




## Data Preparation
target variable is perfomance_index

In [41]:
import pandas as pd

encoded_data = pd.read_csv('encoded_data.csv')
encoded_data.head(2)

Unnamed: 0,Customer Name,uid,business start date,business duration (years),Normalized Performance Index,business sector_Agriculture,business sector_Fisheries,business sector_Hospitality,business sector_IT,business sector_Manufacturing,...,education level_Vocational,duration range_0 - 5 years,duration range_5 - 10 years,duration range_10 - 15 years,duration range_15 - 20 years,transaction range_0 - 2000,transaction range_2000 - 4000,transaction range_4000 - 6000,transaction range_6000 - 8000,transaction range_8000 - 11000
0,James William,cca67301-ef3c-4496-b8fd-31847b7edd88,2013-02-25,10.63,0.183499,1,0,0,0,0,...,0,0,0,1,0,1,0,0,0,0
1,James Nansukusa,7a1c5730-b8e7-4774-8746-90a317b73a82,2009-02-22,14.64,0.156887,0,0,0,0,0,...,0,0,0,1,0,1,0,0,0,0


### Drop columns not needed for ML

In [42]:
encoded_data_cleaned = encoded_data.drop(columns = ['Customer Name', 'uid', 'business start date'])
encoded_data_cleaned.head(2)

Unnamed: 0,business duration (years),Normalized Performance Index,business sector_Agriculture,business sector_Fisheries,business sector_Hospitality,business sector_IT,business sector_Manufacturing,business sector_Retail,business sector_Services,location_Bwindi,...,education level_Vocational,duration range_0 - 5 years,duration range_5 - 10 years,duration range_10 - 15 years,duration range_15 - 20 years,transaction range_0 - 2000,transaction range_2000 - 4000,transaction range_4000 - 6000,transaction range_6000 - 8000,transaction range_8000 - 11000
0,10.63,0.183499,1,0,0,0,0,0,0,0,...,0,0,0,1,0,1,0,0,0,0
1,14.64,0.156887,0,0,0,0,0,0,1,0,...,0,0,0,1,0,1,0,0,0,0


In [43]:
encoded_data_cleaned.rename(columns={'Normalized Performance Index': 'performance index'}, inplace=True)
customer_data_ml = encoded_data_cleaned.copy()
customer_data_ml.head()

Unnamed: 0,business duration (years),performance index,business sector_Agriculture,business sector_Fisheries,business sector_Hospitality,business sector_IT,business sector_Manufacturing,business sector_Retail,business sector_Services,location_Bwindi,...,education level_Vocational,duration range_0 - 5 years,duration range_5 - 10 years,duration range_10 - 15 years,duration range_15 - 20 years,transaction range_0 - 2000,transaction range_2000 - 4000,transaction range_4000 - 6000,transaction range_6000 - 8000,transaction range_8000 - 11000
0,10.63,0.183499,1,0,0,0,0,0,0,0,...,0,0,0,1,0,1,0,0,0,0
1,14.64,0.156887,0,0,0,0,0,0,1,0,...,0,0,0,1,0,1,0,0,0,0
2,14.68,0.156148,0,1,0,0,0,0,0,0,...,1,0,0,1,0,1,0,0,0,0
3,7.49,0.136193,0,0,0,0,1,0,0,0,...,0,0,1,0,0,1,0,0,0,0
4,10.58,0.135615,0,0,1,0,0,0,0,0,...,0,0,0,1,0,1,0,0,0,0


## Train ML model

In [44]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

X = customer_data_ml.drop(columns = ['performance index'], axis=1)
y = customer_data_ml['performance index']

# Splitting data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Random Forest Regressor
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

## Test the Model

In [45]:
# Predict on the test set
y_pred = model.predict(X_test)

# Calculate and print the mean squared error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

Mean Squared Error: 0.0016473168232350368


In [46]:
# Feature importance
feature_importances = model.feature_importances_
features = X.columns
importance_df = pd.DataFrame({'feature': features, 'importance': feature_importances})
importance_df = importance_df.sort_values(by='importance', ascending=False)
print(importance_df)

                           feature  importance
0        business duration (years)    0.345042
28      transaction range_0 - 2000    0.254606
29   transaction range_2000 - 4000    0.087380
5    business sector_Manufacturing    0.020411
9                 location_Entebbe    0.017480
23      education level_Vocational    0.017305
22      education level_University    0.016739
20         education level_Primary    0.016198
4               business sector_IT    0.016059
8                  location_Bwindi    0.015953
19                     gender_Male    0.015864
18                   gender_Female    0.014764
14                 location_Kisoro    0.014306
2        business sector_Fisheries    0.012648
21       education level_Secondary    0.012320
1      business sector_Agriculture    0.012090
3      business sector_Hospitality    0.011513
16                  location_Mbale    0.010330
7         business sector_Services    0.009991
11                   location_Gulu    0.009816
13           

## Predicting PI for a new business

In [47]:
# For a new business, predicting performance index
# new_business_data should be a DataFrame with the same columns as X, but just one row of data for the new business
# new_business_data = pd.DataFrame([...])  # Fill in your new business data here
# predicted_performance = model.predict(new_business_data)
# print(f"Predicted Performance Index: {predicted_performance[0]}")

# Based on feature importance, you can provide tailored recommendations.
# For example, if 'transaction volume' is the top feature, and the new business's volume is low:
# print("We recommend actions to increase your transaction volume.")
