# MODELLING OVERVIEW : Investment Type Recommender System

## Objectives

- **Analysis-Based**  
  Understand investment behaviors among Kenyan users and segment them based on patterns.


- **Modeling-Based**  
  Build and evaluate recommender models, including:
  - Content-based filtering
  - Hybrid approaches (clustering + classification)




In [54]:

# import libraries
import pandas as pd
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

In [55]:
# load data

file_path = "C:/Users/hp/Documents/Group4_Capstone_Final_Project/final_refined.csv"

invest_df = pd.read_csv(file_path)
invest_df.head()

Unnamed: 0,householdid,county,area_type,gender,age_of_respondent,no_of_household_mebers,livelihoodcat,Quintiles,Education,Marital,...,insurance_including_NHIF_use,All_Insurance_excluding_NHIF_use,PWD,Latitude,Longitude,has_account,has_savings,has_credit,has_mobile,receives_remittance
0,107141431,garissa,urban,male,29,5,dependent,fourth,tertiary,married/living with partner,...,never used,never used,without disability,-0.435423,39.636586,0,0,0,0,0
1,10712933,garissa,urban,male,60,11,other,second,primary,married/living with partner,...,never used,never used,without disability,0.058794,40.305006,0,0,0,0,0
2,140173183,busia,urban,female,35,2,casual worker,fourth,primary,divorced/separated,...,never used,never used,without disability,0.636836,34.27739,0,0,0,0,0
3,122137153,kiambu,urban,male,24,1,casual worker,middle,secondary,single/never married,...,never used,never used,without disability,-1.251917,36.719076,0,0,0,0,0
4,121193116,murang'a,urban,female,20,1,dependent,highest,secondary,single/never married,...,never used,never used,without disability,-0.79582,37.131085,0,0,0,0,0


 # **Investment Modelling pipeline**
 Emulating a object oriented approach with our **class** `InvestmentPipeline`.We are defining the structure of our pipeline before calling 



In [56]:
class InvestmentPipeline:
    def __init__(self):
        # Initialize models, scalers, configs, etc.
        pass

    def preprocess_data(self, df):
        """
        General preprocessing: handle missing values, encode, scale, etc.
        """
        pass

    def prepare_transactions(self, df):
        """
        Converts binary features (0/1) into boolean format for association rule mining.
        Assumes input features are already numeric and binary.
        """
        df_bool = df.copy()

        # Identify binary columns (only 0 and 1 values)
        binary_cols = [col for col in df_bool.columns if set(df_bool[col].dropna().unique()) <= {0, 1}]
        df_bool = df_bool[binary_cols]

        # Convert to boolean
        df_bool = df_bool.astype(bool)

        return df_bool

    def mine_association_rules(self, df_bool, min_support=0.1, min_confidence=0.5):
        """
        Mine frequent itemsets and extract association rules.
        """
        pass

    def cluster_investors(self, df, n_clusters=3):
        """
        Apply clustering (e.g., K-Means) to segment investors.
        """
        pass

    def recommend_investments(self, investor_profile):
        """
        Recommend investments based on rules or cluster profiles.
        """
        pass

    def explain_model(self, model, X):
        """
        Use SHAP or PCA to visualize and interpret model decisions.
        """
        pass

## **Feature Separation**
###  Preprocesses the dataset:
        - Imputes and scales numeric features
        - Imputes and encodes categorical features
        - Returns a transformed DataFrame

In [57]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline as SKPipeline

class InvestmentPipeline:
    def __init__(self):
        # Define preprocessing for numeric and categorical features
        self.numeric_transformer = SKPipeline(steps=[
            ('imputer', SimpleImputer(strategy='mean')),
            ('scaler', StandardScaler())
        ])

        self.categorical_transformer = SKPipeline(steps=[
            ('imputer', SimpleImputer(strategy='most_frequent')),
            ('encoder', OneHotEncoder(handle_unknown='ignore'))
        ])

        self.preprocessor = None  # Will be set after fitting

    def preprocess_data(self, invest_df):
        
        invest_df_clean = invest_df.copy()

        # Step 1: Identify column types
        numeric_cols = invest_df_clean.select_dtypes(include=['int64', 'float64']).columns.tolist()
        categorical_cols = invest_df_clean.select_dtypes(include=['object', 'category']).columns.tolist()

        # Step 2: Create column transformer
        self.preprocessor = ColumnTransformer(transformers=[
            ('num', self.numeric_transformer, numeric_cols),
            ('cat', self.categorical_transformer, categorical_cols)
        ])

        # Step 3: Fit and transform
        invest_df_transformed = self.preprocessor.fit_transform(invest_df_clean)

        # Step 4: Get feature names
        num_features = numeric_cols
        cat_features = self.preprocessor.named_transformers_['cat']['encoder'].get_feature_names_out(categorical_cols)
        all_features = list(num_features) + list(cat_features)

        # Step 5: Return as DataFrame
        return pd.DataFrame(invest_df_transformed, columns=all_features)

In [62]:
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import pandas as pd

class InvestmentPipeline:
    def __init__(self):
        # Initialize transformers
        self.numeric_transformer = Pipeline(steps=[
            ('imputer', SimpleImputer(strategy='mean')),
            ('scaler', StandardScaler())
        ])

        self.categorical_transformer = Pipeline(steps=[
            ('imputer', SimpleImputer(strategy='most_frequent')),
            ('encoder', OneHotEncoder(handle_unknown='ignore'))
        ])

        self.preprocessor = None  # Will be defined after fitting

    def preprocess_data(self, invest_df):
        """
        Preprocess the raw investment data:
        - Impute missing values
        - Encode categorical features
        - Scale numeric features
        """
        invest_df_clean = invest_df.copy()

        # Separate numeric and categorical columns
        numeric_cols = invest_df_clean.select_dtypes(include=['int64', 'float64']).columns.tolist()
        categorical_cols = invest_df_clean.select_dtypes(include=['object', 'category']).columns.tolist()

        # Combine transformers
        self.preprocessor = ColumnTransformer(transformers=[
            ('num', self.numeric_transformer, numeric_cols),
            ('cat', self.categorical_transformer, categorical_cols)
        ])

        # Apply transformations
        invest_df_processed = self.preprocessor.fit_transform(invest_df_clean)

        # Get feature names
        num_features = numeric_cols
        cat_features = self.preprocessor.named_transformers_['cat']['encoder'].get_feature_names_out(categorical_cols)
        all_features = list(num_features) + list(cat_features)

        # Return as DataFrame
        return pd.DataFrame(invest_df_processed, columns=all_features)

    def prepare_transactions(self, invest_df):
        """
        Converts binary features (0/1) into boolean format for association rule mining.
        Assumes input features are already numeric and binary.
        """
        invest_df_bool = invest_df.copy()

        # Identify binary columns (only 0 and 1 values)
        binary_cols = [col for col in invest_df_bool.columns if set(invest_df_bool[col].dropna().unique()) <= {0, 1}]
        invest_df_bool = invest_df_bool[binary_cols]

        # Convert to boolean
        invest_df_bool = invest_df_bool.astype(bool)

        return invest_df_bool

    def mine_association_rules(self, invest_df_bool, min_support=0.1, min_confidence=0.5):
        """
        Mine frequent itemsets and extract association rules.
        """
        pass

    def cluster_investors(self, invest_df, n_clusters=3):
        """
        Apply clustering (e.g., K-Means) to segment investors.
        """
        pass

    def recommend_investments(self, investor_profile):
        """
        Recommend investments based on rules or cluster profiles.
        """
        pass

    def explain_model(self, model, X):
        """
        Use SHAP or PCA to visualize and interpret model decisions.
        """
        pass

In [63]:
pipeline = InvestmentPipeline()

# Step 1: Preprocess your raw investment data
invest_df_scaled = pipeline.preprocess_data(invest_df)

# Step 2: Inspect the result
print(invest_df_scaled.head())

   householdid  age_of_respondent  no_of_household_mebers  CalcExpenditure  \
0    -0.245766          -0.596843                0.312049         3.293907   
1    -0.559097           1.204477                2.700016        -0.021234   
2    -0.138433          -0.248200               -0.881934         0.969329   
3    -0.197039          -0.887378               -1.279928        -0.484404   
4    -0.200107          -1.119806               -1.279928         0.222393   

   total_monthly_expenditure  no_respodent_per_hh  hhWeight  \
0                   4.096836            -0.591811 -0.308792   
1                   0.425916             1.207526 -0.914595   
2                   0.066022            -0.243553 -0.456369   
3                  -0.401840            -0.882027  2.757352   
4                  -0.509808            -1.114200 -0.348454   

   Informal_group_membership  Above16_Total  Above16  ...  \
0                  -0.160662      -0.096354      0.0  ...   
1                  -0.160662  

## **Associate Rule Mining Method to the Class**
###  In our existing pipeline, using `mlxtend` for rule extraction and `networkx` and `matplotlib` for visualizations
### install Required libraries.

In [None]:
pip install mlxtend networkx matplotlib seaborn

  and should_run_async(code)





## Update the Pipeline with our Class **prepare_transactions()** for Rule Mining.
### Preprocessing our data into a transactional format suitable for `APriori`.Converting  binary financial features into boolean format for rule mining.

In [None]:
def prepare_transactions(self, invest_df):
  
    invest_df_bool = invest_df.copy()

    # Only keep binary columns (0/1)
    binary_cols = [col for col in invest_df_bool.columns if set(invest_df_bool[col].unique()) <= {0, 1}]
    invest_df_bool = invest_df_bool[binary_cols]

    # Convert to boolean
    invest_df_bool = invest_df_bool.astype(bool)

    return invest_df_bool

In [None]:
invest_df_bool = pipeline.prepare_transactions(invest_df_scaled)
print(invest_df_bool.head())
print(invest_df_bool.dtypes)

AttributeError: 'InvestmentPipeline' object has no attribute 'prepare_transactions'

## **Extract Rules with Apriori**
### Add this method to your class,Mines frequent itemsets and association rules.
    


## **Visualize Rules as a Graph**
### This helps stakeholders see how financial behaivours relate 