<font size="+3"><strong>Farm Relief Customer Segmentation</strong></font>

In this project we are going to focus on farmer produce sellers and buyers in Nigeria(and neighbouring countries). We will examine some demographic characteristics of the group, such as age, income bracket, occupation etc. Then we will select five features, and create a clustering model to divide consumers into subgroups. Finally, we will create some visualizations to highlight the differences between these subgroups.

In this project, we're going to work with data from the [Farm Relief](https://forms.gle/nH7jb1RyWptCrzSV8) google form. The form tracks the demographic, financial and opinion information about individuals in Nigeria. The survey was conducted a month ago and it is still ongoing.

# Objective

The aim of this project is to use unsupervised learning, specifically clustering to do customer segmentation.

We will:

    Compare characteristics across subgroups using a side-by-side bar chart.
    Build a k-means clustering model.
    Conduct feature selection for clustering.
    Reduce high-dimensional data using principal component analysis (PCA).
    Design, build and deploy a Dash web application.

- Libraries

In [1]:
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

from scipy.stats.mstats import trimmed_var
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.metrics import silhouette_score
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

from dash import Input, Output, dcc, html
from jupyter_dash import JupyterDash
JupyterDash.infer_jupyter_proxy_config()

# Data Preparation

## Importing The File

In [2]:
df = pd.read_csv('FARM RELIEF-1.csv')
print("df shape:", df.shape)
df.head()

df shape: (49, 18)


Unnamed: 0,Timestamp,Age range,Gender,Occupation,Marital status,Monthly Household income,"Location e.g Lagos, Nigeria",Device type,Most preferred delivery method,Are you a farmer?,"If yes, What farming system do you have?",What farm produce do you buy or sell? (You can give a list),Would you like an online platform where you can buy or sell farm products and not feel cheated?,Do you know of any platform that provides you with the service mentioned above?,"If yes, please give a name",What features would you be looking forward to on such a digital platform?,What do you think is the daily average price you pay for the commodities you buy?,On what social media did you come across this survey?
0,2023/02/14 6:43:49 pm CET,18 - 25,Female,Education,Single,"#50,000 - #100,000","Modakeke, Nigeria",Android,,No,,Garri\nRice\nPepper\nOnion\nVegetables,Yes,No,,,,WhatsApp
1,2023/02/14 6:47:45 pm CET,50 and above,Female,Student,Married,"#100,000 - #200,000","Osun, Nigeria",Android,Home delivery,No,,"Pepper, Vegetables, Plantain, Charcoal, Yam fl...",Yes,No,,Prompt response and quality products at cheape...,"#10,000",WhatsApp
2,2023/02/14 6:56:23 pm CET,50 and above,Male,Farming,Married,"#100,000 - #200,000","Osun, Nigeria",Android,,Yes,Small Scale,Cocoa,Yes,No,,Much profit,,WhatsApp
3,2023/02/14 6:57:24 pm CET,18 - 25,Male,Student,Single,"< #50,000",Ekiti State,Android,,No,,,Yes,Maybe,,,,WhatsApp
4,2023/02/14 6:59:31 pm CET,26 - 35,Male,Freelance copywriter,Single,"#200,000 - #500,000","Ibadan, Nigeria",Android,Home delivery service.,No,,,Yes,No,,"Transparent pricing, convenience of use, and p...",Around 15k-25k on foodstuff monthly.,WhatsApp


## Data Cleaning

- change headers

In [3]:
df.head(1)

Unnamed: 0,Timestamp,Age range,Gender,Occupation,Marital status,Monthly Household income,"Location e.g Lagos, Nigeria",Device type,Most preferred delivery method,Are you a farmer?,"If yes, What farming system do you have?",What farm produce do you buy or sell? (You can give a list),Would you like an online platform where you can buy or sell farm products and not feel cheated?,Do you know of any platform that provides you with the service mentioned above?,"If yes, please give a name",What features would you be looking forward to on such a digital platform?,What do you think is the daily average price you pay for the commodities you buy?,On what social media did you come across this survey?
0,2023/02/14 6:43:49 pm CET,18 - 25,Female,Education,Single,"#50,000 - #100,000","Modakeke, Nigeria",Android,,No,,Garri\nRice\nPepper\nOnion\nVegetables,Yes,No,,,,WhatsApp


In [4]:
df.rename(columns={
    'Age range':'Age Range',
    'Marital status':'Marital Status',
    'Monthly Household income':'Monthly Household Income',
    'Location e.g Lagos, Nigeria':'Location',
    'Device type':'Device Type',
    'Most preferred delivery method':'Delivery Method',
    'Are you a farmer?':'Farmer',
    'If yes, What farming system do you have?':'Farming System',
    'What farm produce do you buy or sell? (You can give a list)':'Farm Produce',
    'Would you like an online platform where you can buy or sell farm products and not feel cheated?':'Online Platform',
    'Do you know of any platform that provides you with the service mentioned above?':'Alternative Platform Knowledge',
    'If yes, please give a name':'Platform Name',
    'What features would you be looking forward to on such a digital platform?':'Digital Platform Features',
    'What do you think is the daily average price you pay for the commodities you buy? ':'Daily Commodities Price',
    'On what social media did you come across this survey?':'Survey Accessed From'    
}, inplace=True)

- remove # from `Monthly Household Income` values

In [5]:
df['Monthly Household Income'] = (df['Monthly Household Income']
                   .str.replace('#','', regex=False)
                   #.str.replace(',', '')
                   #.astype(float)
                   )

- remove # from `Daily Commodities Price` values

In [6]:
df['Daily Commodities Price'] = (df['Daily Commodities Price']
                   .str.replace('#','', regex=False)
                   #.str.replace(',', '')
                   #.astype(float)
                   )

- replace \n with space in the `Farm Produce`

In [7]:
df['Farm Produce'] = (df['Farm Produce']
                   .str.replace('\n',' ', regex=False)
                   )

- missing values

In [9]:
df.isnull().sum()

Timestamp                         12
Age Range                          0
Gender                             0
Occupation                         7
Marital status                     9
Monthly Household Income           9
Location                           8
Device Type                        8
Delivery Method                   27
Farmer                             0
Farming System                    31
Farm Produce                      14
Online Platform                    4
Alternative Platform Knowledge     9
Platform Name                     42
Digital Platform Features         27
Daily Commodities Price           29
Survey Accessed From              11
dtype: int64

In [11]:
df.isnull().sum()/len(df)

Timestamp                         0.244898
Age Range                         0.000000
Gender                            0.000000
Occupation                        0.142857
Marital status                    0.183673
Monthly Household Income          0.183673
Location                          0.163265
Device Type                       0.163265
Delivery Method                   0.551020
Farmer                            0.000000
Farming System                    0.632653
Farm Produce                      0.285714
Online Platform                   0.081633
Alternative Platform Knowledge    0.183673
Platform Name                     0.857143
Digital Platform Features         0.551020
Daily Commodities Price           0.591837
Survey Accessed From              0.224490
dtype: float64

 **Delivery Method**, **Farming System**, **Platform Name**, **Digital Platform Features** and **Daily Commodities Price** columns have a lot of missing values.

- information

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49 entries, 0 to 48
Data columns (total 18 columns):
 #   Column                          Non-Null Count  Dtype 
---  ------                          --------------  ----- 
 0   Timestamp                       37 non-null     object
 1   Age Range                       49 non-null     object
 2   Gender                          49 non-null     object
 3   Occupation                      42 non-null     object
 4   Marital status                  40 non-null     object
 5   Monthly Household Income        40 non-null     object
 6   Location                        41 non-null     object
 7   Device Type                     41 non-null     object
 8   Delivery Method                 22 non-null     object
 9   Farmer                          49 non-null     object
 10  Farming System                  18 non-null     object
 11  Farm Produce                    35 non-null     object
 12  Online Platform                 45 non-null     obje