# Introduction


## Problem statement

Singlife has observed a concerning trend in the customer journey: potential policyholders are expressing hesitation and eventual disengagement during the insurance acquisition process. To address this, Singlife seeks to leverage its dataset. The objective is to <font size="4">**derive actionable insights from this data to enhance the customer experience**</font>. The challenge is to dissect the dataset to <font size="4">**uncover the critical touchpoints that contribute to customer drop-off and identify opportunities to streamline the application process and personalize communication**</font>. The ultimate goal is to <font size="4">**predict customer satisfaction and conversion rates, thereby bolstering Singlife's market position**</font>.


## Selected variables

<strong><h5>1. General Client Information</h5></strong>

1. `clntnum`
2. `ctrycode_desc`
3. `stat_flag`
4. `min_occ_date`
5. `cltdob_fix`
6. `cltsex_fix`
7. `cltage` (Age of client)
8. `clt_ten` (Customer tenure)

<strong><h5>2. Client Risk and Status Indicators</h5></strong>

1. `flg_substandard`
2. `flg_is_borderline_standard`
3. `flg_is_revised_term`
4. `flg_has_health_claim`
5. `flg_gi_claim`
6. `flg_is_proposal`

<strong><h5>3. Demographic and Household Information</h5></strong>

1. `is_dependent_in_at_least_1_policy`
2. `annual_income_est`

<strong><h5>4. Policy and claim history</h5></strong>

1. `tot_inforce_pols`, `tot_cancel_pols`
2. `f_ever_declined_la`

<strong><h5>5. Target Column</h5></strong>

1. `f_purchase_lh` (Indicates if customer will purchase insurance in the next 3 months)


In [None]:
# Filtering columns
columnNames = ["clntnum", "ctrycode_desc", "stat_flag", "min_occ_date", "cltdob_fix", "cltsex_fix",
               "flg_substandard", "flg_is_borderline_standard", "flg_is_revised_term", "flg_has_health_claim", "flg_gi_claim", "flg_is_proposal",
               "is_dependent_in_at_least_1_policy", "annual_income_est", "tot_inforce_pols", "tot_cancel_pols", "f_ever_declined_la",
               "f_purchase_lh"]
data = data.loc[:, columnNames]
print(data.shape)
# Filtering data to focus on Singaporeans
cilentsByCountry = data["ctrycode_desc"].value_counts()
cilentsByCountry_Percentage = cilentsByCountry*100/sum(cilentsByCountry)
pd.DataFrame(round(cilentsByCountry_Percentage, 5)
             ).head(n=5).rename(columns={'ctrycode_desc': "Percentage of clients by country"})
data = data[data["ctrycode_desc"] == "Singapore"]

(17992, 18)
