---
# Future Value Insights (CLTV)
---

### Introduction
---
Imagine if you could see the future value of every customer who walks through your virtual door. That's the power of **Future Value Insights**. **Customer Lifetime Value** (CLV) isn't just a number; it's the key to unlocking the full potential of your customer relationships and driving your business forward.

**CLV** represents the total revenue a business can expect from a customer throughout their entire journey with you. By leveraging CLV, businesses can paint a vivid picture of their customer base, guiding savvy decisions in marketing, sales, and customer service strategies. It's a game-changer because it allows companies to assess the true profitability of acquiring and retaining customers, prioritize efforts to boost customer retention, and wisely allocate resources.

The magic lies in the details: CLV is calculated by analyzing factors like average purchase value, frequency of purchases, and customer lifespan. This metric is crucial because it shifts the focus from short-term gains to long-term growth and sustainability. It's about understanding not just the immediate transaction, but the enduring value a customer brings over time.

With Future Value Insights, you gain a crystal-clear vision of your customer relationships, empowering you to make informed decisions that maximize value and drive lasting success. Are you ready to embark on this journey of discovery and growth?

---

## 1.) Import Required Packages

####  Importing Pandas, Matplotlib, Seaborn and Warings Library.

In [2]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt 
import warnings
import os
warnings.filterwarnings('ignore')

---
## 2.) Data Collection
- Dataset Source - https://www.kaggle.com/datasets/shibumohapatra/customer-life-time-value

#### Import the CSV Data as Pandas DataFrame

In [3]:
df = pd.read_csv('../data/raw_data/Future_Value_Insights_data.csv')

#### Show Top 5 Records

In [4]:
df.head()

Unnamed: 0,id,gender,area,qualification,income,marital_status,vintage,claim_amount,num_policies,policy,type_of_policy,cltv
0,1,Male,Urban,Bachelor,5L-10L,1,5,5790,More than 1,A,Platinum,64308
1,2,Male,Rural,High School,5L-10L,0,8,5080,More than 1,A,Platinum,515400
2,3,Male,Urban,Bachelor,5L-10L,1,8,2599,More than 1,A,Platinum,64212
3,4,Female,Rural,High School,5L-10L,0,7,0,More than 1,A,Platinum,97920
4,5,Male,Urban,High School,More than 10L,1,6,3508,More than 1,A,Gold,59736


#### Shape of the dataset

In [5]:
df.shape

(89392, 12)

### 2.1 Dataset information

- **id** : Unique identifier of a customer.
- **gender** : Gender of the customer.
- **area** : Area of the customer.
- **qualification** : Highest Qualification of the customer.
- **income** : Income earned in a year (in rupees).
- **marital_status** : Marital Status of the customer {0:Single, 1: Married}.
- **vintage** : No. of years since the first policy date.
- **claim_amount** : Total Amount Claimed by the customer (in rupees).
- **num_policies** : Total no. of policies issued by the customer.
- **policy** : Active policy of the customer.
- **type_of_policy** : Type of active policy.
- **cltv** : Customer lifetime value (Target Variable).

---
## 3.) Data Checks to perform

- Check Missing values
- Check Duplicates
- Check data type
- Check the number of unique values of each column
- Check statistics of data set

### 3.1 Check Missing values

In [6]:
df.isna().sum()

id                0
gender            0
area              0
qualification     0
income            0
marital_status    0
vintage           0
claim_amount      0
num_policies      0
policy            0
type_of_policy    0
cltv              0
dtype: int64

### Great there is no missing values in the data

### 3.2 Check Duplicates

In [7]:
df.duplicated().sum()

0

#### There are no duplicates  values in the data set

### 3.3 Check data types

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89392 entries, 0 to 89391
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   id              89392 non-null  int64 
 1   gender          89392 non-null  object
 2   area            89392 non-null  object
 3   qualification   89392 non-null  object
 4   income          89392 non-null  object
 5   marital_status  89392 non-null  int64 
 6   vintage         89392 non-null  int64 
 7   claim_amount    89392 non-null  int64 
 8   num_policies    89392 non-null  object
 9   policy          89392 non-null  object
 10  type_of_policy  89392 non-null  object
 11  cltv            89392 non-null  int64 
dtypes: int64(5), object(7)
memory usage: 8.2+ MB


### 3.4 Checking the number of unique values of each column

In [9]:
df.nunique()

id                89392
gender                2
area                  2
qualification         3
income                4
marital_status        2
vintage               9
claim_amount      10889
num_policies          2
policy                3
type_of_policy        3
cltv              18796
dtype: int64

In [13]:
df.columns

Index(['id', 'gender', 'area', 'qualification', 'income', 'marital_status',
       'vintage', 'claim_amount', 'num_policies', 'policy', 'type_of_policy',
       'cltv'],
      dtype='object')

### 3.5 Check statistics of data set

In [10]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
id,89392.0,44696.5,25805.391969,1.0,22348.75,44696.5,67044.25,89392.0
marital_status,89392.0,0.575488,0.494272,0.0,0.0,1.0,1.0,1.0
vintage,89392.0,4.595669,2.290446,0.0,3.0,5.0,6.0,8.0
claim_amount,89392.0,4351.502416,3262.359775,0.0,2406.0,4089.0,6094.0,31894.0
cltv,89392.0,97952.828978,90613.814793,24828.0,52836.0,66396.0,103440.0,724068.0


In [11]:
df.describe(include='object').T

Unnamed: 0,count,unique,top,freq
gender,89392,2,Male,50497
area,89392,2,Urban,62455
qualification,89392,3,High School,46247
income,89392,4,5L-10L,52716
num_policies,89392,2,More than 1,60263
policy,89392,3,A,56644
type_of_policy,89392,3,Platinum,47796
