# Case Study

## Scenario

You are working as an analyst for an auto insurance company. The company has collected some data about its customers including their demographics, education, employment, policy details, vehicle information on which insurance policy is, and claim amounts. You will help the senior management with some business questions that will help them to better understand their customers, improve their services, and improve profitability.

## Business Objectives

- Retain customers,
- analyze relevant customer data,
- develop focused customer retention programs.

Based on the analysis, take targeted actions to increase profitable customer response, retention, and growth.

## Activities

Refer to the `Activities.md` file where you will find guidelines for some of the activities that you want to do.

## Data

The csv files is provided in the folder. The columns in the file are self-explanatory.


### Activity 1

- Aggregate data into one Data Frame using Pandas.
- Standardizing header names
- Deleting and rearranging columns – delete the column customer as it is only a unique identifier for each row of data
- Working with data types – Check the data types of all the columns and fix the incorrect ones (for ex. customer lifetime value and number of complaints )
- Filtering data and Correcting typos – Filter the data in state and gender column to standardize the texts in those columns
- Removing duplicates
- Replacing null values – Replace missing values with means of the column (for numerical columns)

## Import and look at the data

In [1]:
# Set up your working environment with all neccessary libraries for your operations 

import numpy as np
import pandas as pd

pd.options.display.max_rows = 50


In [2]:
# Import the Data and have a look at it

file1_df = pd.read_csv('Data/file1.csv')
file2_df = pd.read_csv('Data/file2.csv')
file3_df = pd.read_csv('Data/file3.csv')

In [3]:
file1_df["ST"]

0       Washington
1          Arizona
2           Nevada
3       California
4       Washington
           ...    
4003           NaN
4004           NaN
4005           NaN
4006           NaN
4007           NaN
Name: ST, Length: 4008, dtype: object

In [4]:
file1_df # Observation: - many NaN values
         #              - CLV value not correctly
                       

Unnamed: 0,Customer,ST,GENDER,Education,Customer Lifetime Value,Income,Monthly Premium Auto,Number of Open Complaints,Policy Type,Vehicle Class,Total Claim Amount
0,RB50392,Washington,,Master,,0.0,1000.0,1/0/00,Personal Auto,Four-Door Car,2.704934
1,QZ44356,Arizona,F,Bachelor,697953.59%,0.0,94.0,1/0/00,Personal Auto,Four-Door Car,1131.464935
2,AI49188,Nevada,F,Bachelor,1288743.17%,48767.0,108.0,1/0/00,Personal Auto,Two-Door Car,566.472247
3,WW63253,California,M,Bachelor,764586.18%,0.0,106.0,1/0/00,Corporate Auto,SUV,529.881344
4,GA49547,Washington,M,High School or Below,536307.65%,36357.0,68.0,1/0/00,Personal Auto,Four-Door Car,17.269323
...,...,...,...,...,...,...,...,...,...,...,...
4003,,,,,,,,,,,
4004,,,,,,,,,,,
4005,,,,,,,,,,,
4006,,,,,,,,,,,


In [5]:
file2_df # Observation: - TCA with lots of decimal places
         #              - CLV value not correctly

Unnamed: 0,Customer,ST,GENDER,Education,Customer Lifetime Value,Income,Monthly Premium Auto,Number of Open Complaints,Total Claim Amount,Policy Type,Vehicle Class
0,GS98873,Arizona,F,Bachelor,323912.47%,16061,88,1/0/00,633.600000,Personal Auto,Four-Door Car
1,CW49887,California,F,Master,462680.11%,79487,114,1/0/00,547.200000,Special Auto,SUV
2,MY31220,California,F,College,899704.02%,54230,112,1/0/00,537.600000,Personal Auto,Two-Door Car
3,UH35128,Oregon,F,College,2580706.30%,71210,214,1/1/00,1027.200000,Personal Auto,Luxury Car
4,WH52799,Arizona,F,College,380812.21%,94903,94,1/0/00,451.200000,Corporate Auto,Two-Door Car
...,...,...,...,...,...,...,...,...,...,...,...
991,HV85198,Arizona,M,Master,847141.75%,63513,70,1/0/00,185.667213,Personal Auto,Four-Door Car
992,BS91566,Arizona,F,College,543121.91%,58161,68,1/0/00,140.747286,Corporate Auto,Four-Door Car
993,IL40123,Nevada,F,College,568964.41%,83640,70,1/0/00,471.050488,Corporate Auto,Two-Door Car
994,MY32149,California,F,Master,368672.38%,0,96,1/0/00,28.460568,Personal Auto,Two-Door Car


In [6]:
file3_df # Observation: - TCA & CLV with lots of decimal places
         #              - NOC has different format

Unnamed: 0,Customer,State,Customer Lifetime Value,Education,Gender,Income,Monthly Premium Auto,Number of Open Complaints,Policy Type,Total Claim Amount,Vehicle Class
0,SA25987,Washington,3479.137523,High School or Below,M,0,104,0,Personal Auto,499.200000,Two-Door Car
1,TB86706,Arizona,2502.637401,Master,M,0,66,0,Personal Auto,3.468912,Two-Door Car
2,ZL73902,Nevada,3265.156348,Bachelor,F,25820,82,0,Personal Auto,393.600000,Four-Door Car
3,KX23516,California,4455.843406,High School or Below,F,0,121,0,Personal Auto,699.615192,SUV
4,FN77294,California,7704.958480,High School or Below,M,30366,101,2,Personal Auto,484.800000,SUV
...,...,...,...,...,...,...,...,...,...,...,...
7065,LA72316,California,23405.987980,Bachelor,M,71941,73,0,Personal Auto,198.234764,Four-Door Car
7066,PK87824,California,3096.511217,College,F,21604,79,0,Corporate Auto,379.200000,Four-Door Car
7067,TD14365,California,8163.890428,Bachelor,M,0,85,3,Corporate Auto,790.784983,Four-Door Car
7068,UP19263,California,7524.442436,College,M,21941,96,0,Personal Auto,691.200000,Four-Door Car


## Standardizing header names

In [7]:
print(file1_df.columns)

Index(['Customer', 'ST', 'GENDER', 'Education', 'Customer Lifetime Value',
       'Income', 'Monthly Premium Auto', 'Number of Open Complaints',
       'Policy Type', 'Vehicle Class', 'Total Claim Amount'],
      dtype='object')


In [8]:
# look for differences in header names 

print(file1_df.columns.sort_values())  
print(file2_df.columns.sort_values())
print(file3_df.columns.sort_values())  # 'Gender' not uppercase, 'State' instead of 'ST'

Index(['Customer', 'Customer Lifetime Value', 'Education', 'GENDER', 'Income',
       'Monthly Premium Auto', 'Number of Open Complaints', 'Policy Type',
       'ST', 'Total Claim Amount', 'Vehicle Class'],
      dtype='object')
Index(['Customer', 'Customer Lifetime Value', 'Education', 'GENDER', 'Income',
       'Monthly Premium Auto', 'Number of Open Complaints', 'Policy Type',
       'ST', 'Total Claim Amount', 'Vehicle Class'],
      dtype='object')
Index(['Customer', 'Customer Lifetime Value', 'Education', 'Gender', 'Income',
       'Monthly Premium Auto', 'Number of Open Complaints', 'Policy Type',
       'State', 'Total Claim Amount', 'Vehicle Class'],
      dtype='object')


In [9]:
file1_df["ST"]

0       Washington
1          Arizona
2           Nevada
3       California
4       Washington
           ...    
4003           NaN
4004           NaN
4005           NaN
4006           NaN
4007           NaN
Name: ST, Length: 4008, dtype: object

In [10]:
# lowercase and replace " " by "_" in columns

def lower_case_replace_space(file):         
    file.columns=[i.lower() for i in file.columns]
    # file.columns=[file.columns.str.replace(' ','_')]
    file.columns=[i.replace(" ", "_") for i in file.columns]
    return file

In [11]:
# call function to lowercase file1

lower_case_replace_space(file1_df)
file1_df.head(2)


Unnamed: 0,customer,st,gender,education,customer_lifetime_value,income,monthly_premium_auto,number_of_open_complaints,policy_type,vehicle_class,total_claim_amount
0,RB50392,Washington,,Master,,0.0,1000.0,1/0/00,Personal Auto,Four-Door Car,2.704934
1,QZ44356,Arizona,F,Bachelor,697953.59%,0.0,94.0,1/0/00,Personal Auto,Four-Door Car,1131.464935


In [12]:
# call function to lowercase file2

lower_case_replace_space(file2_df)
file2_df.head(2)

Unnamed: 0,customer,st,gender,education,customer_lifetime_value,income,monthly_premium_auto,number_of_open_complaints,total_claim_amount,policy_type,vehicle_class
0,GS98873,Arizona,F,Bachelor,323912.47%,16061,88,1/0/00,633.6,Personal Auto,Four-Door Car
1,CW49887,California,F,Master,462680.11%,79487,114,1/0/00,547.2,Special Auto,SUV


In [13]:
file1_df["st"]

0       Washington
1          Arizona
2           Nevada
3       California
4       Washington
           ...    
4003           NaN
4004           NaN
4005           NaN
4006           NaN
4007           NaN
Name: st, Length: 4008, dtype: object

In [14]:
# call function to lowercase file3

lower_case_replace_space(file3_df)
file3_df.head(2)

Unnamed: 0,customer,state,customer_lifetime_value,education,gender,income,monthly_premium_auto,number_of_open_complaints,policy_type,total_claim_amount,vehicle_class
0,SA25987,Washington,3479.137523,High School or Below,M,0,104,0,Personal Auto,499.2,Two-Door Car
1,TB86706,Arizona,2502.637401,Master,M,0,66,0,Personal Auto,3.468912,Two-Door Car


In [15]:
# sort columns and check again

print(sorted(file1_df.columns))
print(sorted(file2_df.columns))
print(sorted(file3_df.columns))

['customer', 'customer_lifetime_value', 'education', 'gender', 'income', 'monthly_premium_auto', 'number_of_open_complaints', 'policy_type', 'st', 'total_claim_amount', 'vehicle_class']
['customer', 'customer_lifetime_value', 'education', 'gender', 'income', 'monthly_premium_auto', 'number_of_open_complaints', 'policy_type', 'st', 'total_claim_amount', 'vehicle_class']
['customer', 'customer_lifetime_value', 'education', 'gender', 'income', 'monthly_premium_auto', 'number_of_open_complaints', 'policy_type', 'state', 'total_claim_amount', 'vehicle_class']


In [16]:
# Rename columns

def rename_column(file):
    file.rename(columns= {'st': 'state'}, inplace=True)
    return file

In [17]:
# Call the rename function and apply to file1

rename_column(file1_df)
file1_df.head(2) # Display outcome

Unnamed: 0,customer,state,gender,education,customer_lifetime_value,income,monthly_premium_auto,number_of_open_complaints,policy_type,vehicle_class,total_claim_amount
0,RB50392,Washington,,Master,,0.0,1000.0,1/0/00,Personal Auto,Four-Door Car,2.704934
1,QZ44356,Arizona,F,Bachelor,697953.59%,0.0,94.0,1/0/00,Personal Auto,Four-Door Car,1131.464935


In [18]:
# Call the rename function and apply to file2

rename_column(file2_df)
file2_df.head(2) # Display outcome

Unnamed: 0,customer,state,gender,education,customer_lifetime_value,income,monthly_premium_auto,number_of_open_complaints,total_claim_amount,policy_type,vehicle_class
0,GS98873,Arizona,F,Bachelor,323912.47%,16061,88,1/0/00,633.6,Personal Auto,Four-Door Car
1,CW49887,California,F,Master,462680.11%,79487,114,1/0/00,547.2,Special Auto,SUV


## Concatinating the data into one file

In [19]:
# Concatinate the data

insurance_df=pd.concat([file1_df,file2_df,file3_df], axis=0)
insurance_df

Unnamed: 0,customer,state,gender,education,customer_lifetime_value,income,monthly_premium_auto,number_of_open_complaints,policy_type,vehicle_class,total_claim_amount
0,RB50392,Washington,,Master,,0.0,1000.0,1/0/00,Personal Auto,Four-Door Car,2.704934
1,QZ44356,Arizona,F,Bachelor,697953.59%,0.0,94.0,1/0/00,Personal Auto,Four-Door Car,1131.464935
2,AI49188,Nevada,F,Bachelor,1288743.17%,48767.0,108.0,1/0/00,Personal Auto,Two-Door Car,566.472247
3,WW63253,California,M,Bachelor,764586.18%,0.0,106.0,1/0/00,Corporate Auto,SUV,529.881344
4,GA49547,Washington,M,High School or Below,536307.65%,36357.0,68.0,1/0/00,Personal Auto,Four-Door Car,17.269323
...,...,...,...,...,...,...,...,...,...,...,...
7065,LA72316,California,M,Bachelor,23405.98798,71941.0,73.0,0,Personal Auto,Four-Door Car,198.234764
7066,PK87824,California,F,College,3096.511217,21604.0,79.0,0,Corporate Auto,Four-Door Car,379.200000
7067,TD14365,California,M,Bachelor,8163.890428,0.0,85.0,3,Corporate Auto,Four-Door Car,790.784983
7068,UP19263,California,M,College,7524.442436,21941.0,96.0,0,Personal Auto,Four-Door Car,691.200000


In [20]:
insurance_df["state"]

0       Washington
1          Arizona
2           Nevada
3       California
4       Washington
           ...    
7065    California
7066    California
7067    California
7068    California
7069    California
Name: state, Length: 12074, dtype: object

In [21]:
# Check for pandas type 

for column in insurance_df.columns:
    print(type(insurance_df[column]))

<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>


In [22]:
# Check, if no rows or columns got missing on the way 

print(file1_df.shape)
print(file2_df.shape)
print(file3_df.shape)

(4008, 11)
(996, 11)
(7070, 11)


## Dropping obsolete columns

In [23]:
# Define a function to drop obsolete columns

def drop_columns(file) :
    file.drop(columns=["customer"], inplace=True)
    return file

In [24]:
# Call function to drop column "customer"
drop_columns(insurance_df)

Unnamed: 0,state,gender,education,customer_lifetime_value,income,monthly_premium_auto,number_of_open_complaints,policy_type,vehicle_class,total_claim_amount
0,Washington,,Master,,0.0,1000.0,1/0/00,Personal Auto,Four-Door Car,2.704934
1,Arizona,F,Bachelor,697953.59%,0.0,94.0,1/0/00,Personal Auto,Four-Door Car,1131.464935
2,Nevada,F,Bachelor,1288743.17%,48767.0,108.0,1/0/00,Personal Auto,Two-Door Car,566.472247
3,California,M,Bachelor,764586.18%,0.0,106.0,1/0/00,Corporate Auto,SUV,529.881344
4,Washington,M,High School or Below,536307.65%,36357.0,68.0,1/0/00,Personal Auto,Four-Door Car,17.269323
...,...,...,...,...,...,...,...,...,...,...
7065,California,M,Bachelor,23405.98798,71941.0,73.0,0,Personal Auto,Four-Door Car,198.234764
7066,California,F,College,3096.511217,21604.0,79.0,0,Corporate Auto,Four-Door Car,379.200000
7067,California,M,Bachelor,8163.890428,0.0,85.0,3,Corporate Auto,Four-Door Car,790.784983
7068,California,M,College,7524.442436,21941.0,96.0,0,Personal Auto,Four-Door Car,691.200000


In [25]:
# Drop rows with more than 8 NaN values

insurance_df = insurance_df[insurance_df.isnull().sum(axis=1) < 8]
insurance_df

# could also be performaded with: insurance_df.dropna(axis=0, how= 'all', inplace=True)

Unnamed: 0,state,gender,education,customer_lifetime_value,income,monthly_premium_auto,number_of_open_complaints,policy_type,vehicle_class,total_claim_amount
0,Washington,,Master,,0.0,1000.0,1/0/00,Personal Auto,Four-Door Car,2.704934
1,Arizona,F,Bachelor,697953.59%,0.0,94.0,1/0/00,Personal Auto,Four-Door Car,1131.464935
2,Nevada,F,Bachelor,1288743.17%,48767.0,108.0,1/0/00,Personal Auto,Two-Door Car,566.472247
3,California,M,Bachelor,764586.18%,0.0,106.0,1/0/00,Corporate Auto,SUV,529.881344
4,Washington,M,High School or Below,536307.65%,36357.0,68.0,1/0/00,Personal Auto,Four-Door Car,17.269323
...,...,...,...,...,...,...,...,...,...,...
7065,California,M,Bachelor,23405.98798,71941.0,73.0,0,Personal Auto,Four-Door Car,198.234764
7066,California,F,College,3096.511217,21604.0,79.0,0,Corporate Auto,Four-Door Car,379.200000
7067,California,M,Bachelor,8163.890428,0.0,85.0,3,Corporate Auto,Four-Door Car,790.784983
7068,California,M,College,7524.442436,21941.0,96.0,0,Personal Auto,Four-Door Car,691.200000


In [26]:
# Check for further null-values 

insurance_df.info() 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9137 entries, 0 to 7069
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   state                      9137 non-null   object 
 1   gender                     9015 non-null   object 
 2   education                  9137 non-null   object 
 3   customer_lifetime_value    9130 non-null   object 
 4   income                     9137 non-null   float64
 5   monthly_premium_auto       9137 non-null   float64
 6   number_of_open_complaints  9137 non-null   object 
 7   policy_type                9137 non-null   object 
 8   vehicle_class              9137 non-null   object 
 9   total_claim_amount         9137 non-null   float64
dtypes: float64(3), object(7)
memory usage: 785.2+ KB


## Rearrange columns 

In [27]:
insurance_df.head(2)

Unnamed: 0,state,gender,education,customer_lifetime_value,income,monthly_premium_auto,number_of_open_complaints,policy_type,vehicle_class,total_claim_amount
0,Washington,,Master,,0.0,1000.0,1/0/00,Personal Auto,Four-Door Car,2.704934
1,Arizona,F,Bachelor,697953.59%,0.0,94.0,1/0/00,Personal Auto,Four-Door Car,1131.464935


In [28]:
insurance_df = insurance_df[['state', 'gender', 'education', 'income', 'policy_type', 'vehicle_class', 'monthly_premium_auto', 'total_claim_amount', 'number_of_open_complaints', 'customer_lifetime_value' ]]

In [29]:
insurance_df.head(2)

Unnamed: 0,state,gender,education,income,policy_type,vehicle_class,monthly_premium_auto,total_claim_amount,number_of_open_complaints,customer_lifetime_value
0,Washington,,Master,0.0,Personal Auto,Four-Door Car,1000.0,2.704934,1/0/00,
1,Arizona,F,Bachelor,0.0,Personal Auto,Four-Door Car,94.0,1131.464935,1/0/00,697953.59%


## Investigating and cleaning each column

In [30]:
# Start with the first column which in this case is 'state'

insurance_df['state'].value_counts()

# Observation: 
# duplicates 'Cali':'California', 'AZ':'Arizona', 'WA':'Washington'
# wrong data type            

California    3032
Oregon        2601
Arizona       1630
Nevada         882
Washington     768
Cali           120
AZ              74
WA              30
Name: state, dtype: int64

In [31]:
insurance_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9137 entries, 0 to 7069
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   state                      9137 non-null   object 
 1   gender                     9015 non-null   object 
 2   education                  9137 non-null   object 
 3   income                     9137 non-null   float64
 4   policy_type                9137 non-null   object 
 5   vehicle_class              9137 non-null   object 
 6   monthly_premium_auto       9137 non-null   float64
 7   total_claim_amount         9137 non-null   float64
 8   number_of_open_complaints  9137 non-null   object 
 9   customer_lifetime_value    9130 non-null   object 
dtypes: float64(3), object(7)
memory usage: 785.2+ KB


In [32]:
insurance_df['state']

0       Washington
1          Arizona
2           Nevada
3       California
4       Washington
           ...    
7065    California
7066    California
7067    California
7068    California
7069    California
Name: state, Length: 9137, dtype: object

In [33]:
def clean_state(x):
    if x in ['WA']:
        return 'Washington'
    elif x in ['AZ']:
        return 'Arizona'
    elif x in ['Cali']:
        return 'California'
    else:
        return x

In [34]:
insurance_df['state'] = list(map(clean_state, insurance_df['state']))


In [35]:
insurance_df['state']

0       Washington
1          Arizona
2           Nevada
3       California
4       Washington
           ...    
7065    California
7066    California
7067    California
7068    California
7069    California
Name: state, Length: 9137, dtype: object

In [36]:
list(map(clean_state, insurance_df['state']))

['Washington',
 'Arizona',
 'Nevada',
 'California',
 'Washington',
 'Oregon',
 'Oregon',
 'Arizona',
 'Oregon',
 'Oregon',
 'California',
 'California',
 'California',
 'Arizona',
 'California',
 'Oregon',
 'Nevada',
 'California',
 'Oregon',
 'California',
 'Oregon',
 'Washington',
 'Arizona',
 'Nevada',
 'California',
 'Oregon',
 'California',
 'Washington',
 'Arizona',
 'Oregon',
 'Arizona',
 'Nevada',
 'California',
 'Washington',
 'Oregon',
 'Arizona',
 'California',
 'Oregon',
 'Oregon',
 'Arizona',
 'Nevada',
 'Oregon',
 'California',
 'Arizona',
 'Washington',
 'Oregon',
 'Arizona',
 'Oregon',
 'California',
 'Arizona',
 'Oregon',
 'California',
 'Nevada',
 'Washington',
 'California',
 'Arizona',
 'California',
 'Arizona',
 'Oregon',
 'Oregon',
 'Arizona',
 'California',
 'Oregon',
 'California',
 'Arizona',
 'Washington',
 'Oregon',
 'Arizona',
 'Oregon',
 'California',
 'Oregon',
 'Oregon',
 'Oregon',
 'Oregon',
 'Oregon',
 'Oregon',
 'Oregon',
 'Nevada',
 'Washington',
 'C

In [37]:
type(insurance_df[["state"]])

pandas.core.frame.DataFrame

In [38]:
insurance_df['state'].value_counts()

California    3152
Oregon        2601
Arizona       1704
Nevada         882
Washington     798
Name: state, dtype: int64

In [39]:
insurance_df['gender'].value_counts()

F         4560
M         4368
Male        40
female      30
Femal       17
Name: gender, dtype: int64

In [40]:
# ERROR

def gen_gender(gender):
    if type(gender) != str:
        return 'u'
    if 'f' in gender.lower():
        return 'female'
    else:
        return 'male'   

In [41]:
# ERROR

def generalize_gender(gender):
    if gender in ['M', 'MALE']:
        return 'Male'
    elif gender == "F":
        return 'Female'
    else:
        return 'U'

In [42]:
insurance_df["gender"].value_counts()

F         4560
M         4368
Male        40
female      30
Femal       17
Name: gender, dtype: int64

In [43]:
list(map(generalize_gender, insurance_df['gender']))

['U',
 'Female',
 'Female',
 'Male',
 'Male',
 'Female',
 'Female',
 'Male',
 'Male',
 'Female',
 'Male',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'Male',
 'U',
 'U',
 'U',
 'U',
 'U',
 'Male',
 'U',
 'U',
 'U',
 'U',
 'U',
 'Female',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'Male',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'Female',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'Female',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'Female',
 'U',
 'U',
 'U',
 'U',
 'U',
 'Female',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'Male',
 'U',
 'U',
 'Female',
 'U',
 'U',
 'U',
 'U',
 'U',
 'Female',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'Female',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'U',
 'Female',
 'U',
 'Female',
 'Male',
 'Male',
 'Female',
 'Female',
 'Female',
 'Male'

In [44]:
# ERROR

insurance_df['gender2'] = list(map(generalize_gender, insurance_df['gender']))

In [45]:
insurance_df['customer_lifetime_value'].str.rstrip('%').astype(float) / 100

0              NaN
1        6979.5359
2       12887.4317
3        7645.8618
4        5363.0765
           ...    
7065           NaN
7066           NaN
7067           NaN
7068           NaN
7069           NaN
Name: customer_lifetime_value, Length: 9137, dtype: float64