# Lab | Customer Analysis Round 2

For this lab, we will be using the `marketing_customer_analysis.csv` file that you can find in the `files_for_lab` folder. Check out the `files_for_lab/about.md` to get more information if you are using the Online Excel.

**Note**: For the next labs we will be using the same data file. Please save the code, so that you can re-use it later in the labs following this lab.

### Dealing with the data

1. Show the dataframe shape.
2. Standardize header names.
3. Which columns are numerical?
4. Which columns are categorical?
5. Check and deal with `NaN` values.
6. Datetime format - Extract the months from the dataset and store in a separate column. Then filter the data to show only the information for the first quarter , ie. January, February and March. _Hint_: If data from March does not exist, consider only January and February.
7. BONUS: Put all the previously mentioned data transformations into a function.


In [101]:
import pandas as pd
import numpy as np


In [102]:
#IMPORT AND CHECK FILE

df = pd.read_csv("files_for_lab/csv_files/marketing_customer_analysis.csv")
print(type(df))
display(df.head())
display(df.tail())


<class 'pandas.core.frame.DataFrame'>


Unnamed: 0.1,Unnamed: 0,Customer,State,Customer Lifetime Value,Response,Coverage,Education,Effective To Date,EmploymentStatus,Gender,...,Number of Open Complaints,Number of Policies,Policy Type,Policy,Renew Offer Type,Sales Channel,Total Claim Amount,Vehicle Class,Vehicle Size,Vehicle Type
0,0,DK49336,Arizona,4809.21696,No,Basic,College,2/18/11,Employed,M,...,0.0,9,Corporate Auto,Corporate L3,Offer3,Agent,292.8,Four-Door Car,Medsize,
1,1,KX64629,California,2228.525238,No,Basic,College,1/18/11,Unemployed,F,...,0.0,1,Personal Auto,Personal L3,Offer4,Call Center,744.924331,Four-Door Car,Medsize,
2,2,LZ68649,Washington,14947.9173,No,Basic,Bachelor,2/10/11,Employed,M,...,0.0,2,Personal Auto,Personal L3,Offer3,Call Center,480.0,SUV,Medsize,A
3,3,XL78013,Oregon,22332.43946,Yes,Extended,College,1/11/11,Employed,M,...,0.0,2,Corporate Auto,Corporate L3,Offer2,Branch,484.013411,Four-Door Car,Medsize,A
4,4,QA50777,Oregon,9025.067525,No,Premium,Bachelor,1/17/11,Medical Leave,F,...,,7,Personal Auto,Personal L2,Offer1,Branch,707.925645,Four-Door Car,Medsize,


Unnamed: 0.1,Unnamed: 0,Customer,State,Customer Lifetime Value,Response,Coverage,Education,Effective To Date,EmploymentStatus,Gender,...,Number of Open Complaints,Number of Policies,Policy Type,Policy,Renew Offer Type,Sales Channel,Total Claim Amount,Vehicle Class,Vehicle Size,Vehicle Type
10905,10905,FE99816,Nevada,15563.36944,No,Premium,Bachelor,1/19/11,Unemployed,F,...,,7,Personal Auto,Personal L1,Offer3,Web,1214.4,Luxury Car,Medsize,A
10906,10906,KX53892,Oregon,5259.444853,No,Basic,College,1/6/11,Employed,F,...,0.0,6,Personal Auto,Personal L3,Offer2,Branch,273.018929,Four-Door Car,Medsize,A
10907,10907,TL39050,Arizona,23893.3041,No,Extended,Bachelor,2/6/11,Employed,F,...,0.0,2,Corporate Auto,Corporate L3,Offer1,Web,381.306996,Luxury SUV,Medsize,
10908,10908,WA60547,California,11971.97765,No,Premium,College,2/13/11,Employed,F,...,4.0,6,Personal Auto,Personal L1,Offer1,Branch,618.288849,SUV,Medsize,A
10909,10909,IV32877,,6857.519928,,Basic,Bachelor,1/8/11,Unemployed,M,...,0.0,3,Personal Auto,Personal L1,Offer4,Web,1021.719397,SUV,Medsize,


1. Show the dataframe shape.

In [103]:
print(df.shape)


(10910, 26)


2. Standardize header names.

In [104]:
df.columns = df.columns.astype("str")


In [105]:
df.columns


Index(['Unnamed: 0', 'Customer', 'State', 'Customer Lifetime Value',
       'Response', 'Coverage', 'Education', 'Effective To Date',
       'EmploymentStatus', 'Gender', 'Income', 'Location Code',
       'Marital Status', 'Monthly Premium Auto', 'Months Since Last Claim',
       'Months Since Policy Inception', 'Number of Open Complaints',
       'Number of Policies', 'Policy Type', 'Policy', 'Renew Offer Type',
       'Sales Channel', 'Total Claim Amount', 'Vehicle Class', 'Vehicle Size',
       'Vehicle Type'],
      dtype='object')

In [106]:
df.columns = df.columns.str.lower()
df.columns = df.columns.map(lambda x : x.replace(" ", "_")) 
del df['unnamed:_0']
df.head()


Unnamed: 0,customer,state,customer_lifetime_value,response,coverage,education,effective_to_date,employmentstatus,gender,income,...,number_of_open_complaints,number_of_policies,policy_type,policy,renew_offer_type,sales_channel,total_claim_amount,vehicle_class,vehicle_size,vehicle_type
0,DK49336,Arizona,4809.21696,No,Basic,College,2/18/11,Employed,M,48029,...,0.0,9,Corporate Auto,Corporate L3,Offer3,Agent,292.8,Four-Door Car,Medsize,
1,KX64629,California,2228.525238,No,Basic,College,1/18/11,Unemployed,F,0,...,0.0,1,Personal Auto,Personal L3,Offer4,Call Center,744.924331,Four-Door Car,Medsize,
2,LZ68649,Washington,14947.9173,No,Basic,Bachelor,2/10/11,Employed,M,22139,...,0.0,2,Personal Auto,Personal L3,Offer3,Call Center,480.0,SUV,Medsize,A
3,XL78013,Oregon,22332.43946,Yes,Extended,College,1/11/11,Employed,M,49078,...,0.0,2,Corporate Auto,Corporate L3,Offer2,Branch,484.013411,Four-Door Car,Medsize,A
4,QA50777,Oregon,9025.067525,No,Premium,Bachelor,1/17/11,Medical Leave,F,23675,...,,7,Personal Auto,Personal L2,Offer1,Branch,707.925645,Four-Door Car,Medsize,


In [139]:
#There's also a column name that needs to be specifially changed, adding a space. This is 'employmentstatus', which should be "employment_status"

df = df.rename(columns={"employmentstatus":"employment_status"})
df.head()
                                                      
                            

Unnamed: 0,customer,state,customer_lifetime_value,response,coverage,education,effective_to_date,employment_status,gender,income,...,months_since_policy_inception,number_of_open_complaints,number_of_policies,policy_type,policy,renew_offer_type,sales_channel,total_claim_amount,vehicle_class,vehicle_size
0,DK49336,Arizona,4809.21696,No,Basic,College,2/18/11,Employed,M,48029,...,52,0.0,9,Corporate Auto,Corporate L3,Offer3,Agent,292.8,Four-Door Car,Medsize
1,KX64629,California,2228.525238,No,Basic,College,1/18/11,Unemployed,F,0,...,26,0.0,1,Personal Auto,Personal L3,Offer4,Call Center,744.924331,Four-Door Car,Medsize
2,LZ68649,Washington,14947.9173,No,Basic,Bachelor,2/10/11,Employed,M,22139,...,31,0.0,2,Personal Auto,Personal L3,Offer3,Call Center,480.0,SUV,Medsize
3,XL78013,Oregon,22332.43946,Yes,Extended,College,1/11/11,Employed,M,49078,...,3,0.0,2,Corporate Auto,Corporate L3,Offer2,Branch,484.013411,Four-Door Car,Medsize
4,QA50777,Oregon,9025.067525,No,Premium,Bachelor,1/17/11,Medical Leave,F,23675,...,31,0.384256,7,Personal Auto,Personal L2,Offer1,Branch,707.925645,Four-Door Car,Medsize


3. Which columns are numerical?


In [107]:
df.dtypes
# "customer_lifetime_value", "income", "montly_premium_auto", "months_since_last_claim", "months_since_policy_inception", "number_of_open_complaints", "number_of_policies" and "total_claim_amount" 

customer                          object
state                             object
customer_lifetime_value          float64
response                          object
coverage                          object
education                         object
effective_to_date                 object
employmentstatus                  object
gender                            object
income                             int64
location_code                     object
marital_status                    object
monthly_premium_auto               int64
months_since_last_claim          float64
months_since_policy_inception      int64
number_of_open_complaints        float64
number_of_policies                 int64
policy_type                       object
policy                            object
renew_offer_type                  object
sales_channel                     object
total_claim_amount               float64
vehicle_class                     object
vehicle_size                      object
vehicle_type    

4. Which columns are categorical?

In [108]:
# "customer", "state", "response", "coverage", "education", "effective_to_date", "employmentstatus", "gender", "location_code", "marital_status", "policy_type", "policy", "renew_offer_type", "sales_channel", "vehicle_class", "vehicle_size", "vehicle_type"      

5. Check and deal with `NaN` values.

In [109]:
df.isna().sum()


customer                            0
state                             631
customer_lifetime_value             0
response                          631
coverage                            0
education                           0
effective_to_date                   0
employmentstatus                    0
gender                              0
income                              0
location_code                       0
marital_status                      0
monthly_premium_auto                0
months_since_last_claim           633
months_since_policy_inception       0
number_of_open_complaints         633
number_of_policies                  0
policy_type                         0
policy                              0
renew_offer_type                    0
sales_channel                       0
total_claim_amount                  0
vehicle_class                     622
vehicle_size                      622
vehicle_type                     5482
dtype: int64

In [110]:
round(df.isna().sum()/len(df),4)*100  # shows the percentage of null values in a column
nulls_df = pd.DataFrame(round(df.isna().sum()/len(df),4)*100)

nulls_df = nulls_df.reset_index()

nulls_df.columns = ['header_name', 'percent_nulls']
nulls_df


Unnamed: 0,header_name,percent_nulls
0,customer,0.0
1,state,5.78
2,customer_lifetime_value,0.0
3,response,5.78
4,coverage,0.0
5,education,0.0
6,effective_to_date,0.0
7,employmentstatus,0.0
8,gender,0.0
9,income,0.0


In [111]:
# Replacing null values for categorical variables
display(df['state'].unique())
display(df['state'].value_counts())

df['state'].value_counts(dropna=False) # Use dropna=False to also display the amount of NA's


array(['Arizona', 'California', 'Washington', 'Oregon', nan, 'Nevada'],
      dtype=object)

California    3552
Oregon        2909
Arizona       1937
Nevada         993
Washington     888
Name: state, dtype: int64

California    3552
Oregon        2909
Arizona       1937
Nevada         993
Washington     888
NaN            631
Name: state, dtype: int64

In [112]:
#Here I have decided to create a new cathegory called 'ther_states' for all those who have not answered

df['state'] = df['state'].fillna("other_states")
#df.column1 = df.column1.fillna('')


In [113]:
df.isna().sum()


customer                            0
state                               0
customer_lifetime_value             0
response                          631
coverage                            0
education                           0
effective_to_date                   0
employmentstatus                    0
gender                              0
income                              0
location_code                       0
marital_status                      0
monthly_premium_auto                0
months_since_last_claim           633
months_since_policy_inception       0
number_of_open_complaints         633
number_of_policies                  0
policy_type                         0
policy                              0
renew_offer_type                    0
sales_channel                       0
total_claim_amount                  0
vehicle_class                     622
vehicle_size                      622
vehicle_type                     5482
dtype: int64

In [114]:
# Replacing null values for categorical variables
display(df['state'].unique())
display(df['state'].value_counts())

df['state'].value_counts(dropna=False) # Use dropna=False to also display the amount of NA's


array(['Arizona', 'California', 'Washington', 'Oregon', 'other_states',
       'Nevada'], dtype=object)

California      3552
Oregon          2909
Arizona         1937
Nevada           993
Washington       888
other_states     631
Name: state, dtype: int64

California      3552
Oregon          2909
Arizona         1937
Nevada           993
Washington       888
other_states     631
Name: state, dtype: int64

In [115]:
#---------


In [116]:
# Replacing null values for the variable "response" which is a categorical variable that can onlye adopt 2 different positions.
display(df['response'].unique())
display(df['response'].value_counts())

df['response'].value_counts(dropna=False) # Use dropna=False to also display the amount of NA's

array(['No', 'Yes', nan], dtype=object)

No     8813
Yes    1466
Name: response, dtype: int64

No     8813
Yes    1466
NaN     631
Name: response, dtype: int64

In [117]:
# 2 Check the column state for null values. Replace those null values with the state that is represented largest number of times in that column
df['response'].value_counts(dropna=False)
len(df[df['response'].isna()==True])  # number of missing values
df['response'] = df['response'].fillna('No') #I refill it with the value that is most repited as I think it makes sense in this particular case.


In [118]:
df.isna().sum()


customer                            0
state                               0
customer_lifetime_value             0
response                            0
coverage                            0
education                           0
effective_to_date                   0
employmentstatus                    0
gender                              0
income                              0
location_code                       0
marital_status                      0
monthly_premium_auto                0
months_since_last_claim           633
months_since_policy_inception       0
number_of_open_complaints         633
number_of_policies                  0
policy_type                         0
policy                              0
renew_offer_type                    0
sales_channel                       0
total_claim_amount                  0
vehicle_class                     622
vehicle_size                      622
vehicle_type                     5482
dtype: int64

In [119]:
#---------

In [120]:
# Replacing null values for the variable "months_since_last_claim" which is a numerical variable
display(df['months_since_last_claim'].unique())
display(df['months_since_last_claim'].value_counts())

df['months_since_last_claim'].value_counts(dropna=False) # Use dropna=False to also display the amount of NA's


array([ 7.,  3., 34., 10., nan,  2.,  8., 35., 33., 19., 13.,  5., 24.,
       25.,  6., 20., 26., 14.,  9., 29., 11.,  1., 18., 16., 30., 12.,
        4., 22., 31., 21., 28., 17., 15., 32., 23., 27.,  0.])

3.0     426
6.0     394
1.0     386
7.0     378
4.0     373
2.0     368
5.0     355
0.0     354
10.0    346
16.0    333
11.0    331
15.0    323
13.0    320
8.0     315
12.0    307
17.0    299
14.0    299
9.0     277
23.0    276
19.0    276
25.0    269
21.0    267
18.0    257
20.0    254
22.0    246
24.0    237
29.0    234
28.0    227
31.0    221
26.0    211
27.0    208
30.0    201
34.0    199
33.0    198
35.0    157
32.0    155
Name: months_since_last_claim, dtype: int64

NaN     633
3.0     426
6.0     394
1.0     386
7.0     378
4.0     373
2.0     368
5.0     355
0.0     354
10.0    346
16.0    333
11.0    331
15.0    323
13.0    320
8.0     315
12.0    307
17.0    299
14.0    299
9.0     277
19.0    276
23.0    276
25.0    269
21.0    267
18.0    257
20.0    254
22.0    246
24.0    237
29.0    234
28.0    227
31.0    221
26.0    211
27.0    208
30.0    201
34.0    199
33.0    198
35.0    157
32.0    155
Name: months_since_last_claim, dtype: int64

In [121]:
mean_months_since_last_claim = df['months_since_last_claim'].mean()
mean_months_since_last_claim
df['months_since_last_claim'] = df['months_since_last_claim'].fillna(mean_months_since_last_claim)


In [122]:
df.isna().sum()


customer                            0
state                               0
customer_lifetime_value             0
response                            0
coverage                            0
education                           0
effective_to_date                   0
employmentstatus                    0
gender                              0
income                              0
location_code                       0
marital_status                      0
monthly_premium_auto                0
months_since_last_claim             0
months_since_policy_inception       0
number_of_open_complaints         633
number_of_policies                  0
policy_type                         0
policy                              0
renew_offer_type                    0
sales_channel                       0
total_claim_amount                  0
vehicle_class                     622
vehicle_size                      622
vehicle_type                     5482
dtype: int64

In [123]:
#----

In [124]:
# Replacing null values for the variable "number_of_open_complaints" which is a numerical variable
display(df['number_of_open_complaints'].unique())
display(df['number_of_open_complaints'].value_counts())

df['number_of_open_complaints'].value_counts(dropna=False) # Use dropna=False to also display the amount of NA's


array([ 0., nan,  3.,  1.,  2.,  4.,  5.])

0.0    8160
1.0    1145
2.0     414
3.0     324
4.0     166
5.0      68
Name: number_of_open_complaints, dtype: int64

0.0    8160
1.0    1145
NaN     633
2.0     414
3.0     324
4.0     166
5.0      68
Name: number_of_open_complaints, dtype: int64

In [125]:
mean_number_of_open_complaints = df['number_of_open_complaints'].mean()
mean_number_of_open_complaints
df['number_of_open_complaints'] = df['number_of_open_complaints'].fillna(mean_number_of_open_complaints)


In [126]:
df.isna().sum()


customer                            0
state                               0
customer_lifetime_value             0
response                            0
coverage                            0
education                           0
effective_to_date                   0
employmentstatus                    0
gender                              0
income                              0
location_code                       0
marital_status                      0
monthly_premium_auto                0
months_since_last_claim             0
months_since_policy_inception       0
number_of_open_complaints           0
number_of_policies                  0
policy_type                         0
policy                              0
renew_offer_type                    0
sales_channel                       0
total_claim_amount                  0
vehicle_class                     622
vehicle_size                      622
vehicle_type                     5482
dtype: int64

In [127]:
#-----


In [128]:
# Checking null values for the variable "vehicle_type" 
display(df['vehicle_type'].unique())
display(df['vehicle_type'].value_counts())

df['vehicle_type'].value_counts(dropna=False) # Use dropna=False to also display the amount of NA's


array([nan, 'A'], dtype=object)

A    5428
Name: vehicle_type, dtype: int64

NaN    5482
A      5428
Name: vehicle_type, dtype: int64

In [129]:
# In this case we could opt for eliminating this column as it does not provide any clear info.
# Relevant info of this category is gathered in "vechicle_class"

df = df.drop('vehicle_type', axis=1)


In [130]:
df.isna().sum()


customer                           0
state                              0
customer_lifetime_value            0
response                           0
coverage                           0
education                          0
effective_to_date                  0
employmentstatus                   0
gender                             0
income                             0
location_code                      0
marital_status                     0
monthly_premium_auto               0
months_since_last_claim            0
months_since_policy_inception      0
number_of_open_complaints          0
number_of_policies                 0
policy_type                        0
policy                             0
renew_offer_type                   0
sales_channel                      0
total_claim_amount                 0
vehicle_class                    622
vehicle_size                     622
dtype: int64

In [131]:
# ----


In [132]:
# Replacing null values for the variable "number_of_open_complaints" which is a numerical variable
display(df['vehicle_size'].unique())
display(df['vehicle_size'].value_counts())

df['vehicle_size'].value_counts(dropna=False) # Use dropna=False to also display the amount of NA's


array(['Medsize', 'Small', 'Large', nan], dtype=object)

Medsize    7251
Small      1966
Large      1071
Name: vehicle_size, dtype: int64

Medsize    7251
Small      1966
Large      1071
NaN         622
Name: vehicle_size, dtype: int64

In [133]:
# Check the column state for null values. Replace those null values with the state that is represented largest number of times in that column
df['vehicle_size'].value_counts(dropna=False)
len(df[df['vehicle_size'].isna()==True])  # number of missing values
df['vehicle_size'] = df['vehicle_size'].fillna('Medsize') #I refill it with the value that is most repited as I think it makes sense in this particular case.


In [134]:
df.isna().sum()


customer                           0
state                              0
customer_lifetime_value            0
response                           0
coverage                           0
education                          0
effective_to_date                  0
employmentstatus                   0
gender                             0
income                             0
location_code                      0
marital_status                     0
monthly_premium_auto               0
months_since_last_claim            0
months_since_policy_inception      0
number_of_open_complaints          0
number_of_policies                 0
policy_type                        0
policy                             0
renew_offer_type                   0
sales_channel                      0
total_claim_amount                 0
vehicle_class                    622
vehicle_size                       0
dtype: int64

In [135]:
#----


In [136]:
# Replacing null values for the variable "number_of_open_complaints" which is a numerical variable
display(df['vehicle_class'].unique())
display(df['vehicle_class'].value_counts())

df['vehicle_class'].value_counts(dropna=False) # Use dropna=False to also display the amount of NA's


array(['Four-Door Car', 'SUV', 'Two-Door Car', 'Sports Car', 'Luxury Car',
       'Luxury SUV', nan], dtype=object)

Four-Door Car    5212
Two-Door Car     2118
SUV              2012
Sports Car        550
Luxury SUV        208
Luxury Car        188
Name: vehicle_class, dtype: int64

Four-Door Car    5212
Two-Door Car     2118
SUV              2012
NaN               622
Sports Car        550
Luxury SUV        208
Luxury Car        188
Name: vehicle_class, dtype: int64

In [137]:
# In this particular case I have opted for discarting this data as I cannot have any clear view of which could be
# the missing info. Also, it represents less than 6% of the total sample.

df = df[df['vehicle_class'].isna()==False] # Since these nulls are not a lot, we can filter them
rows_to_drop = df[df['vehicle_class'].isna()==True].index
df = df.drop(rows_to_drop, axis=0)


In [96]:
df.isna().sum()

customer                         0
state                            0
customer_lifetime_value          0
response                         0
coverage                         0
education                        0
effective_to_date                0
employmentstatus                 0
gender                           0
income                           0
location_code                    0
marital_status                   0
monthly_premium_auto             0
months_since_last_claim          0
months_since_policy_inception    0
number_of_open_complaints        0
number_of_policies               0
policy_type                      0
policy                           0
renew_offer_type                 0
sales_channel                    0
total_claim_amount               0
vehicle_class                    0
vehicle_size                     0
dtype: int64

6. Datetime format - Extract the months from the dataset and store in a separate column. Then filter the data to show only the information for the first quarter , ie. January, February and March. _Hint_: If data from March does not exist, consider only January and February.

In [149]:
df["months"] = df["effective_to_date"]
df["months"] = pd.to_datetime(df["months"])
df["months"] = pd.DatetimeIndex(df["months"]).month
#df["months"] = df["months"].strftime(format= "%m")
df.head()

Unnamed: 0,customer,state,customer_lifetime_value,response,coverage,education,effective_to_date,employment_status,gender,income,...,number_of_open_complaints,number_of_policies,policy_type,policy,renew_offer_type,sales_channel,total_claim_amount,vehicle_class,vehicle_size,months
0,DK49336,Arizona,4809.21696,No,Basic,College,2/18/11,Employed,M,48029,...,0.0,9,Corporate Auto,Corporate L3,Offer3,Agent,292.8,Four-Door Car,Medsize,2
1,KX64629,California,2228.525238,No,Basic,College,1/18/11,Unemployed,F,0,...,0.0,1,Personal Auto,Personal L3,Offer4,Call Center,744.924331,Four-Door Car,Medsize,1
2,LZ68649,Washington,14947.9173,No,Basic,Bachelor,2/10/11,Employed,M,22139,...,0.0,2,Personal Auto,Personal L3,Offer3,Call Center,480.0,SUV,Medsize,2
3,XL78013,Oregon,22332.43946,Yes,Extended,College,1/11/11,Employed,M,49078,...,0.0,2,Corporate Auto,Corporate L3,Offer2,Branch,484.013411,Four-Door Car,Medsize,1
4,QA50777,Oregon,9025.067525,No,Premium,Bachelor,1/17/11,Medical Leave,F,23675,...,0.384256,7,Personal Auto,Personal L2,Offer1,Branch,707.925645,Four-Door Car,Medsize,1


In [154]:
first_quarter = df[df["months"] <= 3]  # Lets say that we are working on this filtered data
first_quarter.head(20)

Unnamed: 0,customer,state,customer_lifetime_value,response,coverage,education,effective_to_date,employment_status,gender,income,...,number_of_open_complaints,number_of_policies,policy_type,policy,renew_offer_type,sales_channel,total_claim_amount,vehicle_class,vehicle_size,months
0,DK49336,Arizona,4809.21696,No,Basic,College,2/18/11,Employed,M,48029,...,0.0,9,Corporate Auto,Corporate L3,Offer3,Agent,292.8,Four-Door Car,Medsize,2
1,KX64629,California,2228.525238,No,Basic,College,1/18/11,Unemployed,F,0,...,0.0,1,Personal Auto,Personal L3,Offer4,Call Center,744.924331,Four-Door Car,Medsize,1
2,LZ68649,Washington,14947.9173,No,Basic,Bachelor,2/10/11,Employed,M,22139,...,0.0,2,Personal Auto,Personal L3,Offer3,Call Center,480.0,SUV,Medsize,2
3,XL78013,Oregon,22332.43946,Yes,Extended,College,1/11/11,Employed,M,49078,...,0.0,2,Corporate Auto,Corporate L3,Offer2,Branch,484.013411,Four-Door Car,Medsize,1
4,QA50777,Oregon,9025.067525,No,Premium,Bachelor,1/17/11,Medical Leave,F,23675,...,0.384256,7,Personal Auto,Personal L2,Offer1,Branch,707.925645,Four-Door Car,Medsize,1
5,HO81102,other_states,4745.181764,No,Basic,High School or Below,2/14/11,Employed,M,50549,...,0.0,7,Personal Auto,Personal L3,Offer1,Agent,292.8,Four-Door Car,Medsize,2
6,IW72280,California,5035.035257,No,Basic,Doctor,2/14/11,Employed,F,37405,...,3.0,4,Corporate Auto,Corporate L2,Offer2,Branch,287.556107,Four-Door Car,Medsize,2
7,IH64929,California,4956.247235,No,Basic,High School or Below,2/24/11,Employed,M,87197,...,0.0,3,Personal Auto,Personal L2,Offer2,Branch,247.402353,Two-Door Car,Medsize,2
8,FM55990,California,5989.773931,Yes,Premium,College,1/19/11,Employed,M,66839,...,0.0,1,Personal Auto,Personal L1,Offer2,Branch,739.2,Sports Car,Medsize,1
9,QX31376,Oregon,17114.32815,No,Extended,Master,1/4/11,Employed,F,70961,...,0.0,2,Corporate Auto,Corporate L3,Offer2,Agent,408.0,Four-Door Car,Small,1
