# **Holiday Package Prediction**

**Steps of the Data Science Pipeline**
- Acquire our Data
- Prepare/Clean our Data
- Explore/Analyze our Data
- Model

In [2]:
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

import wrangle as w

### Acquire Data

**Acquired data from [here](https://www.kaggle.com/susant4learning/holiday-package-purchase-prediction?select=Travel.csv)

In [3]:
# bring in CSV file
df = pd.read_csv('Travel.csv')

In [4]:
w.get_info(df)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4888 entries, 0 to 4887
Data columns (total 20 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   CustomerID                4888 non-null   int64  
 1   ProdTaken                 4888 non-null   int64  
 2   Age                       4662 non-null   float64
 3   TypeofContact             4863 non-null   object 
 4   CityTier                  4888 non-null   int64  
 5   DurationOfPitch           4637 non-null   float64
 6   Occupation                4888 non-null   object 
 7   Gender                    4888 non-null   object 
 8   NumberOfPersonVisiting    4888 non-null   int64  
 9   NumberOfFollowups         4843 non-null   float64
 10  ProductPitched            4888 non-null   object 
 11  PreferredPropertyStar     4862 non-null   float64
 12  MaritalStatus             4888 non-null   object 
 13  NumberOfTrips             4748 non-null   float64
 14  Passport

Unnamed: 0,CustomerID,ProdTaken,Age,TypeofContact,CityTier,DurationOfPitch,Occupation,Gender,NumberOfPersonVisiting,NumberOfFollowups,ProductPitched,PreferredPropertyStar,MaritalStatus,NumberOfTrips,Passport,PitchSatisfactionScore,OwnCar,NumberOfChildrenVisiting,Designation,MonthlyIncome
3720,203720,0,38.0,Self Enquiry,3,16.0,Small Business,Male,3,5.0,Deluxe,4.0,Married,3.0,0,4,0,2.0,Manager,22867.0
2647,202647,0,41.0,Self Enquiry,1,26.0,Small Business,Male,3,4.0,Deluxe,3.0,Divorced,2.0,1,2,1,2.0,Manager,21419.0
3811,203811,0,28.0,Self Enquiry,1,9.0,Large Business,Male,3,4.0,Basic,4.0,Married,3.0,0,4,1,2.0,Executive,20904.0


### Data Acquisition Takeaways:
 - Most of our columns are numeral types.
 - There are a handful of nulls that I need to handle.
 - Object columns will need to be addressed and be changed to numeral types.

### **Prepare/Clean Data**

In [7]:
# first and foremost let's rename our columns to more human readable
df.rename(str.lower, axis='columns', inplace=True)
df.rename(columns={'customerid': 'cust_id',
                   'prodtaken': 'product_taken',
                   'typeofcontact': 'contact_type',
                   'citytier': 'city_tier',
                   'durationofpitch': 'pitch_duration',
                   'numberofpersonvisiting': 'group_size',
                   'numberoffollowups': 'no_followups',
                   'productpitched': 'product_pitched',
                   'preferredpropertystar': 'preferred_property_star',
                   'maritalstatus': 'marital_status',
                   'numberoftrips': 'no_trips',
                   'pitchsatisfactionscore': 'pitch_sat_score',
                   'owncar': 'has_car',
                   'numberofchildrenvisiting': 'no_children',
                   'monthlyincome': 'monthly_income'
}, inplace=True)

In [8]:
# let's take a look at distribution of the columns


cust_id                      0
product_taken                0
age                        226
contact_type                25
city_tier                    0
pitch_duration             251
occupation                   0
gender                       0
group_size                   0
no_followups                45
product_pitched              0
preferred_property_star     26
marital_status               0
no_trips                   140
passport                     0
pitch_sat_score              0
has_car                      0
no_children                 66
designation                  0
monthly_income             233
dtype: int64