# **Holiday Package Prediction**

**Steps of the Data Science Pipeline**
- Acquire our Data
- Prepare/Clean our Data
- Explore/Analyze our Data
- Model

In [1]:
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

import wrangle as w
import explore as ex

### Acquire Data

**Acquired data from [here](https://www.kaggle.com/susant4learning/holiday-package-purchase-prediction?select=Travel.csv)

In [2]:
# bring in CSV file
df = pd.read_csv('Travel.csv')

In [3]:
w.get_info(df)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4888 entries, 0 to 4887
Data columns (total 20 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   CustomerID                4888 non-null   int64  
 1   ProdTaken                 4888 non-null   int64  
 2   Age                       4662 non-null   float64
 3   TypeofContact             4863 non-null   object 
 4   CityTier                  4888 non-null   int64  
 5   DurationOfPitch           4637 non-null   float64
 6   Occupation                4888 non-null   object 
 7   Gender                    4888 non-null   object 
 8   NumberOfPersonVisiting    4888 non-null   int64  
 9   NumberOfFollowups         4843 non-null   float64
 10  ProductPitched            4888 non-null   object 
 11  PreferredPropertyStar     4862 non-null   float64
 12  MaritalStatus             4888 non-null   object 
 13  NumberOfTrips             4748 non-null   float64
 14  Passport

Unnamed: 0,CustomerID,ProdTaken,Age,TypeofContact,CityTier,DurationOfPitch,Occupation,Gender,NumberOfPersonVisiting,NumberOfFollowups,ProductPitched,PreferredPropertyStar,MaritalStatus,NumberOfTrips,Passport,PitchSatisfactionScore,OwnCar,NumberOfChildrenVisiting,Designation,MonthlyIncome
3719,203719,0,52.0,Self Enquiry,1,10.0,Large Business,Female,3,4.0,Standard,3.0,Married,4.0,0,4,1,2.0,Senior Manager,31794.0
413,200413,0,42.0,Company Invited,1,,Large Business,Female,2,4.0,Deluxe,3.0,Divorced,1.0,0,3,0,0.0,Manager,19523.0
3212,203212,0,41.0,Self Enquiry,3,33.0,Small Business,Male,4,4.0,Deluxe,5.0,Divorced,3.0,0,2,1,2.0,Manager,27074.0


### Data Acquisition Takeaways:
 - Most of our columns are numeral types.
 - There are a handful of nulls that I need to handle.
 - Object columns will need to be addressed and be changed to numeral types.

### **Prepare/Clean Data**

In [4]:
df = w.prepare_holiday(df)

In [8]:
df.head()

Unnamed: 0,cust_id,product_taken,age,contact_type,city_tier,pitch_duration,occupation,gender,group_size,no_followups,product_pitched,preferred_property_star,marital_status,no_trips,passport,pitch_sat_score,has_car,no_children,designation,monthly_income
0,200000,1,41.0,Self Enquiry,3,6.0,Salaried,Female,3,3.0,Deluxe,3.0,Single,1.0,1,2,1,0.0,Manager,20993.0
1,200001,0,49.0,Company Invited,1,14.0,Salaried,Male,3,4.0,Deluxe,4.0,Divorced,2.0,0,3,1,2.0,Manager,20130.0
2,200002,1,37.0,Self Enquiry,1,8.0,Free Lancer,Male,3,4.0,Basic,3.0,Single,7.0,1,3,0,0.0,Executive,17090.0
3,200003,0,33.0,Company Invited,1,9.0,Salaried,Female,2,3.0,Basic,3.0,Divorced,2.0,1,5,1,1.0,Executive,17909.0
4,200004,0,,Self Enquiry,1,8.0,Small Business,Male,2,3.0,Basic,4.0,Divorced,1.0,0,5,1,0.0,Executive,18468.0


In [5]:
# double check there are no duplicates
df.duplicated().any()

False

In [6]:
df.columns

Index(['cust_id', 'product_taken', 'age', 'contact_type', 'city_tier',
       'pitch_duration', 'occupation', 'gender', 'group_size', 'no_followups',
       'product_pitched', 'preferred_property_star', 'marital_status',
       'no_trips', 'passport', 'pitch_sat_score', 'has_car', 'no_children',
       'designation', 'monthly_income'],
      dtype='object')

In [7]:
df.isnull().sum()

cust_id                      0
product_taken                0
age                        226
contact_type                25
city_tier                    0
pitch_duration             251
occupation                   0
gender                       0
group_size                   0
no_followups                45
product_pitched              0
preferred_property_star     26
marital_status               0
no_trips                   140
passport                     0
pitch_sat_score              0
has_car                      0
no_children                 66
designation                  0
monthly_income             233
dtype: int64