# Machine Learning Recipe

1. Load the dataset using pandas. pandas can read many different file formats and even communicate with databases.


2. Clean your dataset, and then explore the data. Use plots to visualise your data. You can consider the following:
 - plt.hist()
 - plt.scatter()
     - use color and colorbar.
     - use size.
 - plt.plot()
 - sns.boxplot()
 - sns.jointplot()
 - sns.pairplot()
     - use hue to see a different dimension.
     
     
3. Transform data and clean or create new features where needed:
    - clean the labels up, compress the data.
    - for categorical variables, use **one hot encoding** to populate your feature matrix.
    - combining columns can be useful.
    - or join with data from other sources.
    
    
4. Based on your insights of the data (are you trying to predict a categorical variable i.e. classification or a continuous variable i.e regression).


5. Choose the candidate models:
    - create a train set and a test set (recommended 70-30 train-test split or 80-20 train-test split).
    - run n-fold cross validation on the training set for each of the candidate model types. Within each model class, this will select the best model (use **accuracy** for classification; use **mean squared error** for regression)
    - then select the best model between the model types (say OLS, Ridge, Lasso, ElasticNet) to select the best overall model by evaluating on the test set.

## Getting started on the Titanic Dataset

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
import seaborn as sns
sns.set()
%matplotlib inline

In [8]:
train_data = pd.read_csv("./Resources/DataSets/titanic/train.csv",header=0)
test_data = pd.read_csv("./Resources/DataSets/titanic/test.csv",header = 0)

In [9]:
train_data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [10]:
test_data.head()

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S
