
# Programming Assignment
# Data Pre-processing, Exploratory Analysis and Data Prediction using Machine Learning


## Description

There are two parts in this assignment:
- #### Part A
  - You will have an opportunity to apply data pre-processing techniques that you learned in the class to a problem. In addition, you will do exploratory analysis on the given dataset.
  
- #### Part B
  - You will have an opportunity to apply machine learning techniques that you learned in the class to a problem. In addition, you will learn to perform machine learning with the Keras Python library.

To get started on this assignment, you need to download the given dataset and read the description carefully written on this page. Please note that all implementation of your program should be done with Python.
<br/><br/>

### Intended Learning Outcomes

- Upon completion of <strong>Part A</strong> of this assignment, you should be able to:
<ol>
    <li>Demonstrate your understanding on how to pre-process data using the algorithms / techniques as described in the class.</li>
    <li>Use simple descriptive statistical approaches to understand your data.</li>
    <li>Construct Python program to analyse the data and draw simple conclusions from it.</li>
</ol>
<br />

- Upon completion of <strong>Part 2</strong> of this assignment, you should be able to:
<ol>
    <li>Demonstrate your understanding on how to do prediction using the machine learning algorithms / techniques as described in the class.</li>
    <li>Construct Python program to learn from the training data and do data prediction for the testing set.</li>
</ol>

### Required Libraries
The following libraries are required for this assignment:
<ol>
    <li>Numpy - Numerical python</li>
    <li>Scipy - Scientific python</li>
    <li>Matplotlib - Python 2D plotting library</li>
    <li>Seaborn - Visualization library based on matplotlib</li>
    <li>Pandas - Python data analysis library</li>
    <li>Keras - Deep learning library</li>
</ol>

### Dataset ~ Titanic (titanic-train.csv)
This dataset contains the following features:
<ul>
    <li>PassengerId</li>
    <li>Survived (Survival: 0 = No; 1 = Yes)</li>
    <li>Pclass (Passenger Class: 1 = 1st; 2 = 2nd; 3 = 3rd)</li>
    <li>Name (Name of the passenger)</li>
    <li>Sex (Sex of the passenger)</li>
    <li>Age (Age of the passenger)</li>
    <li>SibSp (Number of siblings / spourses aboard)</li>
    <li>Parch (Number of parents / children aboard)</li>
    <li>Ticket (Ticket number of the passenger)</li>
    <li>Fare (Passenger fare)</li>
    <li>Cabin (Cabin number of the passenger)</li>
    <li>Embarked (Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton) )</li>
</ul>

## Steps:
<ol>
    <li>Part A: Data Pre-processing and Exploratory Analysis  
      (i.e. Preparing the training data)
      <ul>
      <li>1 Importing data and exploring the features.</li>
      <li>2 Cleaning data: Handling missing values</li>
      <li>3 Creating new features and dropping redundant features.</li>
      <li>4 Transforming data.</li>
      <li>5 Analysing data statistically.</li>
      </ul>
    </li>
    <li>Part B: Data Prediction using Machine Learning  
      <ul>
      <li>1. Training the model. (Assignment 2)</li> 
      <li>2. Preparing the testing data (Assignment 2)</li>
          <ul>    
          <li>1 Importing data and exploring the features.</li>
          <li>2 Cleaning data: Handling missing values</li>
          <li>3 Creating new features and dropping redundant features.</li>
          <li>4 Transforming data.</li>
          </ul>
      <li>3. Doing data prediction using the trained model.</li>
      </ul>
    </li>
</ol>

## Part A: Data Pre-processing and Exploratory Analysis

### Step 1: Importing data and exploring the features

#### Step 1.1 
To start working with the Titanic dataset, you will need to import the required libraries, and read the data into a pandas DataFrame.
- Import the following libraries using import statements.
<ul>
    <li>pandas (for data manipulation)</li>
    <li>numpy (for multidimensional array computation)</li>
    <li>seaborn and matplotlib.pyplot (both for data visualization)</li>
</ul>
- Read the csv file 'titanic-train.csv' using Pandas' read_csv function
(<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html">pandas.read_csv</a>)

Note: Run a code cell by clicking on the cell and using the keyboard shortcut &lt;Shift&gt; + &lt;Enter&gt;.

In [None]:
# Put your statements here

#### Step 1.2
Use head function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html">pandas.DataFrame.head</a>) of pandas library to preview the first 10 data.

In [None]:
# Put your statement here

#### Step 1.3
Use tail function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.tail.html">pandas.DataFrame.tail</a>) of pandas library to preview the last 10 data.

In [None]:
# Put your statement here

#### Step 1.4
Display informtion on dataframe using info function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.info.html">pandas.DataFrame.info</a>) of pandas library.

In [None]:
# Put your statement here

#### Step 1.5
Evaluate the data quality & perform missing values assessment using isnull function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.isnull.html">pandas.isnull</a>) and sum function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sum.html">pandas.DataFrame.sum</a>) of pandas library.

In [None]:
# Put your statement here

<span style="color:red">What is your observation?</span>  

Intstruction: Write down your observations by editing this markup cell.


#### Step 1.6
Evaluate the distribution of categorical features (e.g., Name, Sex, Ticket, Cabin, and Embarked) using describe function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html">pandas.DataFrame.describe</a>) of the pandas library.

In [None]:
# Put your statement here

<span style="color:red">What is your observation?</span>  

Intstruction: Write down your observations by editing this markup cell.


### Step 2: Cleaning data: Handling missing values

#### Step 2.1
Refer to Step 1.5, there are missing values in three features.
For the feature with the second least number of missing values, evaluate its distribution using hist function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.hist.html">pandas.DataFrame.hist</a>) of pandas library.

In [None]:
# Put your statements here

<span style="color:red">What is your observation?</span>  

  
    


Intstruction: Write down your observations by editing this markup cell.


#### Step 2.2
For the feature with the least missing values, evaluate the distribution using countplot function (<a href="https://seaborn.pydata.org/generated/seaborn.countplot.html">seaborn.countplot</a>) of seaborn library.

In [None]:
# Put your statements here

<span style="color:red">What is your observation?</span>  
  
  
    


Intstruction: Write down your observations by editing this markup cell.


#### Step 2.3
- For the feature with the second least missing values, if the data is not skewed, using mean to impute the missing values. Otherwise using median.
- For the feature with the least missing values, impute the missing values with most the common value.

##### Step 2.3.1
Compute the mean OR median of the second least missing values using mean (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.mean.html">pandas.DataFrame.mean</a>) / median function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.median.html">pandas.DataFrame.median</a>) of pandas library.

Note: You have to skip all the missing values when computing the mean or median.

In [None]:
# Put your statement here


<span style="color:red">What is your observation?</span>  
Write down your observation here in the cell:  
  
  
    


Intstruction: Write down your observations by editing this markup cell.


##### Step 2.3.2
Use mean / median to impute the missing values of the feature with the second least missing values. fillna function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html">pandas.DataFrame.fillna</a>) of pandas library can be used.

In [None]:
# Put your statement here


##### Step 2.3.3
Use the most common value of the feature with the least missing values to impute the missing values. Again, fillna function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html">pandas.DataFrame.fillna</a>) of the pandas library can be used.

In [None]:
# Put your statement here


### Step 3: Creating new features and dropping redundant features

#### Step 3.1
Create a new feature called Family based on Parch and SibSp.
- Define a new feature 'Family' and assign it with the sum of SibSp and Parch.

In [None]:
# Put your statement here


#### Step 3.2
Use head function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html">pandas.DataFrame.head</a>) of pandas library to preview the first 10 data.

In [None]:
# Put your statement here


#### Step 3.3
- PassengerId, Name, Ticket, and Cabin may be dropped (using drop function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html">pandas.DataFrame.drop</a>) of the pandas library) as they contain high ratio of duplicates and there may not be a correlation to survivial.
- In addition, SibSp and Parch can be dropped as the feature 'Family' has been defined.

In [None]:
# Put your statement here


#### Step 3.4
Use head function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html">pandas.DataFrame.head</a>) of pandas library to preview the first 10 data.

In [None]:
# Put your statement here


### Step 4: Transforming data

##### Step 4.1
- Update the Family's values using where function (<a href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.where.html">numpy.where</a>) of numpy library as follows:
<ul>
    <li>0 if the Family's count is 0, which means single.</li>
    <li>1 if the Family's count is greater than or equal to 1 AND less than or equal to 4, which means small family.</li>
    <li>2 if the Family's count is greater than or equal to 5, means large family</li>
</ul>

In [None]:
# Put your statements here


#### Step 4.2
- Update the Sex's values to 1 if the original value is 'male'. Otherwise, update it to 0.
- where function (<a href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.where.html">numpy.where</a>) of numpy library can be used.

In [None]:
# Put your statement here


#### Step 4.3
- Update the Embarked values using where function (<a href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.where.html">numpy.where</a>) of numpy library as follows:
<ul>
    <li>0 if the Embarked value is C.</li>
    <li>1 if the Embarked value is Q.</li>
    <li>2 if teh Embarked value is S.</li>
</ul>

In [None]:
# Put your statements here


#### Step 4.4
- Update the Fare values using where function (<a href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.where.html">numpy.where</a>) of numpy library as follows:
<ul>
    <li>0 if the Fare value is less than or equal to 7.91</li>
    <li>1 if the Fare value is greater than 7.91 AND less than or equal to 14.454</li>
    <li>2 if the Fare value is greater than 14.454 AND less than or equal to 31</li>
    <li>3 if the Fare value is greater than 31</li>
</ul>

In [None]:
# Put your statements here


#### Step 4.5
- Update the Age values using where function (<a href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.where.html">numpy.where</a>) of numpy library as follows:
<ul>
    <li>0 if the Age value is less than or equal to 16</li>
    <li>1 if the Age value is greater than 16 AND less than or equal to 32</li>
    <li>2 if the Age value is greater than 32 AND less than or equal to 48</li>
    <li>3 if the Age value is greater than 48 AND less than or equal to 64</li>
    <li>4 if the Age value is greater than 64</li>
</ul>

In [None]:
# Put your statements here


#### Step 4.6
Change Embarked data to int type using astype function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.astype.html">pandas.DataFrame.astype</a>) of pandas library.

In [None]:
# Put your statement here


#### Step 4.7
Change Fare data from float to int type using astype function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.astype.html">pandas.DataFrame.astype</a>) of pandas library.

In [None]:
# Put your statement here


#### Step 4.8
Change Age data from float to int type using astype function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.astype.html">pandas.DataFrame.astype</a>) of pandas library.

In [None]:
# Put your statement here


#### Step 4.9
Use head function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html">pandas.DataFrame.head</a>) of pandas library to preview the first 20 data.

In [None]:
# Put your statement here


### Step 5: Analysing data statistically and graphically

#### Step 5.1
Use describe function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html">pands.DataFrame.describe</a>) of pandas library to generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset's distribution.

In [None]:
# Put your statement here


#### Step 5.2
Explore survival rate using countplot function (<a href="https://seaborn.pydata.org/generated/seaborn.countplot.html">seaborn.countplot</a>) of seaborn library.

In [None]:
# Put your statements here


#### Step 5.3
Perform other data analysis (e.g. on the features) to explore the factors constituting higher survival rate.

In [None]:
# Put your statements here

<span style="color:red">What are your observations?</span>

Intstruction: Write down your observations by editing this markup cell.


## Part B: Data Pre-processing and Exploratory Analysis

### Step 0: Installation and Importing the libraries

#### Step 0.1 
We are going to use <strong>Keras</strong> which will be using <strong>Tensorflow</strong> as backend.
Therefore, you will need to install Tensorflow and Keras first
<br/>
<ol>
   <li>To install Tesnsorflow:
      <ul>
         <li>Start the "Anaconda Prompt" and enter the following command:<br/>
         pip install --ignore-installed --upgrade tensorflow </li>
         <li>For more information on installing Tensorflow: <url>https://www.tensorflow.org/install/</url></li>
      </ul>
   </li>
   <li>To install Keras:
      <ul>
         <li>In the "Anaconda Prompt", enter the following command:<br />
         pip install keras 
         </li>
         <li>For more information on installing Keras: <url>https://keras.io/#installation</url></li>
      </ul>
   </li>
</ol>

### Step 1: Training the model

#### Step 1.1
Import the following libraries using import statements.
<ul>
    <li>keras (for deep learning) (Reference: <url>https://keras.io/</url>)
        <ul>
            <li>Sequential from keras.models</li>
            <li>Dense from keras.layers</li>
            <li>ModelCheckpoint from keras.callbacks</li>
        </ul>
    </li>    
</ul>
Note: Run a code cell by clicking on the cell and using the keyboard shortcut &lt;Shift&gt; + &lt;Enter&gt;.
<br />
Note: Tensorflow and Keras have to be installed first, see <strong>Step 0.1</strong>

In [None]:
# Put your statements here




#### Step 1.2
Prepare training data - feature set and label set
- Build feature set X by extracting 'Pclass', 'Sex', 'Age', 'Fare', 'Embarked' and 'Family' from trainData
- Build label set Y by extracting 'Survived' from trainData

In [None]:
# Put your statements here



#### Step 1.3
- Build a neural network to learn from the given training set - trainData.
(Reference: <a href="https://keras.io/guides/sequential_model/">Here</a>)
<ol>
  <li>Initalizing a neural network using Sequential() function and name the returned object NN.</li>
  <li>Adding the input layer and the hidden layer using add function of NN and Dense function.<br/>
      Parameters of Dense function:
      <ul>
          <li>Set output_dim to 9: output_dim is the number of nodes we want to add to this layer.</li>
          <li>kernel_initializer to 'random_uniform': the initialization of weights for stochastic gradient decent.
          </li>
          <li>activation to 'sigmoid': activation is the activation function of the node.</li>
          <li>input_dim to 6: input_dim refers to the number of inputs, which is only needed for the first layer.</li>
      </ul>
  </li>
  <li>Adding the output layer using add function of NN and Dense function.<br/>
      Parameters of Dense function:
      <ul>
          <li>Set output_dim to 1: output_dim is the number of nodes we want to add to this layer.</li>
          <li>kernel_initializer to 'random_uniform': the initialization of weights for stochastic gradient decent.</li>
          <li>activation to 'sigmoid': activation is the activation function of the node.</li>
      </ul>
  </li>
  <li>Prints a summary representation of our model by calling summary function of NN.</li>
</ol>

For more information about the weight initializer: <a href="here">https://keras.io/api/layers/initializers/</a>

In [None]:
# Put your statements here




#### Step 1.4
- Compile the neural network and start training
<ol>
  <li>Call compile function of NN to compile the neural network<br/>
      Parameters of compile function:
      <ul>
          <li>loss to 'binary_crossentropy': loss is a stochastic gradient decent depends on loss. Since the dependent variable in our case is binary, we will have to use logarithmic loss function called 'binary_crossentropy'.</li>
          <li>optimizer to 'Adam': optimizer is the algorithm that we want to use to find optimal set of weights. For details about 'Adam', please visit <a href="https://arxiv.org/abs/1412.6980v8">here</a>.              
          </li>
          <li>metrics to ['accuracy']: metrics is the metric(s) that we want to use to improve the performance of our neural network. In our case, accuracy is the metrics.</li>
      </ul>
  </li>
  <li>Call fit function of NN to train the model on training data<br/>
      Parameters of fit function:
      <ul>
          <li>Set x to X: x is the numpy array of training data.</li>
          <li>Set y to Y: y is the numpy array of label data.</li>
          <li>Set batch_size to 32: batch_size is the number of samples per gradient update.</li>
          <li>Set epochs to 10000: epochs is the number of epochs to train the model.</li>
      </ul>
  </li>
</ol>

In [None]:
# Put your statements here



### Step 2: Preparing the testing data

Steps for preprocessing the testing data:
<ol>
<li>Importing data and exploring the features.</li>
<li>Cleaning data: Handling missing values</li>
<li>Creating new features and dropping redundant features.</li>
<li>Transforming data.</li>
</ol>

#### Step 2.1 Importing data and exploring the features 
##### Step 2.1.1
Read the csv file 'titanic-test.csv' using Pandas' read_csv function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html">pandas.read_csv</a>)
Note: Run a code cell by clicking on the cell and using the keyboard shortcut &lt;Shift&gt; + &lt;Enter&gt;.

In [None]:
# Put your statements here


##### Step 2.1.2
Use head function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html">pandas.DataFrame.head</a>) of pandas library to preview the first 10 data.

In [None]:
# Put your statements here


##### Step 2.1.3
Use tail function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.tail.html">pandas.DataFrame.tail</a>) of pandas library to preview the last 10 data.

In [None]:
# Put your statements here


##### Step 2.1.4
Display informtion on dataframe using info function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.info.html">pandas.DataFrame.info</a>) of pandas library.

In [None]:
# Put your statements here


##### Step 2.1.5
Evaluate the data quality & perform missing values assessment using isnull function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.isnull.html">pandas.isnull</a>) and sum function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sum.html">pandas.DataFrame.sum</a>) of pandas library.

In [None]:
# Put your statements here


##### Step 2.1.6
Evaluate the distribution of categorical features (e.g., Name, Sex, Ticket, Cabin, and Embarked) using describe function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html">pandas.DataFrame.describe</a>) of the pandas library.

In [None]:
# Put your statements here


#### Step 2.2 Cleaning Data: Handling missing values 
##### Step 2.2.1
Refer to Step 2.1.5, there are missing values in three features.
For the feature with the second least number of missing values, evaluate its distribution using hist function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.hist.html">pandas.DataFrame.hist</a>) of pandas library.

In [None]:
# Put your statements here




#### Step 2.2.2
- For the feature with the second least missing values, if the data is not skewed, using mean to impute the missing values. Otherwise using median.

##### Step 2.2.2.1
Compute the mean OR median of the second least missing values using mean (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.mean.html">pandas.DataFrame.mean</a>) / median function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.median.html">pandas.DataFrame.median</a>) of pandas library.

Note: You have to skip all the missing values when computing the mean or median.

In [None]:
# Put your statement here


##### Step 2.2.2.2
Use the mean value of the feature with the least missing values to impute the missing values. Again, fillna function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html">pandas.DataFrame.fillna</a>) of the pandas library can be used.

In [None]:
# Put your statement here


##### Step 2.2.2.3
Use mean / median to impute the missing values of the feature with the second least missing values. fillna function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html">pandas.DataFrame.fillna</a>) of pandas library can be used.

In [None]:
# Put your statement here


#### Step 2.3: Creating new features and dropping redundant features
##### Step 2.3.1
Create a new feature called Family based on Parch and SibSp.
- Define a new feature 'Family' and assign it with the sum of SibSp and Parch.

In [None]:
# Put your statement here


##### Step 2.3.2
Use head function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html">pandas.DataFrame.head</a>) of pandas library to preview the first 10 data.

In [None]:
# Put your statement here


##### Step 2.3.3
- PassengerId, Name, Ticket, and Cabin may be dropped (using drop function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html">pandas.DataFrame.drop</a>) of the pandas library) as they contain high ratio of duplicates and there may not be a correlation to survivial.
- In addition, SibSp and Parch can be dropped as the feature 'Family' has been defined.

In [None]:
# Put your statement here


##### Step 2.3.4
Use head function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html">pandas.DataFrame.head</a>) of pandas library to preview the first 10 data.

In [None]:
# Put your statements here


#### Step 2.4: Transforming data
##### Step 2.4.1
- Update the Family's values using where function (<a href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.where.html">numpy.where</a>) of numpy library as follows:
<ul>
    <li>0 if the Family's count is 0, which means single.</li>
    <li>1 if the Family's count is greater than or equal to 1 AND less than or equal to 4, which means small family.</li>
    <li>2 if the Family's count is greater than or equal to 5, means large family</li>
</ul>

In [None]:
# Put your statements here




##### Step 2.4.2
- Update the Sex's values to 1 if the original value is 'male'. Otherwise, update it to 0.
- where function (<a href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.where.html">numpy.where</a>) of numpy library can be used.

In [None]:
# Put your statement here


##### Step 2.4.3
- Update the Embarked values using where function (<a href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.where.html">numpy.where</a>) of numpy library as follows:
<ul>
    <li>0 if the Embarked value is C.</li>
    <li>1 if the Embarked value is Q.</li>
    <li>2 if teh Embarked value is S.</li>
</ul>

In [None]:
# Put your statements here




##### Step 2.4.4
- Update the Fare values using where function (<a href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.where.html">numpy.where</a>) of numpy library as follows:
<ul>
    <li>0 if the Fare value is less than or equal to 7.91</li>
    <li>1 if the Fare value is greater than 7.91 AND less than or equal to 14.454</li>
    <li>2 if the Fare value is greater than 14.454 AND less than or equal to 31</li>
    <li>3 if the Fare value is greater than 31</li>
</ul>

In [None]:
# Put your statements here





##### Step 2.4.5
- Update the Age values using where function (<a href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.where.html">numpy.where</a>) of numpy library as follows:
<ul>
    <li>0 if the Age value is less than or equal to 16</li>
    <li>1 if the Age value is greater than 16 AND less than or equal to 32</li>
    <li>2 if the Age value is greater than 32 AND less than or equal to 48</li>
    <li>3 if the Age value is greater than 48 AND less than or equal to 64</li>
    <li>4 if the Age value is greater than 64</li>
</ul>

In [None]:
# Put your statements here






##### Step 2.4.6
Change Embarked data to int type using astype function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.astype.html">pandas.DataFrame.astype</a>) of pandas library.

In [None]:
# Put your statement here


##### Step 2.4.7
Change Fare data from float to int type using astype function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.astype.html">pandas.DataFrame.astype</a>) of pandas library.

In [None]:
# Put your statement here


##### Step 2.4.8
Change Age data from float to int type using astype function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.astype.html">pandas.DataFrame.astype</a>) of pandas library.

In [None]:
# Put your statement here


##### Step 2.4.9
Use head function (<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html">pandas.DataFrame.head</a>) of pandas library to preview the first 20 data.

In [None]:
# Put your statement here


### Step 3: Use the NN model to predict the results

Predicting whether or not the passengers survived the sinking of the Titanic.
<ol>
    <li>Use predict function of NN to predict testData. The predictions will be in the range between 0 and 1.</li>
    <li>If the predicted value is greater than 0.5, set it to 1, otherwise, set it to 0.
    </li>
    <li>Save the classification results to a CSV file: prediction-ann.csv (for example, you may use the pandas.DataFrame.to_csv function).</li>
</ol>

In [None]:
# Put your statements here





### Submission
Submit your jupyter notebook (.ipynb) and the classification results (prediction-ann.csv).