#### Copyright 2020 Google LLC.

In [None]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Classification Project

In this project you will apply what you have learned about classification and TensorFlow to complete a project from Kaggle. The challenge is to achieve a high accuracy score while trying to predict which passengers survived the Titanic ship crash. After building your model, you will upload your predictions to Kaggle and submit the score that you get.

## The Titanic Dataset

[Kaggle](https://www.kaggle.com) has a [dataset](https://www.kaggle.com/c/titanic/data) containing the passenger list on the Titanic. The data contains passenger features such as age, gender, ticket class, as well as whether or not they survived.

Your job is to create a binary classifier using TensorFlow to determine if a passenger survived or not. The `Survived` column lets you know if the person survived. Then, upload your predictions to Kaggle and submit your accuracy score at the end of this Colab, along with a brief conclusion.


To get the dataset, you'll need to accept the competition's rules by clicking the "I understand and accept" button on the [competition rules page](https://www.kaggle.com/c/titanic/rules). Then upload your `kaggle.json` file and run the code below.

In [None]:
! chmod 600 kaggle.json && (ls ~/.kaggle 2>/dev/null || mkdir ~/.kaggle) && cp kaggle.json ~/.kaggle/ && echo 'Done'
! kaggle competitions download -c titanic
! ls

**Note: If you see a "403 - Forbidden" error above, you still need to click "I understand and accept" on the [competition rules page](https://www.kaggle.com/c/titanic/rules).**

Three files are downloaded:

1. `train.csv`: training data (contains features and targets)
1. `test.csv`: feature data used to make predictions to send to Kaggle
1. `gender_submission.csv`: an example competition submission file

## Step 1: Exploratory Data Analysis

Perform exploratory data analysis and data preprocessing. Use as many text and code blocks as you need to explore the data. Note any findings. Repair any data issues you find.

**Student Solution**

In [None]:
# we use the line of code below to enable tensor flow version 2 to run

%tensorflow_version 2.x

In [None]:
# this is the section for our explatory data analysis of the 3 files hidden within the titanic.zip file

import pandas as pd
import tensorflow as tf

gender_df=pd.read_csv('gender_submission.csv')

# gender_df.dtypes
# gender_df.describe()
# gender_df.hist()
gender_df.head(10)

In [None]:
import pandas as pd
test_df=pd.read_csv('test.csv')

test_df.dtypes
test_df.describe()
# test_df.hist()
test_df.head(10)
# test_df.columns

In [None]:
import pandas as pd
train_df=pd.read_csv('train.csv')

train_df.dtypes
train_df.describe()
# train_df.hist()
train_df.head(10)
# test_df.columns
# 327/418
test_df[test_df['Cabin'].isna()].head(25)

In [None]:
passg_1='PassengerId'
surv_2='Survived'

for data in passg_1:
  data=gender_df.iloc[0:209]
for data2 in surv_2:
  data2=gender_df.iloc[209:418]
print(f'PassengerId\n{data.describe()}\n')
print(f'Survived\n{data2.describe()}\n')


---

## Step 2: The Model

Build, fit, and evaluate a classification model. Perform any model-specific data processing that you need to perform. If the toolkit you use supports it, create visualizations for loss and accuracy improvements. Use as many text and code blocks as you need to explore the data. Note any findings.

**Student Solution**

In [None]:
# Your code goes here
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10,10))

gender_df = sns.heatmap(gender_df.corr(), cmap='coolwarm', annot=True )


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

X_train,X_test,y_train,y_test=train_test_split(
  gender_df[feature_columns],
  gender_df[target_column],
  test_size=0.2,
  random_state=180,
  shuffle=True  
)
y_train.groupby(y_train).count()

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10,10))
titanic_df = sns.heatmap(train_df.corr(), cmap='coolwarm', annot=True )


---

## Step 3: Make Predictions and Upload To Kaggle

In this step you will make predictions on the features found in the `test.csv` file and upload them to Kaggle using the [Kaggle API](https://github.com/Kaggle/kaggle-api). Use as many text and code blocks as you need to explore the data. Note any findings.

**Student Solution**

In [None]:
# Your code goes here

What was your Kaggle score?

> *Record your score here*

---

## Step 4: Iterate on Your Model

In this step you're encouraged to play around with your model settings and to even try different models. See if you can get a better score. Use as many text and code blocks as you need to explore the data. Note any findings.

**Student Solution**

In [None]:
# Your code goes here

---