# <div style="text-align: center">A Data Science Framework for Quora </div>

<img src='http://s9.picofile.com/file/8342477368/kq.png'>


 <a id="1"></a> <br>
## 1- Introduction
Quora has defined a competition in Kaggle. A realistic and attractive data set for data scientists.
on this notebook, I will provide a comprehensive approach to solve Quora classification problem.
I am open to getting your feedback for improving this **kernel**


## Notebook  Content
1. [Introduction](#1)
1. [Data Science Workflow for Quora](#2)
1. [Problem Definition](#3)
    1.[Problem feature](#4)
    1.[Aim](#5)
    1.[Variables](#6)
    1. [ Inputs & Outputs](#7)
    1. [Inputs ](#8)
    1. [Outputs](#9)
    1. [Exploratory data analysis](#16)
1. [Data Collection](#17)
1. [Model Deployment](#32)
1. [Conclusion](#54)
1. [References](#55)

-------------------------------------------------------------------------------------------------------------

 **I hope you find this kernel helpful and some <font color="red"><b>UPVOTES</b></font> would be very much appreciated**
 
 -----------

<a id="2"></a> <br>
## 2- A Data Science Workflow for Quora
Of course, the same solution can not be provided for all problems, so the best way is to create a general framework and adapt it to new problem.

**You can see my workflow in the below image** :

 <img src="http://s9.picofile.com/file/8338227634/workflow.png" />

**you should	feel free	to	adapt 	this	checklist 	to	your needs**

<a id="3"></a> <br>
## 3- Problem Definition
I think one of the important things when you start a new machine learning project is Defining your problem. that means you should understand business problem.( **Problem Formalization**)
**we will be predicting whether a question asked on Quora is sincere or not.**
## 3-1 Business View 
An existential problem for any major website today is how to handle toxic and divisive content. **Quora** wants to tackle this problem head-on to keep their platform a place where users can feel safe sharing their knowledge with the world.

**Quora** is a platform that empowers people to learn from each other. On Quora, people can ask questions and connect with others who contribute unique insights and quality answers. A key challenge is to weed out insincere questions -- those founded upon false premises, or that intend to make a statement rather than look for helpful answers.

In this kernel, I will develop models that identify and flag insincere questions.we Help Quora uphold their policy of “Be Nice, Be Respectful” and continue to be a place for sharing and growing the world’s knowledge.

### 3-2 What is a insincere question?
is defined as a question intended to make a statement rather than look for helpful answers.

### 3-3 how can we find insincere question?
Some characteristics that can signify that a question is insincere:

1. Has a non-neutral tone
    1. Has an exaggerated tone to underscore a point about a group of people
    1. Is rhetorical and meant to imply a statement about a group of people
1. Is disparaging or inflammatory
    1. Suggests a discriminatory idea against a protected class of people, or seeks confirmation of a stereotype
    1. Makes disparaging attacks/insults against a specific person or group of people
    1. Based on an outlandish premise about a group of people
    1. Disparages against a characteristic that is not fixable and not measurable
1. Isn't grounded in reality
    1. Based on false information, or contains absurd assumptions
    1. Uses sexual content (incest, bestiality, pedophilia) for shock value, and not to seek genuine answers

<a id="4"></a> <br>
## 4- Problem Feature
Problem Definition has four steps that have illustrated in the picture below:

1. Aim
1. Variable
1. Inputs & Outputs





<a id="5"></a> <br>
### 4-1 Aim
we will be predicting whether a question asked on Quora is sincere or not.


<a id="6"></a> <br>
### 4-2 Variables

1. qid - unique question identifier
1. question_text - Quora question text
1. target - a question labeled "insincere" has a value of 1, otherwise 0

<a id="6"></a> <br>
### 4-3 Inputs & Outputs
we use train.csv and test.csv as Input and we should upload a  submission.csv as Output


**<< Note >>**
> You must answer the following question:
How does your company expact to use and benfit from your model.

<a id="2"></a> <br>
## 5- Select Framework
after problem definition and problem feature, we should select our framework To implement it.


### 5-1 Import

In [1]:
from sklearn.cross_validation import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
import matplotlib.pylab as pylab
import matplotlib.pyplot as plt
from pandas import get_dummies
import matplotlib as mpl
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib
import warnings
import sklearn
import scipy
import numpy
import json
import sys
import csv
import os



### 5-2 version

In [None]:
print('matplotlib: {}'.format(matplotlib.__version__))
print('sklearn: {}'.format(sklearn.__version__))
print('scipy: {}'.format(scipy.__version__))
print('seaborn: {}'.format(sns.__version__))
print('pandas: {}'.format(pd.__version__))
print('numpy: {}'.format(np.__version__))
print('Python: {}'.format(sys.version))


### 5-3 Setup

A few tiny adjustments for better **code readability**

In [5]:
sns.set(style='white', context='notebook', palette='deep')
pylab.rcParams['figure.figsize'] = 12,8
warnings.filterwarnings('ignore')
mpl.style.use('ggplot')
sns.set_style('white')
%matplotlib inline

<a id="16"></a> <br>
## 6- EDA
 In this section, you'll learn how to use graphical and numerical techniques to begin uncovering the structure of your data. 
 
* Which variables suggest interesting relationships?
* Which observations are unusual?
* Analysis of the features!
By the end of the section, you'll be able to answer these questions and more, while generating graphics that are both insightful and beautiful.  then We will review analytical and statistical operations:

*   5-1 Data Collection
*   5-2 Visualization
*   5-3 Data Preprocessing
*   5-4 Data Cleaning
<img src="http://s9.picofile.com/file/8338476134/EDA.png">

 

<a id="17"></a> <br>
## 6-1 Data Collection
**Data collection** is the process of gathering and measuring data, information or any variables of interest in a standardized and established manner that enables the collector to answer or test hypothesis and evaluate outcomes of the particular collection.[techopedia]

I start Collection Data by the training and testing datasets into **Pandas DataFrames**


In [7]:
# import train and test to play with it
train = pd.read_csv('../input/train.csv')
test = pd.read_csv('../input/test.csv')

**<< Note 1 >>**

* Each **row** is an observation (also known as : sample, example, instance, record)
* Each **column** is a feature (also known as: Predictor, attribute, Independent Variable, input, regressor, Covariate)

After loading the data via **pandas**, we should checkout what the content is, description and via the following:

In [None]:
type(train)

In [None]:
type(test)

In [None]:
train["target"].value_counts()
# data is imbalance

In [None]:
train_target = train['target'].values

np.unique(train_target)

<a id="18"></a> <br>
## 6-2 Visualization
**Data visualization**  is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns.

With interactive visualization, you can take the concept a step further by using technology to drill down into charts and graphs for more detail, interactively changing what data you see and how it’s processed.[SAS]

 In this section I show you  **11 plots** with **matplotlib** and **seaborn** that is listed in the blew picture:
 <img src="http://s8.picofile.com/file/8338475500/visualization.jpg" />


<a id="19"></a> <br>
### 6-2-1 Scatter plot

Scatter plot Purpose To identify the type of relationship (if any) between two quantitative variables




In [None]:
# Modify the graph above by assigning each species an individual color.
g = sns.FacetGrid(train, hue="Survived", col="Pclass", margin_titles=True,
                  palette={1:"seagreen", 0:"gray"})
g=g.map(plt.scatter, "Fare", "Age",edgecolor="w").add_legend();

<a id="20"></a> <br>
### 6-2-2 Box
In descriptive statistics, a **box plot** or boxplot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-and-whisker diagram.[wikipedia]

In [None]:
train.plot(kind='box', subplots=True, layout=(2,4), sharex=False, sharey=False)
plt.figure()
#This gives us a much clearer idea of the distribution of the input attributes:



In [None]:
# To plot the species data using a box plot:

sns.boxplot(x="Fare", y="Age", data=test )
plt.show()

In [None]:
# Use Seaborn's striplot to add data points on top of the box plot 
# Insert jitter=True so that the data points remain scattered and not piled into a verticle line.
# Assign ax to each axis, so that each plot is ontop of the previous axis. 

ax= sns.boxplot(x="Fare", y="Age", data=train)
ax= sns.stripplot(x="Fare", y="Age", data=train, jitter=True, edgecolor="gray")
plt.show()

In [None]:
# Tweek the plot above to change fill and border color color using ax.artists.
# Assing ax.artists a variable name, and insert the box number into the corresponding brackets

ax= sns.boxplot(x="Fare", y="Age", data=train)
ax= sns.stripplot(x="Fare", y="Age", data=train, jitter=True, edgecolor="gray")

boxtwo = ax.artists[2]
boxtwo.set_facecolor('red')
boxtwo.set_edgecolor('black')
boxthree=ax.artists[1]
boxthree.set_facecolor('yellow')
boxthree.set_edgecolor('black')

plt.show()

<a id="21"></a> <br>
### 6-2-3 Histogram
We can also create a **histogram** of each input variable to get an idea of the distribution.



In [None]:
# histograms
train.hist(figsize=(15,20))
plt.figure()

It looks like perhaps two of the input variables have a Gaussian distribution. This is useful to note as we can use algorithms that can exploit this assumption.



In [None]:
train["Age"].hist();

In [None]:
f,ax=plt.subplots(1,2,figsize=(20,10))
train[train['Survived']==0].Age.plot.hist(ax=ax[0],bins=20,edgecolor='black',color='red')
ax[0].set_title('Survived= 0')
x1=list(range(0,85,5))
ax[0].set_xticks(x1)
train[train['Survived']==1].Age.plot.hist(ax=ax[1],color='green',bins=20,edgecolor='black')
ax[1].set_title('Survived= 1')
x2=list(range(0,85,5))
ax[1].set_xticks(x2)
plt.show()

In [None]:
f,ax=plt.subplots(1,2,figsize=(18,8))
train['Survived'].value_counts().plot.pie(explode=[0,0.1],autopct='%1.1f%%',ax=ax[0],shadow=True)
ax[0].set_title('Survived')
ax[0].set_ylabel('')
sns.countplot('Survived',data=train,ax=ax[1])
ax[1].set_title('Survived')
plt.show()

In [None]:
f,ax=plt.subplots(1,2,figsize=(18,8))
train[['Sex','Survived']].groupby(['Sex']).mean().plot.bar(ax=ax[0])
ax[0].set_title('Survived vs Sex')
sns.countplot('Sex',hue='Survived',data=train,ax=ax[1])
ax[1].set_title('Sex:Survived vs Dead')
plt.show()

<a id="22"></a> <br>
### 6-2-4 Multivariate Plots
Now we can look at the interactions between the variables.

First, let’s look at scatterplots of all pairs of attributes. This can be helpful to spot structured relationships between input variables.

In [None]:

# scatter plot matrix
pd.plotting.scatter_matrix(train,figsize=(10,10))
plt.figure()

Note the diagonal grouping of some pairs of attributes. This suggests a high correlation and a predictable relationship.

<a id="23"></a> <br>
### 6-2-5 violinplots

In [None]:
# violinplots on petal-length for each species
sns.violinplot(data=train,x="Fare", y="Age")

In [None]:
f,ax=plt.subplots(1,2,figsize=(18,8))
sns.violinplot("Pclass","Age", hue="Survived", data=train,split=True,ax=ax[0])
ax[0].set_title('Pclass and Age vs Survived')
ax[0].set_yticks(range(0,110,10))
sns.violinplot("Sex","Age", hue="Survived", data=train,split=True,ax=ax[1])
ax[1].set_title('Sex and Age vs Survived')
ax[1].set_yticks(range(0,110,10))
plt.show()

<a id="24"></a> <br>
### 6-2-6 pairplot

In [None]:
# Using seaborn pairplot to see the bivariate relation between each pair of features
sns.pairplot(train, hue="Age")

From the plot, we can see that the species setosa is separataed from the other two across all feature combinations

We can also replace the histograms shown in the diagonal of the pairplot by kde.

In [None]:
# updating the diagonal elements in a pairplot to show a kde
sns.pairplot(train, hue="Age",diag_kind="kde")

<a id="25"></a> <br>
###  6-2-7 kdeplot

In [None]:
# seaborn's kdeplot, plots univariate or bivariate density estimates.
#Size can be changed by tweeking the value used
sns.FacetGrid(train, hue="Survived", size=5).map(sns.kdeplot, "Fare").add_legend()
plt.show()

<a id="26"></a> <br>
### 6-2-8 jointplot

In [None]:
# Use seaborn's jointplot to make a hexagonal bin plot
#Set desired size and ratio and choose a color.
sns.jointplot(x="Age", y="Survived", data=train, size=10,ratio=10, kind='hex',color='green')
plt.show()

<a id="27"></a> <br>
###  6-2-9 andrews_curves

In [None]:
# we will use seaborn jointplot shows bivariate scatterplots and univariate histograms with Kernel density 
# estimation in the same figure
sns.jointplot(x="Age", y="Fare", data=train, size=6, kind='kde', color='#800000', space=0)

<a id="28"></a> <br>
### 6-2-10 Heatmap

In [None]:
plt.figure(figsize=(7,4)) 
sns.heatmap(train.corr(),annot=True,cmap='cubehelix_r') #draws  heatmap with input as the correlation matrix calculted by(iris.corr())
plt.show()

In [None]:
sns.heatmap(train.corr(),annot=False,cmap='RdYlGn',linewidths=0.2)  
fig=plt.gcf()
fig.set_size_inches(10,8)
plt.show()

###  6-2-11 Bar Plot

In [None]:
train['Pclass'].value_counts().plot(kind="bar");

### 6-2-12 Factorplot

In [None]:
sns.factorplot('Pclass','Survived',hue='Sex',data=train)
plt.show()

### 6-2-13 distplot

In [None]:
f,ax=plt.subplots(1,3,figsize=(20,8))
sns.distplot(train[train['Pclass']==1].Fare,ax=ax[0])
ax[0].set_title('Fares in Pclass 1')
sns.distplot(train[train['Pclass']==2].Fare,ax=ax[1])
ax[1].set_title('Fares in Pclass 2')
sns.distplot(train[train['Pclass']==3].Fare,ax=ax[2])
ax[2].set_title('Fares in Pclass 3')
plt.show()

**<< Note >>**

**Yellowbrick** is a suite of visual diagnostic tools called “Visualizers” that extend the Scikit-Learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines scikit-learn with matplotlib in the best tradition of the scikit-learn documentation, but to produce visualizations for your models! 

### 6-2-12 Conclusion
we have used Python to apply data visualization tools to the Iris dataset. Color and size changes were made to the data points in scatterplots. I changed the border and fill color of the boxplot and violin, respectively.

<a id="30"></a> <br>
## 7 Data Preprocessing
**Data preprocessing** refers to the transformations applied to our data before feeding it to the algorithm.
 
Data Preprocessing is a technique that is used to convert the raw data into a clean data set. In other words, whenever the data is gathered from different sources it is collected in raw format which is not feasible for the analysis.
there are plenty of steps for data preprocessing and we just listed some of them :
* removing Target column (id)
* Sampling (without replacement)
* Making part of iris unbalanced and balancing (with undersampling and SMOTE)
* Introducing missing values and treating them (replacing by average values)
* Noise filtering
* Data discretization
* Normalization and standardization
* PCA analysis
* Feature selection (filter, embedded, wrapper)

## 4-3-1 Features
Features:
* numeric
* categorical
* ordinal
* datetime
* coordinates

find the type of features in titanic dataset
<img src="http://s9.picofile.com/file/8339959442/titanic.png" height="700" width="600" />

### 4-3-2 Explorer Dataset
1- Dimensions of the dataset.

2- Peek at the data itself.

3- Statistical summary of all attributes.

4- Breakdown of the data by the class variable.[7]

Don’t worry, each look at the data is **one command**. These are useful commands that you can use again and again on future projects.

In [None]:
# shape
print(train.shape)

In [None]:
#columns*rows
train.size

how many NA elements in every column


In [None]:
train.isnull().sum()

In [None]:
# remove rows that have NA's
#train = train.dropna()


We can get a quick idea of how many instances (rows) and how many attributes (columns) the data contains with the shape property.

You should see 150 instances and 5 attributes:

for getting some information about the dataset you can use **info()** command

In [None]:
print(train.info())

you see number of unique item for Species with command below:

In [None]:
train['Age'].unique()

In [None]:
train["Pclass"].value_counts()


to check the first 5 rows of the data set, we can use head(5).

In [None]:
train.head(5) 

to check out last 5 row of the data set, we use tail() function

In [None]:
train.tail() 

to pop up 5 random rows from the data set, we can use **sample(5)**  function

In [None]:
train.sample(5) 

to give a statistical summary about the dataset, we can use **describe()

In [None]:
train.describe() 

to check out how many null info are on the dataset, we can use **isnull().sum()

In [None]:
train.isnull().sum()

In [None]:
train.groupby('Pclass').count()

to print dataset **columns**, we can use columns atribute

In [None]:
train.columns

**<< Note 2 >>**
in pandas's data frame you can perform some query such as "where"

In [None]:
train.where(train ['Age']==30)

as you can see in the below in python, it is so easy perform some query on the dataframe:

In [None]:
train[train['Age']>7.2]

In [None]:
# Seperating the data into dependent and independent variables
X = train.iloc[:, :-1].values
y = train.iloc[:, -1].values

**<< Note >>**
>**Preprocessing and generation pipelines depend on a model type**

<a id="31"></a> <br>
## 8 Data Cleaning
When dealing with real-world data, dirty data is the norm rather than the exception. We continuously need to predict correct values, impute missing ones, and find links between various data artefacts such as schemas and records. We need to stop treating data cleaning as a piecemeal exercise (resolving different types of errors in isolation), and instead leverage all signals and resources (such as constraints, available statistics, and dictionaries) to accurately predict corrective actions.

The primary goal of data cleaning is to detect and remove errors and **anomalies** to increase the value of data in analytics and decision making. While it has been the focus of many researchers for several years, individual problems have been addressed separately. These include missing value imputation, outliers detection, transformations, integrity constraints violations detection and repair, consistent query answering, deduplication, and many other related problems such as profiling and constraints mining.[8]

In [None]:
cols = train.columns
features = cols[0:12]
labels = cols[4]
print(features)
print(labels)

<a id="32"></a> <br>
## 9- Apply Learning
In this section have been applied plenty of  ** learning algorithms** that play an important rule in your experiences and improve your knowledge in case of ML technique.

> **<< Note 3 >>** : The results shown here may be slightly different for your analysis because, for example, the neural network algorithms use random number generators for fixing the initial value of the weights (starting points) of the neural networks, which often result in obtaining slightly different (local minima) solutions each time you run the analysis. Also note that changing the seed for the random number generator used to create the train, test, and validation samples can change your results.

In [None]:

X = train.iloc[:, :-1].values
y = train.iloc[:, -1].values

# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

## Accuracy and precision
* **precision** : 

In pattern recognition, information retrieval and binary classification, precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances, 
* **recall** : 

recall is the fraction of relevant instances that have been retrieved over the total amount of relevant instances. 
* **F-score** :

the F1 score is a measure of a test's accuracy. It considers both the precision p and the recall r of the test to compute the score: p is the number of correct positive results divided by the number of all positive results returned by the classifier, and r is the number of correct positive results divided by the number of all relevant samples (all samples that should have been identified as positive). The F1 score is the harmonic average of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.
**What is the difference between accuracy and precision?
"Accuracy" and "precision" are general terms throughout science. A good way to internalize the difference are the common "bullseye diagrams". In machine learning/statistics as a whole, accuracy vs. precision is analogous to bias vs. variance.

<a id="33"></a> <br>
## 9-1 K-Nearest Neighbours
In **Machine Learning**, the **k-nearest neighbors algorithm** (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:

In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.
k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.

-----------------
<a id="54"></a> <br>
# 10- Conclusion

this kernel is not completed yet , I have tried to cover all the parts related to the process of **Quora problem** with a variety of Python packages and I know that there are still some problems then I hope to get your feedback to improve it.


you can Fork and Run this kernel on Github:
> ###### [ GitHub](https://github.com/mjbahmani/Machine-Learning-Workflow-with-Python)

--------------------------------------

 **I hope you find this kernel helpful and some <font color="red"><b>UPVOTES</b></font> would be very much appreciated** 

<a id="55"></a> <br>

-----------

# 11- References
1. [https://skymind.ai/wiki/machine-learning-workflow](https://skymind.ai/wiki/machine-learning-workflow)

1. [Problem-define](https://machinelearningmastery.com/machine-learning-in-python-step-by-step/)

1. [Sklearn](http://scikit-learn.org/)

1. [machine-learning-in-python-step-by-step](https://machinelearningmastery.com/machine-learning-in-python-step-by-step/)

1. [Data Cleaning](http://wp.sigmod.org/?p=2288)

1. [competitive data science](https://www.coursera.org/learn/competitive-data-science/)

1. [Machine Learning Certification by Stanford University (Coursera)](https://www.coursera.org/learn/machine-learning/)

1. [Machine Learning A-Z™: Hands-On Python & R In Data Science (Udemy)](https://www.udemy.com/machinelearning/)

1. [Deep Learning Certification by Andrew Ng from deeplearning.ai (Coursera)](https://www.coursera.org/specializations/deep-learning)

1. [Python for Data Science and Machine Learning Bootcamp (Udemy)](Python for Data Science and Machine Learning Bootcamp (Udemy))

1. [Mathematics for Machine Learning by Imperial College London](https://www.coursera.org/specializations/mathematics-machine-learning)

1. [Deep Learning A-Z™: Hands-On Artificial Neural Networks](https://www.udemy.com/deeplearning/)

1. [Complete Guide to TensorFlow for Deep Learning Tutorial with Python](https://www.udemy.com/complete-guide-to-tensorflow-for-deep-learning-with-python/)

1. [Data Science and Machine Learning Tutorial with Python – Hands On](https://www.udemy.com/data-science-and-machine-learning-with-python-hands-on/)

1. [Machine Learning Certification by University of Washington](https://www.coursera.org/specializations/machine-learning)

1. [Data Science and Machine Learning Bootcamp with R](https://www.udemy.com/data-science-and-machine-learning-bootcamp-with-r/)

1. [Creative Applications of Deep Learning with TensorFlow](https://www.class-central.com/course/kadenze-creative-applications-of-deep-learning-with-tensorflow-6679)

1. [Neural Networks for Machine Learning](https://www.class-central.com/mooc/398/coursera-neural-networks-for-machine-learning)

1. [Practical Deep Learning For Coders, Part 1](https://www.class-central.com/mooc/7887/practical-deep-learning-for-coders-part-1)

1. [Machine Learning](https://www.cs.ox.ac.uk/teaching/courses/2014-2015/ml/index.html)

1. [https://www.kaggle.com/ash316/eda-to-prediction-dietanic](https://www.kaggle.com/ash316/eda-to-prediction-dietanic)

1. [https://www.kaggle.com/mrisdal/exploring-survival-on-the-titanic](https://www.kaggle.com/mrisdal/exploring-survival-on-the-titanic)

1. [https://www.kaggle.com/yassineghouzam/titanic-top-4-with-ensemble-modeling](https://www.kaggle.com/yassineghouzam/titanic-top-4-with-ensemble-modeling)

1. [https://www.kaggle.com/ldfreeman3/a-data-science-framework-to-achieve-99-accuracy](https://www.kaggle.com/ldfreeman3/a-data-science-framework-to-achieve-99-accuracy)

1. [https://www.kaggle.com/startupsci/titanic-data-science-solutions](https://www.kaggle.com/startupsci/titanic-data-science-solutions)

-------------


### The kernel is not complete and will be updated soon  !!!