# Get the Data

The data will be downloaded using [Kaggle's API](https://github.com/Kaggle/kaggle-api#api-credentials). For your own use, you will need to create a API key in **Account Settings**. On windows, you shoudl create the folder `.kaggle` inside your user and add the file `kaggle.json` donwloaded from the API.

After this, you can install and use the API.

In [1]:
!pip install kaggle



We can list all competitions.

In [2]:
!kaggle competitions list

502 - Bad Gateway


Check our especific competition data, in this case: **titanic**.

In [3]:
!kaggle competitions files -c titanic

name                   size  creationDate         
---------------------  ----  -------------------  
train.csv              60KB  2013-06-28 13:40:25  
test.csv               28KB  2013-06-28 13:40:24  
gender_submission.csv   3KB  2017-02-01 01:49:18  


Finally, download the files to `data` folder.

In [4]:
!kaggle competitions download -c titanic

Downloading train.csv to D:\ARQUIVOS PESSOAIS\GitHub\TitaniK\.ipynb_checkpoints

Downloading test.csv to D:\ARQUIVOS PESSOAIS\GitHub\TitaniK\.ipynb_checkpoints

Downloading gender_submission.csv to D:\ARQUIVOS PESSOAIS\GitHub\TitaniK\.ipynb_checkpoints




  0%|          | 0.00/59.8k [00:00<?, ?B/s]
100%|##########| 59.8k/59.8k [00:00<00:00, 426kB/s]

  0%|          | 0.00/28.0k [00:00<?, ?B/s]
100%|##########| 28.0k/28.0k [00:00<00:00, 1.02MB/s]

  0%|          | 0.00/3.18k [00:00<?, ?B/s]
100%|##########| 3.18k/3.18k [00:00<00:00, 136kB/s]


# Visualizing the Data




In this guided project, we're going to put together all that we've learned in this course and create a data science workflow.

By defining a workflow for yourself, you can give yourself a framework with which to make iterating on ideas quicker and easier, allowing yourself to work more efficiently.

In this mission, we're going to explore a workflow to make competing in the Kaggle Titanic competition easier, using a pipeline of functions to reduce the number of dimensions you need to focus on.

To get started, we'll read in the original **train.csv** and **test.csv** files from Kaggle.



In [11]:
import pandas as pd

train = pd.read_csv("train.csv")
holdout = pd.read_csv("test.csv")

In [12]:
train.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [13]:
holdout.columns

Index(['PassengerId', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch',
       'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [14]:
survived = train["Survived"]
train = train.drop("Survived", axis=1)

In [15]:
holdout.shape

(418, 11)

In [16]:
train.shape

(891, 11)

In [17]:
## concatenate all data to guarantee that dataset have the same columns
all_data = pd.concat([train, holdout], axis=0)

In [18]:
all_data.shape

(1309, 11)

In [19]:
all_data.dtypes

PassengerId      int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object

In [20]:
all_data.head()

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


# Preprocesing the Data

In [21]:
from sklearn.base import BaseEstimator, TransformerMixin

class DataFiller(BaseEstimator, TransformerMixin):
    """
    Applies data filling to NaN values in selected features.
    
    > cols_filler: dictionary with columns and filling values / strategy
    e.g. {"A": 0.5, "B": -2, "C": 'a', "D": 'mean'}
    """
    def __init__(self, cols_filler):
        """
        Inital input for the transformer.
        """
        self.cols_filler = cols_filler
        pass
    
    def fit(self, X):
        return self
    
    def transform(self, X):
        """
        Where the filling occurs.
        """
        for k, v in self.cols_filler.items():
            # Filling strategy
            if v == 'mean':
                filler = X[k].mean()
            elif v == 'median':
                filler = X[k].median()
            else:
                filler = v
            
            X[k] = X[k].fillna(filler)
            
        return X

class DataOneHotEncoding(BaseEstimator, TransformerMixin):
    """
    Applies One Hot Encoding to selected features.
    
    > cols: list of columns to perform one hot encoding.
    e.g. ["A", "B", "C"]
    """
    def __init__(self, cols, inplace=True):
        """
        Inital input for the transformer.
        """
        self.cols = cols
        self.inplace = inplace
    
    def fit(self):
        return self
    
    def transform(self, X):
        """
        Where the encoding occurs.
        """
        for col in self.cols:
            # Get dummies columns
            dummies = pd.get_dummies(X[col], prefix=col)
            # Join with data X
            X = pd.concat([X, dummies], axis=1)
        # Remove old columns
        if self.inplace:
            print(X.shape)
            X = X.drop(self.cols, axis=1)
            print(X.shape)
            
        return X

class DataBinning(BaseEstimator, TransformerMixin):
    """
    Applies binnig to selected features.
    
    > dict_of_cols: dictionary of dictionaries with cut_points and labels for each column.
    e.g. {"A":{'cut_points':[1,2,3], 'labels':['a', 'b']}, "B": {...}, ...}
    """
    def __init__(self, dict_of_cols, inplace=True):
        """
        Inital input for the transformer.
        """
        self.dict_cols = dict_of_cols
        self.inplace = inplace
        pass
    
    def fit(self):
        return self
    
    def transform(self, X):
        """
        Where the binning occurs.
        """
        for k, v in self.dict_cols.items():
            # Cut points data
            cut_points = v['cut_points']
            
            # Labels data
            label_names = v['labels']
            
            # Creates new columns inplace
            if self.inplace:
                X[k] = pd.cut(X[k], cut_points, labels=label_names)
            else:
                k = k + '_binned'
                X[k] = pd.cut(X[k], cut_points, labels=label_names)
            
            # Set dtype to categorical
            X[k] = X[k].astype('category')
            
        return X
    
class DataProcess(BaseEstimator, TransformerMixin):
    """
    Applies application-specific process to selected features.
    
    """
    def __init__(self):
        pass
    
    def fit(self):
        return self
    
    def transform(self, X):
        """
        Where the processing occurs.
        """
        # Process Tickets column
        ticket_cod = []
        ticket_number = []
        for index, ticket in X.Ticket.iteritems():
            if not ticket.isdigit():
                # Take prefix
                split = ticket.replace(".","").replace("/","").strip().split(' ')
                ticket_cod.append(split[0])
                # Take ticket number
                try:
                    ticket_number.append(int(split[1]))
                except:
                    ticket_number.append(-1)
            else:
                ticket_cod.append("X")
                try:
                    ticket_number.append(int(ticket))
                except:
                    ticket_number.append(-1)
        X["Ticket_cod"] = ticket_cod
        X["Ticket_number"] = ticket_number
        X = X.drop('Ticket',axis=1)
        
        # Process titles
        titles = {"Mr" :         "Mr",
                  "Mme":         "Mrs",
                  "Ms":          "Mrs",
                  "Mrs" :        "Mrs",
                  "Master" :     "Master",
                  "Mlle":        "Miss",
                  "Miss" :       "Miss",
                  "Capt":        "Officer",
                  "Col":         "Officer",
                  "Major":       "Officer",
                  "Dr":          "Officer",
                  "Rev":         "Officer",
                  "Jonkheer":    "Royalty",
                  "Don":         "Royalty",
                  "Sir" :        "Royalty",
                  "Countess":    "Royalty",
                  "Dona":        "Royalty",
                  "Lady" :       "Royalty"}
        extracted_titles = X["Name"].str.extract(' ([A-Za-z]+)\.', expand=False)
        X["Title"] = extracted_titles.map(titles)
        X = X.drop('Name',axis=1)
        
        # Process Cabin
        cabin_cod = []
        cabin_number = []
        for index, cabin in X.Cabin.iteritems():
            if isinstance(cabin, str):
                # Take prefix
                split = cabin.strip().split(' ')[-1]
                cabin_cod.append(split[0])
                # Cabin number
                try:
                    cabin_number.append(int(split[1:]))
                except:
                    cabin_number.append(-1)
            else:
                cabin_cod.append('Unknown') 
                cabin_number.append(-1)
        X["Cabin_type"] = cabin_cod
        X["Cabin_number"] = cabin_number
        X = X.drop('Cabin',axis=1)
        
        # Is alone
        X["Family_size"] = X[["SibSp","Parch"]].sum(axis=1)
        X["Alone"] = (X["Family_size"] == 0)
        
        # Is male
        X["Male"] = X["Sex"] == 'male'
        X = X.drop("Sex", axis=1)
        
        return X

# Feature Engineering / Cleaning and Preprocessing

In [22]:
# input dictionaries
dict_fill = { "Fare": "median",
              "Embarked": "S",
              "Age": -0.5
            }
dict_binning = {"Age": {"cut_points": [-5, 0, 5, 12, 18, 35, 60, 100],
                         "labels": ["Missing", "Infant", "Child", "Teenager",
                                    "Young Adult", "Adult", "Senior"]},
                "Fare": {"cut_points": [-5, 12, 50, 100, 1000],
                         "labels": ["0-12","12-50","50-100","100+"]},
                "Cabin_number": {"cut_points": [-5, 0, 50, 100, 150, 200, 250, 300, 1000],
                                 "labels": ["Unknown", "0-50", "50-100",
                                            "100-150", "150-200", "200-250", "250-300", "300+"]},
                "Ticket_number": {"cut_points": [-5, 0, 2000, 10000, 50000,
                                                 250000, 500000, 10000000],
                                  "labels": ["Unknown", "0-2k", "2k-10k", "10k-50k",
                                             "50k-250k", "250k-500k", "500k+"]}
               }
one_hot_cols = ["Age", "Fare", "Embarked", 
                "Ticket_cod", "Ticket_number", "Title",
                "Cabin_type", "Cabin_number", "Pclass"
]   

# Pipeline definition
from sklearn.pipeline import Pipeline

pipeline = Pipeline([('filling', DataFiller(dict_fill)),
                     ('processing', DataProcess()),
                     ('binnig', DataBinning(dict_binning)),
                     ('one_hot_encoding', DataOneHotEncoding(one_hot_cols))
])

transformed_data = pipeline.transform(all_data)

(1309, 99)
(1309, 90)


In [23]:
print(transformed_data.columns)
transformed_data


Index(['PassengerId', 'SibSp', 'Parch', 'Family_size', 'Alone', 'Male',
       'Age_Missing', 'Age_Infant', 'Age_Child', 'Age_Teenager',
       'Age_Young Adult', 'Age_Adult', 'Age_Senior', 'Fare_0-12', 'Fare_12-50',
       'Fare_50-100', 'Fare_100+', 'Embarked_C', 'Embarked_Q', 'Embarked_S',
       'Ticket_cod_A', 'Ticket_cod_A4', 'Ticket_cod_A5', 'Ticket_cod_AQ3',
       'Ticket_cod_AQ4', 'Ticket_cod_AS', 'Ticket_cod_C', 'Ticket_cod_CA',
       'Ticket_cod_CASOTON', 'Ticket_cod_FC', 'Ticket_cod_FCC',
       'Ticket_cod_Fa', 'Ticket_cod_LINE', 'Ticket_cod_LP', 'Ticket_cod_PC',
       'Ticket_cod_PP', 'Ticket_cod_PPP', 'Ticket_cod_SC', 'Ticket_cod_SCA3',
       'Ticket_cod_SCA4', 'Ticket_cod_SCAH', 'Ticket_cod_SCOW',
       'Ticket_cod_SCPARIS', 'Ticket_cod_SCParis', 'Ticket_cod_SOC',
       'Ticket_cod_SOP', 'Ticket_cod_SOPP', 'Ticket_cod_SOTONO2',
       'Ticket_cod_SOTONOQ', 'Ticket_cod_SP', 'Ticket_cod_STONO',
       'Ticket_cod_STONO2', 'Ticket_cod_STONOQ', 'Ticket_cod_SWPP',
    

Unnamed: 0,PassengerId,SibSp,Parch,Family_size,Alone,Male,Age_Missing,Age_Infant,Age_Child,Age_Teenager,...,Cabin_number_0-50,Cabin_number_50-100,Cabin_number_100-150,Cabin_number_150-200,Cabin_number_200-250,Cabin_number_250-300,Cabin_number_300+,Pclass_1,Pclass_2,Pclass_3
0,1,1,0,1,False,True,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,2,1,0,1,False,False,0,0,0,0,...,0,1,0,0,0,0,0,1,0,0
2,3,0,0,0,True,False,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
3,4,1,0,1,False,False,0,0,0,0,...,0,0,1,0,0,0,0,1,0,0
4,5,0,0,0,True,True,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
5,6,0,0,0,True,True,1,0,0,0,...,0,0,0,0,0,0,0,0,0,1
6,7,0,0,0,True,True,0,0,0,0,...,1,0,0,0,0,0,0,1,0,0
7,8,3,1,4,False,True,0,1,0,0,...,0,0,0,0,0,0,0,0,0,1
8,9,0,2,2,False,False,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
9,10,1,0,1,False,False,0,0,0,1,...,0,0,0,0,0,0,0,0,1,0


In [24]:
# Check if all columns are numeric, if only index are displayed
transformed_data.select_dtypes(['object', 'category']).head()


0
1
2
3
4


# Transformed Data

In [25]:
# Train / hold out data
train = transformed_data.iloc[:891]
holdout = transformed_data.iloc[891:]

# X, y split
X = train
y = survived

# Find Best Models with GridSearchCV

In [26]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.feature_selection import RFECV
from sklearn.svm import SVC

def find_best_model(X, y, list_features='all'):
    
    all_X = X
    all_y = y

    # List of dictionaries, each containing a model name,
    # it's estimator and a dict of hyperparameters
    models = [
        {
            "name": "LogisticRegression",
            "estimator": LogisticRegression(),
            "hyperparameters":
                {
                    "solver": ["newton-cg", "lbfgs", "liblinear"]
                }
        },
        {
            "name": "KNeighborsClassifier",
            "estimator": KNeighborsClassifier(),
            "hyperparameters":
                {
                    "n_neighbors": range(1, 50, 2),
                    "weights": ["distance", "uniform"],
                    "algorithm": ["ball_tree", "kd_tree", "brute"],
                    "p": [1, 2]
                }
        },
        {
            "name": "RandomForestClassifier",
            "estimator": RandomForestClassifier(random_state=1, n_estimators=100),
            "hyperparameters":
                {
                    "n_estimators": [100, 200],
                    "criterion": ["entropy", "gini"],
                    "max_depth": [10, 20],
                    "max_features": ["log2", "sqrt"],
                    "min_samples_leaf": [1, 2],
                    "min_samples_split": [2]
                }
        }
#         },
#         {
#             "name":"SVC",
#             "estimator":SVC(),
#             "hyperparameters":
#                 {
#                   "kernel": ['rbf', 'linear'],  
#                   "C": [0.1, 1],
#                   "gamma": [0.01, 0.1]
#                 }
#         },
#         {
#             "name": "PassiveAgressiveC",
#             "estimator": PassiveAggressiveClassifier(random_state=1),
#             "hyperparameters":
#                 {
#                     "C": [.5, 1, 1.5],
#                     "warm_start": [True, False]
#                 }
#         },
#         {
#             "name": "GaussianProcess",
#                 "estimator": GaussianProcessClassifier(),
#                 "hyperparameters":
#                 {
#                     "n_restarts_optimizer": [0, 1, 2],
#                     "warm_start": [True, False]
#                 }
#         }
    ]
    counter = 0
    for model in models:
        # Without feature selection
        if list_features == 'all':
            features = X.columns
        else:
            features = list_features[counter]
            counter += 1
        
        # Train multiple versions of the models
        grid = GridSearchCV(model["estimator"],
                            param_grid=model["hyperparameters"],
                            cv=3,
                            n_jobs=-1)
        grid.fit(all_X[features], all_y)
        
        # Saves the best results
        model["best_features"] = features
        model["best_params"] = grid.best_params_
        model["best_score"] = grid.best_score_
        model["best_model"] = grid.best_estimator_
        
        # Show best results
        print(model['name'])
        print('-'*len(model['name']))
        print("Number of Features: {}\n".format(len(features)))
        print("Best Score: {}\n".format(model["best_score"]))
        print("Best Parameters: {}\n".format(model["best_params"]))

    return models

best_models = find_best_model(X, y, 'all')


LogisticRegression
------------------
Number of Features: 90

Best Score: 0.813692480359147

Best Parameters: {'solver': 'newton-cg'}

KNeighborsClassifier
--------------------
Number of Features: 90

Best Score: 0.5925925925925926

Best Parameters: {'algorithm': 'ball_tree', 'n_neighbors': 37, 'p': 1, 'weights': 'uniform'}

RandomForestClassifier
----------------------
Number of Features: 90

Best Score: 0.8271604938271605

Best Parameters: {'criterion': 'entropy', 'max_depth': 10, 'max_features': 'log2', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 100}



# Find Best Features

In [27]:
def find_best_features(X, y, models):
    """
    Find the best features using REFCV for each model.
    """
    features_list = []
    for model in models:
        # Find best features for each model's settings
        try:
            selector = RFECV(model["estimator"], cv=3, n_jobs=-1)
            selector.fit(X, y)
            features = list(X.columns[selector.support_])
        except RuntimeError:
            features = X.columns
        
        # Saves results
        model["best_features"] = features
        features_list.append(features)
        
    return models, features_list

best_features, feat_list = find_best_features(X, y, best_models)

best_models = find_best_model(X, y, feat_list)


LogisticRegression
------------------
Number of Features: 90

Best Score: 0.813692480359147

Best Parameters: {'solver': 'newton-cg'}

KNeighborsClassifier
--------------------
Number of Features: 90

Best Score: 0.5925925925925926

Best Parameters: {'algorithm': 'ball_tree', 'n_neighbors': 37, 'p': 1, 'weights': 'uniform'}

RandomForestClassifier
----------------------
Number of Features: 32

Best Score: 0.8428731762065096

Best Parameters: {'criterion': 'entropy', 'max_depth': 10, 'max_features': 'log2', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 200}



# Find Best Hyperparameters with RandomizedSearchCV

In [28]:
# For the top 3 previous estimators RandomForest, LogisiticRegression and SVC
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform

def train_multiple_models_random(X, y, selected_features, n_iter):
    
    all_X = X
    all_y = y

    # List of dictionaries, each containing a model name,
    # it's estimator and a dict of hyperparameters
    new_models = [
        {
            "name": "LogisticRegression",
            "estimator": LogisticRegression(n_jobs=-1),
            "hyperparameters":
                {
                    "solver": ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],
                    "class_weight": ['balanced', None],
                    "C": uniform(.1, 2),
                    "warm_start":[True, False],
                    "max_iter": randint(300, 10000),
                }
        },
        {
            "name": "KNeighborsClassifier",
            "estimator": KNeighborsClassifier(n_jobs=-1),
            "hyperparameters":
                {
                    "n_neighbors": randint(2, 100),
                    "weights": ["distance", "uniform"],
                    "algorithm": ["ball_tree", "kd_tree", "brute"],
                    "p": [1, 2],
                    "leaf_size": randint(10, 50)
                }
        },
        {
            "name": "RandomForestClassifier",
            "estimator": RandomForestClassifier(random_state=1),
            "hyperparameters":
                {
                  "max_depth": randint(3, 80),
                  "n_estimators": randint(200, 500),
                  "min_samples_split": randint(2, 40),
                  "min_samples_leaf": randint(1, 40),
                  "bootstrap": [True, False],
                  "max_features": ["log2", "sqrt"],
                }
        }
#        },
#         {
#             "name":"SVC",
#             "estimator":SVC(),
#             "hyperparameters":
#                 {
#                   "kernel": ['rbf', 'linear', 'poly', 'sigmoid'],  
#                   "degree": randint(2, 5),
#                   "coef0": uniform(-3., 3.),
#                   "C": [0.001, 0.01, .1, 1],
#                   "gamma": [0.001, 0.01, .1, 1]
#                 }
#         }
    ]

    for model in new_models:
        # Get features from previous training
        features = selected_features[model['name']]
        
        # Train multiple versions of the models
        randsearch = RandomizedSearchCV(model["estimator"],
                                  param_distributions =model["hyperparameters"],
                                  n_iter=n_iter[model['name']],
                                  cv=3,
                                  n_jobs=-1,
                                  scoring='accuracy'
                                 )
        randsearch.fit(all_X[features], all_y)
        
        # Saves the best results
        model["best_features"] = features
        model["best_params"] = randsearch.best_params_
        model["best_score"] = randsearch.best_score_
        model["best_model"] = randsearch.best_estimator_
        
        # Show best results
        print(model['name'])
        print('-'*len(model['name']))
        print("Number of Features: {}\n".format(len(features)))
        print("Best Score: {}\n".format(model["best_score"]))
        print("Best Parameters: {}\n".format(model["best_params"]))

    return new_models
n_iter = {"LogisticRegression": 300,
          "KNeighborsClassifier": 150,
          "RandomForestClassifier": 150}

selected_features = {model["name"]:model["best_features"] for model in best_features}
best_models_random = train_multiple_models_random(X, y, selected_features, n_iter)


LogisticRegression
------------------
Number of Features: 90

Best Score: 0.8170594837261503

Best Parameters: {'C': 0.7604402513801988, 'class_weight': None, 'max_iter': 5985, 'solver': 'lbfgs', 'warm_start': True}

KNeighborsClassifier
--------------------
Number of Features: 90

Best Score: 0.6195286195286195

Best Parameters: {'algorithm': 'brute', 'leaf_size': 18, 'n_neighbors': 63, 'p': 1, 'weights': 'uniform'}

RandomForestClassifier
----------------------
Number of Features: 32

Best Score: 0.8406285072951739

Best Parameters: {'bootstrap': False, 'max_depth': 67, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 7, 'n_estimators': 246}



In [29]:
def save_submission_file(holdout, model, filename):
    """
    Saves output for best models.
    """
    # Predictions for given input
    predictions = model["best_model"].predict(holdout[model['best_features']])
    
    holdout_ids = holdout["PassengerId"]
    submission_df = {"PassengerId": holdout_ids,
                     "Survived": predictions}
    
    submission = pd.DataFrame(submission_df)
    submission.to_csv(filename, index=False)
    
def get_best_model(list_models, top_k=1):
    """
    Returns top_k best models from grid and random search.
    
    > list_models: The list of models
    > top_k: the number of best models to be exported
    
    < list_estimator: List with the top_k estimators
    """
    return sorted(list_models, key=lambda k: k['best_score'], reverse=True) 
    
# Models
model_grid = get_best_model(best_models)
model_rand = get_best_model(best_models_random)

save_submission_file(holdout, model_grid[0], "submission_15.csv")
save_submission_file(holdout, model_rand[0], "submission_16.csv")


# Submissions

In [30]:
!kaggle competitions submit titanic -f submission_15.csv -m "Random Forest"

Successfully submitted to Titanic: Machine Learning from Disaster



  0%|          | 0.00/3.18k [00:00<?, ?B/s]
100%|##########| 3.18k/3.18k [00:00<00:00, 10.9kB/s]


In [31]:
!kaggle competitions submit titanic -f submission_16.csv -m "Logistic Regression"

Successfully submitted to Titanic: Machine Learning from Disaster



  0%|          | 0.00/3.18k [00:00<?, ?B/s]
100%|##########| 3.18k/3.18k [00:00<00:00, 14.2kB/s]
