# Spaceship Titanic
Predict which passengers are transported to an alternate dimension

**Recommended Competition**  
We highly recommend [Titanic - Machine Learning from Disaster](https://kaggle.com/c/titanic/overview) to get familiar with the basics of machine learning and Kaggle competitions.

Welcome to the year 2912, where your data science skills are needed to solve a cosmic mystery. We've received a transmission from four lightyears away and things aren't looking good.

The _Spaceship Titanic_ was an interstellar passenger liner launched a month ago. With almost 13,000 passengers on board, the vessel set out on its maiden voyage transporting emigrants from our solar system to three newly habitable exoplanets orbiting nearby stars.

While rounding Alpha Centauri en route to its first destination—the torrid 55 Cancri E—the unwary _Spaceship Titanic_ collided with a spacetime anomaly hidden within a dust cloud. Sadly, it met a similar fate as its namesake from 1000 years before. Though the ship stayed intact, almost half of the passengers were transported to an alternate dimension!

![](https://storage.googleapis.com/kaggle-media/competitions/Spaceship%20Titanic/joel-filipe-QwoNAhbmLLo-unsplash.jpg)

To help rescue crews and retrieve the lost passengers, you are challenged to predict which passengers were transported by the anomaly using records recovered from the spaceship’s damaged computer system.

Help save them and change history!

### Acknowledgments

Photos by [Joel Filipe](https://unsplash.com/@joelfilip?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText), [Richard Gatley](https://unsplash.com/@uncle_rickie?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) and [ActionVance](https://unsplash.com/@actionvance?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on Unsplash.

Link: https://www.kaggle.com/competitions/spaceship-titanic/overview

In this competition your task is to predict whether a passenger was transported to an alternate dimension during the _Spaceship Titanic_'s collision with the spacetime anomaly. To help you make these predictions, you're given a set of personal records recovered from the ship's damaged computer system.

File and Data Field Descriptions
================================

*   **train.csv** - Personal records for about two-thirds (~8700) of the passengers, to be used as training data.
    *   `PassengerId` - A unique Id for each passenger. Each Id takes the form `gggg_pp` where `gggg` indicates a group the passenger is travelling with and `pp` is their number within the group. People in a group are often family members, but not always.
    *   `HomePlanet` - The planet the passenger departed from, typically their planet of permanent residence.
    *   `CryoSleep` - Indicates whether the passenger elected to be put into suspended animation for the duration of the voyage. Passengers in cryosleep are confined to their cabins.
    *   `Cabin` - The cabin number where the passenger is staying. Takes the form `deck/num/side`, where `side` can be either `P` for _Port_ or `S` for _Starboard_.
    *   `Destination` - The planet the passenger will be debarking to.
    *   `Age` - The age of the passenger.
    *   `VIP` - Whether the passenger has paid for special VIP service during the voyage.
    *   `RoomService`, `FoodCourt`, `ShoppingMall`, `Spa`, `VRDeck` - Amount the passenger has billed at each of the _Spaceship Titanic_'s many luxury amenities.
    *   `Name` - The first and last names of the passenger.
    *   `Transported` - Whether the passenger was transported to another dimension. This is the target, the column you are trying to predict.
*   **test.csv** - Personal records for the remaining one-third (~4300) of the passengers, to be used as test data. Your task is to predict the value of `Transported` for the passengers in this set.
*   **sample\_submission.csv** - A submission file in the correct format.
    *   `PassengerId` - Id for each passenger in the test set.
    *   `Transported` - The target. For each passenger, predict either `True` or `False`.

In [1]:
import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from catboost import CatBoostClassifier, Pool, sum_models, to_classifier
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import accuracy_score

In [2]:
%load_ext nb_black

<IPython.core.display.Javascript object>

In [3]:
train_df = pd.read_csv("../data/spaceship-titanic/train.csv").set_index("PassengerId")
train_df

Unnamed: 0_level_0,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9276_01,Europa,False,A/98/P,55 Cancri e,41.0,True,0.0,6819.0,0.0,1643.0,74.0,Gravior Noxnuther,False
9278_01,Earth,True,G/1499/S,PSO J318.5-22,18.0,False,0.0,0.0,0.0,0.0,0.0,Kurta Mondalley,False
9279_01,Earth,False,G/1500/S,TRAPPIST-1e,26.0,False,0.0,0.0,1872.0,1.0,0.0,Fayey Connon,True
9280_01,Europa,False,E/608/S,55 Cancri e,32.0,False,0.0,1049.0,0.0,353.0,3235.0,Celeon Hontichre,False


<IPython.core.display.Javascript object>

In [4]:
test_df = pd.read_csv("../data/spaceship-titanic/test.csv").set_index("PassengerId")
test_df

Unnamed: 0_level_0,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
0013_01,Earth,True,G/3/S,TRAPPIST-1e,27.0,False,0.0,0.0,0.0,0.0,0.0,Nelly Carsoning
0018_01,Earth,False,F/4/S,TRAPPIST-1e,19.0,False,0.0,9.0,0.0,2823.0,0.0,Lerome Peckers
0019_01,Europa,True,C/0/S,55 Cancri e,31.0,False,0.0,0.0,0.0,0.0,0.0,Sabih Unhearfus
0021_01,Europa,False,C/1/S,TRAPPIST-1e,38.0,False,0.0,6652.0,0.0,181.0,585.0,Meratz Caltilter
0023_01,Earth,False,F/5/S,TRAPPIST-1e,20.0,False,10.0,0.0,635.0,0.0,0.0,Brence Harperez
...,...,...,...,...,...,...,...,...,...,...,...,...
9266_02,Earth,True,G/1496/S,TRAPPIST-1e,34.0,False,0.0,0.0,0.0,0.0,0.0,Jeron Peter
9269_01,Earth,False,,TRAPPIST-1e,42.0,False,0.0,847.0,17.0,10.0,144.0,Matty Scheron
9271_01,Mars,True,D/296/P,55 Cancri e,,False,0.0,0.0,0.0,0.0,0.0,Jayrin Pore
9273_01,Europa,False,D/297/P,,,False,0.0,2680.0,0.0,0.0,523.0,Kitakan Conale


<IPython.core.display.Javascript object>

In [5]:
sample_submission_df = pd.read_csv(
    "../data/spaceship-titanic/sample_submission.csv"
).set_index("PassengerId")
sample_submission_df

Unnamed: 0_level_0,Transported
PassengerId,Unnamed: 1_level_1
0013_01,False
0018_01,False
0019_01,False
0021_01,False
0023_01,False
...,...
9266_02,False
9269_01,False
9271_01,False
9273_01,False


<IPython.core.display.Javascript object>

In [6]:
df = pd.concat([train_df, test_df])
df

Unnamed: 0_level_0,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9266_02,Earth,True,G/1496/S,TRAPPIST-1e,34.0,False,0.0,0.0,0.0,0.0,0.0,Jeron Peter,
9269_01,Earth,False,,TRAPPIST-1e,42.0,False,0.0,847.0,17.0,10.0,144.0,Matty Scheron,
9271_01,Mars,True,D/296/P,55 Cancri e,,False,0.0,0.0,0.0,0.0,0.0,Jayrin Pore,
9273_01,Europa,False,D/297/P,,,False,0.0,2680.0,0.0,0.0,523.0,Kitakan Conale,


<IPython.core.display.Javascript object>

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 12970 entries, 0001_01 to 9277_01
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   HomePlanet    12682 non-null  object 
 1   CryoSleep     12660 non-null  object 
 2   Cabin         12671 non-null  object 
 3   Destination   12696 non-null  object 
 4   Age           12700 non-null  float64
 5   VIP           12674 non-null  object 
 6   RoomService   12707 non-null  float64
 7   FoodCourt     12681 non-null  float64
 8   ShoppingMall  12664 non-null  float64
 9   Spa           12686 non-null  float64
 10  VRDeck        12702 non-null  float64
 11  Name          12676 non-null  object 
 12  Transported   8693 non-null   object 
dtypes: float64(6), object(7)
memory usage: 1.4+ MB


<IPython.core.display.Javascript object>

In [8]:
df.isna().sum() / len(df)

HomePlanet      0.022205
CryoSleep       0.023901
Cabin           0.023053
Destination     0.021126
Age             0.020817
VIP             0.022822
RoomService     0.020278
FoodCourt       0.022282
ShoppingMall    0.023593
Spa             0.021897
VRDeck          0.020663
Name            0.022668
Transported     0.329761
dtype: float64

<IPython.core.display.Javascript object>

In [9]:
le = preprocessing.LabelEncoder()

<IPython.core.display.Javascript object>

# Подготовка данных

## HomePlanet

In [10]:
df["HomePlanet"].value_counts(dropna=False, normalize=True)

Earth     0.529298
Europa    0.241557
Mars      0.206939
NaN       0.022205
Name: HomePlanet, dtype: float64

<IPython.core.display.Javascript object>

In [11]:
df[df["HomePlanet"].isna()]

Unnamed: 0_level_0,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0064_02,,True,E/3/S,TRAPPIST-1e,33.0,False,0.0,0.0,,0.0,0.0,Colatz Keen,True
0119_01,,False,A/0/P,TRAPPIST-1e,39.0,False,0.0,2344.0,0.0,65.0,6898.0,Batan Coning,False
0210_01,,True,D/6/P,55 Cancri e,24.0,False,0.0,0.0,,0.0,0.0,Arraid Inicont,True
0242_01,,False,F/46/S,TRAPPIST-1e,18.0,False,313.0,1.0,691.0,283.0,0.0,Almone Sté,False
0251_01,,True,C/11/S,55 Cancri e,54.0,False,0.0,0.0,0.0,0.0,0.0,Diphah Amsive,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...
8621_01,,False,E/552/P,TRAPPIST-1e,19.0,False,4.0,0.0,1604.0,0.0,0.0,Vanley Simmonders,
8678_01,,True,G/1399/S,55 Cancri e,9.0,False,0.0,0.0,0.0,0.0,0.0,Eilan Kellson,
8775_01,,True,D/275/P,TRAPPIST-1e,40.0,False,0.0,0.0,0.0,0.0,0.0,Raston Maltorted,
9025_01,,False,G/1454/S,TRAPPIST-1e,42.0,False,0.0,0.0,28.0,726.0,0.0,Ale Whitersone,


<IPython.core.display.Javascript object>

In [12]:
df["HomePlanet"].fillna("Earth", inplace=True)

<IPython.core.display.Javascript object>

In [13]:
df["HomePlanet"] = le.fit_transform(df["HomePlanet"])

<IPython.core.display.Javascript object>

## CryoSleep

In [14]:
df["CryoSleep"].value_counts(dropna=False, normalize=True)

False    0.622899
True     0.353200
NaN      0.023901
Name: CryoSleep, dtype: float64

<IPython.core.display.Javascript object>

In [15]:
df["CryoSleep"].fillna(False, inplace=True)

<IPython.core.display.Javascript object>

In [16]:
df["CryoSleep"] = df["CryoSleep"].astype(int)

<IPython.core.display.Javascript object>

## Cabin

In [17]:
df[["Cabin_Desk", "Cabin_Num", "Cabin_Side"]] = df["Cabin"].str.split("/", expand=True)

<IPython.core.display.Javascript object>

In [18]:
df["Cabin_Desk"].value_counts(dropna=False, normalize=True)

F      0.326831
G      0.291519
E      0.102005
B      0.087972
C      0.084965
D      0.055513
A      0.027294
NaN    0.023053
T      0.000848
Name: Cabin_Desk, dtype: float64

<IPython.core.display.Javascript object>

In [19]:
df["Cabin_Desk"] = le.fit_transform(df["Cabin_Desk"])

<IPython.core.display.Javascript object>

In [20]:
df["Cabin_Num"] = df["Cabin_Num"].fillna(0).astype(int)

<IPython.core.display.Javascript object>

In [21]:
df["Cabin_Side"].value_counts(dropna=False, normalize=True)

S      0.491981
P      0.484965
NaN    0.023053
Name: Cabin_Side, dtype: float64

<IPython.core.display.Javascript object>

In [22]:
df["Cabin_Side"] = le.fit_transform(df["Cabin_Side"])

<IPython.core.display.Javascript object>

In [23]:
df.drop("Cabin", axis=1, inplace=True)

<IPython.core.display.Javascript object>

## Destination

In [24]:
df["Destination"].value_counts(dropna=False, normalize=True)

TRAPPIST-1e      0.683963
55 Cancri e      0.203624
PSO J318.5-22    0.091288
NaN              0.021126
Name: Destination, dtype: float64

<IPython.core.display.Javascript object>

In [25]:
df["Destination"] = le.fit_transform(df["Destination"])

<IPython.core.display.Javascript object>

## Age

In [26]:
df["Age"].describe()

count    12700.000000
mean        28.771969
std         14.387261
min          0.000000
25%         19.000000
50%         27.000000
75%         38.000000
max         79.000000
Name: Age, dtype: float64

<IPython.core.display.Javascript object>

In [27]:
df["Age"].fillna(0, inplace=True)

<IPython.core.display.Javascript object>

## VIP

In [28]:
df["VIP"].value_counts(dropna=False, normalize=True)

False    0.956130
NaN      0.022822
True     0.021049
Name: VIP, dtype: float64

<IPython.core.display.Javascript object>

In [29]:
df["VIP"] = le.fit_transform(df["VIP"])

<IPython.core.display.Javascript object>

## RoomService

In [30]:
df["RoomService"].describe()

count    12707.000000
mean       222.897852
std        647.596664
min          0.000000
25%          0.000000
50%          0.000000
75%         49.000000
max      14327.000000
Name: RoomService, dtype: float64

<IPython.core.display.Javascript object>

In [31]:
df["RoomService"].fillna(0, inplace=True)

<IPython.core.display.Javascript object>

## FoodCourt

In [32]:
df["FoodCourt"].describe()

count    12681.000000
mean       451.961675
std       1584.370747
min          0.000000
25%          0.000000
50%          0.000000
75%         77.000000
max      29813.000000
Name: FoodCourt, dtype: float64

<IPython.core.display.Javascript object>

In [33]:
df["FoodCourt"].fillna(0, inplace=True)

<IPython.core.display.Javascript object>

## ShoppingMall

In [34]:
df["ShoppingMall"].describe()

count    12664.000000
mean       174.906033
std        590.558690
min          0.000000
25%          0.000000
50%          0.000000
75%         29.000000
max      23492.000000
Name: ShoppingMall, dtype: float64

<IPython.core.display.Javascript object>

In [35]:
df["ShoppingMall"].fillna(0, inplace=True)

<IPython.core.display.Javascript object>

## Spa

In [36]:
df["Spa"].describe()

count    12686.000000
mean       308.476904
std       1130.279641
min          0.000000
25%          0.000000
50%          0.000000
75%         57.000000
max      22408.000000
Name: Spa, dtype: float64

<IPython.core.display.Javascript object>

In [37]:
df["Spa"].fillna(0, inplace=True)

<IPython.core.display.Javascript object>

## VRDeck

In [38]:
df["VRDeck"].describe()

count    12702.000000
mean       306.789482
std       1180.097223
min          0.000000
25%          0.000000
50%          0.000000
75%         42.000000
max      24133.000000
Name: VRDeck, dtype: float64

<IPython.core.display.Javascript object>

In [39]:
df["VRDeck"].fillna(0, inplace=True)

<IPython.core.display.Javascript object>

## Name

In [40]:
df[["First_Name", "Last_Name"]] = df["Name"].str.split(" ", expand=True)

<IPython.core.display.Javascript object>

In [41]:
df["First_Name"].value_counts()

Luise     16
Idace     16
Ale       15
Thel      14
Kaye      14
          ..
Smark      1
Mureah     1
Grey       1
Pix        1
Perit      1
Name: First_Name, Length: 2883, dtype: int64

<IPython.core.display.Javascript object>

In [42]:
df["First_Name"] = le.fit_transform(df["First_Name"])

<IPython.core.display.Javascript object>

In [43]:
df["Last_Name"].value_counts()

Buckentry      19
Belley         19
Hinglendez     18
Fowlesterez    18
Casonston      18
               ..
Cabraseed       1
Miste           1
Imotive         1
Gepie           1
Replic          1
Name: Last_Name, Length: 2406, dtype: int64

<IPython.core.display.Javascript object>

In [44]:
df["Last_Name"] = le.fit_transform(df["Last_Name"])

<IPython.core.display.Javascript object>

In [45]:
df.drop("Name", axis=1, inplace=True)

<IPython.core.display.Javascript object>

# Подготовка выборок

In [46]:
X = df.drop("Transported", axis=1)
y = df[["Transported"]].fillna(False).astype(int)

X.shape, y.shape

((12970, 15), (12970, 1))

<IPython.core.display.Javascript object>

In [47]:
X_test = X[X.index.isin(test_df.index)]

X = X[X.index.isin(train_df.index)]
y = y[y.index.isin(train_df.index)]

X.shape, y.shape, X_test.shape

((8693, 15), (8693, 1), (4277, 15))

<IPython.core.display.Javascript object>

In [48]:
X_train, X_true, y_train, y_true = train_test_split(
    X, y, test_size=0.07, random_state=42
)
X_train.shape, X_true.shape, y_train.shape, y_true.shape

((8084, 15), (609, 15), (8084, 1), (609, 1))

<IPython.core.display.Javascript object>

## Подбор гиперпараметров

In [49]:
model = CatBoostClassifier(logging_level="Silent")

# https://effectiveml.com/using-grid-search-to-optimise-catboost-parameters.html
grid_params = {
    "depth": [3, 1, 2, 6, 4, 5, 7, 8, 9, 10],
    "iterations": [250, 100, 500, 1000],
    "learning_rate": [0.03, 0.001, 0.01, 0.1, 0.2, 0.3],
    "l2_leaf_reg": [3, 1, 5, 10, 100],
}

grid_search_result = model.grid_search(
    grid_params, Pool(X, y), cv=3, verbose=False, plot=True
)

MetricVisualizer(layout=Layout(align_self='stretch', height='500px'))

<IPython.core.display.Javascript object>

In [50]:
best_model_params = grid_search_result["params"]
best_model_params

{'depth': 6, 'l2_leaf_reg': 3, 'iterations': 500, 'learning_rate': 0.1}

<IPython.core.display.Javascript object>

## Обучение

In [51]:
skf = StratifiedKFold(n_splits=5)

<IPython.core.display.Javascript object>

In [52]:
ensemble = []

for train_index, val_index in skf.split(X, y):
    X_sub_train, X_sub_valid = X.iloc[train_index], X.iloc[val_index]
    y_sub_train, y_sub_valid = y.iloc[train_index], y.iloc[val_index]

    train_pool = Pool(X_sub_train, y_sub_train)
    valid_pool = Pool(X_sub_valid, y_sub_valid)

    model = CatBoostClassifier(**best_model_params)
    model.fit(train_pool, eval_set=valid_pool, verbose=False)

    ensemble.append(model)
    print(model.get_best_score())

{'learn': {'Logloss': 0.1787927612977998}, 'validation': {'Logloss': 0.4397337077553385}}
{'learn': {'Logloss': 0.18679608448173995}, 'validation': {'Logloss': 0.4231097509251857}}
{'learn': {'Logloss': 0.1933121723327204}, 'validation': {'Logloss': 0.4095773765990272}}
{'learn': {'Logloss': 0.2020050282650763}, 'validation': {'Logloss': 0.366081798668969}}
{'learn': {'Logloss': 0.18639086565988378}, 'validation': {'Logloss': 0.40688985358670043}}


<IPython.core.display.Javascript object>

In [53]:
models_avrg = sum_models(ensemble, weights=[1.0 / len(ensemble)] * len(ensemble))
models_avrg = to_classifier(models_avrg)

<IPython.core.display.Javascript object>

# Проверка и сохранение результатов

In [54]:
y_pred = models_avrg.predict(X_true)
y_pred

array([0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
       1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1,
       0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1,
       1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0,
       0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0,
       1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1,
       0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1,
       1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0,
       0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0,
       1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0,
       0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0,
       0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1,

<IPython.core.display.Javascript object>

In [55]:
accuracy_score(y_true, y_pred)

0.8045977011494253

<IPython.core.display.Javascript object>

In [56]:
submission = pd.DataFrame(
    {
        "PassengerId": X_test.index,
        "Transported": models_avrg.predict(X_test).astype(bool),
    }
).set_index("PassengerId")
submission

Unnamed: 0_level_0,Transported
PassengerId,Unnamed: 1_level_1
0013_01,True
0018_01,False
0019_01,True
0021_01,True
0023_01,True
...,...
9266_02,True
9269_01,False
9271_01,True
9273_01,True


<IPython.core.display.Javascript object>

In [57]:
submission.to_csv("../data/spaceship-titanic/submission.csv")

<IPython.core.display.Javascript object>