# Spaceship Titanic

### Predict which passengers are transported to an alternate dimension

Welcome to the year 2912, where your data science skills are needed to solve a cosmic mystery. We've received a transmission from four lightyears away and things aren't looking good.

The Spaceship Titanic was an interstellar passenger liner launched a month ago. With almost 13,000 passengers on board, the vessel set out on its maiden voyage transporting emigrants from our solar system to three newly habitable exoplanets orbiting nearby stars.

While rounding Alpha Centauri en route to its first destination—the torrid 55 Cancri E—the unwary Spaceship Titanic collided with a spacetime anomaly hidden within a dust cloud. Sadly, it met a similar fate as its namesake from 1000 years before. Though the ship stayed intact, almost half of the passengers were transported to an alternate dimension!

![](/Users/amith/Desktop/aaaa.jpeg "optional-title")

To help rescue crews and retrieve the lost passengers, you are challenged to predict which passengers were transported by the anomaly using records recovered from the spaceship’s damaged computer system.

Help save them and change history!

## Dataset Description

In this competition your task is to predict whether a passenger was transported to an alternate dimension during the Spaceship Titanic's collision with the spacetime anomaly. To help you make these predictions, you're given a set of personal records recovered from the ship's damaged computer system.

## File and Data Field Descriptions

**train.csv** - Personal records for about two-thirds (~8700) of the passengers, to be used as training data.

**PassengerId** - A unique Id for each passenger. Each Id takes the form gggg_pp where gggg indicates a group the passenger is travelling with and pp is their number within the group. People in a group are often family members, but not always.

**HomePlanet** - The planet the passenger departed from, typically their planet of permanent residence.

**CryoSleep** - Indicates whether the passenger elected to be put into suspended animation for the duration of the voyage. Passengers in cryosleep are confined to their cabins.

**Cabin** - The cabin number where the passenger is staying. Takes the form deck/num/side, where side can be either P for Port or S for Starboard.

**Destination** - The planet the passenger will be debarking to.

**Age** - The age of the passenger.

**VIP** - Whether the passenger has paid for special VIP service during the voyage.

**RoomService, FoodCourt, ShoppingMall, Spa, VRDeck** - Amount the passenger has billed at each of the Spaceship Titanic's many luxury amenities.

**Name** - The first and last names of the passenger.

**Transported** - Whether the passenger was transported to another dimension. This is the target, the column you are trying to predict.

**test.csv** - Personal records for the remaining one-third (~4300) of the passengers, to be used as test data. Your task is to predict the value of Transported for the passengers in this set.

**sample_submission.csv** - A submission file in the correct format.

**PassengerId** - Id for each passenger in the test set.

**Transported** - The target. For each passenger, predict either True or False.

### Importing all packages

In [272]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import *
from sklearn.linear_model import *
from math import *
from sklearn.ensemble import *
from sklearn.feature_selection import *
from sklearn.feature_extraction import *
from sklearn.naive_bayes import *
from sklearn.discriminant_analysis import *
from sklearn.preprocessing import *
from sklearn.metrics import *
from sklearn.neighbors import *
from sklearn.cluster import *

### Importing all datasets

In [273]:
df_train = pd.read_csv("train.csv")
df_test = pd.read_csv("test.csv")

### Showing the first 5 elements in the training dataset

In [274]:
df_train.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True


### Showing the first 5 elements in the testing dataset

In [275]:
df_test.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name
0,0013_01,Earth,True,G/3/S,TRAPPIST-1e,27.0,False,0.0,0.0,0.0,0.0,0.0,Nelly Carsoning
1,0018_01,Earth,False,F/4/S,TRAPPIST-1e,19.0,False,0.0,9.0,0.0,2823.0,0.0,Lerome Peckers
2,0019_01,Europa,True,C/0/S,55 Cancri e,31.0,False,0.0,0.0,0.0,0.0,0.0,Sabih Unhearfus
3,0021_01,Europa,False,C/1/S,TRAPPIST-1e,38.0,False,0.0,6652.0,0.0,181.0,585.0,Meratz Caltilter
4,0023_01,Earth,False,F/5/S,TRAPPIST-1e,20.0,False,10.0,0.0,635.0,0.0,0.0,Brence Harperez


### Exploratory Data Analysis for the training dataset

In [276]:
df_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8693 entries, 0 to 8692
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   PassengerId   8693 non-null   object 
 1   HomePlanet    8492 non-null   object 
 2   CryoSleep     8476 non-null   object 
 3   Cabin         8494 non-null   object 
 4   Destination   8511 non-null   object 
 5   Age           8514 non-null   float64
 6   VIP           8490 non-null   object 
 7   RoomService   8512 non-null   float64
 8   FoodCourt     8510 non-null   float64
 9   ShoppingMall  8485 non-null   float64
 10  Spa           8510 non-null   float64
 11  VRDeck        8505 non-null   float64
 12  Name          8493 non-null   object 
 13  Transported   8693 non-null   bool   
dtypes: bool(1), float64(6), object(7)
memory usage: 891.5+ KB


In [277]:
df_train.describe()

Unnamed: 0,Age,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck
count,8514.0,8512.0,8510.0,8485.0,8510.0,8505.0
mean,28.82793,224.687617,458.077203,173.729169,311.138778,304.854791
std,14.489021,666.717663,1611.48924,604.696458,1136.705535,1145.717189
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,19.0,0.0,0.0,0.0,0.0,0.0
50%,27.0,0.0,0.0,0.0,0.0,0.0
75%,38.0,47.0,76.0,27.0,59.0,46.0
max,79.0,14327.0,29813.0,23492.0,22408.0,24133.0


In [278]:
df_train

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8688,9276_01,Europa,False,A/98/P,55 Cancri e,41.0,True,0.0,6819.0,0.0,1643.0,74.0,Gravior Noxnuther,False
8689,9278_01,Earth,True,G/1499/S,PSO J318.5-22,18.0,False,0.0,0.0,0.0,0.0,0.0,Kurta Mondalley,False
8690,9279_01,Earth,False,G/1500/S,TRAPPIST-1e,26.0,False,0.0,0.0,1872.0,1.0,0.0,Fayey Connon,True
8691,9280_01,Europa,False,E/608/S,55 Cancri e,32.0,False,0.0,1049.0,0.0,353.0,3235.0,Celeon Hontichre,False


In [279]:
train_1 = df_train.drop("Name",axis=1,inplace=False)
train_1

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...
8688,9276_01,Europa,False,A/98/P,55 Cancri e,41.0,True,0.0,6819.0,0.0,1643.0,74.0,False
8689,9278_01,Earth,True,G/1499/S,PSO J318.5-22,18.0,False,0.0,0.0,0.0,0.0,0.0,False
8690,9279_01,Earth,False,G/1500/S,TRAPPIST-1e,26.0,False,0.0,0.0,1872.0,1.0,0.0,True
8691,9280_01,Europa,False,E/608/S,55 Cancri e,32.0,False,0.0,1049.0,0.0,353.0,3235.0,False


In [280]:
train_1.isna().sum()

PassengerId       0
HomePlanet      201
CryoSleep       217
Cabin           199
Destination     182
Age             179
VIP             203
RoomService     181
FoodCourt       183
ShoppingMall    208
Spa             183
VRDeck          188
Transported       0
dtype: int64

In [281]:
train_2 = train_1.copy()
train_2 = train_2.dropna(subset=["HomePlanet"],axis=0,inplace=False)
train_2

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...
8688,9276_01,Europa,False,A/98/P,55 Cancri e,41.0,True,0.0,6819.0,0.0,1643.0,74.0,False
8689,9278_01,Earth,True,G/1499/S,PSO J318.5-22,18.0,False,0.0,0.0,0.0,0.0,0.0,False
8690,9279_01,Earth,False,G/1500/S,TRAPPIST-1e,26.0,False,0.0,0.0,1872.0,1.0,0.0,True
8691,9280_01,Europa,False,E/608/S,55 Cancri e,32.0,False,0.0,1049.0,0.0,353.0,3235.0,False


In [282]:
train_2.isna().sum()

PassengerId       0
HomePlanet        0
CryoSleep       215
Cabin           193
Destination     178
Age             177
VIP             200
RoomService     175
FoodCourt       181
ShoppingMall    201
Spa             180
VRDeck          187
Transported       0
dtype: int64

In [283]:
train_3 = train_2.copy()
train_3["CryoSleep"] = train_3["CryoSleep"].fillna(False)
train_3

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...
8688,9276_01,Europa,False,A/98/P,55 Cancri e,41.0,True,0.0,6819.0,0.0,1643.0,74.0,False
8689,9278_01,Earth,True,G/1499/S,PSO J318.5-22,18.0,False,0.0,0.0,0.0,0.0,0.0,False
8690,9279_01,Earth,False,G/1500/S,TRAPPIST-1e,26.0,False,0.0,0.0,1872.0,1.0,0.0,True
8691,9280_01,Europa,False,E/608/S,55 Cancri e,32.0,False,0.0,1049.0,0.0,353.0,3235.0,False


In [284]:
train_3.isna().sum()

PassengerId       0
HomePlanet        0
CryoSleep         0
Cabin           193
Destination     178
Age             177
VIP             200
RoomService     175
FoodCourt       181
ShoppingMall    201
Spa             180
VRDeck          187
Transported       0
dtype: int64

In [285]:
cabin = pd.DataFrame(columns=["Cabin Deck","Cabin Number","Cabin Side"])
deck = []
num = []
side = []
for i in list(train_3["Cabin"].to_numpy()):
    s = str(i)
    stri = s.split("/")
    if len(stri)<3:
        stri.insert(0,np.nan)
        stri[1] = np.nan
        stri.insert(2,np.nan)
    deck.append(stri[0])
    num.append(stri[1])
    side.append(stri[2])
cabin["Cabin Deck"] = deck
cabin["Cabin Number"] = num
cabin["Cabin Side"] = side

cabin["Cabin Deck"].value_counts()

F    2724
G    2498
E     853
B     766
C     734
D     468
A     252
T       4
Name: Cabin Deck, dtype: int64

In [286]:
cabin[["Cabin Number","Cabin Side"]][cabin["Cabin Deck"]=="T"]

Unnamed: 0,Cabin Number,Cabin Side
2206,1,P
2670,2,P
2698,3,P
4461,2,S


In [287]:
cabin_1 = cabin.copy()
cabin_1["Cabin Deck"] = cabin_1["Cabin Deck"].fillna("T")
arr = cabin_1[cabin_1["Cabin Number"].isna()].index.values
arr_1 = arr[:-1]
arr_2 = arr[-1]
arr_1 = arr_1.reshape(int(len(arr_1)/4),4)
arr_1

array([[  15,   92,  102,  219],
       [ 223,  246,  255,  267],
       [ 274,  287,  306,  309],
       [ 336,  403,  424,  443],
       [ 449,  472,  645,  659],
       [ 671,  679,  693,  764],
       [ 767,  772,  804,  847],
       [ 904,  922,  925,  935],
       [ 955, 1000, 1020, 1029],
       [1031, 1046, 1105, 1131],
       [1190, 1203, 1296, 1331],
       [1348, 1429, 1434, 1445],
       [1458, 1486, 1525, 1553],
       [1558, 1575, 1589, 1645],
       [1733, 1743, 1822, 1915],
       [1927, 1937, 1967, 2028],
       [2075, 2254, 2264, 2280],
       [2310, 2329, 2373, 2448],
       [2644, 2662, 2697, 2749],
       [2761, 2794, 2835, 2879],
       [2970, 2974, 3191, 3203],
       [3204, 3214, 3232, 3254],
       [3268, 3306, 3379, 3412],
       [3416, 3473, 3475, 3616],
       [3619, 3654, 3665, 3684],
       [3768, 3794, 3825, 3917],
       [3930, 3965, 4000, 4024],
       [4069, 4241, 4256, 4269],
       [4288, 4297, 4314, 4393],
       [4411, 4424, 4441, 4558],
       [46

In [288]:
arr_2

8458

In [289]:
ctr = 4
for iter in arr_1:

    cabin_1.loc[iter[0],"Cabin Number"] = ctr
    cabin_1.loc[iter[0],"Cabin Side"] = "P"
    
    cabin_1.loc[iter[1],"Cabin Number"] = ctr + 1
    cabin_1.loc[iter[1],"Cabin Side"] = "S"
    
    cabin_1.loc[iter[2],"Cabin Number"] = ctr + 2
    cabin_1.loc[iter[2],"Cabin Side"] = "P"
    
    cabin_1.loc[iter[3],"Cabin Number"] = ctr + 3
    cabin_1.loc[iter[3],"Cabin Side"] = "S"
    
    ctr += 1
    
cabin_1.loc[arr_2,"Cabin Number"] = ctr + 1
cabin_1.loc[arr_2,"Cabin Side"] = "P"

In [290]:
cabin_1.isna().sum()

Cabin Deck      0
Cabin Number    0
Cabin Side      0
dtype: int64

In [291]:
cabin_2 = cabin_1.reset_index()
cabin_2

Unnamed: 0,index,Cabin Deck,Cabin Number,Cabin Side
0,0,B,0,P
1,1,F,0,S
2,2,A,0,S
3,3,A,0,S
4,4,F,1,S
...,...,...,...,...
8487,8487,A,98,P
8488,8488,G,1499,S
8489,8489,G,1500,S
8490,8490,E,608,S


In [292]:
train_4 = train_3.reset_index()
train_4

Unnamed: 0,index,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Transported
0,0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,False
1,1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,True
2,2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,False
3,3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,False
4,4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8487,8688,9276_01,Europa,False,A/98/P,55 Cancri e,41.0,True,0.0,6819.0,0.0,1643.0,74.0,False
8488,8689,9278_01,Earth,True,G/1499/S,PSO J318.5-22,18.0,False,0.0,0.0,0.0,0.0,0.0,False
8489,8690,9279_01,Earth,False,G/1500/S,TRAPPIST-1e,26.0,False,0.0,0.0,1872.0,1.0,0.0,True
8490,8691,9280_01,Europa,False,E/608/S,55 Cancri e,32.0,False,0.0,1049.0,0.0,353.0,3235.0,False


In [295]:
train_5 = pd.merge(left=train_4,right=cabin_2,how="inner",on="index")
train_5.drop("Cabin",axis=1,inplace=True)
train_5 = train_5[["index","PassengerId","HomePlanet","Cabin Deck","Cabin Number","Cabin Side","CryoSleep","Destination","Age","VIP","RoomService","FoodCourt","ShoppingMall","Spa","VRDeck","Transported"]]
train_5.isna().sum()

index             0
PassengerId       0
HomePlanet        0
Cabin Deck        0
Cabin Number      0
Cabin Side        0
CryoSleep         0
Destination     174
Age             176
VIP             196
RoomService     172
FoodCourt       179
ShoppingMall    192
Spa             175
VRDeck          184
Transported       0
dtype: int64

In [296]:
train_5["Destination"].unique()

array(['TRAPPIST-1e', 'PSO J318.5-22', '55 Cancri e', nan], dtype=object)

In [297]:
train_6 = train_5.dropna(subset=["Destination"],axis=0,inplace=False)
train_6

Unnamed: 0,index,PassengerId,HomePlanet,Cabin Deck,Cabin Number,Cabin Side,CryoSleep,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Transported
0,0,0001_01,Europa,B,0,P,False,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,False
1,1,0002_01,Earth,F,0,S,False,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,True
2,2,0003_01,Europa,A,0,S,False,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,False
3,3,0003_02,Europa,A,0,S,False,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,False
4,4,0004_01,Earth,F,1,S,False,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8291,8486,9069_04,Europa,A,97,P,False,TRAPPIST-1e,44.0,False,0.0,4313.0,0.0,568.0,7.0,True
8292,8487,9069_05,Europa,A,98,P,False,55 Cancri e,29.0,False,0.0,12563.0,0.0,3.0,5057.0,False
8293,8488,9071_01,Earth,G,1499,S,False,55 Cancri e,22.0,False,0.0,0.0,1072.0,46.0,3.0,False
8294,8490,9072_02,Mars,E,608,S,False,TRAPPIST-1e,30.0,False,160.0,0.0,719.0,162.0,0.0,False


In [298]:
train_6.isna().sum()

index             0
PassengerId       0
HomePlanet        0
Cabin Deck        0
Cabin Number      0
Cabin Side        0
CryoSleep         0
Destination       0
Age             173
VIP             193
RoomService     169
FoodCourt       173
ShoppingMall    187
Spa             172
VRDeck          182
Transported       0
dtype: int64