<a href="https://colab.research.google.com/github/samettyldrm/machine-learning/blob/main/kaggle/spaceship-titanic/spaceship-titanic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Spaceship Titanic Made Easy 🚢👽
The goals of this notebook is to make this data and this competition easy for you.

Description:

Welcome to the year 2912, where your data science skills are needed to solve a cosmic mystery. We've received a transmission from four lightyears away and things aren't looking good.

The Spaceship Titanic was an interstellar passenger liner launched a month ago. With almost 13,000 passengers on board, the vessel set out on its maiden voyage transporting emigrants from our solar system to three newly habitable exoplanets orbiting nearby stars.

While rounding Alpha Centauri en route to its first destination—the torrid 55 Cancri E—the unwary Spaceship Titanic collided with a spacetime anomaly hidden within a dust cloud. Sadly, it met a similar fate as its namesake from 1000 years before. Though the ship stayed intact, almost half of the passengers were transported to an alternate dimension!

To help rescue crews and retrieve the lost passengers, you are challenged to predict which passengers were transported by the anomaly using records recovered from the spaceship’s damaged computer system.

Help save them and change history!

### 1.Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

from sklearn.preprocessing import LabelEncoder, MinMaxScaler, OneHotEncoder

### 2.Loading the Data

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
train = pd.read_csv("/content/drive/MyDrive/spaceship-titanic/train.csv", sep=";")
test = pd.read_csv("/content/drive/MyDrive/spaceship-titanic/test.csv", sep=";")

### Columns Description

* PassengerId - A unique Id for each passenger. Each Id takes the form gggg_pp where gggg indicates a group the passenger is travelling with and pp is their number within the group. People in a group are often family members, but not always.
* HomePlanet - The planet the passenger departed from, typically their planet of permanent residence.
* CryoSleep - Indicates whether the passenger elected to be put into suspended animation for the duration of the voyage. Passengers in cryosleep are confined to their cabins.
* Cabin - The cabin number where the passenger is staying. Takes the form deck/num/side, where side can be either P for Port or S for Starboard.
* Destination - The planet the passenger will be debarking to.
* Age - The age of the passenger.
* VIP - Whether the passenger has paid for special VIP service during the voyage.
* RoomService, FoodCourt, ShoppingMall, Spa, VRDeck - Amount the passenger has billed at each of the Spaceship Titanic's many luxury amenities.
* Name - The first and last names of the passenger.
* Transported - Whether the passenger was transported to another dimension. This is the target, the column you are trying to predict.

### Türkçe

* PassengerId - Her yolcu için benzersiz bir kimlik. Her Id, gggg_pp biçimini alır; burada gggg, yolcunun birlikte seyahat ettiği bir grubu gösterir ve pp, grup içindeki onların numarasıdır. Bir gruptaki insanlar genellikle aile üyeleridir, ancak her zaman değil.
* HomePlanet - Yolcunun ayrıldığı gezegen, tipik olarak daimi ikamet ettikleri gezegen.
* CryoSleep - Yolcunun yolculuk süresince askıya alınmış animasyona alınmayı seçip seçmediğini gösterir. Kriyo uykudaki yolcular kamaralarına kapatılıyor.
* Cabin - Yolcunun kaldığı kabin numarası. Güverte/sayı/yan formunu alır; burada yan, İskele için P veya Sancak için S olabilir.
* Destination - Yolcunun karaya çıkacağı gezegen.
* Age - Yolcunun yaşı.
* VIP - Yolcunun yolculuk sırasında özel VIP hizmeti için ödeme yapıp yapmadığı.
RoomService, FoodCourt, ShoppingMall, Spa, VRDeck - Uzay Gemisi Titanic'in birçok lüks olanaklarının her birinde yolcunun faturalandırdığı tutar.
* Name - Yolcunun adı ve soyadı.
* Transported - Yolcunun başka bir boyuta taşınıp taşınmadığı. Bu hedef, tahmin etmeye çalıştığınız sütun.

In [4]:
train.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True


In [5]:
train.isnull().sum()

PassengerId       0
HomePlanet      201
CryoSleep       217
Cabin           199
Destination     182
Age             179
VIP             203
RoomService     181
FoodCourt       183
ShoppingMall    208
Spa             183
VRDeck          188
Name            200
Transported       0
dtype: int64

In [6]:
for col in train.columns:
  if train[col].dtypes == 'object':
    train[col].fillna(train[col].mode(), inplace=True)
  elif train[col].dtypes == 'float64':
    train[col].fillna(train[col].mean(), inplace=True)

In [7]:
for col in train.columns:
  if train[col].dtypes == 'object':
    print(train[col].value_counts())
    print()

0001_01    1
6136_01    1
6141_01    1
6139_06    1
6139_05    1
          ..
3126_01    1
3124_03    1
3124_02    1
3124_01    1
9280_02    1
Name: PassengerId, Length: 8693, dtype: int64

Earth     4602
Europa    2131
Mars      1759
Name: HomePlanet, dtype: int64

False    5439
True     3037
Name: CryoSleep, dtype: int64

G/734/S     8
G/109/P     7
B/201/P     7
G/1368/P    7
G/981/S     7
           ..
G/556/P     1
E/231/S     1
G/545/S     1
G/543/S     1
F/947/P     1
Name: Cabin, Length: 6560, dtype: int64

TRAPPIST-1e      5915
55 Cancri e      1800
PSO J318.5-22     796
Name: Destination, dtype: int64

False    8291
True      199
Name: VIP, dtype: int64

Gollux Reedall        2
Elaney Webstephrey    2
Grake Porki           2
Sus Coolez            2
Apix Wala             2
                     ..
Jamela Griffy         1
Hardy Griffy          1
Salley Mckinn         1
Mall Frasp            1
Propsh Hontichre      1
Name: Name, Length: 8473, dtype: int64



In [8]:
train.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True


In [9]:
train[["_TravelGroupId", "_TravelGroupNum"]] = train.PassengerId.str.split("_", n=1, expand=True)

In [10]:
train.tail()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported,_TravelGroupId,_TravelGroupNum
8688,9276_01,Europa,False,A/98/P,55 Cancri e,41.0,True,0.0,6819.0,0.0,1643.0,74.0,Gravior Noxnuther,False,9276,1
8689,9278_01,Earth,True,G/1499/S,PSO J318.5-22,18.0,False,0.0,0.0,0.0,0.0,0.0,Kurta Mondalley,False,9278,1
8690,9279_01,Earth,False,G/1500/S,TRAPPIST-1e,26.0,False,0.0,0.0,1872.0,1.0,0.0,Fayey Connon,True,9279,1
8691,9280_01,Europa,False,E/608/S,55 Cancri e,32.0,False,0.0,1049.0,0.0,353.0,3235.0,Celeon Hontichre,False,9280,1
8692,9280_02,Europa,False,E/608/S,TRAPPIST-1e,44.0,False,126.0,4688.0,0.0,0.0,12.0,Propsh Hontichre,True,9280,2


In [11]:
train[["Deck", "CabinNum", "Side"]] = train.Cabin.str.split("/", n=2, expand=True)

In [12]:
train.drop(['PassengerId',"Cabin"], axis=1, inplace=True)

In [13]:
train.head()

Unnamed: 0,HomePlanet,CryoSleep,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported,_TravelGroupId,_TravelGroupNum,Deck,CabinNum,Side
0,Europa,False,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False,1,1,B,0,P
1,Earth,False,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True,2,1,F,0,S
2,Europa,False,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False,3,1,A,0,S
3,Europa,False,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False,3,2,A,0,S
4,Earth,False,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True,4,1,F,1,S


In [14]:
## Preprocessing

def label_encoding(column_name):
  LE = LabelEncoder()
  train[column_name] = LE.fit_transform(train[column_name])

In [15]:
label_encoding("CryoSleep")
label_encoding("VIP")
label_encoding("Transported")


In [16]:
train.head()

Unnamed: 0,HomePlanet,CryoSleep,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported,_TravelGroupId,_TravelGroupNum,Deck,CabinNum,Side
0,Europa,0,TRAPPIST-1e,39.0,0,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,0,1,1,B,0,P
1,Earth,0,TRAPPIST-1e,24.0,0,109.0,9.0,25.0,549.0,44.0,Juanna Vines,1,2,1,F,0,S
2,Europa,0,TRAPPIST-1e,58.0,1,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,0,3,1,A,0,S
3,Europa,0,TRAPPIST-1e,33.0,0,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,0,3,2,A,0,S
4,Earth,0,TRAPPIST-1e,16.0,0,303.0,70.0,151.0,565.0,2.0,Willy Santantines,1,4,1,F,1,S


In [17]:
one_hot = pd.get_dummies(train[["HomePlanet", "Destination", "Deck", "Side"]]) #one-hot yapmak istediğimiz sütunları buraya ekliyoruz.
one_hot.head()

Unnamed: 0,HomePlanet_Earth,HomePlanet_Europa,HomePlanet_Mars,Destination_55 Cancri e,Destination_PSO J318.5-22,Destination_TRAPPIST-1e,Deck_A,Deck_B,Deck_C,Deck_D,Deck_E,Deck_F,Deck_G,Deck_T,Side_P,Side_S
0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0
1,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1
2,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,1
3,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,1
4,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1


In [18]:
train.drop(["HomePlanet", "Destination", "Deck", "Side"], axis=1, inplace=True)

In [19]:
train = pd.concat([train,one_hot], axis=1)

Unnamed: 0,0,1,2,3,4
CryoSleep,0,0,0,0,0
Age,39.0,24.0,58.0,33.0,16.0
VIP,0,0,1,0,0
RoomService,0.0,109.0,43.0,0.0,303.0
FoodCourt,0.0,9.0,3576.0,1283.0,70.0
ShoppingMall,0.0,25.0,0.0,371.0,151.0
Spa,0.0,549.0,6715.0,3329.0,565.0
VRDeck,0.0,44.0,49.0,193.0,2.0
Name,Maham Ofracculy,Juanna Vines,Altark Susent,Solam Susent,Willy Santantines
Transported,0,1,0,0,1


In [24]:
train.head()

Unnamed: 0,CryoSleep,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported,...,Deck_A,Deck_B,Deck_C,Deck_D,Deck_E,Deck_F,Deck_G,Deck_T,Side_P,Side_S
0,0,39.0,0,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,0,...,0,1,0,0,0,0,0,0,1,0
1,0,24.0,0,109.0,9.0,25.0,549.0,44.0,Juanna Vines,1,...,0,0,0,0,0,1,0,0,0,1
2,0,58.0,1,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,0,...,1,0,0,0,0,0,0,0,0,1
3,0,33.0,0,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,0,...,1,0,0,0,0,0,0,0,0,1
4,0,16.0,0,303.0,70.0,151.0,565.0,2.0,Willy Santantines,1,...,0,0,0,0,0,1,0,0,0,1


In [29]:
x = train.drop(["Transported", "Name"], axis=1)
y = train["Transported"]

In [30]:
x.head()

Unnamed: 0,CryoSleep,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,_TravelGroupId,_TravelGroupNum,...,Deck_A,Deck_B,Deck_C,Deck_D,Deck_E,Deck_F,Deck_G,Deck_T,Side_P,Side_S
0,0,39.0,0,0.0,0.0,0.0,0.0,0.0,1,1,...,0,1,0,0,0,0,0,0,1,0
1,0,24.0,0,109.0,9.0,25.0,549.0,44.0,2,1,...,0,0,0,0,0,1,0,0,0,1
2,0,58.0,1,43.0,3576.0,0.0,6715.0,49.0,3,1,...,1,0,0,0,0,0,0,0,0,1
3,0,33.0,0,0.0,1283.0,371.0,3329.0,193.0,3,2,...,1,0,0,0,0,0,0,0,0,1
4,0,16.0,0,303.0,70.0,151.0,565.0,2.0,4,1,...,0,0,0,0,0,1,0,0,0,1


In [31]:
y.head()

0    0
1    1
2    0
3    0
4    1
Name: Transported, dtype: int64

In [33]:
scaler = MinMaxScaler()
x = scaler.fit_transform(x)
x[0:5]

array([[0.00000000e+00, 4.93670886e-01, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        1.00000000e+00, 0.00000000e+00, 1.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 1.00000000e+00, 0.00000000e+00],
       [0.00000000e+00, 3.03797468e-01, 0.00000000e+00, 7.60801284e-03,
        3.01881729e-04, 1.06419207e-03, 2.45001785e-02, 1.82322960e-03,
        1.07770234e-04, 0.00000000e+00, 0.00000000e+00, 1.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 1.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 1.00000000e+00],
       [0.00000000e+00

In [47]:
train.isnull().sum()

CryoSleep                      0
Age                            0
VIP                            0
RoomService                    0
FoodCourt                      0
ShoppingMall                   0
Spa                            0
VRDeck                         0
Name                         200
Transported                    0
_TravelGroupId                 0
_TravelGroupNum                0
CabinNum                     199
HomePlanet_Earth               0
HomePlanet_Europa              0
HomePlanet_Mars                0
Destination_55 Cancri e        0
Destination_PSO J318.5-22      0
Destination_TRAPPIST-1e        0
Deck_A                         0
Deck_B                         0
Deck_C                         0
Deck_D                         0
Deck_E                         0
Deck_F                         0
Deck_G                         0
Deck_T                         0
Side_P                         0
Side_S                         0
dtype: int64

In [34]:
from sklearn.model_selection import train_test_split

In [35]:
x_train, y_train, x_test, y_test = train_test_split(x,y, test_size= 0.2, random_state = 42)

In [40]:
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.svm import SVR

In [41]:
models = {"linreg" : LinearRegression(),
          "DecTree" : DecisionTreeRegressor(),
          "RandForest" : RandomForestRegressor(),
          "SVM" : SVR()}
          

def scores(models, x_train, y_train):
  model_scores = {}

  for name, model in models.items():
    print(name)
    model_scores[name] = cross_val_score(model, x_train, y_train, scoring = "neg_mean_squared_error", cv= 10)

  return model_scores

In [45]:
linreg = LinearRegression()

In [46]:
cross_val_score(linreg, x_train, y_train, scoring = "neg_mean_squared_error", cv=10)

ValueError: ignored

In [44]:
model_scores = scores(models, x_train, y_train)
model_scores

linreg


ValueError: ignored