# Getting started

This tutorial uses safeds on **titanic passenger data** to predict who will survive and who will not, using sex and travel class as features for the prediction.


1. Load your data into a `Table`, the data is available under `docs/tutorials/data/titanic.csv`:


In [16]:
from safeds.data.tabular.containers import Table

titanic = Table.from_csv_file("data/titanic.csv")

2. Use `OneHotEncoder` to create an encoder, that will be used later to transform the titanic table.
Use the `fit` function of the `OneHotEncoder` to pass the table and the column names, that will be used as features to predict who will survive to the encoder.
The names of the columns before transformation need to be saved, because `OneHotEncoder` will change the names of the fitted `Column`s:


In [5]:
from safeds.data.tabular.transformation import OneHotEncoder
old_column_names = titanic.column_names
encoder = OneHotEncoder().fit(titanic, ["sex", "travel_class"])

3. Transform the table using the fitted encoder, and create a set with the new names of the fitted `Column`s:


In [6]:
transformed_table = encoder.transform(titanic)
new_column_names = transformed_table.column_names
new_columns= set(new_column_names) - set(old_column_names)

4. Tag the "survived" `Column` and use the new names of the fitted `Column`s as features:

In [8]:
tagged_titanic= transformed_table.tag_columns("survived", feature_names=[
    *new_columns
])

5. Use `RandomForest` classifier as a model for the classification. Pass the "tagged_titanic" table to the fit function of the model:

In [9]:
from safeds.ml.classical.classification import RandomForest
model = RandomForest()
fitted_model= model.fit(tagged_titanic)

6. Use the fitted random forest model to predict the survival rate:

In [10]:
fitted_model.predict(
    tagged_titanic.remove_columns(["survived"])
)


Unnamed: 0,id,name,sex__male,sex__female,age,siblings_spouses,parents_children,ticket,travel_class__3,travel_class__2,travel_class__1,fare,cabin,port_embarked,survived
0,0,"Abbing, Mr. Anthony",1.0,0.0,42.0,0,0,C.A. 5547,1.0,0.0,0.0,7.55,,Southampton,0
1,1,"Abbott, Master. Eugene Joseph",1.0,0.0,13.0,0,2,C.A. 2673,1.0,0.0,0.0,20.25,,Southampton,0
2,2,"Abbott, Mr. Rossmore Edward",1.0,0.0,16.0,1,1,C.A. 2673,1.0,0.0,0.0,20.25,,Southampton,0
3,3,"Abbott, Mrs. Stanton (Rosa Hunt)",0.0,1.0,35.0,1,1,C.A. 2673,1.0,0.0,0.0,20.25,,Southampton,0
4,4,"Abelseth, Miss. Karen Marie",0.0,1.0,16.0,0,0,348125,1.0,0.0,0.0,7.65,,Southampton,0
5,5,"Abelseth, Mr. Olaus Jorgensen",1.0,0.0,25.0,0,0,348122,1.0,0.0,0.0,7.65,F G63,Southampton,0
6,6,"Abelson, Mr. Samuel",1.0,0.0,30.0,1,0,P/PP 3381,0.0,1.0,0.0,24.0,,Cherbourg,0
7,7,"Abelson, Mrs. Samuel (Hannah Wizosky)",0.0,1.0,28.0,1,0,P/PP 3381,0.0,1.0,0.0,24.0,,Cherbourg,1
8,8,"Abrahamsson, Mr. Abraham August Johannes",1.0,0.0,20.0,0,0,SOTON/O2 3101284,1.0,0.0,0.0,7.925,,Southampton,0
9,9,"Abrahim, Mrs. Joseph (Sophie Halaut Easu)",0.0,1.0,18.0,0,0,2657,1.0,0.0,0.0,7.2292,,Cherbourg,0


7. You can test the accuracy of that model as follows:

In [11]:
fitted_model.accuracy(tagged_titanic)


0.7830404889228418