# Getting started

This tutorial uses safeds on **titanic passenger data** to predict who will survive and who will not, using sex and travel class as features for the prediction.


1. Load your data into a `Table`, the data is available under `docs/tutorials/data/titanic.csv`:


In [30]:
from safeds.data.tabular.containers import Table

titanic = Table.from_csv_file("data/titanic.csv")

2. Split the titanic dataset into two tables. A training set, that we will use later to implement a training model to predict the survival of passengers, containing 60% of the data, and a testing set containing the rest of the data.
Delete the column `survived` from the test set, to be able to predict it later:

In [None]:
split_tuple = titanic.split(0.60)

train_table = split_tuple[0]
test_table = split_tuple[1]

test_table = test_table.remove_columns(["survived"])

3. Use `OneHotEncoder` to create an encoder, that will be used later to transform the training table.
* We use `OneHotEncoder` to transform non-numerical categorical values into numerical representations with values of zero or one. In this example we will transform the values of columns sex and travel_class, hence they will be used in the model for predicting the surviving of passengers.(In this case, travel_class has categorical but not numerical values, OneHotEncoder Would still be the most efficient Encoder to use in this case, since we can pass both columns needed as features at the same time)
* Use the `fit` function of the `OneHotEncoder` to pass the table and the column names, that will be used as features to predict who will survive to the encoder.
* The names of the columns before transformation need to be saved, because `OneHotEncoder` will change the names of the fitted `Column`s:


In [32]:
from safeds.data.tabular.transformation import OneHotEncoder

old_column_names = train_table.column_names
encoder = OneHotEncoder().fit(train_table, ["sex", "travel_class"])

  encoder = OneHotEncoder().fit(train_table, ["sex"])


4. Transform the training table using the fitted encoder, and create a set with the new names of the fitted `Column`s:


In [33]:
transformed_table = encoder.transform(train_table)
new_column_names = transformed_table.column_names
new_columns= set(new_column_names) - set(old_column_names)

5. Tag the `survived` `Column` as the target variable to be predicted. Use the new names of the fitted `Column`s as features, which will be used to make predictions based on the target variable.

In [34]:
tagged_train_table= transformed_table.tag_columns("survived", feature_names=[
    *new_columns
])

6. Use `RandomForest` classifier as a model for the classification. Pass the "tagged_titanic" table to the fit function of the model:

In [35]:
from safeds.ml.classical.classification import RandomForest

model = RandomForest()
fitted_model= model.fit(tagged_train_table)

7. Use the fitted random forest model, that we trained on the training dataset to predict the survival rate of passengers in the test dataset.
Transform the test data with `OneHotEncoder` first, to be able to pass it to the predict function, that uses our fitted random forest model for prediction:

In [36]:
encoder = OneHotEncoder().fit(test_table, ["sex", "travel_class"])
transformed_test_table = encoder.transform(test_table)

fitted_model.predict(
    transformed_test_table
)


  encoder = OneHotEncoder().fit(test_table, ["sex"])


Unnamed: 0,id,name,sex__male,sex__female,age,siblings_spouses,parents_children,ticket,travel_class__1,travel_class__3,travel_class__2,fare,cabin,port_embarked,survived
0,785,"McGough, Mr. James Robert",1.0,0.0,36.0,0,0,PC 17473,1.0,0.0,0.0,26.2875,E25,Southampton,0
1,786,"McGovern, Miss. Mary",0.0,1.0,,0,0,330931,0.0,1.0,0.0,7.8792,,Queenstown,0
2,787,"McGowan, Miss. Anna 'Annie'",0.0,1.0,15.0,0,0,330923,0.0,1.0,0.0,8.0292,,Queenstown,0
3,788,"McGowan, Miss. Katherine",0.0,1.0,35.0,0,0,9232,0.0,1.0,0.0,7.75,,Queenstown,0
4,789,"McKane, Mr. Peter David",1.0,0.0,46.0,0,0,28403,0.0,0.0,1.0,26.0,,Southampton,0
5,790,"McMahon, Mr. Martin",1.0,0.0,,0,0,370372,0.0,1.0,0.0,7.75,,Queenstown,0
6,791,"McNamee, Mr. Neal",1.0,0.0,24.0,1,0,376566,0.0,1.0,0.0,16.1,,Southampton,0
7,792,"McNamee, Mrs. Neal (Eileen O'Leary)",0.0,1.0,19.0,1,0,376566,0.0,1.0,0.0,16.1,,Southampton,0
8,793,"McNeill, Miss. Bridget",0.0,1.0,,0,0,370368,0.0,1.0,0.0,7.75,,Queenstown,0
9,794,"Meanwell, Miss. (Marion Ogden)",0.0,1.0,,0,0,SOTON/O.Q. 392087,0.0,1.0,0.0,8.05,,Southampton,0


7. You can test the accuracy of that model as follows:

In [39]:
fitted_model.accuracy(tagged_train_table)


0.7796178343949045