# Data Processing

This tutorial explains how tabular data can be handled and transformed with the `Table` class.

<div class="admonition note">
  <p class="admonition-title">Note</p>
  <p>
    All operations on a <code>Table</code> return a new <code>Table</code>. The original <code>Table</code> will not be changed.
  </p>
</div>

### Create & Load data

1. Load your data into a `Table`:

In [None]:
from safeds.data.tabular.containers import Table

titanic = Table.from_csv_file("data/titanic.csv")

2. Create a `Table` containing only the first 10 rows:

In [None]:
titanic_slice = titanic.slice_rows(length=10)

titanic_slice # just to show the output

3. Extract a `Column` from your `Table`:

In [None]:
titanic_slice.get_column("name")

4. Combine a list of `Column`s to a `Table` (make sure the `Column`s have the same amount of rows):

In [None]:
Table.from_columns([
    titanic_slice.get_column("name"),
    titanic_slice.get_column("age")
])

5. Drop columns from a `Table`:

In [None]:
titanic_slice.remove_columns([
    "id",
    "name",
    "ticket",
    "cabin",
    "port_embarked",
    "survived"
])

6. Keep only specified columns of a `Table`:

In [None]:
titanic_slice.remove_columns_except(["name", "survived"])

## Process data

1. Filter rows with a given query:

In [None]:
titanic.remove_rows(
    lambda row: row.get_value("age") < 1
)

## Transform table
1. Transform table using `Imputer`. `Imputer`s replace missing values with other values (e.g. a constant, the mean or the median of the column etc.) depending on the chosen startegy, for example, the following `Imputer` will replace missing values in the given columns of the table with the constant 0:

In [None]:
from safeds.data.tabular.transformation import SimpleImputer

imputer = SimpleImputer(SimpleImputer.Strategy.constant(0)).fit(titanic, ["age", "fare", "cabin", "port_embarked"])
imputer.transform(titanic_slice)

2. Transform table using `LabelEncoder`, this will encode categorical features in the chosen `Column`s as integers:

In [None]:
from safeds.data.tabular.transformation import LabelEncoder

encoder = LabelEncoder().fit(titanic, ["sex", "port_embarked"])
encoder.transform(titanic_slice)

3. Transform table using `OneHotEncoder`, this will create new `Column`s based on unique values in each chosen `Column`:


In [None]:
from safeds.data.tabular.transformation import OneHotEncoder

encoder = OneHotEncoder().fit(titanic, ["sex", "port_embarked"])
encoder.transform(titanic_slice)

 4. Transform table using `RangeScaler`, this will scale the values in the chosen `Column`s to a given range:

In [None]:
from safeds.data.tabular.transformation import RangeScaler

scaler = RangeScaler(0.0, 1.0).fit(titanic, ["age"])
scaler.transform(titanic_slice)

5. Transform table using `StandardScaler`, this will standardize values of chosen `Column`s:

In [None]:
from safeds.data.tabular.transformation import StandardScaler

scaler = StandardScaler().fit(titanic, ["age", "travel_class"])
scaler.transform(titanic_slice)

## Transform column

1. Transform values of "parents_children" `Column` into true or false, depending on whether passenger travelled with parents or children:

In [None]:
titanic_slice.transform_column("parents_children", lambda cell: cell > 0)