# Data Visualization

This tutorial shows you how to use the `safeds.plotting` module to visualize your data and understand it better.

## The data

First, we need some data to visualize. For this, we use the common example of the Titanic disaster, which is also included in our [`safe-ds-examples` package](https://pypi.org/project/safe-ds-examples). If you want to know more about the dataset, check out its [documentation](https://stdlib-examples.safe-ds.com/en/latest/examples/titanic/). Naturally, you can also use your own data.

In [None]:
from safeds.data.tabular.containers import Table

titanic = Table.from_csv_file("data/titanic.csv")

Let's have a quick look at the first 10 rows of the data:

In [None]:
titanic.slice_rows(end=10)

The visualizations we present in this tutorial work on numerical data only. So, let's remove the columns that are not numerical and let's get rid of the `id` column as well while we are at it:

In [None]:
titanic_numerical = titanic.remove_columns(
    ["id", "name", "sex", "ticket", "cabin", "port_embarked"]
)

## Correlation heatmap

The correlation heatmap is ideal to get a quick overview of the relationships between the columns in your dataset. Each cell represents the correlation between two columns as a value between -1 and 1:

* A negative value (blue) means that if one column increases, the other decreases.
* A positive value (red) indicates that if one column increases, the other increases as well.
* A value of 0 (white) means that the two columns are not correlated at all.

In [None]:
titanic_numerical.plot_correlation_heatmap()

Let's look at the correlations involving the `travel_class` column:

* `travel_class` is negatively correlated with `age`. This means that older passengers tended to travel in better travel classes (1st/2nd class) than younger passengers.
* `travel_class` has no strong correlation with either `siblings_spouses` or `parents_children`.
* `travel_class` is, unsurprisingly, positively correlated with itself. You'll always find that the diagonal of a correlation heatmap is bright red.
* `travel_class` is negatively correlated with `fare`. Naturally, better travel classes were more expensive.
* `travel_class` is negatively correlated with `survived`. People in better travel classes were more likely to survive the accident.

## Lineplot

Next, we use a lineplot to better understand the relationship between `survived` and `fare`. The line itself represents the mean value and the hued area around it a 95% confidence interval around the mean.

In [None]:
titanic_numerical.plot_lineplot("survived", "fare")

We can conclude that survivors paid around twice as much for their ticket as non-survivors.

## Boxplot

In [None]:
titanic_numerical.get_column("age").plot_boxplot()

## Boxplot of all numerical columns

In [None]:
titanic_numerical.plot_boxplots()

## Histogram

In [None]:
titanic_numerical.get_column("fare").plot_histogram()

## Scatterplot

In [None]:
titanic_numerical.plot_scatterplot("age", "fare")
