# Data Visualization

[Data-based Storytelling](./index.html)

Daniel Winkler (Institute for Retailing & Data Science)  
Stephan Fally (Department Marketing)

# Always Visualize!

## Processing numbers is hard

In [None]:
library(datasauRus)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

## Summary statistics are limiting

In [None]:
datasaurus_dozen |>
  filter(dataset %in% c('dino', 'star')) |>
  group_by(dataset) |>
  summarize(
    avg_x = round(mean(x), digits = 2), 
    sd_x = round(sd(x), digits = 2),
    avg_y = round(mean(y), digits = 2),
    sd_y = round(sd(y), digits = 2),
    cor_xy = round(cor(x,y), digits = 2)
    ) |>
  mutate(dataset = str_to_title(dataset)) |> 
  gt() |>
  cols_label(
    avg_x = "Mean of x",
    sd_x = "Std. Dev. of x",
    avg_y = "Mean of y",
    sd_y = "Std. Dev. of y",
    cor_xy = "Correlation") |> 
  tab_options(
    table.width = pct(85),
    table.font.size = 35) 

``` r
datasaurus_dozen |>
  filter(dataset %in% c('dino', 'star')) |>
  mutate(dataset = str_to_title(dataset)) |>
  ggplot(aes(x = x, y = y)) +
  geom_point() +
  facet_wrap(~dataset) +
  theme_minimal() +
  theme(
    panel.grid=element_blank(),
    strip.text=element_text(size=35),
    axis.ticks=element_blank(),
    axis.text=element_blank(),
    axis.title=element_text(size=30)
  )
```

![](attachment:02-data_visualization_files/figure-ipynb/unnamed-chunk-4-1.png)

## Exercise

> **See beyond summary statistics**
>
> -   Select the `dataset`s `x_shape` & `bullseye` from the data.frame
>     `datasaurus_dozen`
> -   Create a table showing the following statistics for the two
>     `dataset`s:
>     -   mean of `x` and `y`,
>     -   standard deviation of `x` and `y`, and
>     -   covariance between `x` and `y`
> -   Create a plot showing the two `dataset`s

In [None]:
library(datasauRus)
library(dplyr)
library(ggplot2)
library(gt)
library(stringr)
library(tidyr)
filter(datasaurus_dozen, dataset %in% c('x_shape', 'bullseye')) |>
  str(give.attr = FALSE)

tibble [284 × 3] (S3: tbl_df/tbl/data.frame)
 $ dataset: chr [1:284] "x_shape" "x_shape" "x_shape" "x_shape" ...
 $ x      : num [1:284] 38.3 35.8 32.8 33.7 37.2 ...
 $ y      : num [1:284] 92.5 94.1 88.5 88.6 83.7 ...

# Visual Channels

In [None]:
print("HI")

[1] "HI"

# Mapping visual ratios

If including $0$ in the y-axis would make the graph unreadable include
additional annotation Or visualize the difference.