*Analytical Information Systems*

# Descriptive Statistics in R - Baseball Salaries

Prof. Christoph M. Flath<br>
Lehrstuhl für Wirtschaftsinformatik und Informationsmanagement

SS 2019

<h1>Agenda<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Load-packages" data-toc-modified-id="Load-packages-1">Load packages</a></span></li><li><span><a href="#Download-and-preprocess-data" data-toc-modified-id="Download-and-preprocess-data-2">Download and preprocess data</a></span></li><li><span><a href="#Central-Tendency" data-toc-modified-id="Central-Tendency-3">Central Tendency</a></span></li><li><span><a href="#Variability" data-toc-modified-id="Variability-4">Variability</a></span></li><li><span><a href="#Shape" data-toc-modified-id="Shape-5">Shape</a></span></li></ul></div>

## Load packages


In [None]:
library(tidyverse)
library(moments)

## Download and preprocess data

In [None]:
file_url <- "https://www.dropbox.com/s/ysd0zljicq5yqfo/baseball.csv?dl=1"

file_url %>%
    read_csv2() %>%
    mutate(Salary = str_replace_all(Salary,"\\$","")) %>%
    mutate(Salary = str_replace_all(Salary,",","")) %>%
    mutate(Salary = as.numeric(Salary) / 1000000) -> salaries

Have a quick look at the data

In [None]:
glimpse(salaries)

## Central Tendency

In [None]:
salaries %>%
  summarise(mean=mean(Salary),
            median=median(Salary))

no direct function for mode

In [None]:
salaries %>%
  group_by(Salary) %>%
  summarize(count = n()) %>%
  arrange(-count) %>%
  head(5)

## Variability

In [None]:
salaries %>%
  summarise(range=max(Salary)-min(Salary),
            var=var(Salary),
            CoV=sd(Salary)/mean(Salary))

Tukey's five number summary (minimum, lower-hinge, median, upper-hinge, maximum)

In [None]:
fivenum(salaries$Salary)

Summary function

In [None]:
summary(salaries$Salary)

#### not meaningful without comparisons - let's do on team level

- range

In [None]:
salaries %>%
  group_by(Team) %>%
  summarize(range = diff(range(Salary))) %>%
  arrange(range)

- covariance

In [None]:
salaries %>%
  group_by(Team) %>%
  summarize(cov = sd(Salary)/mean(Salary)) %>%
  arrange(cov)

##  Shape

In [None]:
salaries %>%
  summarise(skew=skewness(Salary),
            kurt=kurtosis(Salary))

In [None]:
salaries %>%
  group_by(Team) %>%
  summarize(skew = skewness(Salary)) %>%
  arrange(-skew)

In [None]:
salaries %>%
  group_by(Team) %>%
  summarize(skew = skewness(Salary)) %>%
  arrange(skew)

In [None]:
salaries %>%
  group_by(Team) %>%
  summarize(kurt = kurtosis(Salary)) %>%
  arrange(-kurt)

In [None]:
salaries %>%
  group_by(Team) %>%
  summarize(kurt = kurtosis(Salary)) %>%
  arrange(kurt)