# Week 06: Descriptive Statistics

## Introduction 

Statistics in general can be defined as a branch of mathematics that deals with data collection, organization, analysis, interpretation, and presentation. As such, statistics can be subdivided into two main areas. 

* **Descriptive statistics** deals with the description of data and their visualization

* **Inferential statistics** deals with data analysis and interpretation. Typically, this means testing assumptions about correlations between variables 

As stated above, here, we will be dealing with the description of data, especially with  
* *measures of central tendency*  
* *measures of variability*   
* *confidence intervals*  


**Preparation and session set up**

Before turning to the code below, please install the packages by running the code below this paragraph. If you have already installed the packages mentioned below, then you can skip ahead and ignore this section. To install the necessary packages, simply run the following code - it may take some time (between 1 and 5 minutes to install all of the libraries so you do not need to worry if it takes some time).


In [None]:
# install packages
#install.packages("dplyr")
#install.packages("DescTools")
#install.packages("readxl")


Now that we have installed the packages, we activate them as shown below.



In [None]:
# activate packages
library(DescTools)
library(dplyr)
library(readxl)


##  Tutorial Activity 

Go into groups - each group will describe 1 data set.

You need to find the following elements:

**Task 1**: Calculate the overall mean, median, and standard deviation of the numeric variable.

**Task 2**: Provide the frequencies of nominal and categorical variables

**Task 3**: Cross-tabulate nominal and categorical variables

**Task 4**: For each configuration of nominal and categorical variables (e.g. young women, young, men, middle aged women, middle-aged men, old women, old men), calculate 
  + the mean 
  + the median 
  + the range
  + the standard deviation
  + the variance

**Task 5**: Calculate the confidence intervals for the numeric variable

**Task 6**: Calculate the confidence intervals for the numeric variable for each configuration of nominal and categorical variables

## Load data 


Load data


In [None]:
# group 1
dat <- readxl::read_excel()


## Task 1

Calculate the overall mean, median, and standard deviation of the numeric variable.


In [None]:
dat %>%
  dplyr::summarise()


## Task 2

Provide the frequencies of nominal and categorical variables


In [None]:
dat %>%
  dplyr::group_by() %>%
  dplyr::summarise( = n())


## Task 3

Cross-tabulate nominal and categorical variables


In [None]:
ftable



## Task 4

For each configuration of nominal and categorical variables (e.g. young women, young, men, middle aged women, middle-aged men, old women, old men), calculate 
  + the mean 
  + the median 
  + the range
  + the standard deviation
  + the variance.

You need the functions: `mean`, `median`, `min`, `max`, `sd`, `^2`


In [None]:
dat %>%
  dplyr::group_by() %>%
  dplyr::summarise()


## Task 5

Calculate the confidence intervals for the numeric variable


In [None]:
Rmisc::CI(, ci=0.95) 



## Task 6

Calculate the confidence intervals for the numeric variable for each configuration of nominal and categorical variables

Tip: split the data.


In [None]:
slat7806_low <- dat  %>%
  dplyr::filter( & )
Rmisc::CI(, ci=0.95)


[Back to top](#descriptive_statistics)

