# R Discussion Day 3
## Data Visualisation or DataViz

- Usually the first step of a data analysis is graphical data exploration
- Graphical exploration complements descriptive statistics
- The most important aim is to get an overview of the dataset
    - Where is data centered?
    - How is the data spread (symmetric, skewed…)?
    - Any outliers?
    - Are the variables normally distributed?
    - How are the relationships between variables:
        - Between dependent and independents
        - Between independents  
        
        ### Gallery of Graphs 
        
      ![Graphs](https://snipboard.io/0fpz1M.jpg)
      ![Graphs 2](https://snipboard.io/WtCSEa.jpg)

In [None]:
# Start with installing this package and delare the library to use
install.packages("tidyverse")
library(tidyverse)

In [None]:
# Load the gapminder package
library(gapminder)

# Load the dplyr package
library(dplyr)

# Look at the gapminder dataset
gapminder

# Filter Verb
The `filter` verb extracts particular observations based on a condition. In this exercise you'll filter for observations from a particular year.

### Exercise
- Add a `filter()` line after the pipe (`%>%`) to extract only the observations from the year 1957. Remember that you use `==` to compare two values.

In [None]:
# Filter the gapminder dataset for the year 1957
gapminder %>%
    filter(year==1957)

# Filtering for one country and one year

You can also use the `filter()` verb to set two conditions, which could retrieve a single observation.

Just like in the last exercise, you can do this in two lines of code, starting with `gapminder %>%` and having the `filter()` on the second line. Keeping one verb on each line helps keep the code readable. Note that each time, you'll put the pipe `%>%` at the end of the first line (like `gapminder %>%`); putting the pipe at the beginning of the second line will throw an error.

### Exercise
- Filter the `gapminder` data to retrieve only the observation from China in the year 2002.

In [None]:
# Filter for China in 2002
gapminder %>%
    filter(year==2002,country=="China")

# Arranging observations by life expectancy

You use `arrange()` to sort observations in ascending or descending order of a particular variable. In this case, you'll sort the dataset based on the `lifeExp` variable.

### Exercise
-   Sort the `gapminder` dataset in ascending order of life expectancy (`lifeExp`).
-   Sort the `gapminder` dataset in descending order of life expectancy.

In [None]:
# Sort in ascending order of lifeExp
gapminder %>%
    arrange(lifeExp)
  
# Sort in descending order of lifeExp
gapminder %>%
    arrange(desc(lifeExp))

# Filtering and arranging

You'll often need to use the pipe operator (`%>%`) to combine multiple dplyr verbs in a row. In this case, you'll combine a `filter()` with an `arrange()` to find the highest population countries in a particular year.

### Exercise
-   Use `filter()` to extract observations from just the year 1957, then use `arrange()` to sort in descending order of population (`pop`).

In [None]:
# Filter for the year 1957, then arrange in descending order of population
gapminder %>%
    filter(year==1957) %>%
    arrange(desc(pop))

In [None]:
mpg

In [None]:
hist(mpg$cty)

In [None]:
hist(mpg$cty,
     xlab   = "Miles Per Gallon (City)",
     main   = "Histogram of MPG (City)",
     breaks = 12,
     col    = "dodgerblue",
     border = "darkorange")

## How to DataViz 101

![Single Value](https://snipboard.io/W9YAS7.jpg)
![Relationships](https://snipboard.io/NB6IYc.jpg)
![Distrbution](https://snipboard.io/fM7SPt.jpg)
![Trend](https://snipboard.io/Iu17x4.jpg)
![Part to whole](https://snipboard.io/IVLjS4.jpg)
![Flow](https://snipboard.io/ip4f92.jpg)

Refrences:
https://book.stat420.org/summarizing-data.html

https://www.r-exercises.com/start-here-to-learn-r/

https://arc.lib.montana.edu/book/statistics-with-r-textbook/table-of-contents.html
https://crumplab.com/rstatsforpsych/descriptives.html



https://r4ds.had.co.nz/data-visualisation.html
https://jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html#geometric-objects


for something
https://snipboard.io/
https://stackedit.io/
https://jakevdp.github.io/PythonDataScienceHandbook/