# Appendix C Data 

## C.1 Data importation

The first step in data analysis is to load the data set into your workspace. We will examine comma-separated value (CSV) data  and data from the Internet in this section. 

### C.1.2 Web scraping

These days, it's increasingly common to pull data from online sources. For example, say we wanted to know the population of European countries. This is [easily found](https://en.wikipedia.org/wiki/Demographics_of_Europe) on Wikipedia. We may want to analyze this kind of data in `R`. We can use the package `htmltab` to scrap data from the Internet. 

In [5]:
library(htmltab)

The syntax of this command is:

```
htmltab(<url>, <table identifier>)
```
Let's try it with the Wikipedia page above.

This did not produce what we want. The reason is that there are many tables on this page, and by default `htmltab()` just takes the first one it finds. We can pass a number as the second argument in order to take the second, third, etc.

To get `europe.pop` into a usable format we need to do a bit more work. 

In [None]:
htmltab(url, 6) %>% as_tibble %>% slice(-60) %>%
    mutate(pop = as.integer(gsub(",", "", Population))) %>% 
    ggplot + geom_col(aes(x=`Country/territory`, y=pop)) + coord_flip()

In making the plot above, we use quite a few new commands. We will learn more about data manipulations in the following sections, and we will see more about `ggplot2` in Chapter 2.