# Loading Data

## Download Rmd Version

If you wish to engage with this course content via Rmd, then please click the link below to download the Rmd file.

[Download load_data.Rmd](rmarkdown/load_data.Rmd)

## Download iris.csv data

[Download iris.csv](data/iris.csv)

## Learning Objectives
- Learn how to load data from CSV files into R using the `read.csv` function.
- Recognize and understand the arguments for the `read.csv` functions, particularly `file` and `header`
- Execute data loading operations and assign the data to a variable for further use
- Understand the significance of the `header` argument and its default value in the `read.csv` function
- Gain awareness of other useful arguments and methods for importing data using `read.csv`

## Loading Data From Files 

More often than not you will need to load data from a file into R that you want to analyse. We are going to use a 
dataset in the file "data/iris.csv" which you can download from the course webpage. While this dataset is already in R as an example dataset, we are loading it here to demonstrate how loading works.
This is a comma-separated values (CSV) format which means that a comma is used to indicate the end of a column.

We need to tell our computer where the file that contains the values is. If we forget this step we'll get an error message when trying to read the file. We can load the data into R using `read.csv`.

Assuming you have downloaded the file into a subdirectory called data within your current working directory, you can execute the following

In [2]:
read.csv(file = "data/iris.csv", header = TRUE)

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
<dbl>,<dbl>,<dbl>,<dbl>,<chr>
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa
4.6,3.4,1.4,0.3,setosa
5.0,3.4,1.5,0.2,setosa
4.4,2.9,1.4,0.2,setosa
4.9,3.1,1.5,0.1,setosa


We have provided two arguments to this function: 
1. __file__ - the name of the file we want to read.
2. __header__ - whether the first line of the file contains names for the columns of data.

The filename needs to be a character string (or string for short), so we put it in quotes. 
The header argument needs to be a boolean, we have set `TRUE` indicating that the data file does have column headers. 


Since we didn't tell it to do anything else with the function's output, the console will display the full contents of the file `iris.csv`.
Try it out.

`read.csv` reads the file, but we can't easily use data unless we assign it to a variable. Let's re-run `read.csv` and 
save the result:

In [None]:
df <- read.csv(file = "data/iris.csv", header = TRUE)

Some of the functions we introduced earlier can be used to summarise the properties of the iris dataset.

## Other options for Reading CSV Files
`read.csv` actually has many more arguments that you may find useful when importing your own data in the future. Take a look at `?read.csv` or `help(read.csv)` for more information on the various arguments. 

## Loading Data With Header

What happens if you forget to put `header = FALSE`? The default value is `header = TRUE`, which you can check  with `?read.csv` or `help(read.csv)`. What do you expect will happen if you leave the default value? Before you  run any code, think about what will happen to the first few rows of your data frame, and its overall size.

## Summary Quiz

In [3]:
# Call the function to display quiz interactively:
source("../../R_functions/quiz_renderer.R")
show_quiz_from_json("questions/summary_load_data.json")