# Welcome to Azure Notebooks!

[R](https://www.r-project.org/) is a free programming language and software environment for statistical computing and graphics. The R language is very popular for statistical analysis and data analysis.

In the following exercise we will give you a taste of what using R is like.

We have provided some data for you, and tidied it up so it’s ready to analyse. You can move through the blocks of code below by clicking on the code (within the grey boxes), then clicking the `Run` button above.

Exercise 1: Introduction To Jupyter Notebooks
===

The purpose of this exercise is to get you familiar with using Jupyter Notebooks. Don't worry if you find the code difficult to understand, as this is not an R course. You will slowly learn more about the R programming language as you go, and you definitely don't need to understand every line of code.

Step 1
---

Notebooks contain blocks of code that you can execute, such as the grey box below.

Give it a go.

** Click on the code below, then press `Run` in the toolbar above (or press __Shift+Enter__) to run the code. **

In [None]:
print("The code ran successfully!")

If all went well, the code should have printed a message for you.

Step 2
---

Let's print a message you choose this time. 

** In the block of code below, write a message between the quotation marks. It is OK to use spaces, numbers, and letters. Your message should appear red in colour.**

In [None]:
###
# WRITE A MESSAGE BETWEEN THE SPEECH MARKS IN THE LINE BELOW, THEN HIT RUN.
###
print("type something here!")
###

# It's ok to use spaces, numbers, or letters. Your message should look red.
# For example: print("this is my message")

You will notice the hash symbols `#` in the code block above. Anything after a `#` is ignored by the computer (within Jupyter Notebook the text appears blue in colour). Using `#` at the start of a line allows you to comment the code so that it is human readable and easier to follow.

Step 3
---

R allows us to save data to use later. In this exercise, we will save a message you create.

** In the code below, write a message within the quotation marks. Again, it is OK to use spaces, numbers, and letters, as long as they are within the quotation marks. **

In [None]:
###
# WRITE A MESSAGE BETWEEN THE SPEECH MARKS IN THE LINE BELOW, THEN PRESS RUN
###
my_msg <- "type something here!"
###

print(my_msg) 

OK, what happened here? 

In the real world, we might put items into a cardboard box for storage, like toys, DVDs, or photo albums. We label the box, say "My DVDs", to identify what is inside the box.

In R, we can do something similar: when we want to store information, we use **variables** (the cardboard box in our analogy), and the variable is given a name to help us identify what it stores so we can refer back to it.

This is what you've just done in the code block above.

You created a message inside the quotation marks, then you saved it to a **variable** called `my_msg`.

```
my_msg <- "This is my message that I'm going to forget so I want to save it for later!"
           ↑↑↑
           The message you created
 
my_msg <- "This is my message that I'm going to forget so I want to save it for later!"
       ↑↑↑
       The arrow (pointing left) is called the assignment symbol, and saves the information on the right
     
my_msg <- "This is my message that I'm going to forget so I want to save it for later!"
↑↑↑
The name of your variable (what the arrow is pointing towards)
```

Note that in R, variable names cannot contain spaces, or begin with a number. As per the [R FAQ](https://cran.r-project.org/doc/FAQ/R-FAQ.html#What-are-valid-names_003f), a syntactically valid name:

> _... consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. Names such as ".2way" are not valid, and neither are the reserved words_

Reserved words include: `if` `else` `repeat` `while` `function` `for` `in` `next` `break` `TRUE` `FALSE` `NULL` `Inf` `NaN` `NA` `NA_integer_` `NA_real_` `NA_complex_` `NA_character_` `...` `..1` `..2` `..3` (etc)

Be mindful that variable names should help describe the information you are saving; you should aim for variable names that are descriptive.

Step 4
---

Let's try using variables again, but save a number inside our variable this time. Remember, the variable is on the *left hand side* of the `<-` assignment symbol and is the equivalent of a labelled box. The information on the *right hand side* is the information we want to store inside the variable (or a box in our analogy).

** In the cell below replace `<addNumber>` with any number you choose. ** The number should not contain spaces or commas.

Then __run the code__.

In [None]:
###
# REPLACE THE <addNumber> WITH A NUMBER OF YOUR CHOICE
###
my_first_number <- <addNumber>
###

# Typing the name of the variable prints the information it stores to screen
my_first_number

# Add 1 to our variable
my_first_number + 1

# Did this calculation affect our variable? Let's check...
my_first_number

# What's the square root of our number?
sqrt(my_first_number)

What happened here?

In our real world example, we might store spare coins inside a cardboard box. We can use the money in different ways, for example, we may want to count how much money is in the box, take some money out of the box then deposit it in the bank, or add more money to the box.

Similarly, in R, when we save numbers inside a variable, we can perform various calculations to the numbers we store inside the variable. Above, we asked R to add a value of one to the number that we stored in the `my_first_number` variable. We also asked R to calculate the square root of our variable using the function `sqrt()`.

N.B. Performing calculations on the `my_first_number` variable will not change the value it has stored. If you want to change the value of `my_first_number`, you need to use the assignment symbol `<-`. If you do not use the assignment symbol, your information/results won't be saved.

How does the `sqrt` function work?

```
sqrt(...)
↑↑↑
```
You are calling R to perform a **function** called `sqrt`, which computes the square root of the value supplied to the function. There are many functions available in R, stored within **libraries**.

```
sqrt(...)
    ↑   ↑
```
To use functions, you need to specify the name of the function, followed by round brackets (parentheses). The pieces of information you provide within the brackets are known as **arguments**. The `sqrt` function only takes one argument.
```
sqrt(my_first_number)
     ↑↑↑
```
In the example above, we supplied the `sqrt()` function a variable name `my_first_number` between the brackets, and the result, i.e. the square root of `my_first_number`, is printed to the screen.               
      

Step 5
---

Let's make a graph from some data. First we need to load the appropriate library to create a graph. The first line of code below loads the `ggplot2` library that contains the functions for graphing capabilities in this exercise. At the start of (most) R programming exercises, we have to load **libraries** to help us perform tasks. 

** Click on the code below, then hit the `Run` button to create a scatter plot using the `ggplot2` library. You do not need to edit any of the code. **

In [None]:
# Load the required library for plotting functions
library("ggplot2")

# Create data to plot
# N.B. Input to ggplot2 needs to be a data frame, i.e. x and y must be stored in the same variable
test_data <- data.frame(x.values = c(1, 2, 3), 
                        y.values = c(5, 4, 6))

# The following code makes a scatter plot, using our continuous x and y values specified above
ggplot(data = test_data, aes(x = x.values, y = y.values)) +
# Specify type of plot as scatter plot
geom_point() +
# x-axis label
xlab("x value") +
# y-axis label
ylab("y value") +
# Title of plot
ggtitle("My test plot using the ggplot2 library") +
# Align title to centre
theme(plot.title = element_text(hjust = 0.5))

If you'd like, have a play with the code:

* Change the `x.values` and `y.values` stored within the variable `test_data` and see how the graph changes. Make sure they have the same count of numbers in them (i.e. currently `x.values` and `y.values` have three numbers each).


Step 6
---

From time to time, we will load data from text files, rather than create it ourselves. You can't see the text files in your browser because they are saved on the server running this website. We can load the files using R though. Let's load a text file, inspect it, then graph it.

#### Run the code block below to load data about chocolate bars and inspect the data. You do not need to edit the code.

In [None]:
# Load data with information about chocolate bars, and save it to a variable called 'choc_data'
choc_data <- read.delim("Data/chocolate data.txt")

# Use the function str() below to inspect the data
# str displays the structure of the data
str(choc_data)

# To view the data, use the head() function
# head returns the first part of the data
head(choc_data)

The `str` function returns a compact display of the structure of an object. It informs us of the **class** (type) of the object, and its contents. 

By performing the `str` function on `choc_data`, we have determined our data is of the class `data.frame`, has 5 variables, and 100 observations (abbreviated "obs."). The names of each variable are shown after the $ symbol:

* weight;
* cocoa_percent;
* sugar_percent;
* milk_percent;
* customer_happiness.

The `head` function, by default, returns the first six rows of our object. For our object `choc_data`, each row (horizontal) represents the information about one chocolate bar, and each column (vertical) represents the different variables. For example, the first chocolate bar is:
* weighed 185 grams;
* is 65% cocoa;
* is 11% sugar;
* is 24% milk;
* and a customer said they were 47% happy with it.


Step 7
---

Let's graph features from the `choc_data` variable we saved earlier. We can graph some of these features in a scatter plot, referred to as the `geom_point` function in the library `ggplot2`. Let's place `customer_happiness` on the x-axis, and `cocoa_percent` on the y-axis. 

### In the cell below replace:
#### 1. `<xValues>` with `customer_happiness`
#### 2. `<yValues>` with `cocoa_percent`
#### then __run the code__.

In [None]:
###
# REPLACE <xValues> WITH customer_happiness and <yValues> WITH cocoa_percent
###
ggplot(data = choc_data, aes(x = <xValues>, y = <yValues>)) +
###
geom_point() +
xlab("Customer happiness") +
ylab("Cocoa percent") +
ggtitle("Customer satisfaction with chocolate bars given cocoa percentage") +
theme(plot.title = element_text(hjust = 0.5))

In this graph, every chocolate bar is represented by a single point. Later, we will analyse this data with AI.

Conclusion
---

__Well done!__ That's the end of programming Exercise 1.

You can now go back to the course and click __'Next Step'__ to move onto some key concepts of AI - models and error.