# R Demo for ICCWS

Welcome to the demo notebook for R.  Here you will find a couple quick examples of what R is capable of.


You can try out the R code in the cells below by clicking on the play button in the top left of the cells.

The graphing example in this notebook uses the Red Wine Quality open dataset which can be found at https://archive.ics.uci.edu/ml/datasets/wine+quality





# Math

R is a language that was originally designed for statistical analysis and as such it has a variety of built in math functions.  The great thing about R is that the base language is one of the easiest to learn, making it a great beginner language.  

Aside from it being case sensitive, it is a very forgiving language as it doesn't care about spaces and indents and it also has a very robust error system to help you troubleshoot any issues.

One of the simplest but most useful functions worth knowing in R is the **summary()** function which performs a variety of mathematical functions on the data you give it and outputs the results.

The code below runs the summary function on a column from the dataset.  Try changing the column number in the last line of code to see summaries of different columns.

1 - fixed acidity  
2 - volatile acidity  
3 - citric acid  
4 - residual sugar  
5 - chlorides  
6 - free sulfur dioxide  
7 - total sulfur dioxide  
8 - density  
9 - pH  
10 - sulphates  
11 - alcohol  
12 - quality  

In [None]:
# This code imports our dataset and puts into the variable called dat
dat <- read.csv("https://github.com/BrockDSL/ICCWS-R-and-Python-Workshops/raw/master/winequality-red.csv")


# This code ensures that the dataset is in the form of a dataframe which makes it easier to work with
dat <- data.frame(dat)


# This code runs the summary function on a specified column
summary(dat[,12])

# Visualizations

R has great built in functions like **plot()**, **pie()**, and **barplot()** that can make quick visualizations from data.  Below is an example of a scatterplot using the built in visualization functions.

In [None]:
x <- dat[,12]
y <- dat[,11]

plot(x,y, main = "Alcohol VS Wine Quality", xlab = "Quality", ylab = "Alcohol")

## ggplot2

On top of the great built in functions, there is a package called ggplot2 that can greatly enhance the flexibility of visualizations in R with just a small bump in difficulty.

Before a new package can be used in R it needs to be installed and added to your library.

Run the cell below to set up ggplot2 in this notebook.

In [None]:
install.packages("ggplot2")
library(ggplot2)

Now that ggplot2 is installed you are now able to use all of the functions that come with that package.  

Run the cell below to see the same visualization as before but made in ggplot2.

In [None]:
ggplot(dat, aes(x=dat[,12], y=dat[,11])) + geom_point() + ggtitle("Alcohol VS Wine Quality") +
  xlab("Quality") + ylab("Alcohol")

# Dataset citation

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.