# Scatter Plots

**NOTE: This is an R Notebook**

For this lab, we are going to look at using the **ggplot** library for rendering scatter plots and visually exploring data via the scatter plots using different _aesthetics_.


Reference Site: http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/
 
  * [Local Mirror](http://indigo.sgn.missouri.edu/static/mirror_sites/zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/)


**Current GGPlote Reference** 
  * http://docs.ggplot2.org/current/
  * [Local Mirror](https://indigo.sgn.missouri.edu/static/mirror_sites/docs.ggplot2.org/current/)
 

Our data set is some Bike Share data, spread into two files:
  * datasets/bikeshare/day.csv
  * datasets/bikeshare/hour.csv


## Data Summary

#### Creator

```
Hadi Fanaee-T

Laboratory of Artificial Intelligence and Decision Support (LIAAD), University of Porto
INESC Porto, Campus da FEUP
Rua Dr. Roberto Frias, 378
4200 - 465 Porto, Portugal
```

#### Background 


Bike sharing systems are new generation of traditional bike 
rentals where whole process from membership, rental and return 
back has become automatic. 
Through these systems, the user is able to easily rent a bike from a particular position and return 
back at another position.
Currently, there are over 500 bike-sharing programs around the world which is composed of 
over 500-thousand bicycles. 
Today, there exists great interest in these systems due to their important role in traffic, 
environmental and health issues. 

Apart from interesting real world applications of bike sharing systems, 
the characteristics of data being generated by
these systems make them attractive for the research. 
Opposed to other transport services such as bus or subway, the duration
of travel, departure and arrival position is explicitly recorded in these systems. 
This feature turns bike sharing system into
a virtual sensor network that can be used for sensing mobility in the city. 
Hence, it is expected that most of important
events in the city could be detected via monitoring these data.


#### Data Set

Bike-sharing rental processes is highly correlated to the environmental and seasonal settings. 
For instance, weather conditions,
precipitation, day of week, season, hour of the day, etc. can affect the rental behaviors. 
The core data set is related to  
the two-year historical log corresponding to years 2011 and 2012 from Capital Bikeshare system, Washington D.C., USA which is 
publicly available in http://capitalbikeshare.com/system-data. 
We aggregated the data on two hourly and daily basis and then 
extracted and added the corresponding weather and seasonal information. 
Weather information are extracted from http://www.freemeteo.com. 

#### Associated tasks

  * Regression: 
	 *	Predication of bike rental count hourly or daily based on the environmental and seasonal settings.
  * Event and Anomaly Detection:  
	 *	Count of rented bikes are also correlated to some events in the town which easily are traceable via search engines.
	 *	For instance, query like "2012-10-30 washington d.c." in Google returns related results to Hurricane Sandy. Some of the important events are identified in [1]. Therefore the data can be used for validation of anomaly or event detection algorithms as well.


In [None]:
# Read in the Day CSV file, look at the head
BS_day = read.csv("/dsa/data/all_datasets/bikeshare/day.csv",header=TRUE,sep=",")
head(BS_day)

In [None]:
# Another look, including data type
str(BS_day)

In [None]:
# Import the library for Grammar of Graphics Plotting
library(ggplot2)

# Setup a graphic that has the (B)ike (S)hare data
#      # The data set
#      #   # Add an aesthetic
#      #   #   # Plot the temp versus the count
ggplot(BS_day, aes(temp, cnt)) + geom_point() 
                             # Add points rendering

## <span style="background:yellow">YOUR TURN</span>

Explore visualization of two additional pairs of variables.
Reference to the `str()` command above for the variable names.

#### 1)

In [None]:
# Add your code below this comment
# ----------------------------------








#### 2)

In [None]:
# Add your code below this comment
# ----------------------------------








We are going to add a couple more tricks :
  * Variable Assignment
  * Colors
  * Transparency (alpha)

In [None]:
# Import the library for Grammar of Graphics Plotting
library(ggplot2)


    # Setup a graphic that has the (B)ike (S)hare data
    #       # The data set
    #       #   # Add an aesthetic
    #       #   #   # Plot the temp versus the count
p <- ggplot(BS_day, aes(temp, cnt))
# p is a variable holding the plot... it will not be rendered.

   #   assign to p the result of p + points
   #   # Add points rendering
p <- p + geom_point(color="mediumpurple1",alpha = 0.2)

** NOTE ** <span style="background:yellow">Nothing rendered.</span>

Eveything is being held in `p`

In [None]:
p  # Just dump the memory

We get a general idea of the data for these two variables. 
**NOTE** the `aes()`  is the positional information.


## <span style="background:yellow">YOUR TURN</span>

Choose the most cluttered visualization you have above, copy-paste-edit code into the cells as specififed.


#### 1) Experiment until you find your personal preference for color.
  * See a list of [colors in R](https://indigo.sgn.missouri.edu/static/PDF/Rcolor.pdf)

In [None]:
# Add your code below this comment
# ----------------------------------







#### 2) Experiment different alpha settings for this plot.

In [None]:
# Add your code below this comment
# ----------------------------------







We have alternative color techniques as well.

### Rendering variables as a color
Lets add one more variable, as color.

In [None]:
# Import the library for Grammar of Graphics Plotting
library(ggplot2)

# Setup a graphic that has the (B)ike (S)hare data
#      # The data set
#      #   # Add an aesthetic
#      #   #   # Plot the temp versus the count
ggplot(BS_day, aes(temp, cnt, color=weekday)) + geom_point() 
                          #                  # Add points rendering
                          # Set the color here  


Note, week day is coded as an ordinal variable from 0 - 6.  
GGPlot naturally wants to use a color gradient for rendering this feature.

Look at colorings:  
https://indigo.sgn.missouri.edu/static/mirror_sites/zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/#working-with-colors



In [None]:
# Import the library for Grammar of Graphics Plotting
library(ggplot2)

# Setup a graphic that has the (B)ike (S)hare data
#      # The data set
#      #   # Add an aesthetic
#      #   #   # Plot the temp versus the count
ggplot(BS_day, aes(temp, cnt, color=factor(weekday))) + geom_point() 
                          #      #                # Add points rendering
                          #      # tell GGPlot this is a factor, AKA nominal
                          # Set the color here  


Notice the automagical change GGPlot applied, based on the information that weekday was a factor.

## <span style="background:yellow">YOUR TURN</span>

#### 1) Answer the following question, without using Google or other references.


We have seen that we can specify the color as part of the aethetic as well as modifer to the point.  
**Why is the `facor()` visualiztion allowed in the `aes()` and not the `geom_point()`?**


Just a verbal hypothesis.

#### 2) Render a plot with three appropriate variables, choosing one of the varaibles as a factor or gradient color.

In [None]:
# Add your code below this comment
# ----------------------------------







## <span style="background:yellow">YOUR TURN</span>

In each cell of this Your Turn you will consult reference documentation, tutorials, and the Internet to add the specified chart element.
The base chart should be your favorite variable pairing used previously.

**References:**  
  * https://indigo.sgn.missouri.edu/static/mirror_sites/docs.ggplot2.org/current/
  * http://indigo.sgn.missouri.edu/static/mirror_sites/zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/

#### 1) Chart Title

In [None]:
# Add your code below this comment
# ----------------------------------







#### 2) Label the X and Y axis with better, full words.

In [None]:
# Add your code below this comment
# ----------------------------------







#### 3) Change the background  color the pot.

In [None]:
# Add your code below this comment
# ----------------------------------







# SAVE YOUR NOTEBOOK   -- Then "Close & Halt" the notebook.