## Bob Ross example: R

Bob Ross left a long Air Force career to start a painting career and eventually hosted a PBS show called the Joy of Painting, in which Bob paints one landscape per episode in the [wet-on-wet style](https://en.wikipedia.org/wiki/Wet-on-wet). The Joy of Painting ran in the 1980s-90s, but several years ago Twitch streamed all episodes in a marathon. Now you can watch them on youtube, netflix etc.

We're going to use a dataset [jwilber](https://github.com/jwilber/Bob_Ross_Paintings) grabbed from [twoinchbrush](twoinchbrush.com), which has all photos, video and color data for all of the show's paintings.

### A light intro to coding
This is not a coding tutorial, really. It's a demo to get you excited about what you can do with a little bit of coding. It will expose you to the basic method for running code in a Jupyter notebook (this document!) using simple functions that will:

- load the dataset in .csv format (think of this as a spreadsheet) using the [dplyr](https://dplyr.tidyverse.org/) and related libraries for R
- view painting names from all seasons
- plot blobs or point clouds with colors used in a painting of your choice, or one that is randomly chosen
- generate summary tables about how often Bob used one color or the other in the Joy of Painting

### python v. R demos
In this demo you will use [R](https://www.r-project.org/), a widely-used open-source language for scientific and statistical computing.

This demo also has a version in python.

You'll see hardly any difference between the two versions! They are meant to work similarly. If you decide to follow the 'next_bobross' demos, these R/python starters and the modules they load give you code to refer back to as you figure out which language(s) you like to use.

## Part 1: Avert your eyes!
This code chunk has all functions you'll need. Don't worry about what it means. It defines the functions we need in the demo and is here for later reference, if you want it. 

### Run then skip ahead to Prt 2.
Run the following section in the notebook by selecting it and pressing Crtl+Enter, or pressing the 'play' icon at the top of this notebook.

In [None]:
library(readr, warn.conflicts = F)
library(dplyr, warn.conflicts = F)
library(ggplot2, warn.conflicts = F)
library(rlang, warn.conflicts = F)
library(purrr, warn.conflicts = F)
library(tidyr, warn.conflicts = F)
library(stringr, warn.conflicts = F)


# HELPERS
get_bobross <- function(){
  # CHANGE FILEPATH TO GITHUB SITE
  bob <- read_csv("https://raw.githubusercontent.com/littlebuttermilk/ncssm_sessions/master/data/bob_ross_paintings.csv",
                  col_types = cols()) %>%
  mutate(color_hex = str_replace_all(color_hex, "[\\[\\s\\]']", "") %>% str_split(., ","),
         # uppercase hex causes problems for ggplot2?
         color_hex = map(color_hex, tolower),
         colors = str_replace_all(colors, "[\\[\\]']|\\\\r|\\\\n", "") %>% str_split(., ","),
         colors = map(colors, str_trim))
}

get_help <- function(){
  # TODO
  # help file for students who forget usage
  # either with one argument, bare unquoted function name, to print usage for that fun
  # or no arguments to print all
}


# VIEW DATA
get_painting_names <- function(random = FALSE){
  # error rather than calling the fun as teaching moment
  if (!exists("bob", .GlobalEnv))
    stop("You haven't loaded the dataset! Run get_bobross() first.")
  
  if (random)
    sample(bob$painting_title, 1)
  else
    bob$painting_title
}

plot_blobs <- function(painting="Downstream View"){
  if (!exists("bob", .GlobalEnv))
    stop("You haven't loaded the dataset! Run bob = get_bobross() first.")
  if (!is.character(painting))
    stop("painting_title must be the name of a painting in quotations. Run get_help().")
  
  d <- filter(bob, painting_title == painting) %>% 
    unnest(cols = c(colors, color_hex)) %>% 
    select(colors, color_hex) %>%
    mutate(x = runif(n(), min = -8, max = 8),
           y = runif(n(), min = -8, max = 8))
  
  ggplot(d, aes(x, y, color = colors)) + geom_point(alpha = .7, size = 32) +
    theme_void() + scale_color_manual(values = d$color_hex) +
    theme(legend.position="none") +
    annotate('text', x = d$x, y = d$y,
             label = d$colors, size = 3, color = 'grey20') +
    xlim(-1.3*8, 1.3*8) + ylim(-1.3*8, 1.3*8) +
    ggtitle(label = painting)
}



plot_cloud <- function(painting="Downstream View"){
  if (!exists("bob", .GlobalEnv))
    stop("You haven't loaded the dataset! Run bob = get_bobross() first.")
  if (!is.character(painting))
    stop("painting_title must be the name of a painting in quotations. Run get_help().")
  
  d <- filter(bob, painting_title == painting) %>% 
    unnest(cols = c(colors, color_hex)) %>% 
    select(colors, color_hex)
  
  color_vals <- d$color_hex
  
  d <- mutate(d, 
              xc = runif(n(), min = -8, max = 8),
              yc = runif(n(), min = -8, max = 8)) %>%
    mutate(cloud = map2(xc, yc, ~ tibble(x = runif(100, min = .x-1.5, max = .x+1.5),
                                         y = runif(100, min = .y-1.5, max = .y+1.5)))
           ) %>% unnest(cols = cloud) %>%
    select(colors, color_hex, x, y)
  
  ggplot(d, aes(x, y, color = colors)) + geom_point(alpha = .6, size = 10) +
    theme_void() + scale_color_manual(values = color_vals) +
    theme(legend.position="none") +
    xlim(-1.3*8, 1.3*8) + ylim(-1.3*8, 1.3*8) +
    ggtitle(label = paste0(painting, "\n", "with happy little clouds"))
}

## Part 2: Load the data
Run the following code chunk (select the chunk then press Ctrl+Enter).

**Here you run the get_bobross() function to**

- load the csv file
- into a 'data frame'
- called 'bob', which holds all of the data here

#### What is a data frame?
Think of it as a spreadsheet. It has rows and columns, and each cell holds some data: a word, a number, a list of words.

![](https://d33wubrfki0l68.cloudfront.net/6f1ddb544fc5c69a2478e444ab8112fb0eea23f8/91adc/images/tidy-1.png)

Image source: r4ds.had.co.nz (R for Data Science)

In [None]:
bob = get_bobross()
bob

## Part 3: First look at the data

Subsequent functions we run now can access the 'bob' dataset.

How do you know you did anything when the code gives no output?

**Here you run bob.columns to**
- access a list of all column names for bob

**Then you run the function get_painting_names() to**

- print all painting names across all seasons of the Joy of Painting
- in other words, you are printing the column called 'painting_title' in the bob data frame

Since there are a lot of names, the display will only show the first and last several items, with the ellipses '...' in between to remind you there's more in the column than what is shown.

In [None]:
colnames()

In [None]:
get_painting_names()

## Part 4: Functions and arguments

A function does something then produces an output. It can take inputs, called arguments, or not. This is a universal concept in all coding languages, mathematics and beyond. We have seen two functions so far that take no inputs: get_bobross() and get_painting_names().

### Arugments and defaults
Many functions can be run with or without inputs (aka arguments). If you do not specify what an argument's value should be, then the function should use some default value.

The function get_painting_names() in fact has one argument, called random. It tells the function whether to 

- print *all* painting names by specifying `random = False` in the parentheses
- or to *pick a random one* by specifying `random = True`

The default is `random = False`, as you can see from the fact that get_painting_names(), which specifies no arguments, gave us all names.

**Here you run get_painting_names(random = True) to**

- choose a random painting name from all possible values
- run it several times to get different results!

In [None]:
get_painting_names(random = True)

## Part 5: Make a plot of the colors used in a painting
Plotting data is a great way to look at it. 

Our dataset has the colors Bob used for each painting, so let's check it out!

**Here you run plot_blobs() to**

- fill in circles with each of the colors used
- in the painting "Downstream View"
- where the circle locations are randomly chosen (run it several times!)
- but the colors are the actual ones Bob used, as specified in our dataset

In [None]:
plot_blobs()

### Downstream View, from twoinchbrush.com

![](https://www.twoinchbrush.com/images/painting329.png)

### Blobs for different paintings
plot_blobs takes one argument, the name of the painting whose colors you want to plot. "Downstream View" is the default painting to use if you specify no arguments. Let's change it up.


**Here you run plot_blobs(painting = "Through the Window") to**

- plot color blobs as above, but for the specified painting instead of the default

In [None]:
plot_blobs(painting = "Through the Window")

### arguments without the `=`
If there is no confusion about which argument you are trying to specify, you can just put the argument value without the `argument name = argument value` syntax. This makes your code a little more concise and easier to read.

Since plot_blobs has only one argument there is no confusion, and you can run

In [None]:
plot_blobs("Through the Window")

### Through the Window

![](https://www.twoinchbrush.com/images/painting392.png)

## Part 6: So much plotting you can do

pandas, and python in general, has a *huge* amount of flexibility in plots you can create from data.  Here, we just *take one small step* away from plotting blobs to plotting point clouds with the plot_cloud function. You use it just like the plot_blobs function.

**Here you run plot_cloud("Through the Window") to**

- generate a random point cloud for each color used in "Through the Window"
- plot it, with the real color value as in plot_blobs

In [None]:
plot_cloud("Through the Window")

In [None]:
get_painting_names(True)

## Part 7: Output of one, input of other

Each function has some output, and usually the function *returns* an object that can be used elsewhere in your code.

For example, we saw the get_bobross() function returns a data frame, which we called bob by writing the code `bob = get_bobross()`.

Similarly, we saw `get_painting_names(True)` returns a single painting name in quotations. Since this is exactly the kind of argument our plotting functions need, we can use the output of get_painting_names as the argument in plot_cloud!

**Here you run plot_cloud(get_painting_names(True)) to**

- first randomly choose a painting name (using the statement inside the outermost parentheses)
- and use it as the argument for plot_cloud
- to show color point clouds for a that random painting

In [None]:
plot_cloud(get_painting_names(True))
plot_blobs(get_painting_names(True))

### or save the painting name for multiple uses
Using the `=` statement just like with `bob = get_bobross()` we can assign a randomly chosen painting name to an object for later use. Let's call it `rpaint`. This way we can plot blobs and clouds for the same painting. 

**Here you run rpaint = get_painting_names(True) to**

- select a random painting name
- store it as an object called rpaint

**then plot_cloud(rpaint), plot_blobs(rpaint) to**

- show blobs and clouds for the random painting chosen

In [None]:
rpaint = get_painting_names(True)
rpaint

In [None]:
plot_cloud(rpaint)

In [None]:
plot_blobs(rpaint)

### TODO FOR R: Part 8: Summarize the data

Here we will just *dip a toe* into the vast pool of options for summarizing data with pandas in python. This section is not indended to show you *how* to do that, which requires more coding than we can cover, but to give you the idea of what can be done with relatively little experience or work.

We will answer two questions...

#### How many times each season did Bob use a given color?

#### How many times did Bob use a given color, across all seasons?

### How many times each season?

The function colors_byseason by default returns the total number of times Bob used each color, for each television season.

**Here you run colors_byseason() to**

- return a data frame with the number of times Bob used a color, for each color and season

In [None]:
#colors_byseason()

### Too many columns to look at
Looking at big tables can be annoying. Let's look at this information for just *one color*.

The colors_byseason() function as an argument called `color`. Specify a color name to see the output for that color.

**Here you run colors_byseason(color = "Alizarin Crimson") to**

- return a data frame showing the number of times Bob used that color per season

**Important!**

Notice I wrote "Alizarin Crimson", as the color appears in the output of get_painting_names(). Typing "alizarin crimson", "Alizarin_Crimson" or anything else that is not *exactly* a painting name as written in the list will return an error.

This is an important lesson in coding: You need to specify arguments exactly as the function requires them to be. If you wanted to allow variations of the same name, such as "alizarin crimson" or "Alizarin_Crimson", you would need to modify the function to interpret that input correctly!

In [None]:
#colors_byseason("Alizarin Crimson")

#### Example of the error from mis-specifying your argument

Whoops... See what went wrong in the `SystemExit` message. This error message written specially for this tutorial. Often python error messages are helpful, but just as often they are not.  They always come with a bunch of information you might not find useful, so learn to look for the relevant bits.

In [None]:
#colors_byseason(color = "Alizarin")

### How many times across all seasons?

This one is easier.

**Here you run colors_show() to**

- view the total number of times Bob used a color
- across all seasons

**and colors_show("Black Gesso") to**

- view the total number of times Bob used that color across all seasons

In [None]:
#colors_show()

In [None]:
#colors_show("Black Gesso")

# An exercise for you

Actually, the function colors_by_season has two arguments! 

- `painting` which can be set equal to "all" (the default) or a specific painting name, as you saw above
- `stat` which can be set to "total" (the default) to give total number of times a painting was used
- or set `stat` to "mean" to give the *average* number of times a painting was used each season

We might want to know the average number of times a painting was used because the number of episodes in each season varies.

**Use the syntax `argument name = argument value` to input two arguments to colors_by_season and show the average number of times a painting was used, or all of them.**

To specify multiple arguments for a function, use a comma between them inside the parentheses. 

## TODO FOR R: On to the next level!

See `next_bobross_R.ipynb`.