# The Coolest Characters
## Chapter 4.1-4.2 Explaining Variation

In [None]:
# This code will load the R packages we will use
suppressPackageStartupMessages({
    library(coursekata)
})

# set styles
css <- suppressWarnings(readLines("https://raw.githubusercontent.com/jimstigler/jupyter/master/ck_jupyter_styles.css"))
IRdisplay::display_html(sprintf('<style>%s</style>', paste(css, collapse = "\n")))


# This code will make sure the middle rows/columns don't get cut out (ellipsized) when you 
# print out a really large data frame (note: you can adjust the values for max rows/cols)
options(repr.matrix.max.rows=1000, repr.matrix.max.cols=100)

# Load the data frame
characters <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQk_n4m-VBCD7CtcpB1kOsiNDLrPmEOEtlOoaKwDhogE_YeGEW5PYTaOtZaqypEgHRFGWsZ0pdYvt_A/pub?gid=0&single=true&output=csv")


## 1. Our Favorite Fictional Characters

<img src="https://i.postimg.cc/s34kFsZr/xcd-03-B-fictional-chars.png" alt="A collage of the faces of various fictional characters" width = 30%>

There are many popular fictional universes out there and they all have their own unique set of fictional characters, and those characters have a wide range of personality types. 

Who are some of your favorite fictional characters (i.e., from any of your favorite books, movies, TV shows, or games)? What would you say are the prominent personality traits of each character?

Would you classify any of the characters on your list or in the image above as cool, or dorky? Why are some characters considered "cooler" than others?

Today we'll explore and see if we can explain any of the variation in character "coolness".


## 1. What traits are associated with being "cool"?


### The Dataset

**Description:** The `characters` data frame contains characters from various fictional universes. More than [3 million volunteers from the internet](https://openpsychometrics.org/tests/characters/) rated these characters on various traits by using a sliding scale. For example, the character Mushu (from Disney's Mulan), is depicted below being rated on a scale from zero, rude, to 100, respectful.

<img src="https://i.postimg.cc/tXVg4SjZ/rating-characters.png" alt="example of how people rated a character with a slider" width = 40%>

##### Variable Descriptions

- `char_id` The character ID.
- `char_name` The character's name.	
- `uni_id` The universe ID for the book, game, movie, or TV show.
- `uni_name` The universe name of the book, game, movie, or TV show.
- `gender` The gender of the character (M=Male, F=Female, NB=NonBinary).
- `abstract` The average rating of how abstract (vs concrete) the character is on a scale of 0-100 (0-concrete, 100-abstract).
- `agreeable` The average rating of how agreeable (vs stubborn) the character is on a scale of 0-100 (0-stubborn, 100-agreeable).	
- `anxious` The average rating of how anxious (vs calm) the character is on a scale of 0-100 (0-calm, 100-anxious).
- `attractive` The average rating of how attractive (vs repulsive) the character is on a scale of 0-100 (0-repulsive, 100-attractive).	
- `beautiful` The average rating of how beautiful (vs ugly) the character is on a scale of 0-100 (0-ugly, 100-beautiful).	
- `chaotic` The average rating of how chaotic (vs orderly) the character is on a scale of 0-100 (0-orderly, 100-chaotic).
- `chill` The average rating of how chill (vs offended) the character is on a scale of 0-100 (0-offended, 100-chill).	
- `cool` The average rating of how cool (vs dorky) the character is on a scale of 0-100 (0-dorky, 100-cool).	
- `decisive` The average rating of how decisive (vs hesitant) the character is on a scale of 0-100 (0-hesitant, 100-decisive).	
- `emotional` The average rating of how emotional (vs unemotional) the character is on a scale of 0-100 (0-unemotional, 100-emotional).	
- `extrovert` The average rating of how extroverted (vs introverted) the character is on a scale of 0-100 (0-introvert, 100-extrovert).	
- `feminine` The average rating of how feminine (vs masculine) the character is on a scale of 0-100 (0-masculine, 100-feminine).	
- `future_focused` The average rating of how future-focused (vs present-focused) the character is on a scale of 0-100 (0-present-focused, 100-future-focused).	
- `loveable` The average rating of how loveable (vs punchable) the character is on a scale of 0-100 (0-punchable, 100-loveable).
- `messy` The average rating of how messy (vs neat) the character is on a scale of 0-100 (0-neat, 100-messy).		
- `moody` The average rating of how moody (vs stable) the character is on a scale of 0-100 (0-stable, 100-moody).		
- `open_minded` The average rating of how open-minded (vs close-minded) the character is on a scale of 0-100 (0-close-minded, 100-open-minded).
- `reasoned` The average rating of how reasoned (vs instinctual) the character is on a scale of 0-100 (0-instinctual, 100-reasoned).
- `respectful` The average rating of how respectful (vs rude) the character is on a scale of 0-100 (0-rude, 100-respectful).
- `self_assured` The average rating of how self-assured (vs self-conscious) the character is on a scale of 0-100 (0-self-conscious, 100-self-assured).
- `self_disciplined` The average rating of how self-disciplined (vs disorganized) the character is on a scale of 0-100 (0-disorganized, 100-self-disciplined).	
- `tall` The average rating of how tall (vs short) the character is on a scale of 0-100 (0-short, 100-tall).	
- `trusting` The average rating of how trusting (vs suspicious) the character is on a scale of 0-100 (0-suspicious, 100-trusting).


##### Data Source: 

Originally collected at [Open Psychometrics](https://openpsychometrics.org/tests/characters/) made available by Tanya Shapiro as a [Tidy Tuesday data set](https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-08-16).

### 1.1 Take a look at the `characters` data frame.


In [None]:
# 1.1
# Take a look at the data frame



### 1.2 Which trait do you think will help predict coolness? Write your hypothesis as a word equation.

Word equation format: 

outcome = explanatory + other stuff 

1.2 Response:

Modify the word equation to fit your hypothesis:

cool = x + other stuff

## 2. Explaining Variation in `cool`

### 2.1 Create a scatterplot to explore your hypothesis.

Feel free to play around with arguments and chain markers such as color, fill, and labels!

In [None]:
# 2.1
# Modify to fit your hypothesis:

gf_point(y ~ x, data = characters)

### 2.2 Does it look like your explanatory variable helps us explain variation in `cool`?

In other words, does knowing that information about the character help you make a better prediction about how cool or dorky they are rated? What are you seeing in the graph that made you come to this conclusion?

2.2 Response:

Does it help explain variation? How can you tell?




### 2.3 Run the code below to compare a few more hypotheses.  Which hypothesis (including yours) explains the most variation in `cool`, and how can you tell? Which one explains the least variation?

In [None]:
# 2.3
# Run this code

# cool = open_minded + other stuff
gf_point(cool ~ open_minded, data = characters)

# cool = anxious + other stuff
gf_point(cool ~ anxious, data = characters) 

# cool = self_assured + other stuff
gf_point(cool ~ self_assured, data = characters) 

# cool = cool + other stuff
gf_point(cool ~ cool, data = characters)

2.3 Response:






### 2.4 Which hypothesis (including yours) explains the *least* variation in `cool`, and how can you tell?

In other words, which one is mostly "other stuff"?

2.4 Response:



### 2.5 Why does the last graph look like that? Is it perfectly explaining all of the variation in `cool`?

2.5 Response:



## 3. Do I need to adjust my prediction?

Connecting predictions and explained variation.

### 3.1 Run the code below. Then modify it to make a prediction for each "slice" of the graph.

We have roughly cut the graph up into low, medium, and high `self_assured` ratings. For each slice, what would you predict the coolness rating would be for the characters with that range of self-assuredness?

In [None]:
# 3.1
# Run the code to see what it does (ok to ignore warning msg)
# then change the y-values (currently set to zero) to your predictions

# cool = self_assured + other stuff
gf_point(cool ~ self_assured, data = characters) %>%
  gf_vline(xintercept = 33) %>%
  gf_vline(xintercept = 66) %>%
  # adjust predictions below
  gf_point(0 ~ 20, color = "red")%>%
  gf_point(0 ~ 50, color = "red")%>%
  gf_point(0 ~ 75, color = "red")


### 3.2 Run the code below. Then modify it to make a prediction for each "slice" of the graph.

Like the previous plot, we have roughly cut the graph up into low, medium, and high `open_minded` ratings. For each slice, what would you predict the coolness rating would be for the characters with that range of open-mindedness?

In [None]:
# 3.2
# Run the code
# then change the y-values (currently set to zero) to your predictions

# cool = open_minded + other stuff
gf_point(cool ~ open_minded, data = characters) %>%
  gf_vline(xintercept = 33) %>%
  gf_vline(xintercept = 66) %>%
  # adjust predictions below
  gf_point(0 ~ 20, color = "red")%>%
  gf_point(0 ~ 50, color = "red")%>%
  gf_point(0 ~ 75, color = "red")


### 3.3 Which hypothesis had the *least* change in each prediction? Does this suggest it explains more or less variation than the other hypothesis? Why?

Another way of thinking of this, is: Does this suggest it helps us reduce error in our prediction by knowing that information? Or, does one help us make a better prediction of coolness than the other?

3.3 Response: 
