# 9.11: What makes a character loveable?

In [None]:
# Run this code first or nothing else will work!

# This code loads the R packages we will use
library(coursekata)

# This allows our jupyter notebook to print out data frames with a lot of variables (e.g., 40)
options(repr.matrix.max.cols=40)


## 1.0: Explore the Data Frame

Let's get familiar with the larger data set we will be working with. Run the code below to take a look at a few characters (one character is represented in each row). 

**1.1, Think Aloud:** Note anything interesting to you, or that you are curious about, or anything you think we might want to know about the data frame, the characters, or the variables.

In [None]:
# Loads the data frame from a google sheet
characters <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSQ9NRrJDB_KNWNXtApbLGAC_e0C216iG6uSEGPruj2hTCiyYuC8KbtHbrR1OYt9-qxZfDZMaNt0WGW/pub?gid=0&single=true&output=csv")

# Prints out a sample of 5 rows of this data frame
sample(characters, 5)

### The Dataset

**Description:** The `characters` data frame contains characters from various fictional universes. At [Open Psychometrics](https://openpsychometrics.org/tests/characters/), more than 3 million volunteers from the internet rated these characters on various traits by using a sliding scale. For example, the character Mushu (from Disney's Mulan), is depicted below being rated on a scale from zero, rude, to 100, respectful. These ratings have been averaged to produce the data frame `characters`.

<img src="https://i.postimg.cc/tXVg4SjZ/rating-characters.png" alt="example of how people rated a character with a slider" width = 40%>

##### Variable Descriptions

- `char_id` The character ID.
- `char_name` The character's name.	
- `uni_id` The universe ID for the book, game, movie, or TV show.
- `uni_name` The universe name of the book, game, movie, or TV show.
- `gender` The gender of the character (M=Male, F=Female, NB=NonBinary).


- `abstract` A rating of how abstract (vs concrete) the character is on a scale of 0-100 (0-concrete, 100-abstract).
- `agreeable` A rating of how agreeable (vs stubborn) the character is on a scale of 0-100 (0-stubborn, 100-agreeable).	
- `anxious` A rating of how anxious (vs calm) the character is on a scale of 0-100 (0-calm, 100-anxious).
- `attractive` A rating of how attractive (vs repulsive) the character is on a scale of 0-100 (0-repulsive, 100-attractive).	
- `beautiful` A rating of how beautiful (vs ugly) the character is on a scale of 0-100 (0-ugly, 100-beautiful).	
- `chaotic` A rating of how chaotic (vs orderly) the character is on a scale of 0-100 (0-orderly, 100-chaotic).
- `chill` A rating of how chill (vs offended) the character is on a scale of 0-100 (0-offended, 100-chill).	
- `cool` A rating of how cool (vs dorky) the character is on a scale of 0-100 (0-dorky, 100-cool).	
- `decisive` A rating of how decisive (vs hesitant) the character is on a scale of 0-100 (0-hesitant, 100-decisive).	
- `emotional` A rating of how emotional (vs unemotional) the character is on a scale of 0-100 (0-unemotional, 100-emotional).	
- `extrovert` A rating of how extroverted (vs introverted) the character is on a scale of 0-100 (0-introvert, 100-extrovert).	
- `feminine` A rating of how feminine (vs masculine) the character is on a scale of 0-100 (0-masculine, 100-feminine).	
- `future_focused` A rating of how future-focused (vs present-focused) the character is on a scale of 0-100 (0-present-focused, 100-future-focused).	
- `loveable` A rating of how loveable (vs punchable) the character is on a scale of 0-100 (0-punchable, 100-loveable).
- `messy` A rating of how messy (vs neat) the character is on a scale of 0-100 (0-neat, 100-messy).		
- `moody` A rating of how moody (vs stable) the character is on a scale of 0-100 (0-stable, 100-moody).		
- `open_minded` A rating of how open-minded (vs close-minded) the character is on a scale of 0-100 (0-close-minded, 100-open-minded).
- `reasoned` A rating of how reasoned (vs instinctual) the character is on a scale of 0-100 (0-instinctual, 100-reasoned).
- `respectful` A rating of how respectful (vs rude) the character is on a scale of 0-100 (0-rude, 100-respectful).
- `self_assured` A rating of how self-assured (vs self-conscious) the character is on a scale of 0-100 (0-self-conscious, 100-self-assured).
- `self_disciplined` A rating of how self-disciplined (vs disorganized) the character is on a scale of 0-100 (0-disorganized, 100-self-disciplined).	
- `tall` A rating of how tall (vs short) the character is on a scale of 0-100 (0-short, 100-tall).	
- `trusting` A rating of how trusting (vs suspicious) the character is on a scale of 0-100 (0-suspicious, 100-trusting).


##### Data Source: 

Originally collected at [Open Psychometrics](https://openpsychometrics.org/tests/characters/) made available by Tanya Shapiro as a [Tidy Tuesday data set](https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-08-16).

## 2.0: Generate Questions/Hypotheses

**2.1:** The code below `arranges` the data frame in ascending order, from least `respectful` (i.e., rude) to most `respectful`. 


In [None]:
arrange(characters, respectful)

**2.2:** Modify the code above so we can see the least/most loveable characters.

## 3.0: Good vs. Bad

**3.1:** We are going to compare the characteristics between good characters and bad characters to see which one(s) might do a better job helping us predict which group a character belongs in. 

**From Hypothesis to Word Equation (the proto-model).** We would translate a hypothesis (e.g., characters that are more lovable as more likely to be good rather than bad) into a word equation like this: **good_bad = loveable + other stuff**.

Come up with a few hypotheses and express them as word equations below:

1. e.g., characters that are more lovable will be rated as good: **good_bad = loveable + other stuff**
2. Write your own.
3. 
4. 

### Explore Word Equations with Data Visualizations

**3.2:** We've set you up with some code to look at these theories in a visualization. Describe what kind of patterns you see. Does one model (i.e., word equation) appear to explain more variation than the others?

In [None]:
# example code to explore an example hypothesis: 
# good_bad = loveable + other stuff
gf_point(good_bad ~ loveable, data = characters)

In [None]:
# modify this code with the idea you'd like to explore
gf_point(good_bad ~ loveable, data = characters, color = "blue")

Totally extra but if you'd like to see a list of all the colors available in R, this is a famous R color cheatsheet (you can always google "Rcolor" and get it too): http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

## 4.0 - Everybody Loves a Baddie!

**4.1:** Are we sure that good characters are really more loved than bad ones?  OR could there be some other explination for what we see ...like coincidence? How could we even tell?

**4.2:** The first step: we need an average loveable rating for both the good group and the bad group.

In [None]:
# use this block to get the average loveability for both groups



**4.3:** How big is the difference between the groups?  What would we expect to see if there wasn't really a difference between good and bad (if both groups where equaly loved)?

**4.4:** What explanation(s) do we have for how big this gap between the group averages is?

- Reason 1:
- Reason 2:

**4.5:** Could this difference be a coincidence?  How likely would it be to get a gap like this between the groups if there really was no difference between good and bad characters?

In [None]:
# Run some code to simulate the results if there was no 
# difference between good and bad characters' loveability:



**4.6:** If we did this many times, what would happen?

In [None]:
#have the computer do it over and over 1000s of times!!!



**4.7:** Look at the results.  Still think it could have been coincidence? 