<div class="alert alert-block alert-danger">

# 3C: Traits of Fictional Characters (COMPLETE)

*This notebook is intended for students who have completed up to:*
 
**Page 3.13**

</div>

<div class="alert alert-block alert-warning">

#### Summary of Notebook:

In this lesson, students will explore a dataset on hundreds of fictional characters (from books, movies, TV shows, video games, etc.) that have been rated on various personality traits (e.g., how loveable, tall, or moody they are perceived to be). Students will first explore models that try to explain variation in how loveable a character is, then they will get to develop their own theories and explore those models as well to see which variables do the best job explaining variation in their outcome variable. 


#### Includes:

- Fitting and interpreting a model with a quantitative explanatory variable.
- Connecting parameter estimates to visualizations.
- Making predictions with models, and evaluating error off of the models.

#### Resources:

- Optional [Printable Graph Handout](https://docs.google.com/document/d/1L1ur9c3z7ctpfKEVLcAwTZb0O88pneJYjTBB1h-EHEc/edit?usp=sharing).This handout contains images of the relevant visualizations that are made throughout the lesson. They can be used for students to manually draw on, mark up, and make notes. This can give students the chance to process the graphs more deeply, and connect them to the models they are fitting.

</div>

<div class="alert alert-block alert-success">

## Approximate time to complete Notebook: 55-75 Mins

</div>

In [None]:
# This code will load the R packages we will use
suppressPackageStartupMessages({
    library(coursekata)
})

# This code will make sure the middle rows/columns don't get cut out (ellipsized) when you 
# print out a really large data frame (note: you can adjust the values for max rows/cols)
options(repr.matrix.max.rows=900, repr.matrix.max.cols=100)

# Load the data frame
characters <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQk_n4m-VBCD7CtcpB1kOsiNDLrPmEOEtlOoaKwDhogE_YeGEW5PYTaOtZaqypEgHRFGWsZ0pdYvt_A/pub?gid=0&single=true&output=csv")

## Favorite Fictional Characters

<img src="https://i.postimg.cc/s34kFsZr/xcd-03-B-fictional-chars.png" alt="A collage of the faces of various fictional characters" width = 30%>

There are many popular fictional universes out there with their own unique set of fictional characters with a wide range of personality types. Some characters are known for being quite likeable, and some characters are known for being unlikeable, or even despised. 

Who are some of your favorite fictional characters (i.e., from any of your favorite books, movies, TV shows, or games)? What is it about those characters that you like?

Who are some unlikeable characters? What makes them unlikeable?

Today we'll explore and explain the variation in character likeability.

### Motivating Question: What traits make a character more likeable?

### The Dataset

**Description:** The `characters` data frame contains characters from various fictional universes. More than [3 million volunteers from the internet](https://openpsychometrics.org/tests/characters/) rated these characters on various traits by using a sliding scale. For example, the character Mushu (from Disney's Mulan), is depicted below being rated on a scale from zero, rude, to 100, respectful.

<img src="https://i.postimg.cc/tXVg4SjZ/rating-characters.png" alt="example of how people rated a character with a slider" width = 40%>

##### Variable Descriptions

- `char_id` The character ID.
- `char_name` The character's name.	
- `uni_id` The universe ID for the book, game, movie, or TV show.
- `uni_name` The universe name of the book, game, movie, or TV show.
- `gender` The gender of the character (M=Male, F=Female, NB=NonBinary).
- `abstract` The average rating of how abstract (vs concrete) the character is on a scale of 0-100 (0-concrete, 100-abstract).
- `agreeable` The average rating of how agreeable (vs stubborn) the character is on a scale of 0-100 (0-stubborn, 100-agreeable).	
- `anxious` The average rating of how anxious (vs calm) the character is on a scale of 0-100 (0-calm, 100-anxious).
- `attractive` The average rating of how attractive (vs repulsive) the character is on a scale of 0-100 (0-repulsive, 100-attractive).	
- `beautiful` The average rating of how beautiful (vs ugly) the character is on a scale of 0-100 (0-ugly, 100-beautiful).	
- `chaotic` The average rating of how chaotic (vs orderly) the character is on a scale of 0-100 (0-orderly, 100-chaotic).
- `chill` The average rating of how chill (vs offended) the character is on a scale of 0-100 (0-offended, 100-chill).	
- `cool` The average rating of how cool (vs dorky) the character is on a scale of 0-100 (0-dorky, 100-cool).	
- `decisive` The average rating of how decisive (vs hesitant) the character is on a scale of 0-100 (0-hesitant, 100-decisive).	
- `emotional` The average rating of how emotional (vs unemotional) the character is on a scale of 0-100 (0-unemotional, 100-emotional).	
- `extrovert` The average rating of how extroverted (vs introverted) the character is on a scale of 0-100 (0-introvert, 100-extrovert).	
- `feminine` The average rating of how feminine (vs masculine) the character is on a scale of 0-100 (0-masculine, 100-feminine).	
- `future_focused` The average rating of how future-focused (vs present-focused) the character is on a scale of 0-100 (0-present-focused, 100-future-focused).	
- `loveable` The average rating of how loveable (vs punchable) the character is on a scale of 0-100 (0-punchable, 100-loveable).
- `messy` The average rating of how messy (vs neat) the character is on a scale of 0-100 (0-neat, 100-messy).		
- `moody` The average rating of how moody (vs stable) the character is on a scale of 0-100 (0-stable, 100-moody).		
- `open_minded` The average rating of how open-minded (vs close-minded) the character is on a scale of 0-100 (0-close-minded, 100-open-minded).
- `reasoned` The average rating of how reasoned (vs instinctual) the character is on a scale of 0-100 (0-instinctual, 100-reasoned).
- `respectful` The average rating of how respectful (vs rude) the character is on a scale of 0-100 (0-rude, 100-respectful).
- `self_assured` The average rating of how self-assured (vs self-conscious) the character is on a scale of 0-100 (0-self-conscious, 100-self-assured).
- `self_disciplined` The average rating of how self-disciplined (vs disorganized) the character is on a scale of 0-100 (0-disorganized, 100-self-disciplined).	
- `tall` The average rating of how tall (vs short) the character is on a scale of 0-100 (0-short, 100-tall).	
- `trusting` The average rating of how trusting (vs suspicious) the character is on a scale of 0-100 (0-suspicious, 100-trusting).


##### Data Source: 

Originally collected at [Open Psychometrics](https://openpsychometrics.org/tests/characters/) made available by Tanya Shapiro as a [Tidy Tuesday data set](https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-08-16).

<div class="alert alert-block alert-success">

### 1.0 - Approximate Time:  10-15 mins

</div>

### 1.0 - Explore the Data Frame

<div class="alert alert-block alert-warning">

**Note to Instructor:**

Sections 1.0 and 2.0 provide a little bit of extra code setup and scaffolding so that students can move through them a little more quickly in order to spend more time on section 3.0, where they will make their own models.

</div>

#### Free Play

**1.1:** Let's just start by getting familiar with the data we will be working with. Run the code below to get a glimpse at the data frame. Then, take a minute to freely explore the data. Look at anything interesting to you, or that you are curious about, or anything you think we might want to know about the data frame, the cases, or the variables before we start modeling anything.

In [None]:
# Check out the data
head(characters)

**1.2:** What questions do you have about this data set? 

<div class="alert alert-block alert-warning">

**Sample Response:**

Spend some time generating questions and have students try to answer them. Here are some questions students might generate... 

- How many characters are represented in the data frame?
- Which is the most/least ___ character? 
- How many fictional universes are represented in the data frame?
- Which universes have the most/fewest number of characters represented in the data frame?
- Is there a roughly equal number of gender categories represented?

</div> 


In [None]:
# Complete Version

## How many characters?
str(characters)

## How many universes? 
# note this code is not taught in the book
length(unique(characters$uni_name))

## Which universes have the most/fewest characters?
sort(tally(~uni_name, data = characters))

## How many in each gender category?
tally(~gender, data = characters)

## Which characters are most attractive?
head(arrange(characters, desc(attractive)), 10)

#### Measuring Likeability

**1.3--Discussion:** We are interested in modeling what predicts how likeable a character is, but there isn't a variable called "likeable." Which variables in the data frame might be a good measure for this trait?

<div class="alert alert-block alert-warning">

**Note to Instructors:**

We want to encourage students to consider what could be a stand-in (or "operationalization") for likeability. 

Students might suggest variables such as attractive, agreeable, loveable might be ways of measuring likeability. 

Pick one as a class. Then discuss whether there is any difference between this variable and likeability. For example, if your class chooses `loveable`, how is that variable somewhat different from likeability? Students might bring up characters that they like but do not think of as particularly lovable. Let them know that these differences are important to keep in mind even as we try to explain variation in `loveable`.

</div>

**1.4:** One variable we might consider as a measure of a character's likeability is the variable `loveable`. Let's pursue that as our outcome variable together first, then afterwards, you can pursue some models using any of the other variables you are interested in as well.

So, before we develop any complex models, let's start by getting a little bit of information about `loveable`.

Take a look at the visualization and empty model for `loveable` below. Describe the distribution and interpret the empty model.

In [None]:
# Run this code and interpet the output

empty_model <- lm(loveable ~ NULL, data = characters)
empty_model

gf_histogram(~loveable, data = characters) %>%
gf_model(empty_model) %>%
gf_boxplot(width = 5)

<div class="alert alert-block alert-warning">


**Sample Response:**

The distribution has a slight skew to the left. Most characters have more positive ratings (above 50) than characters with really low ratings. 

The emtpy model predicts that the average loveable rating is 55.9.

</div>

**1.5--Discussion:** Take a look at the 50 most/least `loveable` characters. Does anything stand out about the two groups of characters? Do the characters in each group have anything in common? As you look over the data, try to come up with a few theories about what might explain variation in `loveable`.

In [None]:
# The 50 most loveable characters
head(arrange(characters, desc(loveable)), 50)

# The 50 least loveable characters
head(arrange(characters, loveable), 50)

<div class="alert alert-block alert-warning">


**Note to Instructors:**

Students may notice various things, or get "hunches" about what the high/low loveable characters have in common, but if they are having trouble coming up with things, you might ask them things like:


- Do they notice any common universes among the low vs high loveable characters?
- Or does there appear to be one gender more represented in one group vs the other?
- Does one group appear to have more low/high values on a particular trait?


While nothing may actually stand out by just looking at a few rows of the dataset, it can still be a helpful exercise when trying to develop theories about the DGP.

Of course, they can also just use their general knowledge of the world to develop theories.

</div>

<div class="alert alert-block alert-success">

### 2.0 - Approximate Time:  15-20 mins

</div>

### 2.0 - Explaining Variation in `loveable`

**2.1:** We are going to compare a few models to see which one(s) might do a better job helping us predict (i.e., explain variation in) `loveable`. 

We have developed a few theories and put them into word equations below:

> 1. **loveable = tall + other stuff**
> 2. **loveable = moody + other stuff**
> 3. **loveable = open_minded + other stuff**

For each word equation above, indicate whether you predict there will be:

- a positive relationship (as x goes up, y (loveable) goes up)
- a negative relationship (as x goes up, y (loveable) goes down)
- some other relationship
- no relationship

<div class="alert alert-block alert-warning">


**Note to Instructors:**

These three models were selected because they demonstrate a stark contrast between: a positive relationship, a negative relationship, and no relationship.

</div>

#### Explore the Distribution

**2.2:** We've set you up with some code to look at these theories in a visualization. Describe what kind of patterns you see. Does one model appear to explain more variation than the others?

In [None]:
# loveable = tall + other stuff
gf_point(loveable ~ tall, data = characters)

# loveable = moody + other stuff
gf_point(loveable ~ moody, data = characters)

# loveable = open_minded + other stuff
gf_point(loveable ~ open_minded, data = characters)

<div class="alert alert-block alert-warning">


**Sample Response:**

***loveable = tall + other stuff***

- There does not appear to be a clear pattern. The data are very scattered and cloudy. It does not look like `tall` can help us explain much variation in `loveable`.


***loveable = moody + other stuff***

- There appears to be a negative relationship, where those who are higher on `moody` are rated lower on `loveable`. Although there is still a lot of unexplained variation, and possibly a slight curve to the data.

***loveable = open_minded + other stuff***

- There appears to be a positive relationship, where those who are rated more highly on `open_minded` are also rated more highly on `loveable`.

This model appears to have the strongest relationship (i.e., to explain the most variation).


</div>

#### Fit and Interpret the Models

**2.3:** We've also set you up with some code to fit these models. Put them into the GLM notation we have started below, and interpret the parameter estimates. Then, engage in the discussion questions below.

**GLM Notation:**

> - $loveable_i = b_0 + b_1(tall_i) + e_i$
> - $loveable_i = b_0 + b_1(moody_i) + e_i$
> - $loveable_i = b_0 + b_1(open\_minded_i) + e_i$

***Discussion Questions:***

Compare the $b_1$ estimates of the three models. Why are some negative? Which model has the smallest (or largest) $b_1$ (in absolute value)? What does this suggest?

Also, compare the $b_0$ estimates of the three models. Are they similar? Different? Why is that?

In [None]:
# loveable = tall + other stuff
tall_model <- lm(loveable ~ tall, data = characters)
tall_model

# loveable = moody + other stuff
moody_model <- lm(loveable ~ moody, data = characters)
moody_model

# loveable = open_minded + other stuff
open_minded_model <- lm(loveable ~ open_minded, data = characters)
open_minded_model

<div class="alert alert-block alert-warning">


**Sample Response:**

- $loveable_i = 56.79 + (-0.02)(tall_i) + e_i$

> The $b_0$ estimate is 56.79; this is the y-intercept, and the prediction for `loveable` when `tall` is zero. The $b_1$ estimate is -0.02; this is the slope, and is how much we subtract from $b_0$ for every 1 unit increase in `tall`.

- $loveable_i = 91.17 + (-0.58)(moody_i) + e_i$

> The $b_0$ estimate is 91.17; this is the y-intercept, and the prediction for `loveable` when `moody` is zero. The $b_1$ estimate is -0.58; this is the slope, and is how much we subtract from $b_0$ for every 1 unit increase in `moody`.

- $loveable_i = 11.40 + 0.81(open\_minded_i) + e_i$
 
 > The $b_0$ estimate is 11.40; this is the y-intercept, and the prediction for `loveable` when `tall` is zero. The $b_1$ estimate is 0.81; this is the slope, and is how much we add to $b_0$ for every 1 unit increase in `open_minded`.

 ***$b_0$ comparisons:***

 One way the $b_0$ estimates are similar is that they are all positive values. Other than that they are not very similar. This is because each model has a very different pattern, so the starting off point for the y-axis (for `loveable`) will be different. For the `tall` model, the predictions don't change much, so they start off near the mean (and pretty much stay there, since $b_1$ for the tall model is so small). For the `moody` model, there is a downward slope, so the predictions start off high on the y-axis (`loveable`), and the `open_minded` model has an upward trend, so the predictions start off low on the y-axis.

 ***$b_1$ comparisons:*** 

Some of the $b_1$ estimates are negative because their regression line has a downward slope, so as the explanatory variable goes up, the prediction for the outcome variable goes down. The `open_minded` model has the largest $b_1$ (in absolute value); this suggests that it might explain the most variation because it predicts the biggest change in $Y$ from the effect of $X$. This also suggests that the `tall` model will explain the least variation, because it is closest to zero.

**Note to Instructors:**

Part of the goal in this discussion exercise is to make sure students are also connecting the parameter estimates to the units of the outcome variable. So, even though it does not explicitly ask them to clarify what the units are, you may want to further emphasize that, if they are having trouble making that connection.

</div>

<div class="alert alert-block alert-info">

<b> <font size="+1">Key Question</font></b>

**2.4:**  Add the models to the visualizations and connect the parameter estimates to the model in the graph.

</div>

In [None]:
# Sample Responses

# loveable = tall + other stuff
gf_point(loveable ~ tall, data = characters) %>%
    gf_model(loveable ~ tall, data = characters, color = "red")

# loveable = moody + other stuff
gf_point(loveable ~ moody, data = characters) %>%
    gf_model(moody_model, color = "red")

# loveable = open_minded + other stuff
gf_point(loveable ~ open_minded, data = characters) %>%
    gf_lm(color = "red")


<div class="alert alert-block alert-warning">


**Sample Response:**

The $b_0$ is where the regression line runs through the y-axis, and the $b_1$ is the vertical increase or decrease (the slope of the line) from one unit of X to the next.

**Note to Instructors:**

You may want to use the printable graph handout in this section so students can manually draw out these ideas and make notes.

</div>

#### Make Predictions with the Models

**2.5:** Make some predictions with the models. For example, what does the model predict for a character who is rated as very tall (e.g., a rating of 75)? How about a character who is rated not very tall (e.g., a rating of 25)?

<div class="alert alert-block alert-warning">


**Sample Response:**

***Using the visualization to make rough predictions:***

The `tall` model predicts roughly the same value for tall characters and short characters (close to the mean of `loveable`). 

The `moody` model predicts that those with a 75 moody rating will be rated just under 50 on loveable, and a moody rating of 25 will have a loveable rating of about 75.

The `open_minded` model predicts that those with a 75 open-minded rating will be rated just under 75 on loveable, and an open-minded rating of 25 will have a loveable rating of a little over 25.

***Using the parameter estimates to make more precise predictions:***

Tall:

- loveable_i = 56.79 + (-0.02)(75) = 55.29
- loveable_i = 56.79 + (-0.02)(25) = 56.29

Moody:

- loveable_i = 91.17 + (-0.58)(75) = 47.67
- loveable_i = 91.17 + (-0.58)(25) = 76.67

Open Minded:

- loveable_i = 11.40 + 0.81(75) = 72.15
- loveable_i = 11.40 + 0.81(25) = 31.65

**Note to Instructors:**

Try to get students to make predictions both ways (rough estimations by looking at the model in the visualization, and precise estimates with the GLM). Of course, they can also use R code (such as the predict() function) as well.

</div>

In [None]:
56.79 + (-0.02*75) 
56.79 + (-0.02*25) 
91.17 + (-0.58*75) 
91.17 + (-0.58*25) 
11.40 + (0.81*75)
11.40 + (0.81*25) 

In [None]:
56.79 + (-0.02*42.4) 

91.17 + (-0.58*66.1) 

11.40 + (0.81*71.4)


**2.6:** Take a look at the character selected below (or filter for another character). How far off is the model prediction for that character?

In [None]:
# Select the row for a particular character (e.g., Harry Potter)
characters[characters$char_name == "Harry Potter", ]

In [None]:
59.4 - 55.94
59.4 - 52.83 
59.4 - 69.23

<div class="alert alert-block alert-warning">

**Sample Response:**

***The actual `loveable` rating for Harry Potter = 59.4***


Harry Potter's `tall` rating = 42.4:

--> loveable_i = 56.79 + (-0.02)(42.4) = 55.94 *(Residual: 59.4 - 55.94 = 3.46)*

Harry Potter's `moody` rating = 66.1:

--> loveable_i = 91.17 + (-0.58)(66.1) = 52.83 *(Residual: 59.4 - 52.83 = 6.57)*

Harry Potter's `open_minded` rating = 71.4:

--> loveable_i = 11.40 + 0.81(71.4) = 69.23 *(Residual: 59.4 - 69.23 = -9.83)*

In this case, the tall model has the smallest residual when predicting Harry Potter's `loveable` rating.


</div>

#### Evaluate the Models

<div class="alert alert-block alert-info">

<b> <font size="+1">Key Question</font></b>

**2.7:** How much variation in `loveable` does each model explain? Are any of the models much better than the empty model? And, if so, which model explains the most variation in `loveable`. Use statistics to support your answer.

</div>

In [None]:
# Here are the saved model names to get you started

# loveable = tall + other stuff
# tall_model

# loveable = moody + other stuff
# moody_model

# loveable = open_minded + other stuff
# open_minded_model

In [None]:
# Complete Version

# loveable = tall + other stuff
supernova(tall_model)

# loveable = moody + other stuff
supernova(moody_model)

# loveable = open_minded + other stuff
supernova(open_minded_model)

<div class="alert alert-block alert-warning">

**Sample Response:**

Tall:

The `tall` model explains about zero percent (0.0004) of the variation in `loveable` according the PRE.

Moody:

The `moody` model explains about 28% of the variation in `loveable` according the PRE.

Open-Minded:

The `open_minded` model explains about 55% of the variation in `loveable` according the PRE.

The `tall` model is not any better than the empty model. The `open_minded` model explains the most variation when comparing the PREs. It also has the largest F value, and the highest SS Model. Or, it also has the lowest SS Error, which means it has the least amount of leftover (or unexplained) variation.

</div>

<div class="alert alert-block alert-success">

### 3.0 - Approximate Time:  20-25 mins

</div>

### 3.0 - Make Your Own Models

#### Explore the Distribution

**3.1:** Take a look through the data and select a character trait that you are interested in as an outcome variable. 

Create a visualization to explore the distribution, and fit the empty model.


In [None]:
# Example Outcome Variable: chill

# Sample Visualization
gf_histogram(~chill, data = characters) %>%
    gf_boxplot(fill = "white", width = 9)

# Sample Visualization
gf_boxplot(chill~1, data = characters)

# Sample Empty Model
empty_chill_model <- lm(chill ~ NULL, data = characters)
empty_chill_model

**3.2:** Come up with two different theories about the DGP for that variable, and write them as word equations (i.e., pick two explanatory variables).


Make some predictions about what you might expect to find, then create visualizations to explore your hypotheses. Describe what you see. Does one model appear to explain more variation than the others?

In [None]:
# Sample Response

# chill = chaotic + other stuff
gf_point(chill ~ chaotic, data = characters)

# chill = extrovert + other stuff
gf_point(chill ~ extrovert, data = characters)

<div class="alert alert-block alert-warning">

**Sample Response:**

***Example Student Theories:***

- chill = chaotic + other stuff

"I predict that the more chaotic a character is rated, the less chill they will be rated."

- chill = extrovert + other stuff

"I predict that the more introverted a character is rated, the more chill they will be rated."

***Visualizations:***

It does not look like `chaotic` nor `extrovert` explain much variation in `chill`. They both have very cloudy, scattered data. 

</div>

#### Fit and Interpret the Models

**3.3:** Fit your models, put them into GLM notation ($Y_i = b_0 + b_1(X_i) + e_i$), and interpret the parameter estimates.

In [None]:
# Sample Response

# chill = chaotic + other stuff
chaotic_model <- lm(chill ~ chaotic, data = characters)
chaotic_model

# chill = extrovert + other stuff
extrovert_model <- lm(chill ~ extrovert, data = characters)
extrovert_model

<div class="alert alert-block alert-warning">


**Sample Response:**

- $chill_i = 43.29 + 0.02(chaotic_i) + e_i$
 
 > The $b_0$ estimate is 43.29; this is the y-intercept, and the prediction for `chill` when `chaotic` is zero. The $b_1$ estimate is 0.02; this is the slope, and is how much we add to $b_0$ for every 1 unit increase in `chaotic`.

- $chill_i = 44.06 + 0.00(extrovert) + e_i$
 
 > The $b_0$ estimate is 44.06; this is the y-intercept, and the prediction for `chill` when `extrovert` is zero. The $b_1$ estimate is 0.00; this is the slope, and is how much we add to $b_0$ for every 1 unit increase in `extrovert`.


**Note to Instructors:**

You may also want to get them to discuss and compare the parameter estimates for the two models, as they did in section 2.0. For instance, in the example models (chill = chaotic/extrovert), the $b_0$s are pretty close and hover near the mean of chill, and the $b_1$s are near zero. This suggests, as with the loveable = tall model, that they probably do not explain much variation in `chill`.

</div>

<div class="alert alert-block alert-info">

<b> <font size="+1">Key Question</font></b>

**3.4:** Add the models to your visualizations, and connect the parameter estimates to the models in the graphs.

</div>

In [None]:
# Sample Responses

# chill = chaotic + other stuff
gf_point(chill ~ chaotic, data = characters) %>%
    gf_lm(color = "red")

# chill = extrovert + other stuff
gf_point(chill ~ extrovert, data = characters) %>%
    gf_model(extrovert_model, color = "red")

#### Make Predictions with the Models

**3.5:** Make some predictions with the models. For example, what does the model predict for a character who is low in that trait versus a character who is high in that trait?

<div class="alert alert-block alert-warning">


**Sample Response:**

In the example models, the predictions for `chill` do not really change for high vs low ratings on `chaotic` nor `extrovert`.

</div>

#### Evaluate the Models

<div class="alert alert-block alert-info">

<b> <font size="+1">Key Question</font></b>

**3.6:** How much variation in your outcome variable does each model explain? Are any of the models much better than the empty model? And, if so, which model explains the most variation in your outcome variable? Use statistics to support your answer.

</div>

In [None]:
# Sample Responses

supernova(chaotic_model)
supernova(extrovert_model)

<div class="alert alert-block alert-warning">


**Sample Response:**

In the example models, neither of the explanatory variables explain any variation in the outcome variables. They both have a PRE of zero (or, less than .001), and their SS Error is very close their SS Total, meaning the model did not reduce very much error at all. Both of the models are as good as the empty model.

</div>

<div class="alert alert-block alert-success">

### 4.0 - Approximate Time:  10-15 mins

</div>

### 4.0 - BONUS: Explore A Universe

**4.1:** Try filtering the data for a particular fictional universe that you are interested in (probably one with at least 10+ characters). 

In [None]:
## Just in case: Check which universes have the most/fewest characters
# sort(tally(~uni_name, data = characters))

# Sample Response -- filtering for the Harry Potter universe

characters_HP <- filter(characters, uni_name == "Harry Potter")
head(characters_HP)

**4.2:** How do the trends for that universe compare to the trends you found when looking at all the universes? Are the trends for those characters similar to the trends for the broader set of characters, or are they quite different? Why do you think that is?

In [None]:
# Sample Responses

gf_point(loveable ~ tall, data = characters_HP) %>% 
    gf_lm()
gf_point(loveable ~ moody, data = characters_HP) %>% 
    gf_lm()
gf_point(loveable ~ open_minded, data = characters_HP) %>% 
    gf_lm()


gf_point(chill ~ chaotic, data = characters_HP) %>% 
    gf_lm()
gf_point(chill ~ extrovert, data = characters_HP) %>% 
    gf_lm()


supernova(lm(loveable ~ tall, data = characters_HP)) 
supernova(lm(loveable ~ moody, data = characters_HP)) 
supernova(lm(loveable ~ open_minded, data = characters_HP)) 

supernova(lm(chill ~ chaotic, data = characters_HP)) 
supernova(lm(chill ~ extrovert, data = characters_HP)) 


<div class="alert alert-block alert-warning">


**Sample Response:**

Some students may notice that there are different trends for one universe compared to all the universes. Ask them why they think that is.

For instance, how might the difference in sample size affect things? What are some aspects of the universe itself that might lead to different trends in the personalities of the characters?

</div>