# Quiz on Chapter 7

In this quiz, you will answer questions about data collected by analyzing all Super Bowl ads in the 21st century.

Brands will pay millions of dollars to have a 30-second commercial air during the Super Bowl because they know they will have tens of millions of people watching.

What makes an impactful Super Bowl commercial? If you only had 30-60 seconds to broadcast your brand to the public, what kinds of ad have the greatest impact?

In [None]:
# Load the CourseKata library
library(coursekata)

# Adjust scientific notation
options(scipen = 10)

# Resize graphs
options(repr.plot.width=12, repr.plot.height=9)

# This fixes font sizes on graphs
theme_set(theme(
  text = element_text(size = 20),
  axis.text = element_text(size = 16)
))

# Read data set
ads <- read.csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv")

## The Data

This data frame comes from [FiveThirtyEight](https://fivethirtyeight.com/). They watched over 200 ads from the 10 brands that aired the most spots in all 21 Super Bowls this century, according to superbowl-ads.com. They evaluated ads using the specific criteria listed below.

**Description of Variables**

- `year` Superbowl year
- `brand` Brand for commercial
- `superbowl_ads_dot_com_url` Superbowl ad URL
- `youtube_url` Youtube URL
- `funny` Contains humor
- `show_product_quickly` Shows product quickly
- `patriotic` Patriotic
- `celebrity` Contains celebrity
- `danger` Contains danger
- `animals` Contains animals
- `use_sex`	Uses sexuality
- `id` Youtube ID
- `kind` Youtube Kind
- `etag` Youtube etag
- `view_count` Youtube view count
- `like_count` Youtube like count
- `dislike_count` Youtube dislike count
- `favorite_count` Youtube favorite count
- `comment_count` Youtube comment count
- `published_at` Youtube when published
- `title` Youtube title
- `description` Youtube description
- `thumbnail` Youtube thumbnail
- `channel_title` Youtube channel name
- `category_id` Youtube content category id


**Data Sources** 

- [Tidy Tuesday](https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-03-02/readme.md)

- Here is a link to their [article](https://projects.fivethirtyeight.com/super-bowl-ads/) on the topic, where you can also watch the commercials, and see their analyses.

## Question 1 (1 point)

In this quiz, we will examine factors that might explain how well ads are received. There are multiple ways we might measure this such as total likes, total views, percentage of views that were liked, etc. However, we will use the ratio of likes to dislikes. In this first question/problem, create a column called ```dislike_like_ratio``` from the ```dislike_count``` and ```like_count``` columns.

## Question 2 (1 point)

How might the ```dislike_like_ratio``` indicate how well an ad was received? What would tell you that a video was more well received compared to another using this variable?

## Question 3 (1 point)

Now, we're going to analyze the Create a multivariate model for ```dislike_like_ratio``` in terms of whether or not the video contains celebrities and/or animals. Complete the following code to create a multivariate model for ```dislike_like_ratio``` and visualize the model.

In [None]:
multivariate_model <- lm(...)
multivariate_model

gf_jitter(dislike_like_ratio ~ ..., color = ~..., size=3, data = listwise_delete(ads, c("dislike_like_ratio", "celebrity", "animals"))) %>%
    gf_model(multivariate_model, size=2)

## Question 4 (1 point)
 
Write down the model you created in the previous question in both word equation form and GLM notation.

## Question 5 (1 point)

Interpret the coefficients of the multivariate model created in Question 3 in terms of the visualization also created in Question 3. In particular, explain how/where you "see" the coefficients appearing in the graph.

## Question 6 (4 points)
 
Use the supernova function to evaluate your model and answer the following questions:

* Determine whether or not your multivariate model is statistically significant. If it is signficant, determine the strength of the model.
* Which variable explains most of the variation in ```dislike_like_ratio``` in your model?  How do you know?
* Calculate the amount of error that is explained by BOTH of your variables (i.e. not just explained independently by one of the variables).
* Which variable and/or model provides the most "bang for your buck" in terms of explaining error?

## Question 7 (1 point)

Something to note about the data in this dataset is that most of the variables are fairly skewed. Perform an F test to see if your conclusions are empirically justified.

To avoid a bunch of errors, use ```data = listwise_delete(ads, c("dislike_like_ratio", "celebrity", "animals"))``` wherever you might specify the data.