```{webr-r}
#| context: setup
download.file(
  "https://raw.githubusercontent.com/ElijahMeyer3/Coursera/main/data/bechdel.csv",
  "bechdel.csv"
)
```

In this mini analysis we work with the data used in the FiveThirtyEight story titled ["The Dollar-And-Cents Case Against Hollywood's Exclusion of Women"](https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/).

This analysis is about the [Bechdel test](https://en.wikipedia.org/wiki/Bechdel_test), a measure of the representation of women in fiction.

## Getting started

### Packages

We start with loading the packages we'll use: **tidyverse** for majority of the analysis and **scales** for pretty plot labels later on.

```{webr-r}
#| label: load-packages
#| warning: false

library(tidyverse)
library(scales)
```

### Data

The data are stored as a CSV (comma separated values) file in the `data` folder of your repository.
Let's read it from there and save it as an object called `bechdel`.

```{webr-r}
#| label: load-data
bechdel <- read.csv("bechdel.csv")
```

::: callout-note
This a modified version of the `bechdel` dataset from the previous application exercise.
It's been modified to include some new variables derived from existing variables as well as to limit the scope of the data to movies released between 1990 and 2013.
For now we're not going to discuss how these modifications were made (that's next week's topic) but if you're curious, you can find the data prep script in the `data/` folder of your repo.
Don't spend too long trying to decipher it, wait till next week when you have the right tools to do so!
:::

### Get to know the data

We can use the `glimpse` function to get an overview (or "glimpse") of the data.

```{webr-r}

# add code here

```

With your output, confirm that: 

-- There are  movies in the dataset

-- There are  variables (columns) in the dataset

## Variables of Interest 

The variables we'll focus on are the following:

-   `budget_2013`: Budget in 2013 inflation adjusted dollars.
-   `gross_2013`: Gross (US and international combined) in 2013 inflation adjusted dollars.
-   `roi`: Return on investment, calculated as the ratio of the gross to budget.
-   `clean_test`: Bechdel test result:
    -   `ok` = passes test
    -   `dubious`
    -   `men` = women only talk about men
    -   `notalk` = women don't talk to each other
    -   `nowomen` = fewer than two women
-   `binary`: Bechdel Test PASS vs FAIL binary

We will also use the `year` of release in data prep and `title` of movie to take a deeper look at some outliers.

There are a few other variables in the dataset, but we won't be using them in this analysis.

## Visualizing data with `ggplot2`

**ggplot2** is the package and `ggplot()` is the function in this package that is used to create a plot.

-   `ggplot()` creates the initial base coordinate system, and we will add layers to that base. We first specify the data set we will use with `data = bechdel`.

```{webr-r}
#| label: plot-base

ggplot(data = bechdel)
```

-   The `mapping` argument is paired with an aesthetic (`aes()`), which tells us how the variables in our data set should be mapped to the visual properties of the graph.

```{webr-r}
#| label: plot-aesthetic-mapping

ggplot(data = bechdel, 
       mapping = aes(x = budget_2013, y = gross_2013))
```

As we previously mentioned, we often omit the names of the first two arguments in R functions.
So you'll often see this written as:

```{webr-r}
#| label: plot-simplified-call

ggplot(bechdel, 
       aes(x = budget_2013, y = gross_2013))
```

Note that the result is exactly the same.

-   The `geom_xx` function specifies the type of plot we want to use to represent the data. In the code below, we use `geom_point` which creates a plot where each observation is represented by a point.

```{webr-r}
#| label: plot-point
ggplot(bechdel, 
       aes(x = budget_2013, y = gross_2013)) +
  # type geom_point() here!
```


```{html}


<label for="freeform">Tell us how you heard about us:</label>
<br>
<textarea id="freeform" name="freeform" rows="4" cols="50">
</textarea><br><br>


```




::: {.callout-tip collapse="true"}
## Coding Solutions

```{webr-r}
#| echo: false
library(ggplot2)
ggplot(bechdel, 
       aes(x = budget_2013, y = gross_2013)) +
       geom_point()
```
:::

Note that this results in a warning as well. 

This warning represents the number of observations that were removed because there were missing data!


## Assessment Questions 

1. What does each observation (row) in the data set represent?

**Answer**: Each observation represents a **movie**.
