# Data Visualization

## Assignment 5: Designing Plots for Communication

You can't learn technical subjects without hands-on practice. The assignments are an important part of the course. To submit this assignment you will need to make sure that you save your Jupyter notebook. 

Below are the links to 2 videos that explain:

1. [How to save your Jupyter notebook](https://youtu.be/0aoLgBoAUSA) and,       
2. [How to answer a question in a Jupyter notebook assignment](https://youtu.be/7j0WKhI3W4s).

<div class="alert alert-info" style="color:black">
    
### Assignment Learning Goals:

By the end of the module, students are expected to:

- Follow guidelines for best practices in visualization design.
- Adjust axes extents and formatting.
- Modify titles of figure elements.
- Choose appropriate color schemes for your data.
- Use pre-made and custom colour schemes.
- Selectively highlight and annotate data with color and text.
- Directly label data instead of using legends.
    

This assignment covers [Module 5](https://viz-learn.mds.ubc.ca/en/module5) of the online course. You should complete this module before attempting this assignment.
 
</div>

Any place you see `...`, you must fill in the function, variable, or data to complete the code. Substitute the `None` and the `raise NotImplementedError # No Answer - remove if you provide an answer` with your completed code and answers then proceed to run the cell!

Note that some of the questions in this assignment will have hidden tests. This means that no feedback will be given as to the correctness of your solution. It will be left up to you to decide if your answer is sufficiently correct. These questions are worth 2 points.

In [None]:
# Import libraries needed for this assignment

from hashlib import sha1
import altair as alt
import pandas as pd
import test_assignment5 as t
# Handle large data sets without embedding them in the notebook
# alt.data_transformers.enable('data_server');

# 0. So we meet again...(Fun preamble - you are free to skip this)

It looks like we have some bad news to report to you. You may have already heard but Betterflix was not approved.  Copyright issues they said, can you believe it?

In the spirit of a true comic book villain, The most appropriate revenge would be for the EDV Party (Exploratory Data Visualization party) to attempt world dominion. Seems fair right? Don't worry, there will be no bloodshed. We operate with class at the EDV headquarters, so our plan here is to instead deceive people into thinking that we are best suited to represent them and then reap personal benefits once in power, creative isn't it?

To convince the honorary citizens of the world to vote for us, we have made several key visualizations showing what an ideal candidate we are. The only way to stop us (hmm why are we telling you this again...) is to debunk these visualizations and show voters the errors of our ways.

Below is EDV's publicly distributed campaign material covering these four main areas:

1. The current dramatic increase in job dissatisfaction
2. EDV's previous record on increasing job wages for all 
3. Guns to the people
4. EDV's great approval rating

For each figure that we have made, we have also included the underlying data (imagine what a wonderful world it would be if that actually happened in real life...). You can find these datasets in the `data` folder. For each question in this assignment, you will have the following three tasks:

1. Decide if you think the suggested plot and claims are misleading or not and why. In addition to what you are learning in this module, you will need to draw on your knowledge from the last few modules for when certain plots are suitable and which are common pitfalls to avoid.
2. To earn the people's trust, you need to recreate the example plot to show that you know what I did. This includes axis labels, legends, colors, titles, figure size, everything as close as possible to the images I have pasted for each question below.
3. Create your own better version of the same figure.

![image.png](img/plotting_pm.png)

*Initial design credit to Joel Ostblom's lab partner who wishes to remain anonymous. Alterations made.*

Let the best candidate win... which we mean is us (Muah hah hah ha!).


# Motivation (The "serious" version)

In this assignment, we are going to analyze why certain plots are misleading and explore how we can improve them so they reflect the data more accurate. 

# 1. A Dramatic Decline in Job Satisfaction


###  The Data
The `cities-job-satisfaction.csv` dataset contains (fictional) data from different cities around the world.
This data, measures which proportion of people reported that they were unsatisfied with their work situation this year compared to last year. Each data point is a different city, and the value indicates the proportion of dissatisfied workers.


**Question 1.1** 
    <br> {points: 1}

Read in the data `cities-job-satisfaction.csv`.  

*Assign your data to a variable named `job_satisfaction_df`*

In [None]:
job_satisfaction_df = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
job_satisfaction_df.head()

In [None]:
t.test_1_1(job_satisfaction_df)

### The Plot and the Claim

#### Plot

![image.png](img/job_satisfaction.png)


#### Claim
The EDV party is claiming that this plot proves that that people are dramatically more dissatisfied at work this year compared to last year. The plot shown uses transparency for the points to prevent overplotting/oversaturation, so this means that the entire dark area has an even amount of cities throughout.
In other words, there are as many cities that increased from ~0% dissatisfied with their job to ~40%
dissatisfaction as there are cities that stayed around 0%.

If you vote for the EDV party, we will work tirelessly to end this spiral before it is too late!

**Question 1.2** 
<br> {points: 2}

Which of the following are some of the mistakes the plot above commits? 

Select all that apply:

i) The plot is over-saturated with points despite using lower transparency which prohibits the reader from drawing an accurate conclusion regarding the change in dissatisfied workers. 

ii) The color choice is problematic as it green which is used is more in plots with positive insights.

iii) The axis labels are improperly displayed with underscores for spaces which contributes to the inability to read the plot effectively. 

iv) The axis labels are not descriptive enough, `last_year` and `this_year` should be explained with the year number like 2021. 

v) The data is still displaying problematic points such as there being more dissatisfied workers than actual workers in a city. 

vi) The point shapes are distracting and a square shape or point would have been better at communicating the counted values. 

vii) The subtitle is not providing effective and explicative support to the graph.


Select all that apply and add them into a list named `answer1_2`. For example, if statement i and iv are both true, your solution will look like this:

```
answer1_2 = ["i", "iv"]
```

In [None]:
answer1_2 = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
answer1_2

In [None]:
t.test_1_2(answer1_2)

**Question 1.3** 
<br> {points: 5}

Recreate the plot shown above. Note that this is a lengthy question so we are going to provide several different tests so you can see if you are on your way to getting things.
Here are a few things we will be checking.

- The correct [theme](https://github.com/vega/vega-themes). Take a look at the different options and make sure you are enabling the correct one. 
- Titles, axes labels, titles and subtitles are correct. 
- Colour
- Mapping
- Mark
- Size of the circles (Hint: we used a multiple of 50) 
- test only accepts color designation in one way ^
- test is looking for title to be designated in the encode section ^

^ look at local vs global variables to see why the test doesn't recognize colour/title designation in one location correctly, even though the graphs look the same

*Save the plot in an object named `workers_og_plot`.*

In [None]:
workers_og_plot = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
workers_og_plot

In [None]:
t.test_1_3_1(workers_og_plot)

In [None]:
t.test_1_3_2(workers_og_plot)

In [None]:
t.test_1_3_3(workers_og_plot)

**Question 1.4** 
<br> {points: 2}

Would you say the plot above is misleading? 

A) Yes, the plot is misleading since it's we can't make a hard conclusion  since there is over-saturation among the points. 

B) Yes, the plot appears to have a strong increase in the dissatisfaction at work when it's evident that the data shows otherwise. 

C) Yes, the plot appears to have a decrease in the dissatisfaction at work, yet the labels are explaining it otherwise.

D) No, the plot is reflecting that there is a dramatic increase in the dissatisfaction at work from last year to the current year which is what is reflected from the data 

*Answer in the cell below using the uppercase letter associated with your answer. Place your answer between `""`, assign the correct answer to an object called `answer1_4`.*

In [None]:
answer1_4 = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
answer1_4

In [None]:
# check that the variable exists
assert 'answer1_4' in globals(
), "Please make sure that your solution is named 'answer1_4'"

# This test has been intentionally hidden. It will be up to you to decide if your solution
# is sufficiently good.

**Question 1.5** 
<br> {points: 5}

Improve the plot provided by the EDV party. 

This is a bit of an ambiguous question so we are providing you with some directions to help pass the tests.

- Use the "default" theme.
- Use `.mark_rect()` for the plotting type. 
- Makes sure you are mapping the columns `dissatisfaction_last_year` and `dissatisfaction_this_year` to the x and y-axis respectively and the count of observations to the colour channel. 
- For the x-axis, set `maxbins` to 30. 
- For the y-axis set `maxbins` to 30 and `extent=(0, 1)` within `alt.Bin()`.
- We will be checking for proper titles and well-formated axes labels, titles and subtitles. 
- Format the axis so that it is showing a percentage instead of a proportion


*Save the plot in an object named `workers_new_plot`.*

In [None]:
workers_new_plot = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
workers_new_plot

In [None]:
t.test_1_5_1(workers_new_plot)

In [None]:
t.test_1_5_2(workers_new_plot)

In [None]:
t.test_1_5_3(workers_new_plot)

**Question 1.6** 
    <br> {points: 1}
    
Is the amended plot `workers_new_plot` from **Question 1.5** communicating EDV's initial claim?

> People are dramatically more dissatisfied at work this year compared to last year. 


A) Yes, People are dramatically more dissatisfied this year compared to last year since we see the majority of people who had low percentages of dissatisfaction last year migrate to much higher percentages of dissatisfaction.

B) Yes, we can clearly see that the upper diagonal has many more counts explaining that people have increased their levels of dissatisfaction

C) No, Although many people who have had low levels of dissatisfaction last year now have higher levels, the narrative that "People are dramatically more dissatisfied" is not communicated in this plot. 

D) No, It appear that people have generally improved their percentage of job satisfaction. 

*Answer in the cell below using the uppercase letter associated with your answer. Place your answer between `""`, assign the correct answer to an object called `answer1_6`.*

In [None]:
answer1_6 = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
answer1_6

In [None]:
t.test_1_6(answer1_6)

**Question 1.7** 
    <br> {points: 1}
    
Which of the following claims is accurate? 

A) Job dissatisfaction dramatically increased from the previous year to this year.

B) Job dissatisfaction slightly increased from the previous year to this year.

C) Job dissatisfaction dramatically decreased from the previous year to this year.

D) Job dissatisfaction slightly decreased from the previous year to this year.

E) Cannot *firmly* make any of the claims above. 

*Answer in the cell below using the uppercase letter associated with your answer. Place your answer between `""`, assign the correct answer to an object called `answer1_7`.*

In [None]:
answer1_7 = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
answer1_7

In [None]:
t.test_1_7(answer1_7)

# 2. Higher wage rates for all!

### The data

Take a look at the `wages.csv`. It contains (fictional) data from different companies in the province where EDV has been elected and governing for the last few years. Each data point represents the average wage per hour an employee makes at a company.
All the major companies in this province were surveyed **before and after EDV came to power**.
However, this is not paired data, some companies are different from "before" and "after",
so you do not need to do a paired analysis.


**Question 2.1** 
    <br> {points: 1}

Read in the `wages.csv` data.  

*Assign your dataframe to a variable named `wages_df`*

In [None]:
wages_df = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
wages_df.head()

In [None]:
t.test_2_1(wages_df)

### The Plot and the Claim

#### Plot

![image.png](img/wage.png)


#### Claim

EDV claims that they have been able to produce a significant increase in the hourly wage for workers in the province.
This large raise almost certainly entails that most (if not all)
workers are making more money now than before EDV came into power.
EVD is trying to back up their claims with more robust estimates, so they have provided the median increase (which is even more notable) and the number of workers in each group (so that you can see that the samples are balanced).
This proves that if you vote for the EDV party in the next election for Plotting Prime Minister, there will be higher salaries for all workers!

**Question 2.2** 
<br> {points: 2}

Which of the following mistakes do the plots above commit? 

Select all that apply:

i) Important descriptive titles are omitted from the above plots

ii) The color choice is problematic

iii) The distribution of the data is hidden

iv) The scales do not all start at 0 which can distort the difference between the before and after statistics

v) The plot sizes distort the visibility of the data


Select all that apply and add them into a list named `answer2_2`. For example, if statement i and iv are both true, your solution will look like this:

```
answer2_2 = ["i", "iv"]
```

In [None]:
answer2_2 = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
answer2_2

In [None]:
t.test_2_2(answer2_2)

It's time to try your best to recreate the plots shown above. 
We are splitting this into 2 questions, 1 for each of the plots.

**Question 2.3** 
<br> {points: 2}


Recreate the first (median) bar plot of wages before and after the ADV party entered into power.

Since this can be a little difficult to get perfectly right, here are a few things we will be checking:

- The correct [theme](https://github.com/vega/vega-themes). Take a look at the different options and make sure you are enabling the correct one. 
- Titles, axes labels, titles and subtitles are correct and formatted correctly.  
- Colour (do not need to set specific colour tones here - default is what is being looked for, something else needs to be added in this encoding)
- Mapping
- Mark
- Scales 
- The bars are displayed in the correct order
- No legend is shown for the color channel

*Save the plot in an object named `median_plot`.*

In [None]:
median_plot = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
median_plot


In [None]:
t.test_2_3(median_plot)

**Question 2.4** 
<br> {points: 3}


Ok and now for the last one! Recreate the count bar plot of wages before and after the ADV party entered into power.

Again here are a few things we will be checking for:

- The correct [theme](https://github.com/vega/vega-themes). Take a look at the different options and make sure you are enabling the correct one. 
- Titles, axes labels, titles and subtitles are correct and formatted correctly.  
- Colour
- Mapping
- Mark
- scales 
- The bars are displayed in the correct order
- No legend is shown for the color channel
- The text is displayed in the correct places in the correct colour. You may need to use some of the parameters such as `align`, `baseline`, `dx` in the [`mark_text()`](https://altair-viz.github.io/gallery/bar_chart_with_labels.html#gallery-bar-chart-with-labels) method

*Hint: In this case, you will need to layer 2 plots, 1 `base_plot` with the counted values, and a `text_plot` with the designated text. These two plots should be combined to make a final plot named `count_plot`.* 

In [None]:
base_plot = None
text_plot = None
count_plot = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
count_plot


In [None]:
t.test_2_4_1(base_plot)

In [None]:
t.test_2_4_2(text_plot)

**Question 2.5** 
<br> {points: 1}

Display the plots on top of each other using either [`.vconcat()`](https://altair-viz.github.io/user_guide/compound_charts.html?highlight=vconcat#vconcat-chart) or [`.hconcat()`](https://altair-viz.github.io/user_guide/compound_charts.html?highlight=vconcat#hconcat-chart) so that it is display like the plots above. 

*Although this isn't taught in this module, we thought we would give you a sneak peek into the next module by allowing you an opportunity to use it here. Look at the documentation in the links provided for more details.*

*Save the combined plots in an object named `wages_og_plots`.*

In [None]:
wages_og_plots = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
wages_og_plots

In [None]:
t.test_2_5(wages_og_plots)

**Question 2.6** 
<br> {points: 2}

Are the plots above misleading? 

A) Yes, the plot appears to show the median when a better representation would have been the mean.

B) Yes, the plots above do not give a thorough idea of what the data looks like by only plotting a 2 summary statistics.

C) Yes, the plot appears to not show an increase in median wage when we know that's not true.

D) No, the plot is correctly reflecting that EDV produced a significant increase in the average hourly wage for workers in the province.

*Answer in the cell below using the uppercase letter associated with your answer. Place your answer between `""`, assign the correct answer to an object called `answer2_6`.*

In [None]:
answer2_6 = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
answer2_6

In [None]:
# check that the variable exists
assert 'answer2_6' in globals(
), "Please make sure that your solution is named 'answer2_6'"

# This test has been intentionally hidden. It will be up to you to decide if your solution
# is sufficiently good.

**Question 2.7** 
<br> {points: 5}

Time to improve the plots provided by the EDV party! 

This is a bit of an ambiguous question so we are providing you with some directions to help pass the tests.

- Make a plot that has the following criteria and name this plot `wages_new_plots`.
- Use the "default" theme.
- Use `.mark_tick()` for the plotting type. 
- Makes sure you are mapping the columns `wage` and `when` to the x and y-axis respectively. It's also useful to map the `when` column to the colour channel as well. 
- For the x-axis, set the axis format to dollars (\$) and stop the axis from starting at zero (*Hint:[scale](https://stackoverflow.com/questions/62281179/how-to-adjust-scale-ranges-in-altair)*). 
- For the y-axis, do not include an axis label.
- There is no need to include a legend for the colour channel in this case so make sure you remove it in the plot.

In [None]:
wages_new_plots = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
wages_new_plots 

In [None]:
t.test_2_7_1(wages_new_plots)

In [None]:
t.test_2_7_2(wages_new_plots)

In [None]:
t.test_2_7_3(wages_new_plots)

**Question 2.8** 
    <br> {points: 1}
    
Is the amended plot `wages_new_plots` from **Question 2.7** communicating EDV's initial claim?

> They have been able to produce a significant increase in the hourly wage for workers in the province. This large raise almost certainly entails that most (if not all) workers are making more money now than before EDV came into power.

A) Yes, the plots above appears to show that with EDV in power, most (if not all) company's are paying their workers more money now than before EDV came into power. 

B) Yes, the plot appears shows an increase in median wage for when EDV was in power.

C) No, the plots appears to show that with EDV in power, most company's are paying their workers less money now than before EDV came into power. 

D) No, the plot appears to show that although many companies did increase wage rates for workers when EDV came into power, many companies wage rates decreased significantly. 

*Answer in the cell below using the uppercase letter associated with your answer. Place your answer between `""`, assign the correct answer to an object called `answer2_8`.*

In [None]:
answer2_8 = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
answer2_8

In [None]:
t.test_2_8(answer2_8)

# 3. Guns to the people

### The data

The `guns.csv` dataset contains yearly data for the number of murders by firearms in Florida.
This is based on real [data](https://www.livescience.com/45083-misleading-gun-death-chart.html) and a real [visualization ](https://cdn.mos.cms.futurecdn.net/h5MSdPM97fm55kTk4kk4P7-1024-80.jpg.webp)that attracted quite a lot of negative attention, which you can read more about online. We are going to be using fictitious data.

Background: The ["Stand your ground"](https://en.wikipedia.org/wiki/Stand-your-ground_law) law states that people may use extreme force to defend themselves against serious and life-threatening crimes.  



**Question 3.1** 
    <br> {points: 1}

Read in the data `guns.csv`.  

*Assign your data to a variable named `guns_df`*

In [None]:
guns_df = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
guns_df.head()

In [None]:
t.test_3_1(guns_df)

### The Plot and the Claim

#### Plot

![image.png](img/guns.png)


#### Claim

Looking at the plot above, we can see that when the gun law was enacted in Florida, deaths from firearms plummeted.
EVD believes feels that giving people guns lowers the death by gun violence. Why wouldn't we enact this?! 
The freedom to own a gun **and** fewer firearm-related deaths? Seems like a win to us! 


Vote for the EDV party in the next election for Plotting Prime Minister!

**Question 3.2** 
<br> {points: 2}

Which of the following mistakes does the plot above commit? 

Select all that apply:

i) The plot should not be using an area plot to communicate these results. 

ii) There is no proper title for the plot. 

iii) The scaling of the axes produces a range that's far too small and adds to the inability to read the plot effectively. 

iv) The y-axis label could be improved to be more informative and communicative.

v) The y-axis is reversed.

vi) The x-tick values are poorly formatted. Adding uncertainty to the axis value.

vii)  The color choice is problematic. Red could be used with negative insights.


Select all that apply and add them into a list named `answer3_2`. For example, if statement i and iv are both true, your solution will look like this:

```
answer3_2 = ["i", "iv"]
```


In [None]:
answer3_2 = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
answer3_2

In [None]:
t.test_3_2(answer3_2)

**Question 3.3** 
<br> {points: 3}

It's important to know what's possible when making plots, so let's attempt to recreate the plot shown. At least now we can hope to educate the party on where they went wrong in their code. Do you think they will listen to reason?

This plot will be split into 4 questions to walk you through the process a bit more since this is a challenging plot to make.

We will start with the area plot (with no points or lines yet).

As we have done in the past, we are providing several different tests so you can see if you are on your way to getting things.

Start by making an area plot mapping `Year` and `Firearm murders` to the correct axis, using an opacity of 0.9 and a colour equal to `#b0272f`. Make sure that the x-axis specifies no gridlines and the y-axis has a reversed scale. Give the y-axis the correct label.

*Save the plot in an object named `area_plot`.*

In [None]:
area_plot = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
area_plot

In [None]:
t.test_3_3_1(area_plot)

In [None]:
t.test_3_3_2(area_plot)

In [None]:
t.test_3_3_3(area_plot)

**Question 3.4** 
<br> {points: 3}

Great! We have a base now! Next, we need to make a plot that has the text on it add it to our area base plot. 

First make a text mark using `.mark_text()`. The data that you use to make this plot should be filtered to only include the `Year` 1986 which is when the gun law was enacted. 
The font for your text should be size 14, white in color with `dx` and `dy` positions of 60 and -10 respectively. Set the text argument in `.encode()` to the desired label. You will need to use `text=alt.value(...)`. This plot should be named `text_plot`. 

We have added the two plots together for you and saved them in an object named `text_area_plot` (Order matters when we layer plots together). 

In [None]:
text_plot = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
text_area_plot = area_plot + text_plot
text_area_plot

In [None]:
t.test_3_4_1(text_plot)

In [None]:
t.test_3_4_2(text_area_plot)

**Question 3.5** 
<br> {points: 2}

Lastly, to complete this plot, add `text_area_plot` with a circle plot that was chained from our base `area_plot`. The colours of the points should be black. 

*Save this layered plot in an object named `guns_og_plot`.*

In [None]:
guns_og_plot = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
guns_og_plot

In [None]:
t.test_3_5(guns_og_plot)

That was a lot of work but we did it! Let's now discuss this plot. 

**Question 3.6** 
<br> {points: 2}

Would you say the plot above is misleading? 

A) Yes, because of the nature of the plot, it attempts to communicate that firearm murders decreased. 

B) Yes, although firearm murders did decrease over time it was not because of the gun law that was enacted. 

C) Yes, but it's because firearm murders steadily increased over time and not because of the gun law that was enacted.

D) No, the plot is correctly reflecting that firearm murders decreased after the gun law was enacted..

*Answer in the cell below using the uppercase letter associated with your answer. Place your answer between `""`, assign the correct answer to an object called `answer3_6`.*

In [None]:
answer3_6 = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
answer3_6

In [None]:
# check that the variable exists
assert 'answer3_6' in globals(
), "Please make sure that your solution is named 'answer3_6'"

# This test has been intentionally hidden. It will be up to you to decide if your solution
# is sufficiently good.

**Question 3.7** 
<br> {points: 6}

Using the question above as inspiration, improve the plot from EDV so that it communicates the data more effectively. 

We are providing you with some directions to help pass the tests.

- Convert  the `Year` column in `guns_df` to dtype `datetime64`. This can be done either by loading in the data again or converting it.
- Make an area plot and save it in an object named `area_good_plot` for the plotting type. 
    - Map the `Year` and `Firearm murders` columns to the x and y-axis respectively. 
    - Set the colour to `#b0272f` and opacity to 0.9.
    - Make sure to remove the grid axis lines on the x-axis.
    - Don't forget to add a title. 
- Like you did in **question 3.4**, make another text mark plot named `text_good_plot` using `.mark_text()`.
    - For this plot, use data filtered to only include the `Year` 1986 which is when the gun law was enacted. 
    - The font for your text should be size 16, have the text be "Gun law enacted in white ink with `dx` and `dy` positions of 60 and 15 respectively.
    - Set the text argument in `.encode()` to the desired label. You will need to use `text=alt.value(...)`. 
- Chain off from the `area_good_plot` to make a line plot named `line_good_plot` using `.mark_line()`. You will only need to assign it to a color of black and an opacity of 1. 
- Repeat the bullet point above but making points on the graph using `.mark_circle()` and naming it `circle_good_plot`. Again set color and opacity to black and 1 respectively. 
- Add `area_good_plot`, `line_good_plot`, `circle_good_plot` and `text_good_plot` together to make a layered plot and name the plot `guns_new_plot`.

*Save the plot in an object named `guns_new_plot`.*

In [None]:
area_good_plot = None
text_good_plot = None
line_good_plot = None
circle_good_plot = None
guns_new_plot = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
guns_new_plot

In [None]:
t.test_3_7_1(area_good_plot)

In [None]:
t.test_3_7_2(text_good_plot)

In [None]:
t.test_3_7_3(line_good_plot)

In [None]:
t.test_3_7_4(circle_good_plot)

In [None]:
t.test_3_7_5(guns_new_plot)

**Question 3.8** 
    <br> {points: 1}
    
Is the amended plot `guns_new_plot` from **Question 3.7** communicating EDV's initial claim?

> Giving people guns lowers the death by gun violence. 


A) Yes, the firearm deaths in Florida did decreased slightly after the "Gun Law" was enacted in 1986. 

B) Yes,  the firearm deaths in Florida decreased substantially after the "Gun Law" was enacted in 1986. 

C) No, firearm deaths in Florida increased slightly since the "Gun Law" was enacted in 1986. 

D) No, firearm deaths in Florida increased substantially since the "Gun Law" was enacted in 1986. 

*Answer "Yes" or "No" as a string in an object called `answer3_8`.*

In [None]:
answer3_8 = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
answer3_8

In [None]:
t.test_3_8(answer3_8)

# 4 Astronomical Approval Ratings 

### The data

Next on our list is to compare different a sample of voters from the province EDV currently governs and the votes from a neighbouring province who's citizens have very similar demographics, laws and standards of living.

The (fictional) `province-comparison.csv` dataset contains voters from each province who rated the party in power on a scale from 0 to 10. 

**Question 4.1** 
    <br> {points: 1}

Read in the data `province-comparison.csv`.  

*Assign your data to a variable named `province_df`.*

In [None]:
province_df = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
province_df.head()

In [None]:
t.test_4_1(province_df)

**Question 4.2** 
    <br> {points: 1}

How many data points are their for each province (split equally) in `province_df`?  

*Assign your answer to a object of type `int` named `data_points`.*

In [None]:
data_points = None 

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
data_points

In [None]:
t.test_4_2(data_points)

### The Plot and the Claim

#### Plot

![image.png](img/provinces.png)


#### Claim

EDV's approval ratings are through the roof!
Clearly, we doing a better job than the governing party in the province next door.
And as you can see, we sampled enough people to get smooth curves, which means that this difference is both likely to be statistically significant and the magnitude is just too big to ignore.

If you want a high approval rating in the future, vote for the EDV party in the next election for Plotting Prime Minister!

**Question 4.3** 
<br> {points: 2}

Which of the following mistakes does the plot above commit? You may want to look at the data a bit more to get an idea of what's going on.  

Select all that apply:

i) The plot should not be using a density plot to communicate these results. 

ii) The title is not appropriate, informative or insightful. 

iii) The plot is created from very little data.

iv) The x-axis label could be improved to be more communicative.

v) The colours are too similar and instead should be more contrasting.


Select all that apply and add them into a list named `answer4_3`. For example, if statement i and iv are both true, your solution will look like this:

```
answer4_3 = ["i", "iv"]
```

In [None]:
answer4_3 = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
answer4_3

In [None]:
t.test_4_3(answer4_3)

**Question 4.4** 
<br> {points: 5}

Let's get to work and see how EVD made their plot. 

Just like in the previous questions, we are providing several different tests so you can see if you are on your way to getting the final visualization.

- Using the `province_df` data, use `transform_density()` to create the densities of the 2 groups in the `where` column for `the value` column. Name the new densities `density` in the `as_` argument.
- Chaining from the previous method, create an area plot where `the value` and `density` are mapped to the x and y-axes.
- Make sure to format the x-axis ticks with a `format='s'`.
- Map the `where` column to the color channel, and make sure to scale with a range equal to the colours `steelblue` and `aquamarine`. Orient the legend at the top of the plot and do not give the legend a title.
- Remove the entire grid from the plot using the `grid` argument in `.configure_axis()`
- remove the border around the plot with the `strokeWidth` argument in `.configure_view()`.
- Make sure the plot has an opacity of 0.4.


*Save the plot in an object named `area_plot`.*

In [None]:
province_og_plot = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
province_og_plot

In [None]:
t.test_4_4_1(province_og_plot)

In [None]:
t.test_4_4_2(province_og_plot)

In [None]:
t.test_4_4_3(province_og_plot)

In [None]:
t.test_4_4_4(province_og_plot)

In [None]:
t.test_4_4_5(province_og_plot)

**Question 4.5** 
<br> {points: 2}

Would you say the plot above is misleading? 

A) Yes, it's covering too much information about the neighboring province's density. 

B) Yes, the plot appears to communicate that EDV's province is rated significantly higher than the neighboring province without knowing how much data is being plotted.

C) Yes, since it's really the neighbouring province who's province is rated significantly higher than EDV's.

D) No, the plot is correctly reflecting that EDV's province rating is higher than it's neighbours.

*Answer in the cell below using the uppercase letter associated with your answer. Place your answer between `""`, assign the correct answer to an object called `answer4_5`.*

In [None]:
answer4_5 = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
answer4_5

In [None]:
# check that the variable exists
assert 'answer4_5' in globals(
), "Please make sure that your solution is named 'answer4_5'"

# This test has been intentionally hidden. It will be up to you to decide if your solution
# is sufficiently good.

**Question 4.6** 
<br> {points: 4}

Using the question above as inspiration, improve the plot from EDV so that it communicates the data more effectively. 

We are providing you with some directions to help pass the tests.

- Make a plot using `mark_point()` with a point size equal to 50. 
- Assign it a title, and subtitle expressing any insights or conclusions. (this can be done using the title argument in `alt.Chart()` and setting `text` and `subtitle` in `alt.TitleParams()`.
- Map `the value` column to the x-axis and the `where` column to the y axis. 
- Make sure to format the x-axis labels with `format=s`. Do not forget to give it a communicative axis label as well. 
- The y-axis, does not need an axis label since the categories sufficiently explain what is being graphed. 
- Configure the axis with a title and label font size both equal to 12.


*Save the plot in an object named `province_new_plot`.*

In [None]:
province_new_plot = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
province_new_plot

In [None]:
t.test_4_6_1(province_new_plot)

In [None]:
t.test_4_6_2(province_new_plot)

In [None]:
t.test_4_6_3(province_new_plot)

**Question 4.7** 
    <br> {points: 1}
    
Is the amended plot `province_new_plot` from **Question 4.6** communicating EDV's initial claim?

> EDV's doing a better job than the governing party in the province next door.


A) Yes, EDV's ratings are significantly higher than the party's in the neighbouring province. 

B) Yes, EDV's ratings are slightly higher than the party's in the neighbouring province. 

C) No, EDV's ratings are significantly lower than the party's in the neighbouring province. 

D) No, EDV's ratings are slightly lower than the party's in the neighbouring province. 

E) No, their isn't enough data to confidently conclude that EDV is rated better than the neighbouring provincial party.


*Answer in the cell below using the uppercase letter associated with your answer. Place your answer between `""`, assign the correct answer to an object called `answer4_7`.*

In [None]:
answer4_7 = None

# your code here
raise NotImplementedError # No Answer - remove if you provide an answer
answer4_7

In [None]:
t.test_4_7(answer4_7)

## Before Submitting 

Before submitting your assignment please do the following:

- Read through your solutions
- **Restart your kernel, clear output and rerun your cells from top to bottom** 
- Makes sure that none of your code is broken 
- Verify that the tests from the questions you answered have obtained the output "Success"

This is a simple way to make sure that you are submitting all the variables needed to mark the assignment. This method should help avoid losing marks due to changes in your environment.  

## Attributions

- MDS DSCI 531: Data Visualization I - [MDS's GitHub website](https://github.com/UBC-MDS/DSCI_531_viz-1) 
