# Lab 6: Group, Join, Conditionals, Iteration, Randomness. 
## Due Tuesday November 13th, 9:00am

Welcome to lab 6! This week, we will get a bit more practice with grouping and joining tables, using iteration and simulations, and practicing the concept of randomness and probability. This material is covered in [Chapter 9](https://www.inferentialthinking.com/chapters/09/randomness.html) and the prior chapters (e.g., [Chapter 8.4](https://www.inferentialthinking.com/chapters/08/4/Joining_Tables_by_Columns)).

In [None]:
# Don't change this cell; just run it. 

import numpy as np
from datascience import *

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

from client.api.notebook import Notebook
ok = Notebook('lab06.ok')
_ = ok.auth(inline=True)

**Important**: The `ok` tests don't usually tell you that your answer is correct. More often, they help catch careless mistakes. It's up to you to ensure that your answer is correct. If you're not sure, ask someone (not for the answer, but for some guidance about your approach).

## 1. Extravaganza Lineup

Every spring, UCSB hosts Extravaganza, a one-day on-campus music festival. The following questions are based on last year's festival. The AS Program Board (an on-campus organization tasked with organizing entertainment events) sends out a survey to UCSB students asking for their suggestions for music artists. The instructions in the survey specify that each student should select a first choice artist (rank 1), a second choice artist (rank 2), and a third choice artist (rank 3). Run the following cell to see how the first several students responded.

In [None]:
survey = Table().read_table("survey.csv")
survey

After these responses come in, however, the AS Program Board notices that their survey form does not actually enforce that each participant choose a single first choice artist, a single second choice artist, and a single third choice artist. Run the cell below to see an example of a student who did not follow the survey's instructions. 

In [None]:
survey.where("Perm Number", are.equal_to(5978341))

The AS Program Board decides to identify all students who did not follow the survey's instructions, delete their votes from the table, and email the students to tell them that their votes did not comply with the rules, and that they should revote if they want to have a say in the Extravaganza lineup. 

The email addreses of all students are available in the student database, a portion of which is displayed in the table below.

In [None]:
database = Table().read_table("student_data.csv")
database

**Question 1.1**  Use the survey data and the student database information to identify the students who did not follow the survey's instructions. Make an array called `violators` that contains the email address of all students who did not follow the survey's instructions. 

**Hint 1:** *Use the `group` command with second argument `list` to see how each student voted.*

**Hint 2:** *Sort the data so that each student who voted correctly has a rank list that looks like `[1, 2, 3]`.*

In [None]:
violators = ...
violators

In [None]:
_ = ok.grade('q1_1')

**Question 1.2** Now delete the rows from the table `survey` that correspond to the voters in `violators`. After this, the table `survey` should have only valid votes in it.

In [None]:
survey = ...
survey

In [None]:
_ = ok.grade('q1_2')

After months of collecting votes and contacting artists, the 2017-18 UC Santa Barbara Extravaganza lineup has officially been released! Run the following cell to see a table of the scheduled performers and their respective numbers of Instagram followers (in thousands).

In [None]:
extravaganza_performers = ["Dillon Francis", "Charli XCX", "Cardi B", "Coast Modern"]
extravaganza_instagram = [2105, 3101, 35100, 15.5]

extravaganza_lineup = Table().with_columns("Artists", extravaganza_performers, "Instagram", extravaganza_instagram)
extravaganza_lineup.show()

As we come closer to the event, we get word that one of the performers is unable to make it to Extravaganza, and has been replaced by another performer instead! Run the following cell to see a table of the new lineup and their respective numbers of Twitter followers (in thousands).

In [None]:
new_performers = ["Dillon Francis", "Charli XCX", "DRAM", "Coast Modern"]
new_twitter = [1057, 3249, 123, 8]

new_lineup = Table().with_columns("Performers", new_performers, "Twitter", new_twitter)
new_lineup.show()

**Question 1.3** Use the `join` method to join these two tables together so each row contains the name of the performer, their number of Instagram followers (in thousands), and their number of Twitter followers (in thousands). Save this new table into the variable `lineup_data`.

In [None]:
lineup_data = ...
lineup_data

In [None]:
_ = ok.grade('q1_3')

**Question 1.4** You should notice that a couple of artists are missing. Which ones are missing and why are they not in the new table?

<hr style="color:Maroon;background-color:Maroon;border:0 none; height: 3px;">
Replace this text with your answer
<hr style="color:Maroon;background-color:Maroon;border:0 none; height: 3px;">

**Question 1.5** Let's add `DRAM` back into the `lineup_data` table so that we can see all the artists who actually performed at Extravaganza last year. DRAM currently has 294.8 thousand Instagram followers and 122.5 thousand Twitter followers.

In [None]:
lineup_data = ...
lineup_data

In [None]:
_ = ok.grade('q1_5')

## 2. Getting Hold of Your Friend

You are excited to go to Extravaganza but you don't want to go alone! You are trying to get a hold of your friend to see if they want to go to Extravaganza with you. However, each time you call your friend, the probability that they answer their phone is 1⁄3. If you call your friend two times today, what is the chance that you will talk to them?

Here is the equation to help you find the probability:

$$P(reaching\ your\ friend\ at\ least\ once\ in\ N\ times) = 1 - P(not\ reaching\ your\ friend\ all\ N\ times)$$

You can find out more about this equation in the textbook [here](https://www.inferentialthinking.com/chapters/09/5/Finding_Probabilities#at-least-one-success) under "At Least One Success" for an example on the probability of rolling a 6 on a die. 

**Question 2.1** Let's first calculate the probability that your friend will not answer the phone both times you call them.

In [None]:
no_answer = ...
no_answer

In [None]:
_ = ok.grade('q2_1')

**Question 2.2** Now that we have the probability of your friend not answering both times, let's calculate the probability that you will reach your friend at least once out of the two calls (using the formula from above).

In [None]:
answered = ...
answered

In [None]:
_ = ok.grade('q2_2')

## 3. Memes

<img src="silicon-meme.jpg" width=40%><img src="reaction-meme.jpg" width=40%>

Twitter has just hired you to analyze some of its most popular memes! Run the following cell to see a table of information on recent Twitter posts that contained memes of certain popular formats. For each Twitter post, the table contains
* The format of the meme in the post. For example, the format *Chemistry Cat* shows a cat dressed up as a scientist in a chemistry lab. The sign above shows a question that is answered below with a witty comment involving a chemical element or a chemistry concept. Two examples of a meme in this format are shown above.
* The Twitter handle (username) of the person who made the post.
* The number of retweets (shares).
* The number of likes.
* The number of days from when the post was generated to when you got the dataset.


In [None]:
memes = Table.read_table('memes.csv')
memes.show()

**Question 3.1** Twitter is interested in determining which meme formats get the most retweets and likes. Calculate the total number of retweets and likes associated with each of the meme formats, and save a table of these results in a variable called `retweets_likes`. Your table should have three columns, containing, from left to right:
* The format of the meme.
* The total number of retweets for all memes with this format.
* The total number of likes for all memes with this format.

In [None]:
retweets_likes = ...
retweets_likes

In [None]:
_ = ok.grade('q3_1')

**Question 3.2** The total number of retweets and likes should be taken relative to the number of days since the meme was posted, because memes that have been posted for longer will naturally have more of a chance to gather retweets and likes. For each meme format, calculate the number of days since a meme of that format was first posted, and add a column with these results to the table `retweets_likes`, saving your new table in a variable called `retweets_likes_age`.

In [None]:
retweets_likes_age = ...
retweets_likes_age

In [None]:
_ = ok.grade('q3_2')

**Question 3.3** Rank the meme formats by popularity, where the popularity of a meme format is measured as the total number of retweets and likes per day since the meme format was originally posted.

Create an array called `popular_memes` that contains the meme formats ranked by popularity, so that the most popular meme is first in the array, and the least popular meme is last.

In [None]:
popular_memes = ...
popular_memes

In [None]:
_ = ok.grade('q3_3')

## 4. Yahtzee 

In the dice game Yahtzee, players roll and reroll dice, trying to meet certain objectives. A player rolls five dice on the first roll, and after looking at the results, *can choose to* reroll any number of them on the second roll. Similarly, after looking at the results of the second roll, the player can choose to reroll any number of those for the third roll. After the third roll, no more rolling is allowed.

One objective in Yahtzee is to roll as many 6's as possible. The standard strategy is as follows:
* Roll all five dice.
* Keep any that are 6's. Reroll all other dice.
* Keep any that are 6's. Reroll all other dice.
The number of 6's at the end of this process determined the player's score. 

**Question 4.1** Create an array called `my_dice` that contains the results of a first Yahtzee roll (that is, five random numbers between 1 and 6).

In [None]:
my_dice = ...
my_dice

In [None]:
_ = ok.grade('q4_1')

**Question 4.2** Define a function called `reroll()` that takes no inputs and does not return any value, but changes the contents of `my_dice` to show the results after one additional roll. Your function should implement the standard strategy for rolling 6's, that is, keep all dice that were a 6 and reroll all other dice.

**Hint** *You can test out your function by repeatedly rerolling. Since you are keeping all the 6's you ever roll, eventually you should get all 6's by repeatedly rerolling.*

In [None]:
def reroll():
    ...
        
reroll()
my_dice

Now, practice taking a complete turn at Yahtzee, and see how many 6's you can get! Re-run the code cell from Question 4.1 to roll new dice. Then use your `reroll()` function twice, and calculate the number of 6's you have at the end of your turn. 

In [None]:
# Practice taking a turn here. How many 6's did you get?


**Question 4.3** Now, use a `for` loop to help you take 100,000 turns at Yahtzee. On each turn, you should roll the dice, reroll them twice, and calculate the number of 6's you have at the end of your turn. Create an array called `sixes` that contains the number of 6's you had at the end of each turn. This array should have 100,000 entries.

**Hint** Try taking 10 turns with a `for` loop. Once you are sure you have that figured out, change it to 100,000 turns. It will take a little while (about a minute) for Python to perform the calculations when you are doing 100,000 turns.

**Hint** You may need more than one `for` loop. (Try using a nested `for` loop).

In [None]:
sixes = ...    
sixes

In [None]:
_ = ok.grade('q4_3')

**Question 4.4** Use the data you have collected to approximate the number of 6's would you expect to get in one turn using this strategy. Store your result in a variable called `expected_sixes.` 

Note that this does not need to be a whole number. For example, if you collected data by repeatedly flipping three coins, you would say that the expected number of heads you see is about 1.5.

In [None]:
expected_sixes = ...
expected_sixes

In [None]:
_ = ok.grade('q4_4')

**Question 4.5** Use the data you have collected to approximate the most commonly rolled number of 6's when taking a single turn using this strategy. Store your result in a variable called `most_common_sixes.`

Note that this does need to be a whole number, because it is not very common at all to roll a non-integer number of sixes in a single turn of Yahtzee.

In [None]:
most_common_sixes = ...
most_common_sixes

In [None]:
_ = ok.grade('q4_5')

Congratulations, you completed Lab 6!

To submit:

1. Select `Run All` from the `Cell` menu to ensure that you have executed all cells, including the test cells. 
2. **Save and Checkpoint** from the `File` menu,
3. Read through the notebook to make sure everything is fine.
4. Submit using the cell below.

In [None]:
_ = ok.submit()