# "Fun with Loot Boxes" Lab

> Author: Caroline Schmitt, Matt Brems

### Scenario:

You're an analyst for [Zynga](https://en.wikipedia.org/wiki/Zynga), a gaming studio working on an event for an MMO (massively multiplayer online) game. This event is going to include **loot boxes**.

<img src="https://vignette.wikia.nocookie.net/2007scape/images/0/06/Culinaromancer%27s_chest.png/revision/latest?cb=20180403231423" alt="drawing" width="150"/> 

A loot box is basically a treasure chest in a game. This loot box can be opened to reveal a variety of items: some items are very rare and valuable, other items are common and less valuable. (You may consult [the esteemed Wikipedia](https://en.wikipedia.org/wiki/Loot_box) for a more extensive definition.)

In our specific game, suppose that loot boxes can be obtained in one of two ways: 
- After every three hours of playing the game, a user will earn one loot box.
- If the user wishes to purchase a loot box, they may pay $1 (in real money!) for a loot box.

These loot boxes are very good for our business!
- If a player earns a loot box, it means they are spending lots of time on the game. This often leads to advertisement revenue, they may tell their friends to join the game, etc.
- If the player purchases a loot box, it means we've earned $1 from our customer.

Suppose each loot box is opened to reveal either:
- magical elixir (super rare, very valuable), or
- nothing.

Whether each loot box contains the elixir or nothing is **random**. Our boss wants some guidance on what sort of randomness to use on these loot boxes! 
- If the magical elixir is too rare, then users may not be motivated to try to get them, because they believe they'll never find the magical elixir.
- If the magical elixir is too common, then users may not be motivated to try to get them, because the game has so much of the magical elixir that it isn't worthwhile to try to get it.

However, our boss isn't a math-y type person! When explaining things to our boss, we need to explain the impact of our choices on the game as concretely as possible.

### Version 1
**NOTE**: No Python is needed to answer questions 1, 2, 3a, and 3b.


In our first version of the game, we'll say that loot boxes contain magical elixir 15% of the time and nothing 85% of the time.

#### 1. Our boss asks, "If a user buys 100 loot boxes, how many elixirs will they get?" How would you respond?

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

In [1]:
# ANSWER: If a user buys 100 loot boxes and loot boxes contain elixir 15% of the time, the user should get 15
#         elixir drops. 

#### 2. Our boss asks, "How many loot boxes does someone have to purchase in order to definitely get elixir?" How would you respond?

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

In [2]:
# Since the elixr has a 15% drop rate. A user would have to buy 86 loot boxes in order for at least 1 elixir drop.

#### 3. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" This is a bit more complicated, so let's break it down before answering.

#### 3a. Let's suppose my random variable $X$ counts up how many elixirs I observe out of my 100 loot boxes. Why is $X$ a discrete random variable?

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

In [3]:
# A discrete random variable have outcomes which are countable. We can count our list of elixirs we get.

#### 3b. Recall our discrete distributions: discrete uniform, Bernoulli, binomial, Poisson. Let's suppose my random variable $X$ counts up how many elixirs I observe out of my 100 loot boxes. What distribution is best suited for $X$? Why?
- Hint: It may help to consider getting the magical elixir a "success" and getting nothing a "failure." 

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

In [4]:
# A Binomial Distribution would be best suited for X. Since we have a set number of "trials" and we can count
# the number of "successes" or elixir drops we recieve.

#### 3c. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" Use the probability mass function to answer the boss' question.

In [5]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

In [6]:
# Your answer should match your result for 3d
# Show your work; leave your answer in a comment.

# We need n trial, p probability of success, and x successes

p_elixir = .15  #the percentage drop rate for exlixir
n_boxes = 100  #the number of loot boxes user opens

elixir_dist = stats.binom(n_boxes, p_elixir) 

possible_drops = list(range(0, 101)) # This is the amount of trial

c_sum = 0
for i in range(21,100):   
    c_sum += elixir_dist.pmf(possible_drops)[i]  # This is calculating the probability of success after 20 drops
                                                 # .pmf() needs the amount of times something is happening
        
print(c_sum) # This prints the probability of success that a user get more than 20 drops of elixir when opening 100 boxes


0.06631976581888226


In [7]:
# If a user earns 100 loot boxes there is a 6.6% chance that a user gets more than 20 elixirs

#### 3d. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" Use the cumulative distribution function to answer the boss' question.

In [8]:
# Your answer should match your result for 3c
# Show your work; leave your answer in a comment.

elixir_dist.cdf(20)
1- elixir_dist.cdf(20)

0.06631976581888166

In [9]:
# If a user earns 100 loot boxes there is a 6.6% chance that a user gets more than 20 elixirs

**NOTE**: You should get the same result for questions 3c and 3d. Double check to make sure this is the case.

#### 3e. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" Answer your boss' question. *Remember that your boss is not a math-y person!*

In [10]:
# If a user earn 100 loot boxes there is a 6.6% chance that that user gets more than 20 elixirs

#### 4. Your boss wants to know how many people purchased how many loot boxes last month. 
> For example, last month, 70% of users did not purchase any loot boxes. 10% of people purchased one loot box. 5% of people purchased two loot boxes... and so on.

#### 4a. Recall our discrete distributions: discrete uniform, Bernoulli, binomial, Poisson. Let's suppose my random variable $Y$ counts up how many loot boxes each person purchased through the game last month. What distribution is best suited for $Y$? Why?

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

In [11]:
# Poisson distribution would be best suited for Y because we are counting the number of 
# successes(loot box purchses) over a set time period.

#### 4b. Suppose that, on average, your customers purchased 2.7 loot boxes last month. In order for your revenue to be at least $500,000, at least how many users would you need on your platform? (Round your answer up to the nearest thousand.) 

In [12]:
# Show your work; leave your answer in a comment.
500_000 /2.7

185185.18518518517

In [13]:
# We would need 186,000 users on our platform in order for our revenue to be at least $500,000

#### 4c. Assume that your platform has the number of users you mentioned in your last answer. Suppose that your platform calls anyone who purchases 5 or more loot boxes in a month a "high value user", and anyone who purchases 0-4 loot boxes in a month a “low value user”. How much money do you expect to have earned from "high value users?" How about "low value users?"

**Hints**: 
- Remember that users buying different amounts of loot boxes will each contribute a different amount of money.
- You only need to calculate either HVU (high value users) or LVU (low value users) revenue. You can calculate one of these and then subtract it from the total expected earnings to find the other. *Hint: Which one is possible to calculate?*

In [28]:
# Show your work; leave your answer in a comment.
# People who purchase 5 or more loot boxes in a month are "high value users"
# People who purchase 0-4 loot boxes in a month are "low value users"
# How much money do we expect to earn from "high value users" and "low value users"

# This is the average amount of purchased loot boxes last month per person
average_boxes_purchased = 2.7 

boxes_dist = stats.poisson(average_boxes_purchased)
boxes_dist.cdf(4)

0.8629078626825668

In [29]:
# Money generated from low value users
low_value_money = boxes_dist.cdf(4) * 500_000
f" ${round(low_value_money, 2)} is the money generated from 'low value users'"

" $431453.93 is the money generated from 'low value users'"

In [30]:
# Money generated from high value users
high_value_money = 500_000 - low_value_money
f" ${round(high_value_money, 2)} is the money generated from 'high value users'"

" $68546.07 is the money generated from 'high value users'"

#### 4d. Suppose that you want to summarize how many people purchased how many loot boxes last month for your boss. Since your boss isn't math-y, what are 2-4 summary numbers you might use to summarize this for your boss? (Your answers will vary here - use your judgment!)

In [None]:
# We would need approximately 185,186 users on our platform to generate $500,000
# People who purchased 5 or more boxes only accounted for aroud $68,546 in revenue
# People who purchased 0-4 loot boxes accounted for $431,453 in revenue

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

#### 5. Your boss asks "How many loot boxes does it take before someone gets their first elixir?" Using `np.random.choice`, simulate how many loot boxes it takes somone to get their first elixir. 
- Start an empty list.
- Use control flow to have someone open loot boxes repeatedly.
- Once they open a loot box containing an elixir, record the number of loot boxes it took in the empty list.
- Repeat this process 100,000 times. 

This simulates how long it takes for someone to open a loot box containing elixir. Share the 5th, 25th, 50th, 75th, and 95th percentiles.

> You may find [this documentation](https://docs.scipy.org/doc//numpy-1.10.4/reference/generated/numpy.random.choice.html)  and [this documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html) helpful.

In [None]:
import numpy as np

In [35]:
number_of_boxes_until_elixir = []
possible_outcomes = range(1,101)
n_opens = 100_000

# getting the number of boxes until you get an elixir
number_of_boxes_until_elixir = [] # Create an empty list


count = 0
for i in range(n_opens):
    open_box = np.random.choice(possible_outcomes) # Open a loop box
    
   
    if open_box <= 15: 
        count += 1
        number_of_boxes_until_elixir.append(count)
        count = 0
    else:
        count += 1
number_of_boxes_until_elixir[:15]

[3, 19, 6, 5, 6, 4, 1, 12, 7, 2, 7, 1, 1, 5, 5]

In [34]:
list_of_percentiles = [5, 25, 50, 75, 95] # list of all percentiles

percentile_solutions = [np.percentile(number_of_boxes_until_elixir, percentile) for percentile in list_of_percentiles]
percentile_solutions

[1.0, 2.0, 5.0, 9.0, 19.0]

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

### Version 2 
**NOTE**: No Python is necessary for these last two questions.

After a substantial update to the game, suppose every loot box can be opened to reveal *one of four different* items:
- magical elixir (occurs 1% of the time, most valuable)
- golden pendant (occurs 9% of the time, valuable)
- steel armor (occurs 30% of the time, semi-valuable)
- bronze coin (occurs 60% of the time, least valuable)

#### 6. Suppose you want repeat problem 5 above, but do that for the version 2 loot boxes so you can track how many loot boxes are needed to get each item? (e.g. You'd like to be able to say that on average it takes 10 trials to get a golden pendant, 3 trials to get steel armor, and so on.) What Python datatype is the best way to store this data? Why?

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

In [None]:
# Dictionary, since it is easy to assign a key value pair and easy to work with.

#### 7. Suppose you and your boss want to measure whether "Version 2" is better than "Version 1." What metrics do you think are important to measure? (Your answers will vary here - use your judgment!)

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

In [None]:
# Metrics that are important to measure would be the increase in loot box buys when Version 2 is released
# Another metric that will be important to measure would be the number of players who receive these drops.
# Increase in average game time for each person
# Increase in revenue
# Increase in new game sign ups