# "Fun with Loot Boxes" Lab

> Author: Caroline Schmitt, Matt Brems

### Scenario:

You're an analyst for [Zynga](https://en.wikipedia.org/wiki/Zynga), a gaming studio working on an event for an MMO (massively multiplayer online) game. This event is going to include **loot boxes**.

<img src="https://vignette.wikia.nocookie.net/2007scape/images/0/06/Culinaromancer%27s_chest.png/revision/latest?cb=20180403231423" alt="drawing" width="150"/> 

A loot box is basically a treasure chest in a game. This loot box can be opened to reveal a variety of items: some items are very rare and valuable, other items are common and less valuable. (You may consult [the esteemed Wikipedia](https://en.wikipedia.org/wiki/Loot_box) for a more extensive definition.)

In our specific game, suppose that loot boxes can be obtained in one of two ways: 
- After every three hours of playing the game, a user will earn one loot box.
- If the user wishes to purchase a loot box, they may pay $1 (in real money!) for a loot box.

These loot boxes are very good for our business!
- If a player earns a loot box, it means they are spending lots of time on the game. This often leads to advertisement revenue, they may tell their friends to join the game, etc.
- If the player purchases a loot box, it means we've earned $1 from our customer.

Suppose each loot box is opened to reveal either:
- magical elixir (super rare, very valuable), or
- nothing.

Whether each loot box contains the elixir or nothing is **random**. Our boss wants some guidance on what sort of randomness to use on these loot boxes! 
- If the magical elixir is too rare, then users may not be motivated to try to get them, because they believe they'll never find the magical elixir.
- If the magical elixir is too common, then users may not be motivated to try to get them, because the game has so much of the magical elixir that it isn't worthwhile to try to get it.

However, our boss isn't a math-y type person! When explaining things to our boss, we need to explain the impact of our choices on the game as concretely as possible.

### Version 1
In our first version of the game, we'll say that loot boxes contain magical elixir 15% of the time and nothing 85% of the time.

#### 1. Our boss asks, "If a user buys 100 loot boxes, how many elixirs will they get?" How would you respond?

**Answer** They will get around 15 elixirs on average.

#### 2. Our boss asks, "How many loot boxes does someone have to purchase in order to definitely get elixir?" How would you respond?

**Answer** Definitely, it is based on luck. But in average, a user should have an elixir after buying $\dfrac{1}{0.15} \approx 7$ loot boxes.

#### 3. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" This is a bit more complicated, so let's break it down before answering.

#### 3a. Let's suppose my random variable $X$ counts up how many elixirs I observe out of my 100 loot boxes. Why is $X$ a discrete random variable?

**Answer** The possible values of $X$ are the set of countable $\{1,2,3,\ldots,100\}$, so $X$ is a discrete random variable.

#### 3b. Recall our discrete distributions: discrete uniform, Bernoulli, binomial, Poisson. Let's suppose my random variable $X$ counts up how many elixirs I observe out of my 100 loot boxes. What distribution is best suited for $X$? Why?
- Hint: It may help to consider getting the magical elixir a "success" and getting nothing a "failure." 

**Answer** $X$ is binomial Distribution. Because outcomes of each loot box are independent from each other and the probability of getting an elixir (success) from each loot box is equal.

#### 3c. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" Use the probability mass function to answer the boss' question.

In [39]:
import scipy.stats as stats               # For making distribution

p = 0.15                                  # Probability of success (getting elixir)
n = 100                                   # Number of loot boxes

X = stats.binom(n,p)                      # X is binomial distribution

P = 1                                     # Set initital P
for x in range(20 + 1):
    P = P - X.pmf(x)                      # Run loop to subtract probability of getting x elixirs where x in 0,1,2,...,20

print(P)                                  # Print answer = 0.0663

0.06631976581888208


#### 3d. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" Use the cumulative distribution function to answer the boss' question.

In [37]:
print(f'{(1 - X.cdf(20))*100:.2f}%')      # The probability of getting more than 20 elixits is 6.63%

6.63%


#### 3e. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" Answer your boss' question. *Remember that your boss is not a math-y person!*

In [41]:
# Only 6 - 7 users out of 100 will get more than 20 elixirs after earns 100 loot boxes

#### 4. Your boss wants to know how many people purchased how many loot boxes last month. 
> For example, last month, 70% of users did not purchase any loot boxes. 10% of people purchased one loot box. 5% of people purchased two loot boxes... and so on.

#### 4a. Recall our discrete distributions: discrete uniform, Bernoulli, binomial, Poisson. Let's suppose my random variable $Y$ counts up how many loot boxes each person purchased through the game last month. What distribution is best suited for $Y$? Why?

**Answer** $Y$ is Poisson distribution since the set of possible outcoumes is infinite countable and the average of outcomes is known.

#### 4b. Suppose that, on average, your customers purchased 2.7 loot boxes last month. In order for your revenue to be at least $500,000, at least how many users would you need on your platform? (Round your answer up to the nearest thousand.) 

In [164]:
import scipy.stats as stats

lmbd = 2.7                                # Average number of loot boxes purchased per customer
target_revenue = 500_000                  # Target revenue in dollars
Y = stats.poisson(lmbd)                   # Poisson distribution with lambda = 2.7

ls = []

for i in range(100):                      # Run simulation 100 times
    user = 0                              # Initialize customer counter
    revn = 0                              # Initialize revenue counter
    while revn < target_revenue:          # Continue until total revenue reaches $500,000
        revn += Y.rvs(1)                  # Simulate the loot box purchases for one user and add to revenue
        user += 1                         # Increment the customer count
    ls.append(user)                       # Record the number of customers needed to reach $500,000 revenue

avg_users_needed = sum(ls) / len(ls)      # Calculate average number of customers needed
print(avg_users_needed)                   # Output result 185_000

185761.0


#### 4c. Assume that your platform has the numer of users you mentioned in your last answer. Suppose that your platform calls anyone who purchases 5 or more loot boxes in a month a "high value user." How much money do you expect to have earned from "high value users?" How about "low value users?"

In [None]:
# The average revenue per customers = 2.7 loot boxes = 2.7$ 
# Total customers = 185_000

In [248]:
# Expected Revenue from High Value User
# It is conditional probability E(X | X >= 5) = E(X >= 5) / P(X >= 5)

expected_value_high = Y.expect(lambda x: x, lb = 5)       # E(X >= 5) = 0.77
prob_value_high = 1 - Y.cdf(4)                            # P(X >= 5) = 0.14
condi_value_high = expected_value_high / prob_value_high  # E(X | X >= 5) = 5.63

# Expected Revenue from Low Value User
# It is conditional probability E(X | X < 5) = E(X < 5) / P(X < 5)
expected_value_low = Y.expect(lambda x: x, ub = 4)        # E(X < 5) = 1.93
prob_value_low = Y.cdf(4)                                 # P(X < 5) = 0.86
condi_value_low = expected_value_low / prob_value_low     # E(X | X >= 5) = 2.23

In [254]:
# Revenue from High Values = Number of High Values customer * Expected Revenue of high value customers
#                          = [185_000 * (prob_value_high)] * [expected_value_high / prob_value_high]
#                          = 185_000 * expected_value_high

ans_high = 185_000 * expected_value_high     # 142_811
print(f'Revenue expect from high value customers is {ans_high:.0f}.') 

Revenue expect from high value customers is 142811.


In [256]:
ans_low = 185_000 * expected_value_low    # 356_689
print(f'Revenue expect from low value customers is {ans_low:.0f}.') 

Revenue expect from low value customers is 356689.


In [258]:
ans_high / (ans_high + ans_low)   # for summarize

0.28590782438378887

#### 4d. Suppose that you want to summarize how many people purchased how many loot boxes last month for your boss. Since your boss isn't math-y, what are 2-4 summary numbers you might use to summarize this for your boss? (Your answers will vary here - use your judgment!)

**Answer** 
1) To gain minimum revenue of 500,000\$, we need around 185,000 customers
2) Average user buy 2.7 loot boxes monthly 
3) There are around 14\% of customers, classified as High Value Customers, buy 5.63 loot boxes monthly on average, generate 29\% of total revenue
4) There are around 86\% of customers, classified as Low Value Customers, buy 2.23 loot boxes monthly on average, generate 71\% of total revenue

#### 5. Your boss asks "How many loot boxes does it take before someone gets their first elixir?" Using `np.random.choice`, simulate how many loot boxes it takes somone to get their first elixir. 
- Start an empty list.
- Use control flow to have someone open loot boxes repeatedly.
- Once they open a loot box containing an elixir, record the number of loot boxes it took in the empty list.
- Repeat this process 100,000 times. 

This simulates how long it takes for someone to open a loot box containing elixir. Share the 5th, 25th, 50th, 75th, and 95th percentiles.

> You may find [this documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html)  and [this documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html) helpful.

In [263]:
import numpy as np                                                               # Needed Library

ls = []
for i in range(100_000):                                                         # run simulation 100,000 times
    open = 1                                                                     # initial number of opening boxes
    while np.random.choice(2, 1, p = [0.15, 0.85], replace = True) == 1:         # the box didn't contain elixirs
        open += 1                                                                # increment the counter
    ls.append(open)

In [269]:
sum(ls) / len(ls)           # around 6.7 loots boxes, consistent with 1 / 0.15

6.65556

In [281]:
import pandas as pd

ds = pd.Series(ls)          # make the list as Pandas Series

ds.agg(quantile_5 = lambda x: x.quantile(0.05)
      , quantile_25 = lambda x: x.quantile(0.25)
      , quantile_50 = lambda x: x.quantile(0.50)
       , quantile_75 = lambda x: x.quantile(0.75)
       , quantile_95 = lambda x: x.quantile(0.95)
      )

quantile_5      1.0
quantile_25     2.0
quantile_50     5.0
quantile_75     9.0
quantile_95    19.0
dtype: float64

**Answer**: $P_{5} = 1, P_{25} = 2, P_{50} = 5, P_{75} = 9, P_{95} = 19$

### Version 2

After a substantial update to the game, suppose every loot box can be opened to reveal *one of four different* items:
- magical elixir (occurs 1% of the time, most valuable)
- golden pendant (occurs 9% of the time, valuable)
- steel armor (occurs 30% of the time, semi-valuable)
- bronze coin (occurs 60% of the time, least valuable)

#### 6. Suppose you want repeat problem 5 above, but do that for the version 2 loot boxes so you can track how many loot boxes are needed to get each item? (e.g. You'd like to be able to say that on average it takes 10 trials to get a golden pendant, 3 trials to get steel armor, and so on.) What Python datatype is the best way to store this data? Why?

**Answer**: Set, because it contains only unique element. But, I prefer dictionary for easy to check the number of each item I get from the loot boxes

#### 7. Suppose you and your boss want to measure whether "Version 2" is better than "Version 1." What metrics do you think are important to measure? (Your answers will vary here - use your judgment!)

**Answer** the number of boxes using to get the magical elixir (suppose all players want it), because they will pay (average dollars) to get this item