# "Fun with Loot Boxes" Lab

> Author: Caroline Schmitt, Matt Brems

### Scenario:

You're an analyst for [Zynga](https://en.wikipedia.org/wiki/Zynga), a gaming studio working on an event for an MMO (massively multiplayer online) game. This event is going to include **loot boxes**.

<img src="https://vignette.wikia.nocookie.net/2007scape/images/0/06/Culinaromancer%27s_chest.png/revision/latest?cb=20180403231423" alt="drawing" width="150"/> 

A loot box is basically a treasure chest in a game. This loot box can be opened to reveal a variety of items: some items are very rare and valuable, other items are common and less valuable. (You may consult [the esteemed Wikipedia](https://en.wikipedia.org/wiki/Loot_box) for a more extensive definition.)

In our specific game, suppose that loot boxes can be obtained in one of two ways: 
- After every three hours of playing the game, a user will earn one loot box.
- If the user wishes to purchase a loot box, they may pay $1 (in real money!) for a loot box.

These loot boxes are very good for our business!
- If a player earns a loot box, it means they are spending lots of time on the game. This often leads to advertisement revenue, they may tell their friends to join the game, etc.
- If the player purchases a loot box, it means we've earned $1 from our customer.

Suppose each loot box is opened to reveal either:
- magical elixir (super rare, very valuable), or
- nothing.

Whether each loot box contains the elixir or nothing is **random**. Our boss wants some guidance on what sort of randomness to use on these loot boxes! 
- If the magical elixir is too rare, then users may not be motivated to try to get them, because they believe they'll never find the magical elixir.
- If the magical elixir is too common, then users may not be motivated to try to get them, because the game has so much of the magical elixir that it isn't worthwhile to try to get it.

However, our boss isn't a math-y type person! When explaining things to our boss, we need to explain the impact of our choices on the game as concretely as possible.

### Version 1
In our first version of the game, we'll say that loot boxes contain magical elixir 15% of the time and nothing 85% of the time.

#### 1. Our boss asks, "If a user buys 100 loot boxes, how many elixirs will they get?" How would you respond?

Answer: we cannot ganrantee how many elixir one person will get, but its possible, though, unlikely that a player who is very lucky that they will get 100 elixir and who is not lucky will get none. The expected value is 15 BUT does not anyway means every player will get 15 out of 100 loot boxes.

In [5]:
# Expected Value (EV) is the avg outcome of an events taht repeated many times, typically calculated multiplying each possible outcome by its porobability ans summing these products together 

   # EV helps us predic tlongterm resutl in situation involving chance

def calculate_elixir_ev(elixir_probability, num_boxes):
    return elixir_probability* num_boxes

# game parameter:
elixir_prob = 0.15
nothing_prob = 0.85
num_loot_box = 100

expected_elixir = calculate_elixir_ev(elixir_prob, num_loot_box)
print(f'Expected number of elixir from {num_loot_box} loot boxes: {expected_elixir}')

Expected number of elixir from 100 loot boxes: 15.0


In [None]:
# Ev and prob diff in that:

# prob measures the likelihood of a specific outcome occuring, expressed as a number 
# between 0 -1 or as %

# EV quantify teh avg result over many trails, often expressed

#### 2. Our boss asks, "How many loot boxes does someone have to purchase in order to definitely get elixir?" How would you respond?

In [10]:
# answer : someone could theoratically purchase an infinate of loot boxes and never get an elixir 
# we cannot ganrantee how may loot boxes one would need to purchase until they finf an elixir 

# BUT, if each box has a 15% chance of obtaining an elixir, regradless of previous loot boxes opening (each is independent from each other)
# on avg user will need to open about 7 loot boxes to get elxir. BUT again, this si just an avg, some users will need fewer boxes, adn some wi
# need more. The actual no. of any given user can vary widly due to the RANDOM nature of the process.

#### 3. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" This is a bit more complicated, so let's break it down before answering.

#### 3a. Let's suppose my random variable $X$ counts up how many elixirs I observe out of my 100 loot boxes. Why is $X$ a discrete random variable?

Becuase X is countable, there is no 1.2, 1.5 boxes (box can not be fraction)

#### 3b. Recall our discrete distributions: discrete uniform, Bernoulli, binomial, Poisson. Let's suppose my random variable $X$ counts up how many elixirs I observe out of my 100 loot boxes. What distribution is best suited for $X$? Why?
- Hint: It may help to consider getting the magical elixir a "success" and getting nothing a "failure." 

- Two outcomes : elixir (sucess) or no elixir (failure)

Types ...
- Bernoulli : no. of success in one trail. but we have 100 trails here...
- Binomial : no. of sucess in 'n' number of trails
- Poission: binomial + time
- Discreate uniform : eaxh outcome are equally likely

Most appropriate ...
- Binomial : no. of sucess in 'n' number of trails

#### 3c. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" Use the probability mass function to answer the boss' question.

In [16]:
# Show your work; leave your answer in a comment.

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# For loop - why? range of value

# Probability of success
p=0.15

# Number of trials
n=100

# Distribution
X = stats.binom(n,p)
P = 1 # counter -set the initial P

for x in range(20+1):
    P = P-X.pmf(x)

# runnign a for loop to extract the prob of getting x elixir where x is in 0,1,2...,20

print(P)

0.06631976581888205


In [28]:
# second method
def calculate_elixir_prob():
    return sum(stats.binom.pmf(x,n=100,p=0.15) for x in range (21, 101)) #find getting more than 20-100

result = calculate_elixir_prob()
print (f'The probability of getting more than 20 elixir from 100 boxes is {result:.2%}')

The probability of getting more than 20 elixir from 100 boxes is 6.63%


#### 3d. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" Use the cumulative distribution function to answer the boss' question.

In [35]:
# using CDF

print(f'Probability of gettig more than 20 elixirs: {1-stats.binom.cdf(20,100,0.15):.2%}')

Probability of gettig more than 20 elixirs: 6.63%


#### 3e. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" Answer your boss' question. *Remember that your boss is not a math-y person!*

In [40]:
# if the user were to earn or buy 100 boxes, there is probability of gettig more than 20 elixirs of 6.63%

# suppose there are 25,000 active users currently. if all 25,000 users earn 100 boxes then we expected 
# about 1,650 of the users to get more than 20 elixirs.

25000*0.066

1650.0

#### 4. Your boss wants to know how many people purchased how many loot boxes last month. 
> For example, last month, 70% of users did not purchase any loot boxes. 10% of people purchased one loot box. 5% of people purchased two loot boxes... and so on.

#### 4a. Recall our discrete distributions: discrete uniform, Bernoulli, binomial, Poisson. Let's suppose my random variable $Y$ counts up how many loot boxes each person purchased through the game last month. What distribution is best suited for $Y$? Why?

Types ...
- Bernoulli : no. of success in one trail. but we have 100 trails here...
- Binomial : no. of sucess in 'n' number of trails
- Poission: binomial + time
- Discreate uniform : eaxh outcome are equally likely

Most appropriate ...
- Poission: binomial + time / model the number of successes we observed ina fixed amount of time, not
a fixed amount of trails
- Poission distribution is often use to model count data, especially when the events are relatively rare and can occur any number f times within the given interval (last month)
- It is flexibel enough to handle the varying prob we see the data, unlike the discrete uniform or binomial distribution

#### 4b. Suppose that, on average, your customers purchased 2.7 loot boxes last month. In order for your revenue to be at least $500,000, at least how many users would you need on your platform? (Round your answer up to the nearest thousand.) 

In [68]:
revenue = 500000
avg_purchase = 2.7
result = int(revenue//avg_purchase+1)   # // floor division
print(f' Number of users needed: {result:,}')

 Number of users needed: 185,186


In [64]:
import math 
math.ceil(revenue/avg_purchase)
(186.186)*1000 

186186.0

In [62]:
def round_up(n, decimals=0):
    multiplier = 10 ** decimals
    return math.ceil(n * multiplier) / multiplier

round_up(result)

185186.0

In [74]:
int(54.65)

54

In [80]:
# rounding to 186,000

#to the nearest 1000
def rounding_up(number, nearest_number):
    seperate = number/ nearest_number #(185.186)
    add = seperate + 1 #(186.186)
    remove_decimal = int(add) #(186.0) ..
    mul_back = remove_decimal * nearest_number #(186.0)*1000 
    return mul_back

rounding_up(result, 1000)


186000

#### 4c. Assume that your platform has the numer of users you mentioned in your last answer. Suppose that your platform calls anyone who purchases 5 or more loot boxes in a month a "high value user." How much money do you expect to have earned from "high value users?" How about "low value users?"

In [92]:
#set our total purchase amout to be 0
amount = 0

# check value from 0 to 4


        # How many users purchase y loot boxes?
            

        # How much money would we make from those people? (dollars amount * number of individuals)
            

        # add in the above quantity to amount 


# how much we expected to make from people buying 4 or fewer loot boxes (low value users)

# how much we expected to make from people buying at least 5 boxes  (high value users)


Money expect to earn from high value users is $144,620
Money expect to earn from low value users is $358,749


In [94]:
import numpy as np

roundup = 186000
outcomes = np.random.poisson(avg_purchase,int(roundup)) # Use poisson, when events occur independently and an event occurring in an interval.
high_value_users = [item for item in outcomes if item>=5] # Users who purchase >=5 items
total_high_value_users =sum(high_value_users) # Get expected money earned from high value users

low_value_users = sum(outcomes) - total_high_value_users 
total_low_value_users = low_value_users * 1

print(f'Money expect to earn from high value users is ${total_high_value_users:,}')
print(f'Money expect to earn from low value users is ${total_low_value_users:,}')

Money expect to earn from high value users is $144,503
Money expect to earn from low value users is $358,080


#### 4d. Suppose that you want to summarize how many people purchased how many loot boxes last month for your boss. Since your boss isn't math-y, what are 2-4 summary numbers you might use to summarize this for your boss? (Your answers will vary here - use your judgment!)

In [None]:
- Expected revenue / Expected number of each user type
- Ratios of low to high value users 
- Total revenue 


#How many people buy last month


#How many boxes buy last month
total_box = low_value_users + high_value_users


total_user = 186,000
high_num_box = 143,733   *user buy >=5
low_num_box = 358,031    *user buy <5

#### 5. Your boss asks "How many loot boxes does it take before someone gets their first elixir?" Using `np.random.choice`, simulate how many loot boxes it takes somone to get their first elixir. 
- Start an empty list.
- Use control flow to have someone open loot boxes repeatedly.
- Once they open a loot box containing an elixir, record the number of loot boxes it took in the empty list.
- Repeat this process 100,000 times. 

This simulates how long it takes for someone to open a loot box containing elixir. Share the 5th, 25th, 50th, 75th, and 95th percentiles.

> You may find [this documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html)  and [this documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html) helpful.

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

### Version 2

After a substantial update to the game, suppose every loot box can be opened to reveal *one of four different* items:
- magical elixir (occurs 1% of the time, most valuable)
- golden pendant (occurs 9% of the time, valuable)
- steel armor (occurs 30% of the time, semi-valuable)
- bronze coin (occurs 60% of the time, least valuable)

#### 6. Suppose you want repeat problem 5 above, but do that for the version 2 loot boxes so you can track how many loot boxes are needed to get each item? (e.g. You'd like to be able to say that on average it takes 10 trials to get a golden pendant, 3 trials to get steel armor, and so on.) What Python datatype is the best way to store this data? Why?

**we will use dictionary due to it able to contain Key and Value**
- key will be magical elixir, golden
- value will be simulated list of how many turns to get the first prize of each Key
- or value can be the probability of each prize

#### 7. Suppose you and your boss want to measure whether "Version 2" is better than "Version 1." What metrics do you think are important to measure? (Your answers will vary here - use your judgment!)

- Revenue change %
- increased number of users
- increased in high value users