# Introduction

Using the scipy.stats implementation of the chi square test 
to assess test whether there is evidence to claim that the 
Player Unknown's Battlegrounds map rotations in seasons 14 and 15
were chosen based on a discrete uniform distribution; i.e. all 
maps were equally likely. Data were obtained from a post on r/PUBATTLEGROUNDS
by u/attractionist. The post can be found [here](https://www.reddit.com/r/PUBATTLEGROUNDS/comments/sitxl1/test_map_probability_chance/)

A $\chi^2$ test for map count data provided by u/attractionist indicates that there is insufficient evidence to claim that the realized map distribution is not uniform categorical in season 15. For season 14 we can reject this claim with strong evidence, but this discrepancy may be accounted for when the differential in player capacity between Paramo and the other maps are factored in.

In [3]:
#imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import chisquare


In [17]:
#Inputting the data collected by u/attractionist as vectors in a dataframe

s14_data = {'map' : ['Sanhok', 'Taego', 'Miramar', 'Erangel', 'Paramo'], 'N_i' : [175, 163, 156, 152, 106]} 

s15_data = {'map' : ['Erangel', 'Vikendi', 'Miramar', 'Taego', 'Sanhok'], 'N_i' : [282, 271, 268, 266, 263]}

s14_df = pd.DataFrame(data=s14_data)

s15_df = pd.DataFrame(data=s15_data)

s14_df['Proportion'] = s14_df['N_i']/s14_df['N_i'].sum()

s15_df['Proportion'] = s15_df['N_i']/s15_df['N_i'].sum()

In [14]:
#Our dataframes look like this:

s14_df.head()

Unnamed: 0,map,N_i,Proportion
0,Sanhok,175,0.232713
1,Taego,163,0.216755
2,Miramar,156,0.207447
3,Erangel,152,0.202128
4,Paramo,106,0.140957


In [13]:
s15_df.head()

Unnamed: 0,map,N_i,Proportion
0,Erangel,282,0.208889
1,Vikendi,271,0.200741
2,Miramar,268,0.198519
3,Taego,266,0.197037
4,Sanhok,263,0.194815


### Hypothesis Test
The PUBG map rotation in seasons 14 and 15 is characterized by categorical distributions with 5 categories each. To assess the question of whether or not the map rotation is a uniform categorical distribution, we can perform a $\chi^2$ test with $K - 1 = 4$ degrees of freedom. The format of our test fits the default arguments of the `scipy.stats.chisquare` implementation (the null hypothesis distribution is the uniform categorical, and there are no degree of freedom adjustments to make). Our test is as follows:

$H_0: X \sim Cat(5)$

$H_1: \neg H_0$

In [12]:
chisq_14 = chisquare(s14_df['N_i'])
print(f'Season 14 map rotation Chi Square test: \n Test Statistic: {np.round(chisq_14[0], 3)}\
\n p-value: {np.round(chisq_14[1], 5)}')

print('\n\n')

chisq_15 = chisquare(s15_df['N_i'])
print(f'Season 15 map rotation Chi Square test: \n Test Statistic: {np.round(chisq_15[0], 3)}\
\n p-value: {np.round(chisq_15[1], 5)}')



Season 14 map rotation Chi Square test: 
 Test Statistic: 18.412
 p-value: 0.00102



Season 15 map rotation Chi Square test: 
 Test Statistic: 0.793
 p-value: 0.93944


### Results
For season 14, our Chi Square test indicates strong evidence that we can reject $H_0$, with a p-value close to .001; for season 15 we have very weak evidence in support of rejecting $H_0$.

From this test, we can conclude that the realized probability of map selection for u/attractionist and his/her squad was most likely not drawn from a uniform categorical distribution. There is almost no evidence to make this claim for season 15, for which our test results indicate that the variation between observations of each different map were most likely due to random chance.

### However...
Astute gamers will notice that the season 14 map rotation includes Paramo (one of my personal favorites!), which has a player capacity of 64. All other maps in season 14 and 15 have player capacities of 100. Could it be that the *map selection* in season 14 was based on a uniform categorical distribution, but the fact that Paramo only allows 64 players per round skewed the realized proportion accordingly? As a thought experiment, imagine that the map rotation was deterministically, uniformly generated (e.g. map1, map2, ... ,map_n, map1, map2, ...), but one map only allowed 5 players per round while the rest allowed 500. It makes sense intuitively that while the exclusive map was *selected* the same number of times as the others, the probability of a player landing in a match on that map would be much smaller than for the other maps. 

Inspection of the data provided by u/attractionist in season 14 indicates that the play count for Paramo is less than the other maps by a scaling constant that is in the neighborhood of 64/100, which would correspond to scaling by the map's player capacity. Let's try scaling the season 14 Paramo match count by this scaling factor and repeating our test.


In [18]:
#Scaling the s14 Paramo match count by 100/64, corresponding to the player capacity differential to other maps
#This technique is probably bad pandas practice, and shouldn't be done outside of a one-off analysis like this one
s14_df['N_i'][4] =  s14_df['N_i'][4]*(100/64) 

s14_df['Proportion'] = s14_df['N_i']/s14_df['N_i'].sum()

s14_df.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  s14_df['N_i'][4] =  s14_df['N_i'][4]*(100/64)


Unnamed: 0,map,N_i,Proportion
0,Sanhok,175,0.215783
1,Taego,163,0.200986
2,Miramar,156,0.192355
3,Erangel,152,0.187423
4,Paramo,165,0.203453


In [19]:
chisq_14 = chisquare(s14_df['N_i'])
print(f'Season 14 map rotation Chi Square test: \n Test Statistic: {np.round(chisq_14[0], 3)}\
\n p-value: {np.round(chisq_14[1], 5)}')

Season 14 map rotation Chi Square test: 
 Test Statistic: 1.941
 p-value: 0.74664


### Results pt. 2

After scaling the match count for Paramo in season 14 by player capacity, our $\chi^2$ test does not provide strong evidence to reject $H_0$, with a p-value around .75! This would seem to indicate that while we have strong evidence that u/attractionist was less likely to play Paramo on any given match during S14, we cannot reject the null hypothesis that the *map rotation* is drawn from a uniform categorical distribution after accounting for the differential in player capacity between Paramo and the other maps. 

### Conclusion

A $\chi^2$ test for map count data provided by u/attractionist indicates that there is insufficient evidence to claim that the realized map distribution is not uniform categorical in season 15. For season 14 we can reject this claim with strong evidence, but this discrepancy may be accounted for when the differential in player capacity between Paramo and the other maps are factored in.

Thanks to u/attractionist for collecting the map count data that this analysis was based on!