# Project: Theory

## Notation

Every player has played $n_i$ games in total starting from when Riot first started taking data on timestamps on June 16, 2021. $i$ is the $i^{th}$ summoner.
<br>
<br>

A timestamp of the start of a game is denoted as $b_{i,j}$, where $i$ is the $i^{th}$ summoner and $j$ is the $j^{th}$ game. For example the timestamp of the 3rd summoner and their 19th game is $b_{3,19}$
<br>
<br>

The duration of a game is also recorded, and we denote it similarly as $d_{i,j}$. Then the end time stamp of a game is 
<br>
<br>
<center> $e_{i,j} = b_{i,j} + d_{i,j}$.


Great, so that's the notation we'll use. Let's do some analysis.

## Tasks

If we wish to analyze the player churn rates, we may explore time passed between games and find trends in this new variable as a function of time. If there are large gaps then this is a warning of a player almost churning.

### Time Between Games

To start this analysis we create a new variable of **times between games**, given by
<br>
<br>
<center>$\Delta t_{i,j} = b_{i,j} - e_{i,j-1}$.
<br>
<br>


This variable is defined as the difference in time between the start time of a game and then end time of the previous game. By plotting this new variable as a function of the start time of the game (or end time of the previous game) we can gain some insights on summoners' trends regarding games played.
<br>
<br>
    
These trends include:
<br>
<br>
- **Play Sessions**: If a summoner is playing a certain amount of games back-to-back, whose collection we'll refer to as a **play session**, then the collection of points $\Delta t_{i,j}$ for this play session will be low in value, since the times between these games are short. By analyzing the number of games within a session as a function of time, we can see how the number of games in a session influences a summoner's probability of churning.
<br>
<br>

- **Session Separation**: We'll define the **session separation** as the time difference between one play session and the next. If a summoner has some sort of pattern in session separations, say for example they play every day around the same time, then there will be many points with a $\Delta t$ of about 24 hours. In general, if there are a frequent amount of points around a non-small value of $\Delta t$ (small values constitute games **within** a play session), then this tells us the session separation is about $\Delta t$ units of time. This is an important insight because if the session separations increase with time, this gives us a warning that the player may churn.
<br>
<br>
    
- **Large absences in play time**: When plotting $\Delta t$ as a function of time, these large absences in play time are characterized by large values of $\Delta t$ with the previous point being far away on the x-axis. Analyzing points before this value will give us insight on the patterns leading up to a summoner potentially churning. This point will also give us insight on what brought the summoner back to playing, instead of completely churning.


# Project in Practice

We now have the trends we wish to explore. There's only one problem: it's hard to get data. There's a few reasons for this:


1. It takes awhile to fetch game data using Riot's API, and if we wish to get data for a large amount ($>10^3$) of players' play times (~$10^3$), then this is about $10^6$ requests we need to make to Riot's API. The choice of numbers of players and games is semi-arbitrary; I just want a large enough sample size where I can make reasonable inferences about the population.

This would be a trivial amount of data to work with if I can actually get my hands on the data, but in my experience fetching this data, I was met with many HTTP 503 Errors that happenned randomly and I could not find a way to handle them well (the python command *except HTTPError:* wasn't working for me). 

No, I was not going over my rate limit as I put a time.sleep(2) after every fetch from the API. 


2. Some of the timestamps are...weird?

According to some posts on the 3rd party dev community, practice tool games have a start game timestamp of 0. This is units of Unix epoch time, which corresponds to a date in 1970, so these points are basically corrupted data. 

So, is there a solution that will still give a reasonalbe chance at inferring anything about the population of LoL players?

# Solution

The answer is no. We need a large number of players and their play times  