In [5]:
import league
import players
import statistics
import plotly.graph_objects as go
import plotly.express as px
import numpy as np
from plotly.subplots import make_subplots
from prettytable import PrettyTable

PLAYERS = players.Players()
LEAGUE = league.League(11, "649912836461539328", PLAYERS)

# Playoff Probability

What's up boys, coming back at you again this week after a hiatus to try and predict the teams most likely to make the playoffs. As you know, the top three teams in each division make the playoffs and we are in a real battle this year for who those teams might be. As a reminder, I have the standings below:

```
Noe's Loyal Lighting Salesman Division
1. Nahome (16-6)
2. Ben (10-12)
3. Noe (8-14)
4. Stephane (7-15)
5. Will (7-15)

Noah's SideThiccs
1. Praveen (16-6)
2. Stian (13-9)
3. Noah (13-9)
4. Tom (11-11)
5. Tegran (9-13)
```

Power rankings for this week are as follows

```
1. Nahome
2. Stian
3. Praveen
4. Noah
5. Tom
6. Ben
7. Stephane
8. Tegran
9. Noe
10. Will
```

Noe and Will are both on an enormous losing streak (9 and 8 games respectively) while Stephane, Praveen, and Stian are all blessed with flames this week. New this week as well are some backend improvements to the code base that will make it much easier for me to handle league data. If you are interested, I've written some new object oriented code to represent the league in the ```fantasy_features``` directory on the [github](https://github.com/bencap/fantasy_analysis) page. This module handles a ton of the backend work and will make the code on the frontend much simpler and easier to follow for you guys.

## The Difficulty of Predicting the Future

As we all know, it is extremely difficult and dangerous to predict the future, but that doesn't mean we won't try anyway. One of the key things to do in my opinion is continuously update our expectations for teams based on performance. For instance, if a team is scoring 105 points per game but then scores 150 in the next, we should update our expectations for how many points that team is likely to score in future games. Likewise, a team which has consistently been scoring 140 points but then scores 90 should have its expectations reduced to some degree. The difficulty in this sense is the degree to which these performances should impact a teams' future projections. How relevant is a single game week to the future and how much of it is random noise? There is clearly an element of both. 

### Methodology

For this analysis, I will keep it pretty simple (in my opinion). For each team, I'll build a distribution of likely scores based off of past scores and pick a value out of that distribution at random. Better teams will be more likely to score more points while bad teams will be more likely to score less points. After picking out a point scored value for a team, that will be added to the distribution so that it constantly updates with regard to their points scored and the next week can be estimated with (hopefully) better accuracy. This will allow slumps and hot streaks to be better accounted for by the model. After each team is given an expected points total for the remaining three weeks, we will compare each teams' points total to their opponents and to the league median to figure out the win/loss record from that week and update the standings. I'll simulate this many times (50,000 seems reasonable). The number of times you make the playoffs out of these 50,000 simulations is the probability that this model assigns to you making the playoffs. 

This model has its shortcomings, but anything else would be needlessly complex and in my opinion not worth building for the three weeks left in the season. Possible shortcomings of this model that I can think of off the top of my head:

1. Points in fantasy are correlated. If I have Matt Stafford on my team and Nahome has Cooper Kupp, the number of points that we score is correlated to some degree. This occurs because any points Stafford scores are from throwing to one of his receivers, of which Kupp is one of them. If Kupp has a big week and Nahome scores a bunch of points, I will necessarily score a decent amount of points because of a good QB performance. This model does not account for this cross team correlation.
2. This model does not account for matchup strength. Player points are also correlated with the strength of a defense. This model only accounts for past performance regardless of team and does not account for the strength of a given defense that your running back is facing.
3. This model does not account for bye weeks and is based solely off a distribution of past points. Players may be more or less likely to score more/less points in future weeks based off of upcoming bye weeks. For instance, Jalen Hurts has a week 14 bye, but this model does not take that into account when calculating Nahome's projected points for a given week.
4. This model does not account for player injuries. Will is missing both his top picks and is likely to score much fewer points in the coming weeks, but this is not accounted for fully.

All these things are potential shortcomings of the model. With that said, I think that the model I have described above is a good enough estimate to use. Adding support for the shortcomings also means adding model complexity, and the tradeoff between that additional complexity and additional accuracy isn't enough to make it worth it for me to add to the model.

Now that this is out of the way, lets get into it!

## Observing the Distribution of Points Scored

Let's first try to figure out how point scoring is distributed in the league and by teams. This will inform which probability distribution we use to predict team scoring figures in the future. We can plot points both for the league as a whole and on a team by team basis.

In [6]:
fig = make_subplots(rows=5, cols=2, shared_yaxes=True,
                    subplot_titles=LEAGUE.get_names_for_plotting())

teams = [team for team in LEAGUE.standings]
for i in range(5):
    for j in range(1, 3):
        current = teams[(i+i+j)-1]

        fig.add_trace(
            go.Histogram(x=current.points_scored(), nbinsx=5),
            i+1, j
        )

fig.update_layout(title="Distribution of Weekly Points Scored by Team",
                  showlegend=False,
                  height=2000, width=1500
                  )

fig.show()


OK, there may not be enough information to see a distribution on a team by team level, so how are points distributed in the league?

In [7]:
fig = px.histogram(LEAGUE.get_points(), marginal="rug", labels=dict(value="Points Scored"))
fig.update_layout(showlegend=False)
fig.show()

Cool, I would say that this is approximately normal, so it should be safe to model points scored based off a normal distribution. Centers will be the average points scored for a given team and then we can calculate the standard deviations based off of weekly scored. These will be wide distributions, but it makes sense given the large point scoring swings that we see from teams on a weekly basis (fantasy football is inherently random!).

In [8]:
LEAGUE.build_team_point_models()

fig = make_subplots(rows=5, cols=2, shared_yaxes=True,
                    subplot_titles=LEAGUE.get_names_for_plotting())

teams = [team for team in LEAGUE.standings]

for i in range(5):
    for j in range(1, 3):
        current = teams[(i+i+j)-1]
        data = np.random.normal(loc=current.distribution[0], scale=current.distribution[1], size=5000)
        fig.add_trace(
            go.Histogram(x=data, nbinsx=100),
            i+1, j
        )

fig.update_layout(title="Theoretical Point Distribution based on Past Weekly Score",
                  showlegend=False,
                  height=2000, width=1500
                  )

fig.show()


We can see the theoretical point distributions based on scores up to this week in the season above. Note that for teams which are generally worse, we see a wider distribution and a slightly lower mean which manifests itself as a distribution shifted to the left. Now that we have these distributions we can select a random value from them, update the distribution, select a random value, and so on for the remaining weeks of the season. Below you can see an example simulation. The first column is wins, then losses, then the points scored on the season.

In [9]:
LEAGUE.sim_remaining_season(np.random.normal)
results = LEAGUE.simmed_results()

for division in results:
    division = sorted(division, reverse=True)
    for t in division:
        print(LEAGUE.name_pairings[LEAGUE.roster_pairings[t[4]]], t[0], t[1], t[2])
    print()

AssNTitties 18 10 1805.0465553446134
My Goat Loves You 16 12 1760.4121842549537
RB sanctuary 10 18 1557.2292097232298
the noés 10 18 1526.8624662500197
billcap 9 19 1500.6277027677029

Tree Leaves 20 8 1815.2482101025228
tealeaves 16 12 1760.3666240190769
Thicc King 15 13 1676.3547119010746
SideThicc #2 14 14 1649.8460735411456
The Brutherhood 12 16 1640.9147097269715



Now lets simulate 50,000 remainders of the season!

In [10]:
# takes awhile to run ~ 15 min
simulation = []
for i in range(50000):
    LEAGUE.sim_remaining_season(np.random.normal)
    simulation.append(LEAGUE.simmed_results())

We can now run through our simulated seasons to see where teams finished in their division standings.

In [11]:
final_standing = {t.roster_id: [0,0,0,0,0] for t in LEAGUE.standings}
probabilities = {t.roster_id: 0 for t in LEAGUE.standings}

for result in simulation:
    for division in result:
        division = sorted(division, reverse=True)
        for i, t in enumerate(division):
            final_standing[t[4]][i] += 1

for roster_id in final_standing:
    print(LEAGUE.name_pairings[LEAGUE.roster_pairings[roster_id]], final_standing[roster_id])
    probabilities[roster_id] = sum(
        final_standing[roster_id][0:3]) / sum(final_standing[roster_id]) * 100
    print(sum(final_standing[roster_id][0:3]) / sum(final_standing[roster_id]))

finish_probabilities = {roster_id: [(num / sum(final_standing[roster_id])) * 100 for num in final_standing[roster_id]] for roster_id in final_standing}

AssNTitties [49958, 42, 0, 0, 0]
1.0
Tree Leaves [44905, 4158, 903, 34, 0]
0.99932
tealeaves [3213, 24876, 15475, 5218, 1218]
0.87128
Thicc King [1813, 17174, 22408, 7640, 965]
0.8279
SideThicc #2 [69, 3603, 9518, 27103, 9707]
0.2638
My Goat Loves You [42, 44392, 4826, 724, 16]
0.9852
The Brutherhood [0, 189, 1696, 10005, 38110]
0.0377
the noés [0, 2573, 21163, 16634, 9630]
0.47472
RB sanctuary [0, 881, 11817, 17042, 20260]
0.25396
billcap [0, 2112, 12194, 15600, 20094]
0.28612


In [12]:
for id in probabilities:
    probabilities[id] = round(probabilities[id], 2)

for id in finish_probabilities:
    finish_probabilities[id] = [round(prob, 2) for prob in finish_probabilities[id]]

We see from the simulation output above the playoff likelihoods for each team. We can also see this output visually below before diving into it in our final cell.

In [13]:
fig = make_subplots(rows=5, cols=2, shared_yaxes=True,
                    subplot_titles=LEAGUE.get_names_for_plotting())

teams = [team for team in LEAGUE.standings]

for i in range(5):
    for j in range(1, 3):
        current = teams[(i+i+j)-1]
        fig.add_trace(
            go.Bar(y=final_standing[current.roster_id],
                   x=["1st", "2nd", "3rd", "4th", "5th"]),
            i+1, j
        )

fig.update_layout(title="Likelihood of each Teams' Final Standing",
                  showlegend=False,
                  height=2000, width=1500
                  )

fig.show()


Now that we have simulated our season 50,000 times and gotten the output, we can make some conclusions about each teams' likelihood to make the playoffs. Below I print out the calculated probability that each team makes the playoffs and the probability they finish in each of the five division positions. 

In [14]:
t = PrettyTable(['Team Name', "Percent Chance of Making Playoffs"])
for team in LEAGUE.standings:
    t.add_row([team.name, probabilities[team.roster_id]])
    
print(t)

+-------------------+--------------------------------+
|     Team Name     | Probability of Making Playoffs |
+-------------------+--------------------------------+
|    AssNTitties    |             100.0              |
|    Tree Leaves    |             99.93              |
|     tealeaves     |             87.13              |
|     Thicc King    |             82.79              |
|    SideThicc #2   |             26.38              |
| My Goat Loves You |             98.52              |
|  The Brutherhood  |              3.77              |
|      the noés     |             47.47              |
|    RB sanctuary   |              25.4              |
|      billcap      |             28.61              |
+-------------------+--------------------------------+


We see that Nahome is the only one to have clinched his ticket to the postseason at the moment, with Praveen and Ben following close behind. We see that the fight for the #3 seed in the salesmen division is tight, with Noe having a 47 percent chance of making it and Will and Stephane both having about a 25 percent chance. I think the model overrates Wills chances slightly due to his injuries. For Tegran, things are looking a bit tough, as he only has a 3.77% chance of making it per my model.

In [15]:
t = PrettyTable(['Team Name', "P(Finishing 1st in Division)",
                "P(Finishing 2nd in Division)", "P(Finishing 3rd in Division)", "P(Finishing 4th in Division)", "P(Finishing 5th in Division)"])
for team in LEAGUE.standings:
    t.add_row([team.name, finish_probabilities[team.roster_id]
              [0], finish_probabilities[team.roster_id][1], finish_probabilities[team.roster_id][2], finish_probabilities[team.roster_id][3], finish_probabilities[team.roster_id][4]])

print(t)


+-------------------+------------------------------+------------------------------+------------------------------+------------------------------+------------------------------+
|     Team Name     | P(Finishing 1st in Division) | P(Finishing 2nd in Division) | P(Finishing 3rd in Division) | P(Finishing 4th in Division) | P(Finishing 5th in Division) |
+-------------------+------------------------------+------------------------------+------------------------------+------------------------------+------------------------------+
|    AssNTitties    |            99.92             |             0.08             |             0.0              |             0.0              |             0.0              |
|    Tree Leaves    |            89.81             |             8.32             |             1.81             |             0.07             |             0.0              |
|     tealeaves     |             6.43             |            49.75             |            30.95             | 

Above we see the percent chance of finishing in a given spot in your division. Nahome is guaranteed to finish at least in the top two, while a few teams have no chance of finishing first in their divisions. Hopefully you guys enjoyed the analysis, let me know of any questions or concerns!