This model is meant to forecast the outcome of the standings here: https://udisclive.com/players?t=standings&z=dgpt

To do so, I need to:
- load in events, and details about those events
- load in the players playing in those events, and details about their skill levels
- generate N runs of a model that forecasts each player's results N times at each event they are entered in
- use the generated results to assign points to the players
- sum the points based on the DGPT rules
- aggregate the ranks for each player at the end of the year (for example, see if Paige Pierce has a 90% chance of 1st place, 5% chance of 2nd place, etc.)

Things that inspired this / what the output should look like
- 538 soccer: https://projects.fivethirtyeight.com/soccer-predictions/champions-league/
- my own attempt at this same thing but in google sheets: https://docs.google.com/spreadsheets/d/19IwKCt5DI77koId916DawdJd1mNM7RQi3Nzlpmsa23Q/edit#gid=0 (see  ForecastFinal tab. This doc does ALL of the logic i want in here, but it's limited to N=50 runs due to how long it takes the formulas to update)

Load In Packages

In [1]:
import pandas as pd
import numpy as np

Load in Events - Details from PDGA.com using Event ID as unique Identifier
Would eventually like to make this step scrape the site, but we can get there later

Load Players signed up for events - Details from PDGA.com with PDGA# and Event ID as unique Identifier
This also needs to be scraped eventually, and will need to include a "load date" column since ratings change once a month. For now, just a csv.

Load Player Scores
Placeholder - eventually I'll need to include the rounds that have already occured (since those scores are 100% likely to happen). I haven't written any code to account for this yet.

In [2]:
events = pd.read_csv('events.csv')

In [3]:
eventplayers = pd.read_csv('eventplayers.csv')

In [4]:
###eventplayerscores = pd.read_csv('eventplayerscores.csv')

Event Mean Regression Creation

This cell creates a simple mean regression - the farther out the event, the more likely player ratings are to change. This variable is mostly a guess right now, and is not coming from any valid regression of how player ratings change over time. Also I had to take the absolute value so that the event that already occured didn't break everything. In real life, I should just have those scores already loaded in.

In [5]:
events['Event Date'] = pd.to_datetime(events['Event Date'])
events['Today'] = pd.to_datetime("now")
events['Time to Event'] = events['Event Date'] - events['Today']
events['Time to Event Number'] = pd.to_numeric(events['Time to Event'])/(1000000000*60*60*24)
events['Event Mean Regression'] = np.log10(np.absolute(events['Time to Event Number']))

Generate Average Rating At Each Event - useful on its own, but needed to calculate the avg round scores. Ideally would use historical data here, but really just need a baseline so I have something to work with

Declare StdDev - 6.82 is a number I grabbed from a couple 2020 tournaments to use later on in generating the normal distribution and (you guessed it) probably needs more work to be better.

In [6]:
eventplayers['Rating'] = pd.to_numeric(eventplayers['Rating'])
means = eventplayers.groupby('Event ID')['Rating'].mean()
events = pd.merge(events,means,on = 'Event ID',how = 'left')
events['StdDev'] = 6.82

events

Unnamed: 0,Event Name,Event ID,Event Type,EventLength,Event Date,Today,Time to Event,Time to Event Number,Event Mean Regression,Rating,StdDev
0,LVC,47877,Elite,4,2021-02-25,2021-03-11 20:29:42.650988,-15 days +03:30:17.349012,-14.853966,1.171842,897.688889,6.82
1,Waco,48685,Elite,3,2021-03-12,2021-03-11 20:29:42.650988,0 days 03:30:17.349012,0.146034,-0.835546,929.269231,6.82
2,Open at Belton,47888,Silver,3,2021-03-19,2021-03-11 20:29:42.650988,7 days 03:30:17.349012,7.146034,0.854065,915.931034,6.82
3,Vintage Open,48119,Silver,3,2021-04-08,2021-03-11 20:29:42.650988,27 days 03:30:17.349012,27.146034,1.433706,922.85,6.82
4,Jonesboro,48567,Elite,3,2021-04-16,2021-03-11 20:29:42.650988,35 days 03:30:17.349012,35.146034,1.545876,930.928571,6.82
5,Mid-America,48121,Silver,3,2021-04-23,2021-03-11 20:29:42.650988,42 days 03:30:17.349012,42.146034,1.624757,906.866667,6.82
6,Goat Hill,48283,Silver,3,2021-05-07,2021-03-11 20:29:42.650988,56 days 03:30:17.349012,56.146034,1.749319,912.736842,6.82
7,OTBO,48172,Elite,3,2021-05-14,2021-03-11 20:29:42.650988,63 days 03:30:17.349012,63.146034,1.800346,917.896552,6.82
8,Portland,48686,Elite,3,2021-06-04,2021-03-11 20:29:42.650988,84 days 03:30:17.349012,84.146034,1.925034,910.454545,6.82
9,Clash at the Canyons,48112,Silver,3,2021-07-02,2021-03-11 20:29:42.650988,112 days 03:30:17.349012,112.146034,2.049784,914.058824,6.82


Calculate Single Round Expected Scores

Here we declare "rating points per stroke" so that we can compare ex. a 1000 to 900 rated player in a single round. Then we turn that into the basic "expected round score" that can be used in the upcoming randomization.

In [7]:
rating_points_per_stroke = 6
eventplayerratings = pd.merge(events,eventplayers, on = 'Event ID',how = 'right')
eventplayerratings['Single Round Expected Score'] = -1*((eventplayerratings['Rating_y']-eventplayerratings['Rating_x'])/(rating_points_per_stroke+eventplayerratings['Event Mean Regression']))

eventplayerratings

Unnamed: 0,Event Name,Event ID,Event Type,EventLength,Event Date,Today,Time to Event,Time to Event Number,Event Mean Regression,Rating_x,StdDev,Event,Name,PDGA#,Rating_y,Single Round Expected Score
0,LVC,47877,Elite,4,2021-02-25,2021-03-11 20:29:42.650988,-15 days +03:30:17.349012,-14.853966,1.171842,897.688889,6.82,LVC,Paige Pierce,29190,991,-13.010759
1,LVC,47877,Elite,4,2021-02-25,2021-03-11 20:29:42.650988,-15 days +03:30:17.349012,-14.853966,1.171842,897.688889,6.82,LVC,Sarah Hokom,34563,967,-9.664338
2,LVC,47877,Elite,4,2021-02-25,2021-03-11 20:29:42.650988,-15 days +03:30:17.349012,-14.853966,1.171842,897.688889,6.82,LVC,Catrina Allen,44184,962,-8.967167
3,LVC,47877,Elite,4,2021-02-25,2021-03-11 20:29:42.650988,-15 days +03:30:17.349012,-14.853966,1.171842,897.688889,6.82,LVC,Hailey King,81351,962,-8.967167
4,LVC,47877,Elite,4,2021-02-25,2021-03-11 20:29:42.650988,-15 days +03:30:17.349012,-14.853966,1.171842,897.688889,6.82,LVC,Valerie Mandujano,62879,956,-8.130562
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
313,PCS Sula Open,47446,Elite,4,2021-07-07,2021-03-11 20:29:42.650988,117 days 03:30:17.349012,117.146034,2.068728,952.200000,6.82,PCS Sula Open,Kona Star Panis,27832,926,3.247104
314,PCS Sula Open,47446,Elite,4,2021-07-07,2021-03-11 20:29:42.650988,117 days 03:30:17.349012,117.146034,2.068728,952.200000,6.82,PCS Sula Open,Paige Pierce,29190,991,-4.808689
315,PCS Sula Open,47446,Elite,4,2021-07-07,2021-03-11 20:29:42.650988,117 days 03:30:17.349012,117.146034,2.068728,952.200000,6.82,PCS Sula Open,Rebecca Cox,32917,929,2.875298
316,PCS Sula Open,47446,Elite,4,2021-07-07,2021-03-11 20:29:42.650988,117 days 03:30:17.349012,117.146034,2.068728,952.200000,6.82,PCS Sula Open,Sarah Hokom,34563,967,-1.834242


Generate Random Scores

With Single Round expected Scores as the Mean, StdDev as the Standard Deviation, we can now generate 1 random score! 
However, we need to generate N number of runs to make this an actual forecast.
We also need to sum 3 pretend scores for 3 round events, and 4 pretend scores for 4 round events.
We need to cast these as integers - no fractional scores in disc golf!

Anyway, this is where I'm stuck! Do I...

Create N columns, each with a random score?

Put N values in an array in 1 column?

Create a eventplayerscores dataframe with (EventPlayerRatings Number of Rows * N) rows, with a "Run #" Column incrementing from 1 to N

How do I take into account scores that have already occured?

How do I correctly perform the loop?

In [8]:
eventRound = eventplayerratings['EventLength']
print(eventRound)
eventplayerratings['Random Scores'] = 0

0      4
1      4
2      4
3      4
4      4
      ..
313    4
314    4
315    4
316    4
317    4
Name: EventLength, Length: 318, dtype: int64


In [9]:
for index, row in eventplayerratings.iterrows():
    while row['EventLength']> 0:
        row['EventLength'] -= 1
        eventplayerratings['Random Scores'] = eventplayerratings['Random Scores'] + np.random.normal(
        eventplayerratings['Single Round Expected Score'].values,
        eventplayerratings['StdDev'].values)

In [10]:
eventplayerratings['Random Scores2'] = np.random.normal(
        eventplayerratings['Single Round Expected Score'].values,
        eventplayerratings['StdDev'].values)

eventplayerratings

Unnamed: 0,Event Name,Event ID,Event Type,EventLength,Event Date,Today,Time to Event,Time to Event Number,Event Mean Regression,Rating_x,StdDev,Event,Name,PDGA#,Rating_y,Single Round Expected Score,Random Scores,Random Scores2
0,LVC,47877,Elite,4,2021-02-25,2021-03-11 20:29:42.650988,-15 days +03:30:17.349012,-14.853966,1.171842,897.688889,6.82,LVC,Paige Pierce,29190,991,-13.010759,-13450.140971,-23.492327
1,LVC,47877,Elite,4,2021-02-25,2021-03-11 20:29:42.650988,-15 days +03:30:17.349012,-14.853966,1.171842,897.688889,6.82,LVC,Sarah Hokom,34563,967,-9.664338,-10368.206576,-6.904954
2,LVC,47877,Elite,4,2021-02-25,2021-03-11 20:29:42.650988,-15 days +03:30:17.349012,-14.853966,1.171842,897.688889,6.82,LVC,Catrina Allen,44184,962,-8.967167,-9291.361413,-6.195223
3,LVC,47877,Elite,4,2021-02-25,2021-03-11 20:29:42.650988,-15 days +03:30:17.349012,-14.853966,1.171842,897.688889,6.82,LVC,Hailey King,81351,962,-8.967167,-8892.964275,-12.149979
4,LVC,47877,Elite,4,2021-02-25,2021-03-11 20:29:42.650988,-15 days +03:30:17.349012,-14.853966,1.171842,897.688889,6.82,LVC,Valerie Mandujano,62879,956,-8.130562,-8049.261343,-15.591629
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
313,PCS Sula Open,47446,Elite,4,2021-07-07,2021-03-11 20:29:42.650988,117 days 03:30:17.349012,117.146034,2.068728,952.200000,6.82,PCS Sula Open,Kona Star Panis,27832,926,3.247104,3406.451924,9.273572
314,PCS Sula Open,47446,Elite,4,2021-07-07,2021-03-11 20:29:42.650988,117 days 03:30:17.349012,117.146034,2.068728,952.200000,6.82,PCS Sula Open,Paige Pierce,29190,991,-4.808689,-5066.428416,-23.626189
315,PCS Sula Open,47446,Elite,4,2021-07-07,2021-03-11 20:29:42.650988,117 days 03:30:17.349012,117.146034,2.068728,952.200000,6.82,PCS Sula Open,Rebecca Cox,32917,929,2.875298,3013.987122,4.789380
316,PCS Sula Open,47446,Elite,4,2021-07-07,2021-03-11 20:29:42.650988,117 days 03:30:17.349012,117.146034,2.068728,952.200000,6.82,PCS Sula Open,Sarah Hokom,34563,967,-1.834242,-1563.427571,-4.659140
