This model is meant to forecast the outcome of the standings here: https://udisclive.com/players?t=standings&z=dgpt 
Logic for how to attribute points is here: https://udisc.com/blog/post/how-disc-golf-pro-tour-points-work-why-they-matter?fbclid=IwAR1VwYCkl7DCkgc93G5qujSDxCSWqg5HMWLv7dVhu_c4GXchW_P7fJO7MSo

To do so, I need to:
- load in events, and details about those events
- load in the players playing in those events, and details about their skill levels
- generate N runs of a model that forecasts each player's results N times at each event they are entered in
- use the generated results to assign points to the players
- sum the points based on the DGPT rules
- aggregate the ranks for each player at the end of the year (for example, see if Paige Pierce has a 90% chance of 1st place, 5% chance of 2nd place, etc.)

Things that inspired this / what the output should look like
- 538 soccer: https://projects.fivethirtyeight.com/soccer-predictions/champions-league/
- my own attempt at this same thing but in google sheets: https://docs.google.com/spreadsheets/d/19IwKCt5DI77koId916DawdJd1mNM7RQi3Nzlpmsa23Q/edit#gid=0 (see  ForecastFinal tab. This doc does ALL of the logic i want in here, but it's limited to N=50 runs due to how long it takes the formulas to update)

Load In Packages

In [1]:
import pandas as pd
import numpy as np

Load in Events - Details from PDGA.com using Event ID as unique Identifier
Would eventually like to make this step scrape the site, but we can get there later

Load Players signed up for events - Details from PDGA.com with PDGA# and Event ID as unique Identifier
This also needs to be scraped eventually, and will need to include a "load date" column since ratings change once a month. For now, just a csv.

Load Player Scores
Placeholder - eventually I'll need to include the rounds that have already occured (since those scores are 100% likely to happen). I haven't written any code to account for this yet.

In [2]:
events = pd.read_csv('events.csv')

In [3]:
eventplayers = pd.read_csv('eventplayers20210321.csv')

In [4]:
pointsLogic = pd.read_csv('PointsLogic.csv')

In [5]:
###eventplayerscores = pd.read_csv('eventplayerscores.csv')

Event Mean Regression Creation

This cell creates a simple mean regression - the farther out the event, the more likely player ratings are to change. This variable is mostly a guess right now, and is not coming from any valid regression of how player ratings change over time. Also I had to take the absolute value so that the event that already occured didn't break everything. In real life, I should just have those scores already loaded in.

In [6]:
events['Event Date'] = pd.to_datetime(events['Event Date'])
events['Today'] = pd.to_datetime("now")
events['Time to Event'] = events['Event Date'] - events['Today']
events['Time to Event Number'] = pd.to_numeric(events['Time to Event'])/(1000000000*60*60*24)
events['Event Mean Regression'] = np.log10(np.absolute(events['Time to Event Number']))

Generate Average Rating At Each Event - useful on its own, but needed to calculate the avg round scores. Ideally would use historical data here, but really just need a baseline so I have something to work with

Declare StdDev - 6.82 is a number I grabbed from a couple 2020 tournaments to use later on in generating the normal distribution and (you guessed it) probably needs more work to be better.

In [7]:
eventplayers['Rating'] = pd.to_numeric(eventplayers['Rating'])
means = eventplayers.groupby('Event ID')['Rating'].mean()
events = pd.merge(events,means,on = 'Event ID',how = 'left')
events['StdDev'] = 6.82

###events

Calculate Single Round Expected Scores

Here we declare "rating points per stroke" so that we can compare ex. a 1000 to 900 rated player in a single round. Then we turn that into the basic "expected round score" that can be used in the upcoming randomization.

In [8]:
rating_points_per_stroke = 6
eventplayerratings = pd.merge(events,eventplayers, on = 'Event ID',how = 'right')
eventplayerratings['Single Round Expected Score'] = -1*((eventplayerratings['Rating_y']-eventplayerratings['Rating_x'])/(rating_points_per_stroke+eventplayerratings['Event Mean Regression']))

###eventplayerratings

Generate Random Scores

In [9]:
###Create N number of copies of the dataframe
eventplayereventratings = pd.concat([eventplayerratings for i in range(1000)],
          ignore_index=True)

eventplayerroundratings = pd.concat([eventplayerratings for i in range(1000)],
          ignore_index=True)

###Number each iteration of the model
eventplayereventratings['ModelRunNumber']= eventplayereventratings.groupby(['Event ID','PDGANumber'])['PDGANumber'].rank(method='first')
eventplayerroundratings['ModelRunNumber']= eventplayerroundratings.groupby(['Event ID','PDGANumber'])['PDGANumber'].rank(method='first')


###Copy the copies by # of rounds, then # the rounds
eventplayerroundratings = eventplayerroundratings.iloc[np.arange(len(eventplayerroundratings)).repeat(eventplayerroundratings['EventLength'])]
eventplayerroundratings['RoundNumber']= eventplayerroundratings.groupby(['Event ID','PDGANumber','ModelRunNumber'])['PDGANumber'].rank(method='first')

#Generate the Scores, and Round them to Integers
eventplayerroundratings['RandomScores'] = np.random.normal(
        eventplayerroundratings['Single Round Expected Score'].values,
        eventplayerroundratings['StdDev'].values)
eventplayerroundratings['RandomScores'] = eventplayerroundratings['RandomScores'].round(0)
eventplayerroundratings['RandomScores'] = eventplayerroundratings.RandomScores.astype(int)
eventplayerroundratings['RoundNumber'] = eventplayerroundratings.RoundNumber.astype(int)
eventplayerroundratings['ModelRunNumber'] = eventplayerroundratings.ModelRunNumber.astype(int)

###eventplayerroundratings

In [10]:
ModelRunScore = eventplayerroundratings.groupby(['Event ID','PDGANumber','ModelRunNumber'],as_index=False).sum()
ModelRunScore = ModelRunScore.drop(['EventLength','Time to Event Number','Event Mean Regression','Rating_x','StdDev','Rating_y','Single Round Expected Score','RoundNumber'], axis = 1)
###ModelRunScore

In [11]:
eventplayereventratings = pd.merge(eventplayereventratings,ModelRunScore, on = ['Event ID','PDGANumber','ModelRunNumber'],how = 'left')
eventplayereventratings['ModelRunNumber'] = eventplayereventratings.ModelRunNumber.astype(int)

###eventplayereventratings

In [13]:
eventplayereventratings['EventRankBeforePlayoff'] = eventplayereventratings.groupby(['Event ID','ModelRunNumber'])['RandomScores'].rank(method='min')
eventplayereventratings['EventRankForPoints'] = eventplayereventratings.groupby(['Event ID','ModelRunNumber'])['RandomScores'].rank(method='first')
eventplayereventratings['EventRankBeforePlayoff'] = eventplayereventratings.EventRankBeforePlayoff.astype(int)
eventplayereventratings['EventRankForPoints'] = eventplayereventratings.EventRankForPoints.astype(int)

###Ridiculous amount of code to break ties in playoffs. Two things it should do that it doesn't : account for strength of player, account for the low % chance that a playoff can't occur
number_of_ties_before_playoff = eventplayereventratings.groupby(['EventRankBeforePlayoff','Event ID','ModelRunNumber'],as_index=False)['EventRankBeforePlayoff'].size()
first_place_ties = pd.merge(eventplayereventratings,number_of_ties_before_playoff, on = ['Event ID','EventRankBeforePlayoff','ModelRunNumber'],how = 'right')
first_place_ties = first_place_ties[first_place_ties['size']>1]
first_place_ties = first_place_ties[first_place_ties['EventRankBeforePlayoff']==1]
first_place_ties = first_place_ties.drop(['Event Date','EventLength','Time to Event Number','Event Mean Regression','Rating_x','StdDev','Rating_y','Single Round Expected Score','RandomScores','EventRankForPoints','Event Name','EventType','Today','Time to Event','Name'], axis = 1)
first_place_ties['Random'] = np.random.rand(len(first_place_ties.index))
first_place_ties['RankAfterPlayoff']= first_place_ties.groupby(['Event ID','ModelRunNumber'])['Random'].rank(method='first')
first_place_ties['RankAfterPlayoff'] = first_place_ties.RankAfterPlayoff.astype(int)
first_place_ties['RankAfterPlayoffWithSecondPlaceTies']= first_place_ties['RankAfterPlayoff']
first_place_ties['RankAfterPlayoffWithSecondPlaceTies'] = np.where((first_place_ties.RankAfterPlayoff >= 2),2,first_place_ties.RankAfterPlayoffWithSecondPlaceTies)
first_place_ties = first_place_ties.drop(['EventRankBeforePlayoff','Random','RankAfterPlayoff','size'], axis = 1)

###first_place_ties

In [14]:
eventplayereventratings = pd.merge(eventplayereventratings,first_place_ties, on = ['Event ID','PDGANumber','ModelRunNumber'],how = 'outer')
eventplayereventratings.RankAfterPlayoffWithSecondPlaceTies.fillna(eventplayereventratings.EventRankBeforePlayoff, inplace = True)
eventplayereventratings['RankAfterPlayoffWithSecondPlaceTies'] = eventplayereventratings.RankAfterPlayoffWithSecondPlaceTies.astype(int)

In [15]:
###find the number of ties
number_of_ties = eventplayereventratings.groupby(['RankAfterPlayoffWithSecondPlaceTies','Event ID','ModelRunNumber'],as_index=False)['RankAfterPlayoffWithSecondPlaceTies'].size()
number_of_ties.rename(columns = {'size':'NumberOfTies'}, inplace = True)
###merge the number of ties back into the thing
eventplayereventratings = pd.merge(eventplayereventratings,number_of_ties, on = ['Event ID','RankAfterPlayoffWithSecondPlaceTies','ModelRunNumber'],how = 'right')
###eventplayereventratings
###number_of_ties

### calculate points for each run
eventplayereventratings = pd.merge(eventplayereventratings,pointsLogic, on = 'EventRankForPoints',how = 'left')
eventplayereventratings = eventplayereventratings.drop(['Rating_x','EventRankBeforePlayoff','EventRankForPoints','StdDev','Today','Time to Event','Rating_y'], axis = 1)
eventplayereventratings

Unnamed: 0,Event Name,Event ID,EventType,EventLength,Event Date,Time to Event Number,Event Mean Regression,Name,PDGANumber,Single Round Expected Score,ModelRunNumber,RandomScores,RankAfterPlayoffWithSecondPlaceTies,NumberOfTies,Points
0,PCS Sula Open,47446,Elite,4,2021-07-07,107.792999,2.032591,Kristin Tattar,73986,-4.099371,1,-47,1,1,100
1,PCS Sula Open,47446,Elite,4,2021-07-07,107.792999,2.032591,Paige Pierce,29190,-8.083142,2,-28,1,1,100
2,PCS Sula Open,47446,Elite,4,2021-07-07,107.792999,2.032591,Paige Pierce,29190,-8.083142,3,-48,1,1,100
3,PCS Sula Open,47446,Elite,4,2021-07-07,107.792999,2.032591,Paige Pierce,29190,-8.083142,4,-40,1,1,100
4,PCS Sula Open,47446,Elite,4,2021-07-07,107.792999,2.032591,Paige Pierce,29190,-8.083142,5,-35,1,1,100
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
400995,LVC,47877,Elite,4,2021-02-25,-24.207001,1.383941,Samii The Tutu Maes,84007,34.513795,996,140,46,1,4
400996,LVC,47877,Elite,4,2021-02-25,-24.207001,1.383941,Samii The Tutu Maes,84007,34.513795,997,141,46,1,4
400997,LVC,47877,Elite,4,2021-02-25,-24.207001,1.383941,Samii The Tutu Maes,84007,34.513795,998,165,46,1,4
400998,LVC,47877,Elite,4,2021-02-25,-24.207001,1.383941,Samii The Tutu Maes,84007,34.513795,999,112,46,1,4


In [16]:
#Calculate Points for Ties
tiePoints = eventplayereventratings.groupby(['RankAfterPlayoffWithSecondPlaceTies','Event ID','EventType','ModelRunNumber','NumberOfTies'],as_index=False).sum()
tiePoints = tiePoints.drop(['EventLength','Time to Event Number','Event Mean Regression','PDGANumber','Single Round Expected Score','RandomScores'], axis = 1)
tiePoints['Actual Points'] = tiePoints['Points']/tiePoints['NumberOfTies']
tiePoints['Actual Points'] = np.where(tiePoints['EventType'] == 'Silver', tiePoints['Actual Points']/4,tiePoints['Actual Points'])
tiePoints = tiePoints.drop(['Points','NumberOfTies','EventType'], axis = 1)
tiePoints

Unnamed: 0,RankAfterPlayoffWithSecondPlaceTies,Event ID,ModelRunNumber,Actual Points
0,1,47446,1,100.0
1,1,47446,2,100.0
2,1,47446,3,100.0
3,1,47446,4,100.0
4,1,47446,5,100.0
...,...,...,...,...
332026,46,47877,996,4.0
332027,46,47877,997,4.0
332028,46,47877,998,4.0
332029,46,47877,999,4.0


In [17]:
eventplayereventratings = pd.merge(eventplayereventratings,tiePoints, on = ['RankAfterPlayoffWithSecondPlaceTies','Event ID','ModelRunNumber'],how = 'left')
eventplayereventratings['EliteSeriesRank'] = eventplayereventratings[eventplayereventratings.EventType == 'Elite'].groupby(['PDGANumber','ModelRunNumber'])['Actual Points'].rank(method='first')
eventplayereventratings['EliteSeriesRank'] = eventplayereventratings['EliteSeriesRank'].fillna(100)
eventplayereventratings['EliteSeriesRank'] = eventplayereventratings.EliteSeriesRank.astype(int)
###eventplayereventratings[eventplayereventratings.EventType == 'Elite']
eventplayereventratings['SilverSeriesRank'] = eventplayereventratings[eventplayereventratings.EventType == 'Silver'].groupby(['PDGANumber','ModelRunNumber'])['Actual Points'].rank(method='first')
eventplayereventratings['SilverSeriesRank'] = eventplayereventratings['SilverSeriesRank'].fillna(100)
eventplayereventratings['SilverSeriesRank'] = eventplayereventratings.SilverSeriesRank.astype(int)


In [18]:
eventplayereventratings['KeepScore'] = np.where((eventplayereventratings['EliteSeriesRank']<=8)|(eventplayereventratings['SilverSeriesRank']<=3),'Yes','No')
eventplayereventratings.query('ModelRunNumber == "1" and PDGANumber == "29190"')

Unnamed: 0,Event Name,Event ID,EventType,EventLength,Event Date,Time to Event Number,Event Mean Regression,Name,PDGANumber,Single Round Expected Score,ModelRunNumber,RandomScores,RankAfterPlayoffWithSecondPlaceTies,NumberOfTies,Points,Actual Points,EliteSeriesRank,SilverSeriesRank,KeepScore
8000,OTBO,48172,Elite,3,2021-05-14,53.792999,1.730726,Paige Pierce,29190,-10.795176,1,-38,1,1,100,100.0,7,100,Yes
9000,Goat Hill,48283,Silver,3,2021-05-07,46.792999,1.670181,Paige Pierce,29190,-13.857603,1,-45,1,1,100,25.0,100,1,Yes
14000,Idlewild,48688,Elite,3,2021-08-13,144.792999,2.160748,Paige Pierce,29190,-6.138029,1,-33,1,1,100,100.0,8,100,Yes
15000,MVP Open,49214,Elite,3,2021-09-03,165.792999,2.219566,Paige Pierce,29190,-7.360486,1,-40,1,1,100,100.0,9,100,No
16000,PCS Sula Open,47446,Elite,4,2021-07-07,107.792999,2.032591,Paige Pierce,29190,-8.083142,1,-39,2,1,85,85.0,4,100,Yes
28049,Jonesboro,48567,Elite,3,2021-04-16,25.792999,1.411502,Paige Pierce,29190,-8.988825,1,-44,2,1,85,85.0,5,100,Yes
29153,Waco,48685,Elite,3,2021-03-12,-9.207001,0.964118,Paige Pierce,29190,-10.477816,1,-37,2,1,85,85.0,6,100,Yes
37654,Ledgestone,47981,Elite,4,2021-08-05,136.792999,2.136064,Paige Pierce,29190,-8.303625,1,-22,3,1,75,75.0,3,100,Yes
43851,DGLO,48338,Elite,3,2021-07-23,123.792999,2.092696,Paige Pierce,29190,-9.422076,1,-28,3,2,75,72.0,2,100,Yes
184401,LVC,47877,Elite,4,2021-02-25,-24.207001,1.383941,Paige Pierce,29190,-12.48008,1,-24,13,2,44,43.0,1,100,Yes


In [19]:
ModelRunScore = eventplayereventratings[eventplayereventratings.KeepScore =='Yes'].groupby(['PDGANumber','ModelRunNumber'],as_index=False).sum()
ModelRunScore = ModelRunScore.drop(['Event ID','EventLength','Time to Event Number','Event Mean Regression','Single Round Expected Score','RankAfterPlayoffWithSecondPlaceTies','RandomScores','NumberOfTies','Points','EliteSeriesRank','SilverSeriesRank'], axis = 1)

In [20]:
ModelRunScore

Unnamed: 0,PDGANumber,ModelRunNumber,Actual Points
0,7438,1,10.75
1,7438,2,9.75
2,7438,3,13.00
3,7438,4,13.00
4,7438,5,11.75
...,...,...,...
132995,147568,996,7.00
132996,147568,997,12.00
132997,147568,998,12.00
132998,147568,999,6.00


In [21]:
ModelRunScore['FinalStandings'] = ModelRunScore.groupby(['ModelRunNumber'])['Actual Points'].rank(ascending=False,method='min').astype(int)
EventPlayersUnique = eventplayers.drop_duplicates(subset=['PDGANumber','Name'])
EventPlayersUnique

Unnamed: 0,Event ID,Name,PDGANumber,Rating
0,47446,Kona Star Panis,27832,927
1,47446,Paige Pierce,29190,996
2,47446,Rebecca Cox,32917,934
3,47446,Sarah Hokom,34563,963
4,47446,Jessica Weese,50656,950
...,...,...,...,...
366,48686,Amy Lewis,61950,914
372,48686,Alison Mabbutt,81569,872
373,48686,Ruby Hall,88525,864
377,48686,Ashlyn Tahlier,141044,884


In [22]:
ModelRunScore = pd.merge(ModelRunScore,EventPlayersUnique,on=['PDGANumber'],how = 'left')
ModelRunScore

Unnamed: 0,PDGANumber,ModelRunNumber,Actual Points,FinalStandings,Event ID,Name,Rating
0,7438,1,10.75,110,48283,Juliana Korver,916
1,7438,2,9.75,112,48283,Juliana Korver,916
2,7438,3,13.00,101,48283,Juliana Korver,916
3,7438,4,13.00,98,48283,Juliana Korver,916
4,7438,5,11.75,104,48283,Juliana Korver,916
...,...,...,...,...,...,...,...
132995,147568,996,7.00,121,47877,Mei Bruist,818
132996,147568,997,12.00,105,47877,Mei Bruist,818
132997,147568,998,12.00,104,47877,Mei Bruist,818
132998,147568,999,6.00,125,47877,Mei Bruist,818


In [30]:
FinalResultsAggregated = pd.pivot_table(ModelRunScore,index=['Name'],columns=['FinalStandings'],aggfunc='count')
FinalResultsAggregated

Unnamed: 0_level_0,Actual Points,Actual Points,Actual Points,Actual Points,Actual Points,Actual Points,Actual Points,Actual Points,Actual Points,Actual Points,...,Rating,Rating,Rating,Rating,Rating,Rating,Rating,Rating,Rating,Rating
FinalStandings,1,2,3,4,5,6,7,8,9,10,...,124,125,126,127,128,129,130,131,132,133
Name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
Alex Benson,,,,,,,,,,,...,,,,,,,,,,
Alexis Kerman,,,,,,,,,,,...,,2.0,6.0,16.0,47.0,454.0,396.0,76.0,3.0,
Alexis Mandujano,,,,1.0,,,1.0,4.0,6.0,11.0,...,,,,,,,,,,
Alison Mabbutt,,,,,,,,,,,...,,,,,,,,,,
Alyssa Pierson,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TONI OSIECKI,,,,,,,,,,,...,26.0,38.0,38.0,21.0,7.0,360.0,352.0,67.0,4.0,
Valerie Mandujano,,2.0,4.0,27.0,56.0,74.0,93.0,120.0,125.0,128.0,...,,,,,,,,,,
Vanessa Armstrong,,,,,,,,,,,...,1.0,6.0,8.0,16.0,30.0,101.0,63.0,34.0,737.0,
Vanessa Van Dyken,,,,,1.0,5.0,4.0,3.0,14.0,33.0,...,,,,,,,,,,


In [31]:
FinalResultsAggregated.to_csv('FinalResultsAggregated.csv')