This model is meant to forecast the outcome of the standings here: https://udisclive.com/players?t=standings&z=dgpt 
Logic for how to attribute points is here: https://udisc.com/blog/post/how-disc-golf-pro-tour-points-work-why-they-matter?fbclid=IwAR1VwYCkl7DCkgc93G5qujSDxCSWqg5HMWLv7dVhu_c4GXchW_P7fJO7MSo

To do so, I need to:
- load in events, and details about those events
- load in the players playing in those events, and details about their skill levels
- generate N runs of a model that forecasts each player's results N times at each event they are entered in
- use the generated results to assign points to the players
- sum the points based on the DGPT rules
- aggregate the ranks for each player at the end of the year (for example, see if Paige Pierce has a 90% chance of 1st place, 5% chance of 2nd place, etc.)

Things that inspired this / what the output should look like
- 538 soccer: https://projects.fivethirtyeight.com/soccer-predictions/champions-league/
- my own attempt at this same thing but in google sheets: https://docs.google.com/spreadsheets/d/19IwKCt5DI77koId916DawdJd1mNM7RQi3Nzlpmsa23Q/edit#gid=0 (see  ForecastFinal tab. This doc does ALL of the logic i want in here, but it's limited to N=50 runs due to how long it takes the formulas to update)

Load In Packages

In [1]:
import pandas as pd
import numpy as np

Load in Events - Details from PDGA.com using Event ID as unique Identifier
Would eventually like to make this step scrape the site, but we can get there later

Load Players signed up for events - Details from PDGA.com with PDGA# and Event ID as unique Identifier
This also needs to be scraped eventually, and will need to include a "load date" column since ratings change once a month. For now, just a csv.

Load Player Scores
Placeholder - eventually I'll need to include the rounds that have already occured (since those scores are 100% likely to happen). I haven't written any code to account for this yet.

In [2]:
events = pd.read_csv('EventsAllTours.csv')

In [3]:
eventplayers = pd.read_csv('EventPlayersAllToursAllDivisions.csv')

In [4]:
pointsLogic = pd.read_csv('PointsLogicAllTours.csv')

In [5]:
###eventplayerscores = pd.read_csv('eventplayerscores.csv')
###might need to get rid of this

Event Mean Regression Creation

This cell creates a simple mean regression - the farther out the event, the more likely player ratings are to change. This variable is mostly a guess right now, and is not coming from any valid regression of how player ratings change over time. Also I had to take the absolute value so that the event that already occured didn't break everything. In real life, I should just have those scores already loaded in.

In [6]:
events['Event Date'] = pd.to_datetime(events['Event Date'])
events['Today'] = pd.to_datetime("now")
events['Time to Event'] = events['Event Date'] - events['Today']
events['Time to Event Number'] = pd.to_numeric(events['Time to Event'])/(1000000000*60*60*24)
events['Event Mean Regression'] = np.log10(np.absolute(events['Time to Event Number']))

Generate Average Rating At Each Event - useful on its own, but needed to calculate the avg round scores. Ideally would use historical data here, but really just need a baseline so I have something to work with

Declare StdDev - 6.82 is a number I grabbed from a couple 2020 tournaments to use later on in generating the normal distribution and (you guessed it) probably needs more work to be better.

In [7]:
eventplayers['Rating'] = pd.to_numeric(eventplayers['Rating'])
means = eventplayers.groupby(['EventID','Division'])['Rating'].mean()
events = pd.merge(events,means,on = ['EventID','Division'],how = 'left')
events['StdDev'] = 6.82

events

Unnamed: 0,Event Name,Division,EventID,EventType,EventTour,EventLength,Event Date,Today,Time to Event,Time to Event Number,Event Mean Regression,Rating,StdDev
0,DGPT All-Stars,MPO,,Other,DGPT,2,2021-02-20,2021-07-12 22:37:16.037904,-143 days +01:22:43.962096,-142.942547,2.155162,,6.82
1,LVC,MPO,47877.0,Elite,DGPT,4,2021-02-25,2021-07-12 22:37:16.037904,-138 days +01:22:43.962096,-137.942547,2.139698,1001.676471,6.82
2,Waco,MPO,48685.0,Elite,DGPT,3,2021-03-12,2021-07-12 22:37:16.037904,-123 days +01:22:43.962096,-122.942547,2.089702,1008.219697,6.82
3,Open at Belton,MPO,47888.0,Silver,DGPT,3,2021-03-19,2021-07-12 22:37:16.037904,-116 days +01:22:43.962096,-115.942547,2.064243,991.245763,6.82
4,Texas State Disc Golf Championship,MPO,47512.0,NT,NT,3,2021-03-26,2021-07-12 22:37:16.037904,-109 days +01:22:43.962096,-108.942547,2.037198,992.864286,6.82
...,...,...,...,...,...,...,...,...,...,...,...,...,...
77,Music City Open,MPO,47541.0,NT,NT,4,2021-09-23,2021-07-12 22:37:16.037904,72 days 01:22:43.962096,72.057453,1.857679,984.774566,6.82
78,Pro Forester,MPO,47444.0,Euro,Euro,3,2021-09-24,2021-07-12 22:37:16.037904,73 days 01:22:43.962096,73.057453,1.863665,,6.82
79,USDGC,MPO,47518.0,Major,Major,4,2021-10-06,2021-07-12 22:37:16.037904,85 days 01:22:43.962096,85.057453,1.929712,,6.82
80,USWDGC,MPO,50023.0,Major,Major,4,2021-10-06,2021-07-12 22:37:16.037904,85 days 01:22:43.962096,85.057453,1.929712,,6.82


Calculate Single Round Expected Scores

Here we declare "rating points per stroke" so that we can compare ex. a 1000 to 900 rated player in a single round. Then we turn that into the basic "expected round score" that can be used in the upcoming randomization.

In [8]:
rating_points_per_stroke = 6
eventplayerratings = pd.merge(events,eventplayers, on = ['EventID','Division'],how = 'right')
eventplayerratings['Single Round Expected Score'] = -1*((eventplayerratings['Rating_y']-eventplayerratings['Rating_x'])/(rating_points_per_stroke+eventplayerratings['Event Mean Regression']))
eventplayerratings = eventplayerratings.drop(['FinalRank','RoundOneScore','RoundTwoScore','RoundThreeScore','RoundFourScore','TotalScore','Payout','City','State','Country','PDGAPoints'], axis = 1)
eventplayerratings = eventplayerratings[eventplayerratings["Division"].isin(['MPO','FPO'])]
eventplayerratings

Unnamed: 0,Event Name,Division,EventID,EventType,EventTour,EventLength,Event Date,Today,Time to Event,Time to Event Number,Event Mean Regression,Rating_x,StdDev,Name,PDGANumber,Rating_y,FinalScore,Single Round Expected Score
134,,FPO,47877.0,,,,NaT,NaT,NaT,,,,,Samii The Tutu Maes,84007.0,644,122,
135,,FPO,47877.0,,,,NaT,NaT,NaT,,,,,TONI OSIECKI,32716.0,769,72,
136,,FPO,47877.0,,,,NaT,NaT,NaT,,,,,Rachel Trager,112608.0,809,47,
137,,FPO,47877.0,,,,NaT,NaT,NaT,,,,,Mei Bruist,147568.0,818,26,
138,,FPO,47877.0,,,,NaT,NaT,NaT,,,,,Alyssa Pierson,122118.0,825,50,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9239,Worlds,MPO,47784.0,Major,Major,5.0,2021-06-22,2021-07-12 22:37:16.037904,-21 days +01:22:43.962096,-20.942547,1.321029,1010.517241,6.82,Paul McBeth,27523.0,1051,,-5.529654
9240,Worlds,MPO,47784.0,Major,Major,5.0,2021-06-22,2021-07-12 22:37:16.037904,-21 days +01:22:43.962096,-20.942547,1.321029,1010.517241,6.82,Calvin Heimburg,45971.0,1052,,-5.666247
9241,Worlds,MPO,47784.0,Major,Major,5.0,2021-06-22,2021-07-12 22:37:16.037904,-21 days +01:22:43.962096,-20.942547,1.321029,1010.517241,6.82,Calvin Heimburg,45971.0,1052,,-5.666247
9242,Worlds,MPO,47784.0,Major,Major,5.0,2021-06-22,2021-07-12 22:37:16.037904,-21 days +01:22:43.962096,-20.942547,1.321029,1010.517241,6.82,Richard Wysocki,38008.0,1056,,-6.212618


In [9]:
eventplayerratings['Single Round Expected Score']

134          NaN
135          NaN
136          NaN
137          NaN
138          NaN
          ...   
9239   -5.529654
9240   -5.666247
9241   -5.666247
9242   -6.212618
9243   -6.212618
Name: Single Round Expected Score, Length: 7855, dtype: float64

Generate Random Scores

In [12]:
###Create N number of copies of the dataframe
eventplayereventratings = pd.concat([eventplayerratings for i in range(100)],
          ignore_index=True)

eventplayerroundratings = pd.concat([eventplayerratings for i in range(100)],
          ignore_index=True)

###Number each iteration of the model
eventplayereventratings['ModelRunNumber']= eventplayereventratings.groupby(['EventID','PDGANumber','Division'])['PDGANumber'].rank(method='first')
eventplayerroundratings['ModelRunNumber']= eventplayerroundratings.groupby(['EventID','PDGANumber','Division'])['PDGANumber'].rank(method='first')


eventplayerroundratings.query('PDGANumber == "27523"')

Unnamed: 0,Event Name,Division,EventID,EventType,EventTour,EventLength,Event Date,Today,Time to Event,Time to Event Number,Event Mean Regression,Rating_x,StdDev,Name,PDGANumber,Rating_y,FinalScore,Single Round Expected Score,ModelRunNumber
1279,Des Moines Challenge,MPO,52009.0,DGPT,DGPT,3.0,2021-07-09,2021-07-12 22:37:16.037904,-4 days +01:22:43.962096,-3.942547,0.595777,987.982301,6.82,Paul McBeth,27523.0,1051,,-9.554250,1.0
1280,Des Moines Challenge,MPO,52009.0,DGPT,DGPT,3.0,2021-07-09,2021-07-12 22:37:16.037904,-4 days +01:22:43.962096,-3.942547,0.595777,987.982301,6.82,Paul McBeth,27523.0,1051,,-9.554250,2.0
2046,Music City Open,MPO,47541.0,NT,NT,4.0,2021-09-23,2021-07-12 22:37:16.037904,72 days 01:22:43.962096,72.057453,1.857679,984.774566,6.82,Paul McBeth,27523.0,1051,,-8.428117,1.0
2047,Music City Open,MPO,47541.0,NT,NT,4.0,2021-09-23,2021-07-12 22:37:16.037904,72 days 01:22:43.962096,72.057453,1.857679,984.774566,6.82,Paul McBeth,27523.0,1051,,-8.428117,2.0
2527,Estonian Open,MPO,47447.0,Euro,Euro,3.0,2021-07-23,2021-07-12 22:37:16.037904,10 days 01:22:43.962096,10.057453,1.002488,978.963636,6.82,Paul McBeth,27523.0,1051,,-10.287253,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
784795,Idlewild,MPO,48688.0,Elite,DGPT,3.0,2021-08-13,2021-07-12 22:37:16.037904,31 days 01:22:43.962096,31.057453,1.492166,1011.478632,6.82,Paul McBeth,27523.0,1051,,-5.275026,200.0
785146,Ledgestone,MPO,47981.0,Elite,DGPT,4.0,2021-08-05,2021-07-12 22:37:16.037904,23 days 01:22:43.962096,23.057453,1.362811,1005.880682,6.82,Paul McBeth,27523.0,1051,,-6.128001,199.0
785147,Ledgestone,MPO,47981.0,Elite,DGPT,4.0,2021-08-05,2021-07-12 22:37:16.037904,23 days 01:22:43.962096,23.057453,1.362811,1005.880682,6.82,Paul McBeth,27523.0,1051,,-6.128001,200.0
785494,Worlds,MPO,47784.0,Major,Major,5.0,2021-06-22,2021-07-12 22:37:16.037904,-21 days +01:22:43.962096,-20.942547,1.321029,1010.517241,6.82,Paul McBeth,27523.0,1051,,-5.529654,199.0


In [11]:
###Copy the copies by # of rounds, then # the rounds
eventplayerroundratings = eventplayerroundratings.iloc[np.arange(len(eventplayerroundratings)).repeat(eventplayerroundratings['EventLength'])]
eventplayerroundratings['RoundNumber']= eventplayerroundratings.groupby(['EventID','PDGANumber','ModelRunNumber','Division'])['PDGANumber'].rank(method='first')

#Generate the Scores, and Round them to Integers
eventplayerroundratings['RandomScores'] = np.random.normal(
        eventplayerroundratings['Single Round Expected Score'].values,
        eventplayerroundratings['StdDev'].values)
eventplayerroundratings['RandomScores'] = eventplayerroundratings['RandomScores'].round(0)
eventplayerroundratings['RandomScores'] = eventplayerroundratings.RandomScores.astype(int)
eventplayerroundratings['RoundNumber'] = eventplayerroundratings.RoundNumber.astype(int)
eventplayerroundratings['ModelRunNumber'] = eventplayerroundratings.ModelRunNumber.astype(int)

###eventplayerroundratings

ValueError: repeats may not contain negative values.

In [None]:
ModelRunScore = eventplayerroundratings.groupby(['EventID','PDGANumber','ModelRunNumber'],as_index=False).sum()
ModelRunScore = ModelRunScore.drop(['EventLength','Time to Event Number','Event Mean Regression','Rating_x','StdDev','Rating_y','Single Round Expected Score','RoundNumber'], axis = 1)
###ModelRunScore

In [None]:
eventplayereventratings = pd.merge(eventplayereventratings,ModelRunScore, on = ['EventID','PDGANumber','ModelRunNumber'],how = 'left')
eventplayereventratings['ModelRunNumber'] = eventplayereventratings.ModelRunNumber.astype(int)

eventplayereventratings

In [None]:
eventplayereventratings['EventRankBeforePlayoff'] = eventplayereventratings.groupby(['EventID','ModelRunNumber'])['RandomScores'].rank(method='min')
eventplayereventratings['EventRankBeforePlayoff'] = eventplayereventratings.EventRankBeforePlayoff.astype(int)

###Ridiculous amount of code to break ties in playoffs. Two things it should do that it doesn't : account for strength of player, account for the low % chance that a playoff can't occur
number_of_ties_before_playoff = eventplayereventratings.groupby(['EventRankBeforePlayoff','EventID','ModelRunNumber'],as_index=False)['EventRankBeforePlayoff'].size()
first_place_ties = pd.merge(eventplayereventratings,number_of_ties_before_playoff, on = ['EventID','EventRankBeforePlayoff','ModelRunNumber'],how = 'right')
first_place_ties = first_place_ties[first_place_ties['size']>1]
first_place_ties = first_place_ties[first_place_ties['EventRankBeforePlayoff']==1]
first_place_ties = first_place_ties.drop(['Event Date','EventLength','Time to Event Number','Event Mean Regression','Rating_x','StdDev','Rating_y','Single Round Expected Score','RandomScores','Event Name','EventType','Today','Time to Event','Name'], axis = 1)
first_place_ties['Random'] = np.random.rand(len(first_place_ties.index))
first_place_ties['RankAfterPlayoff']= first_place_ties.groupby(['EventID','ModelRunNumber'])['Random'].rank(method='first')
first_place_ties['RankAfterPlayoff'] = first_place_ties.RankAfterPlayoff.astype(int)
first_place_ties['RankAfterPlayoffWithSecondPlaceTies']= first_place_ties['RankAfterPlayoff']
first_place_ties['RankAfterPlayoffWithSecondPlaceTies'] = np.where((first_place_ties.RankAfterPlayoff >= 2),2,first_place_ties.RankAfterPlayoffWithSecondPlaceTies)
first_place_ties = first_place_ties.drop(['EventRankBeforePlayoff','Random','RankAfterPlayoff','size'], axis = 1)

first_place_ties

In [None]:
eventplayereventratings = pd.merge(eventplayereventratings,first_place_ties, on = ['EventID','PDGANumber','ModelRunNumber'],how = 'outer')
eventplayereventratings.RankAfterPlayoffWithSecondPlaceTies.fillna(eventplayereventratings.EventRankBeforePlayoff, inplace = True)
eventplayereventratings['RankAfterPlayoffWithSecondPlaceTies'] = eventplayereventratings.RankAfterPlayoffWithSecondPlaceTies.astype(int)

In [None]:


### calculate points for each run
eventplayereventratings = pd.merge(eventplayereventratings,eventplayers, on = ['EventID','PDGANumber','Name'],how = 'left')
eventplayereventratings = eventplayereventratings.drop(['Rating_x','EventRankBeforePlayoff','StdDev','Today','Time to Event','Rating_y'], axis = 1)
eventplayereventratings.FinalRank = np.where(eventplayereventratings.FinalRank.isnull(),eventplayereventratings.RankAfterPlayoffWithSecondPlaceTies,eventplayereventratings.FinalRank)
eventplayereventratings['EventRankForPoints'] = eventplayereventratings.groupby(['EventID','ModelRunNumber'])['FinalRank'].rank(method='first')
eventplayereventratings['EventRankForPoints'] = eventplayereventratings.EventRankForPoints.astype(int)
eventplayereventratings = pd.merge(eventplayereventratings,pointsLogic, on = 'EventRankForPoints',how = 'left')
eventplayereventratings = eventplayereventratings.drop(['EventRankForPoints'], axis = 1)

eventplayereventratings = eventplayereventratings.drop(['RankAfterPlayoffWithSecondPlaceTies'], axis = 1)
###find the number of ties
number_of_ties = eventplayereventratings.groupby(['FinalRank','EventID','ModelRunNumber'],as_index=False)['FinalRank'].size()
number_of_ties.rename(columns = {'size':'NumberOfTies'}, inplace = True)
###merge the number of ties back into the thing
eventplayereventratings = pd.merge(eventplayereventratings,number_of_ties, on = ['EventID','FinalRank','ModelRunNumber'],how = 'right')

eventplayereventratings

In [None]:
#Calculate Points for Ties
tiePoints = eventplayereventratings.groupby(['FinalRank','EventID','EventType','ModelRunNumber','NumberOfTies'],as_index=False).sum()
tiePoints = tiePoints.drop(['EventLength','Time to Event Number','Event Mean Regression','PDGANumber','Single Round Expected Score','RandomScores'], axis = 1)
tiePoints['Actual Points'] = tiePoints['Points']/tiePoints['NumberOfTies']
tiePoints['Actual Points'] = np.where(tiePoints['EventType'] == 'Silver', tiePoints['Actual Points']/4,tiePoints['Actual Points'])
tiePoints = tiePoints.drop(['Points','NumberOfTies','EventType'], axis = 1)
tiePoints.query('ModelRunNumber == "1" and `EventID` == "47877"')

In [None]:
eventplayereventratings = pd.merge(eventplayereventratings,tiePoints, on = ['FinalRank','EventID','ModelRunNumber'],how = 'left')
eventplayereventratings['EliteSeriesRank'] = eventplayereventratings[eventplayereventratings.EventType == 'Elite'].groupby(['PDGANumber','ModelRunNumber'])['Actual Points'].rank(method='first')
eventplayereventratings['EliteSeriesRank'] = eventplayereventratings['EliteSeriesRank'].fillna(100)
eventplayereventratings['EliteSeriesRank'] = eventplayereventratings.EliteSeriesRank.astype(int)
###eventplayereventratings[eventplayereventratings.EventType == 'Elite']
eventplayereventratings['SilverSeriesRank'] = eventplayereventratings[eventplayereventratings.EventType == 'Silver'].groupby(['PDGANumber','ModelRunNumber'])['Actual Points'].rank(method='first')
eventplayereventratings['SilverSeriesRank'] = eventplayereventratings['SilverSeriesRank'].fillna(100)
eventplayereventratings['SilverSeriesRank'] = eventplayereventratings.SilverSeriesRank.astype(int)
eventplayereventratings['NTRank'] = eventplayereventratings[eventplayereventratings.EventType == 'Silver'].groupby(['PDGANumber','ModelRunNumber'])['Actual Points'].rank(method='first')
eventplayereventratings['NTRank'] = eventplayereventratings['SilverSeriesRank'].fillna(100)
eventplayereventratings['NTRank'] = eventplayereventratings.SilverSeriesRank.astype(int)
eventplayereventratings['PDPTRank'] = eventplayereventratings[eventplayereventratings.EventType == 'Silver'].groupby(['PDGANumber','ModelRunNumber'])['Actual Points'].rank(method='first')
eventplayereventratings['PDPTRank'] = eventplayereventratings['SilverSeriesRank'].fillna(100)
eventplayereventratings['PDPTRank'] = eventplayereventratings.SilverSeriesRank.astype(int)

In [None]:
eventplayereventratings['KeepScore'] = np.where((eventplayereventratings['EliteSeriesRank']<=8)|(eventplayereventratings['SilverSeriesRank']<=3),'Yes','No')
eventplayereventratings.query('`EventID` == "48686" and PDGANumber == "99455"')

In [None]:
ModelRunScore = eventplayereventratings[eventplayereventratings.KeepScore =='Yes'].groupby(['PDGANumber','ModelRunNumber'],as_index=False).sum()
ModelRunScore = ModelRunScore.drop(['EventID','EventLength','Time to Event Number','Event Mean Regression','Single Round Expected Score','RandomScores','NumberOfTies','Points','EliteSeriesRank','SilverSeriesRank'], axis = 1)

In [None]:
ModelRunScore

In [None]:
ModelRunScore['FinalStandings'] = ModelRunScore.groupby(['ModelRunNumber'])['Actual Points'].rank(ascending=False,method='min').astype(int)
EventPlayersUnique = eventplayers.drop_duplicates(subset=['PDGANumber','Name'])
EventPlayersUnique = EventPlayersUnique.drop('FinalRank', axis = 1)
EventPlayersUnique

In [None]:
ModelRunScore = pd.merge(ModelRunScore,EventPlayersUnique,on=['PDGANumber'],how = 'left')
ModelRunScore

In [None]:
FinalResultsAggregated = pd.pivot_table(ModelRunScore,index=['Name','PDGANumber'],columns=['FinalStandings'],values=['Actual Points'],aggfunc='count',fill_value =0)
FinalResultsAggregated

In [None]:
FinalResultsAggregated.to_csv('FinalResultsAggregatedNTTest.csv')