# `make_rid_to_runners_dict.ipynb`

### Author: Anthony Hein

#### Last updated: 9/20/2021

# Overview:

Create a dictionary that maps a race number `rid` to the number of horses running in that race. This is a much quicker alternative to finding this information from the `horses_all(_trim(_intxn(_clean))).csv` dataframes. Write this dictionary as a constant in a Python file in the `utils` folder for easy import.

---

## Setup

In [1]:
import git
import os
from typing import List
from tqdm import tqdm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
BASE_DIR = git.Repo(os.getcwd(), search_parent_directories=True).working_dir
BASE_DIR

'/Users/anthonyhein/Desktop/SML310/project'

---

## Load `horses_all(_trim(_intxn)).csv`

In [3]:
horses_all = pd.read_csv(f"{BASE_DIR}/data/csv/horses_all.csv", low_memory=False) 
horses_all.head()

Unnamed: 0,rid,horseName,age,saddle,decimalPrice,isFav,trainerName,jockeyName,position,positionL,...,TR,OR,father,mother,gfather,runners,margin,weight,res_win,res_place
0,267255,Going For Broke,3.0,4.0,0.1,0,P C Haslam,Seb Sanders,1,,...,62.0,62.0,Simply Great,Empty Purse,Pennine Walk,6,1.168254,58,1.0,1.0
1,267255,Pinchincha,3.0,3.0,0.266667,0,Dave Morris,Tony Clark,2,4.0,...,56.0,65.0,Priolo,Western Heights,Shirley Heights,6,1.168254,60,0.0,1.0
2,267255,Skelton Sovereign,3.0,5.0,0.142857,0,Reg Hollinshead,D Griffiths,3,3.0,...,40.0,60.0,Contract Law,Mrs Lucky,Royal Match,6,1.168254,55,0.0,0.0
3,267255,Fast Spin,3.0,6.0,0.380952,1,David Barron,Tony Culhane,4,7.0,...,30.0,59.0,Formidable I,Topwinder,Topsider,6,1.168254,57,0.0,0.0
4,267255,As-Is,3.0,2.0,0.166667,0,Mark Johnston,J Weaver,5,7.0,...,21.0,65.0,Lomond,Capriati I,Diesis,6,1.168254,60,0.0,0.0


In [4]:
horses_all.shape

(4107315, 27)

---

## Load `races_all(_trim(_intxn)).csv`

In [5]:
races_all = pd.read_csv(f"{BASE_DIR}/data/csv/races_all.csv", low_memory=False) 
races_all.head()

Unnamed: 0,rid,course,time,date,title,rclass,band,ages,distance,condition,hurdles,prizes,winningTime,prize,metric,countryCode,ncond,class
0,267255,Southwell (AW),03:40,97/01/01,New Year Handicap Class E,Class 5,0-70,3yo,1m,Standard,,"[2752.25, 833.0, 406.5, 193.25]",106.9,4184.0,1609.0,GB,0,5
1,297570,Southwell (AW),12:35,97/01/01,Resolution Claiming Stakes Class F (Div I),Class 6,,4yo+,7f,Standard,,"[1944.0, 544.0, 264.0]",91.0,2752.0,1407.0,GB,0,6
2,334421,Southwell (AW),01:05,97/01/01,One Too Many Median Auction Maiden Apprentices...,Class 6,,4-6yo,1m3f,Standard,,"[2502.0, 702.0, 342.0]",150.7,3546.0,2212.0,GB,0,6
3,366304,Southwell (AW),03:10,97/01/01,Morning Call Selling Stakes Class G Southwell ...,Class 6,,3yo,1m,Standard,,"[2189.0, 614.0, 299.0]",108.6,3102.0,1609.0,GB,0,6
4,13063,Southwell (AW),02:40,97/01/01,Thinking &amp; Drinking Handicap Class E,Class 5,0-70,4yo+,2m½f,Standard,,"[2726.25, 825.0, 402.5, 191.25]",231.4,4144.0,3318.5,GB,0,5


In [6]:
races_all.shape

(396572, 18)

---

## Make `rid` to `runners` Dict

In [7]:
# takes about 3.5 minutes

d = {}
for _, row in tqdm(horses_all.iterrows()):
    d[row['rid']] = row['runners']

4107315it [03:32, 19293.09it/s]


## Write to File in `utils`

In [11]:
s = f"RID_TO_RUNNERS = {d}"
s[:100]

'RID_TO_RUNNERS = {267255: 6, 297570: 11, 334421: 10, 366304: 10, 13063: 9, 176063: 11, 74037: 10, 39'

In [14]:
with open(f"{BASE_DIR}/utils/rid_to_runners.py", 'w', encoding='utf-8') as f:
    f.write(s)

---