
# DraftKings NFL DFS 
# Webscraping and Data Science Notebook


### James Chapman


---

  Automate the extraction and analysis of data from DraftKings. Uses DraftKings API with cookies to extract metadata and contest results. Including, contest ownership to be used in machine learning project
  
---
**Data Cleaning:**  
   - Collect data from DraftKings using `Requests` & `BeautifulSoup`
   - Lots of `Pandas`

**Multicore Processing:**  
   - **Parallelized:** Enhance performance by distributing tasks across multiple CPU cores, significantly reducing processing time for large datasets
     - Demonstrates partitioning data and executing parallel tasks.
     - Showcases efficient resource utilization.
---


##  This notebook creates 3 DataFrames



#### 1. Contest Details DataFrame (`contests.csv`)

- **Shape:** 30 columns x ~75,000 contests per year
- **Key Features:**
  - **Contest Metadata:**  
    - *Identifiers:* `contest_id`, `name`  
    - *Capacity:* `max_entries`, `entries`  
    - *money:* `entry_fee`, `total_prizes`
  - **Contest TYPES:**  
    - *Formats & Rules:* `gameType`, `is_guaranteed`, `is_winner_take_all`, `is_double_up`, `is_fifty_fifty`, `is_multiplier`  
    - *Additional Flags:* Sparse data in columns like `IsCasual`, `attr.IsPrivate`, and others indicate niche contest attributes or optional features.
  - **Miscellaneous:**  
    - *Date & Time:* `date_time`, `date`, `week`, `year` allow for time-based analyses and trend detection.

- **Analysis Potential:**  
  - **Trend Analysis:** Study variations in entry fees and prize distributions over time or across different NFL weeks/seasons.
  - **Contest Segmentation:** Cluster contests based on format and attributes to identify patterns in popularity or profitability.
  - **Risk-Reward Modeling:** Analyze the relationship between contest parameters (e.g., entry fee versus prize pool) to develop predictive models on contest performance.

---

### 2. Contest Results DataFrame (`contestResults.csv`)

- **Shape:** 6 columns x ~1,000,000 individual entries per year

- **contest_id**  
- **place**  
- **entry_id**  
- **points**  
- **lineup** - actual lineup used ex: "DST Vikings FLEX Rashid Shaheed QB Josh Allen..."
- **payout** - actual dollar amount this person won in the contest

- **Analysis Potential:**  
  - **Outcome Prediction:** Use historical contest results to build models that forecast likely outcomes, such as predicting winners or estimating payout distributions.
  - **Performance Benchmarking:** Evaluate how different contest characteristics (e.g., contest format or entry fee) correlate with performance metrics.
  - **Aggregated Insights:** Summarize data across contests to identify trends at various levels (e.g., by week, season, or contest type).

---

### 3. Percent Player Usage/Ownership DataFrame (`contestOwnership.csv`)

- **Shape:** 5 columns x ~2,000,000 
- **Key Features:**
  
- **contest_id**  
- **player**   
- **pos**  
- **drafted**  - actual percentage of entries used this player in lineup (for the given contest ID)
- **points**   - actual points this player scored  (fantasy points)

- **Analysis Potential:**  
  - **Player Performance Analysis:** Investigate how player popularity (via drafted percentages) relates to actual performance, aiding in player ranking or selection strategies.
  - **Predictive Modeling:** Build models to forecast player performance based on historical data, which can be crucial for fantasy sports lineup optimization.
  - **Market Dynamics:** Analyze trends over time to understand shifts in player popularity and performance across various NFL contests.

---


- **Feature-Rich Datasets for Data Science:**  
  The detailed, multi-dimensional data captured in these DataFrames enables rigorous exploratory data analysis (EDA) and robust feature engineering. Data scientists can extract meaningful patterns, create new variables, and prepare datasets for advanced modeling tasks.

- **Machine Learning Applications:**  
  - **Predictive Analytics:**  
    Utilize historical contest and player performance data to predict future outcomes, optimize contest entries, or forecast player scores. For example, regression models could predict expected points or payouts, while classification models might forecast contest winners.
  - **Clustering & Segmentation:**  
    Group similar contests or player profiles using clustering algorithms to identify underlying patterns that inform strategy decisions in NFL fantasy sports.
  - **Risk Management:**  
    Develop models to assess the risk and reward associated with different contest types, aiding in decision-making for both participants and organizers.

- **Real-World NFL Analytics:**  
  This dataset is tailored to the nuances of NFL contests and player dynamics. Whether you’re an analyst aiming to improve team selections, a strategist looking to understand contest trends, or a machine learning practitioner building predictive models, the rich, diverse data serves as a solid foundation for a range of real-world applications.


- **For Data Science and Machine Learning:**  
  The cleaned and structured datasets allow you to:
  - **Perform Exploratory Data Analysis (EDA):** Quickly understand contest dynamics and player performance.
  - **Build Predictive Models:** Use the features from the datasets to develop models predicting contest outcomes or player performance.
  - **Enhance Feature Engineering:** Tackle challenges like sparse data and categorical variables, turning them into insights for machine learning pipelines.
---



In [1]:
import pandas as pd
import numpy as np
import requests
import datetime
import zipfile
import os.path
import json
import time
import csv
import re
import concurrent.futures
from bs4 import BeautifulSoup
from tqdm import tqdm

In [2]:
CONTEST_MAIN_URL = 'https://www.draftkings.com/lobby/getcontests?sport=NFL'
INDIVIDUAL_CONTEST_URL = 'https://www.draftkings.com/contest/gamecenter/{}'#.format(contest_id)
EXPORT_URL = 'https://www.draftkings.com/contest/exportfullstandingscsv/{}'#.format(contest_id)
CONTEST_DETAIL_URL = 'https://www.draftkings.com/contest/detailspop'

DRAFT_TABLES_URL = 'https://api.draftkings.com/draftgroups/v1/draftgroups/{}/draftables'
DRAFT_GROUPS_URL = 'https://api.draftkings.com/draftgroups/v1/{}'

In [3]:
current_year = 2024
now = datetime.datetime.now().strftime("%m-%d-%Y__%H_%M")

##########################################################
# Switch your browsers Default downloads to the directory of this (your) notebook
TMP_ZIP_CSV_PATH = 'C:/Users/James/OneDrive/Fantasy Football/Jupyter/data/DraftKingsScraper/{}/contestResults/'#.format(current_year)
##########################################################
# store raw csv contest results from DraftKings
CSV_FILE_NAME = 'data/DraftKingsScraper/{}/contestResults/contest-standings-{}.csv'#.format(current_year,contest_id) 

dateToWeek = pd.read_csv('data/dateToWeek.csv') # Special file mapping Date to NFL week/year
contests = pd.read_csv('data/DraftKingsScraper/{}/contests.csv'.format(current_year), low_memory=False)
contestResults = pd.read_csv('data/DraftKingsScraper/{}/contestResults.csv'.format(current_year))
contestOwnership = pd.read_csv('data/DraftKingsScraper/{}/contestOwnership.csv'.format(current_year))


## Collect all Contests from DraftKings Main URL

##### Contests change every day (save each daily contests)

In [4]:
for i in range(3): # Retry up to 3 times
    try:
        r = requests.get(CONTEST_MAIN_URL).json() 
        break # exit loop if get works
    except PermissionError:
        time.sleep(0.1) # 100ms before retry
                
dailyContests = pd.json_normalize(r['Contests'])
dailyContests = dailyContests.drop(['payoutDescriptionMetadata'], axis=1) #InterAble

# Discard SnakeDraft contests (Bool Column)
dailyContests = dailyContests[dailyContests['isSnakeDraft']==False]

# Drop all columns that only have one unique value (useless)
dailyContests = dailyContests.drop([column for column in dailyContests.columns.tolist() 
                                    if pd.unique(dailyContests[column]).size == 1], axis=1)
dailyContests = dailyContests.rename(columns={'id':'contest_id',
                                              'n':'name',
                                              'sd':'date_time',
                                              'po':'total_prizes',
                                              'mec':'max_entries',
                                              'm':'entries',
                                              'dg':'draftGroupID',
                                              'a':'entry_fee',
                                              'attr.IsGuaranteed':'is_guaranteed',
                                              'attr.IsDoubleUp':'is_double_up',
                                              'attr.IsFiftyfifty':'is_fifty_fifty',
                                              'attr.IsWinnerTakeAll':'is_winner_take_all',  
                                              'attr.League':'is_league',
                                              'attr.IsStarred':'is_starred',
                                              'attr.Multiplier':'is_multiplier',
                                              'attr.IsQualifier':'is_qualifier',  
                                              'attr.IsBeginner':'IsBeginner',  
                                              'attr.IsCasual':'IsCasual',  
                                              'attr.IsTournamentOfChamp':'is_tournament_of_champ'})
dailyContests = dailyContests.drop(['fpp',
                                    'startTimeType',
                                    'cso',
                                    'nt',
                                    'tmpl',
                                    'dgpo',
                                    'so',
                                    'rll',
                                    'crownAmount',
                                    'isSnakeDraft',
                                    'attr.LobbyClass',
                                    'pd.Cash',
                                    'pd.ContestSeat',
                                    'pd.Ticket',
                                    'attr.IsHeadliner',  
                                    'attr.HideBrandedLogo',
                                    'attr.DraftSpeed',
                                    'sdstring'], axis=1, errors='ignore') 

boolCols = ['is_guaranteed','is_starred','is_qualifier',#'is_tournament_of_champ','IsCasual',
                'is_winner_take_all','is_double_up','is_fifty_fifty',#'is_multiplier',
                'is_league','IsBeginner']

# Convert Boolean columns To panda type bool, without missing values
dailyContests[boolCols] = dailyContests[boolCols].replace({'true':True, np.nan:False})
# dailyContests[boolCols] = dailyContests[boolCols].replace({'true': True, np.nan: False}).astype(bool)
dailyContests[boolCols] = dailyContests[boolCols].astype('bool')

# Discard qualifier/Tournament of champ/League contests
dailyContests = dailyContests[(dailyContests['is_qualifier']==False)
                             #&(dailyContests['is_tournament_of_champ']==False)
                             &(dailyContests['is_league']==False)]
dailyContests = dailyContests.drop(['is_qualifier','is_league'], axis=1)#,'is_tournament_of_champ','is_league'], axis=1) 

  dailyContests[boolCols] = dailyContests[boolCols].replace({'true':True, np.nan:False})


## Add Columns to DataFrame
#### date, datetime, week, year, draftGroupStatus, contestStatus

In [5]:
# Used to Match dateToWeek['Date'] to dailyContests['date_time']= '/Date(1726419600000)/'
def get_datetime_from_timestamp(timestamp_str):
    timestamp = float(re.findall('[^\d]*(\d+)[^\d]*', timestamp_str)[0])
    return str(datetime.datetime.fromtimestamp(timestamp / 1000))

badDates = set([])
def addWeek(date):
    week = pd.unique(dateToWeek[dateToWeek['Date']== date]['week'])
    if(len(week)!=1):
        badDates.add(date)
        week=0
    else:
        week=int(week[0])
    return week

dailyContests['date_time'] = dailyContests['date_time'].apply(get_datetime_from_timestamp)
dailyContests['date'] = dailyContests['date_time'].str.split(' ').str[0]
dailyContests['week'] = dailyContests['date'].apply(addWeek)
dailyContests['year'] = current_year
dailyContests['draftGroupStatus'] = 'New'
dailyContests['contestStatus'] = 'New'

if(len(badDates)>0):
    print('------------ Confirm these dates are not useful (Preseason, Etc.)--------------  ')
    print(badDates)      
    print('-------------------------')
    print('-------------------------')

dailyContests = dailyContests[dailyContests['week']!=0]

##########################
# SAVE CONTESTS 
dailyContests.to_csv('data/DraftKingsScraper/{}/dailyContests/dailyContests_{}.csv'.format(current_year, datetime.datetime.now().strftime("%m-%d-%Y__%H_%M")), index=False)
##########################

------------ Confirm these dates are not useful (Preseason, Etc.)--------------  
{'2025-02-12'}
-------------------------
-------------------------


  timestamp = float(re.findall('[^\d]*(\d+)[^\d]*', timestamp_str)[0])


### Add daily contests to the yearly list

In [6]:
# Find new contest_ids, add to draftGroups Dataframe, with 'New' status
new_contest_ids = [i for i in dailyContests.contest_id.unique() 
                       if i not in contests.contest_id.unique()]
new_contests = dailyContests[dailyContests['contest_id'].isin(new_contest_ids)]
starting_length = len(contests) # before addition, used later
contests = pd.concat([contests, new_contests], ignore_index=True)

# Check for duplicates
if(0!=len(contests[contests['contest_id'].duplicated(keep=False)])): 
    print(contests[contests['contest_id'].duplicated(keep=False)])
    stop

## Find contest_ids NEEDING RESULTS
##### First find draftGroups with 'status' Change (Using DRAFT_GROUPS_URL)

In [7]:
draftGroupIDs_NEEDING_RESULTS = []
  
# Find draft groups that are waiting for results
for draftGroupID in contests[contests['draftGroupStatus'].isin(['New','Upcoming','Preliminary'])]['draftGroupID'].unique():
    r = None
    for i in range(3): # Retry up to 3 times
        try:
            response = requests.get(DRAFT_GROUPS_URL.format(draftGroupID))
            if response.status_code != 200:
                time.sleep(0.1) 
                continue # Retry 
            r = response.json()
            break  # Exit loop if r is successful
        except: 
            time.sleep(0.1) # Retry 
    newStatus = r['draftGroup']['draftGroupState']
    contests.loc[(contests['draftGroupID']==draftGroupID),'draftGroupStatus'] = newStatus

    print(draftGroupID,newStatus)

    # New draft group state !!!!!! New Type of status!
    if newStatus not in ['Historical','Upcoming','HistoricalCancelled', 
                         'Live','Preliminary','Archived','Finalized']:
        print('------------------------------------')
        print('New draft group state !!!!!!',draftGroupID,newStatus)
        print('------------------------------------')
        
    # NEEDS RESULTS
    if newStatus in ['Historical','Archived','Finalized']:
        draftGroupIDs_NEEDING_RESULTS.append(draftGroupID)

# Find all contests with draftGroupStatus change
contests_NEEDING_RESULTS = contests[contests['draftGroupID'].isin(draftGroupIDs_NEEDING_RESULTS)].copy()

    
contestERROR = contests_NEEDING_RESULTS[contests_NEEDING_RESULTS['contestStatus']!= 'New'].copy()
if(0!=len(contestERROR)): 
    print('------ ERROR With contestStatus -----')
    contestERROR.info(verbose=True)
    
print('Length of contestResults -------',len(contestResults))
print('Length of contestOwnership -----',len(contestOwnership))
print('Length of dailyContests --------',len(dailyContests))
print('Length of contests before ------',starting_length)
print('Length of contests after -------',len(contests))
print('draftGroups needing results --**', len(draftGroupIDs_NEEDING_RESULTS))
print('contests needing results------**', len(contests_NEEDING_RESULTS))

118621 Historical
121040 Historical
121041 Historical
121042 Historical
121246 Historical
121247 Historical
121248 Historical
121249 Historical
121252 Historical
121250 Historical
121251 Historical
121255 Historical
121253 Historical
121254 Historical
Length of contestResults ------- 485118
Length of contestOwnership ----- 1973674
Length of dailyContests -------- 0
Length of contests before ------ 75383
Length of contests after ------- 75383
draftGroups needing results --** 14
contests needing results------** 950


## Functions for `get_contest_results(contest_id)`

1. **`getSession()`**:  
   Grabs login cookies needed to access contest results.  
   *Returns*: `session`

2. **`download_results(export_url, file_name)`**:  
   Downloads CSV/zip and saves to disk.  
   *Returns*: `response`

3. **`get_payout_table(contest_id)`**:  
   Web scrapes contest URL for payout of contest as a table.  
   *Returns*: `payoutTable`

4. **`add_contest_payouts(singleContestResults, payoutTable)`**:  
   Adds payout values for contest results of a single contest.  
   *Returns*: `singleContestResults`


In [8]:
# outdated COOKIE_FILE_PATH = '/Users/James/AppData/Local/Google/Chrome/User Data/Default/Network/Cookies'
def getSession():
    # cookies = browser_cookie3.chrome() # outdated
    ##################################################
    # Get 'cookies.txt LOCALLY' chrome extension
    # log into DraftKings, click on Get cookies extension
    # save to this directory, as Json 
    ##################################################
    with open('cookies.json', 'r') as f:
        cookies = json.load(f)
    session = requests.session()
    if len(cookies) == 0:
        print("Error finding draftkings cookies")
    else:
        # session.cookies.update(cookies)
        for cookie in cookies:
            session.cookies.set(cookie['name'], cookie['value'], domain=cookie['domain'])
    return session

In [9]:
def download_results(export_url,file_name):
    session = getSession()
    response = session.get(export_url)

    try:
        decoded_content = response.content.decode('utf-8')
        with open(file_name, 'w', newline='') as f:
            writer = csv.writer(f)
            for row in csv.reader(decoded_content.splitlines(), delimiter=','):
                writer.writerow(row)
    except (UnicodeDecodeError, UnicodeEncodeError) as e1:
    # sometimes export_url downloads straight to default download folder as 'out.zip'
        try:
            with open('out.zip', 'wb') as f:
                for chunk in response.iter_content(chunk_size=1024):
                    if chunk:
                        f.write(chunk)
            with zipfile.ZipFile('out.zip', 'r') as z:
                z.extractall(TMP_ZIP_CSV_PATH.format(current_year)) #C ONCURRENCY issue???
        except:
            # no contest results found
            # print(f"ERROR download_results - {response}")
            # print(export_url)
            return 'ERROR'
    return response

In [10]:
def get_payout_table(contest_id):
    def place_to_number(s):
        return int(re.findall(r'\d+', s)[0])

    session = getSession()
    PARAMS = {
        'contestId': contest_id,
        'showDraftButton': False,
        'defaultToDetails': True,
        'layoutType': 'legacy'
    }
    response = session.get(CONTEST_DETAIL_URL, params=PARAMS)
    soup = BeautifulSoup(response.text, 'lxml')
    results = []
    try:
        payouts = soup.find_all(id='payouts-table')[0].find_all('tr')
        for payout in payouts:
            places, payout = [x.string for x in payout.find_all('td')]
            places = [place_to_number(x.strip()) for x in places.split('-')]
            start, end = ((places[0], places[0]) if len(places) == 1 else places)
            payout = payout.replace('$', '').replace(',', '')
            try:
                payout = float(payout)
            except:
                payout = 0
            results.append([int(start), int(end), payout])
    except: 
        print(INDIVIDUAL_CONTEST_URL.format(contest_id))
        
    payoutTable_file_name = 'data/DraftKingsScraper/{}/payoutTables/payoutTable_{}.csv'.format(current_year, contest_id)
    if os.path.isfile(payoutTable_file_name) and os.path.getsize(payoutTable_file_name) > 0 :
        #print(f"PAYOUT table file for contest {contest_id} already exists, with size- {os.path.getsize(payoutTable_file_name)}")
        results = pd.read_csv(payoutTable_file_name, low_memory=False)
        results = results.to_numpy()
    elif len(results) > 0: # saves to disc
        pd.DataFrame(results).to_csv(payoutTable_file_name, index=False)
    
    return results

In [11]:
def add_contest_payouts(singleContestResults, payoutTable):
    # create tieTable with the number of ties at each place (rank)
    tieTable = singleContestResults['place'].value_counts().reset_index()
    tieTable['place'] = pd.to_numeric(tieTable['place'])
    tieTable['count'] = pd.to_numeric(tieTable['count'])
    tieTable = tieTable.rename(columns={'count':'tie_count'})
    tieTable = tieTable.sort_values(by='place', ascending=True)
    tieTable = tieTable.to_numpy()
    payouts = []
    for place, tie_count in tieTable:
        max_range = tie_count + place - 1
        prize_pool = None
        payout = None
        for idx, (start, end, prize) in enumerate(payoutTable):
            # calculate the prize pool for the next payout bracket
            if prize_pool:
                if start <= max_range <= end:
                    prize_pool += prize * (max_range - start + 1)
                    ######### payout found #################
                    payout = round(prize_pool / tie_count, 2)
                    payouts.append([place, payout]) 
                    break  # Exit the inner loop
                    ########################################
                else : # add to the prize_pool 
                    prize_pool += prize * (end - start + 1)
            elif start <= place <= end : # this will only trigger once per place,tie_count
                if max_range <= end :
                    ######### payout found #################
                    payout = round(prize, 2)
                    payouts.append([place, payout])
                    break  # Exit the inner loop
                    ########################################
                else : # initialize the prize_pool 
                    prize_pool = prize * (end - place + 1)
        
            if idx == len(payoutTable) - 1 : 
                if prize_pool :
                    ######### payout found #################
                    # Last payout
                    payout = round(prize_pool / tie_count, 2)
                    payouts.append([place, payout]) 
                    ########################################
                # the rest is out of the money
                break  # Exit the outer loop
 
    payouts = pd.DataFrame(payouts, columns=['place', 'payout'])
    singleContestResults = pd.merge(singleContestResults, payouts, how="left",on = ['place'])
    return singleContestResults

## `get_contest_results(contest_id)`
    
    Uses above 4 functions

In [12]:
def get_contest_results(contest_id): 
    file_name =  CSV_FILE_NAME.format(current_year,contest_id)
    export_url = EXPORT_URL.format(contest_id)

    if os.path.isfile(file_name) and os.path.getsize(file_name) > 0 :
        print(f"Download file for contest {contest_id} already exists, with size- {os.path.getsize(file_name)}")
        response = 200
    else: # saves to disc
        response = download_results(export_url,file_name)

    if not os.path.isfile(file_name): # did not get contest results (file_name.CSV)
        return 'ERROR', contest_id
    if response == 'ERROR': # no contest results found in download_results
        return 'ERROR', contest_id
    if os.path.getsize(file_name) == 0: # contest results CSV is empty
        for i in range(3): # Retry up to 3 times
            try:
                os.remove(file_name)
                break # exit loop if file is removed
            except PermissionError:
                time.sleep(0.1) # 100ms before retry
        return 'ERROR', contest_id

    try:
        file = pd.read_csv(file_name, low_memory=False) 
        # contest results
        singleContestResults = file[['Rank','EntryId','Points','Lineup']].copy() 
        singleContestResults = singleContestResults.rename(columns={'Rank':'place','EntryId':'entry_id',
                                                                    'Points':'points','Lineup':'lineup',})
    except:
        os.remove(file_name)
        return 'ERROR', contest_id

    payoutTable = get_payout_table(contest_id)
    if len(payoutTable) == 0:
        #print(f"ERROR payoutTable - {contest_id}")
        return 'ERROR', contest_id

    file = pd.read_csv(file_name, low_memory=False) 
    # contest results
    singleContestResults = file[['Rank','EntryId','Points','Lineup']].copy() 
    singleContestResults = singleContestResults.rename(columns={'Rank':'place','EntryId':'entry_id',
                                                                'Points':'points','Lineup':'lineup',})
    singleContestResults = add_contest_payouts(singleContestResults, payoutTable)
    singleContestResults = singleContestResults.dropna()
    singleContestResults['contest_id'] = contest_id
    # contest ownership
    singleContestOwnership = file[['Player','Roster Position','%Drafted','FPTS']].copy()
    singleContestOwnership = singleContestOwnership.rename(columns={'Player':'player','Roster Position':'pos',
                                                                '%Drafted':'drafted','FPTS':'points',})
    if singleContestOwnership['drafted'].dtype == 'object':
        singleContestOwnership['drafted'] = singleContestOwnership['drafted'].str.rstrip('%').astype('float64')
    singleContestOwnership = singleContestOwnership.dropna()
    singleContestOwnership['contest_id'] = contest_id
    ######################
    # Success!
    return 'RESULTS_GATHERED', contest_id, singleContestResults, singleContestOwnership
    ######################


# Loops through all contests needing results
# Collects results and ownership
<a id='Training'></a>

In [13]:
contest_ids = error_contests if 'error_contests' in locals() and len(error_contests) > 0 else contests_NEEDING_RESULTS.contest_id.unique()
print(len(contest_ids))

950


In [14]:
max_retries = 5
successful_contests = []
error_contests = []
# parallel processing
with concurrent.futures.ThreadPoolExecutor(max_workers=28) as executor:
    futures = [executor.submit(get_contest_results, contest_id) for contest_id in contest_ids]
    
    for future in tqdm(concurrent.futures.as_completed(futures), total=len(contest_ids)):
        result = future.result()

        # RETRY contest_id up to 5 times
        retries = 0
        while result[0] == 'ERROR' and retries < max_retries: 
            result = get_contest_results(result[1])
            retries += 1
    
        if result[0] == 'ERROR': # contest_id FAILED
            error_contests.append(result[1])  
        if result[0] == 'RESULTS_GATHERED': # success!
            successful_contests.append(result[1])  
            singleContestResults, singleContestOwnership = result[2], result[3]
            contestResults = pd.concat([contestResults, singleContestResults], ignore_index=True)
            contestOwnership = pd.concat([contestOwnership, singleContestOwnership], ignore_index=True)

contests.loc[(contests['contest_id'].isin(error_contests)),'contestStatus'] = 'ERROR'
contests.loc[(contests['contest_id'].isin(successful_contests)),'contestStatus'] = 'RESULTS_GATHERED'

  3%|▎         | 27/950 [00:04<01:11, 12.91it/s]

Download file for contest 173580720 already exists, with size- 11534336


  8%|▊         | 76/950 [00:31<15:29,  1.06s/it]

Download file for contest 173580722 already exists, with size- 11534336


100%|██████████| 950/950 [06:25<00:00,  2.46it/s]


## View & Save

In [15]:
print(f"Number of contests (with results) added : {len(successful_contests)}")
print(f"Number of contests with errors : {len(error_contests)}")
print(f"Total ERROR count : {len(contests[(contests['contestStatus']=='ERROR')])}")

if contestResults.duplicated().any():
    print(f"Duplicates found in contestResults : {contestResults.duplicated().sum()}")
    contestResults.drop_duplicates(subset=['contest_id', 'entry_id'], ignore_index=True, inplace=True)
if contestOwnership.duplicated().any():
    print(f"Duplicates found in contestOwnership : {contestOwnership.duplicated().sum()}")
    contestOwnership.drop_duplicates(subset=['contest_id', 'player', 'pos'],ignore_index=True, inplace=True)

print("---------------------------------------------")
print("---------------------------------------------")
contests.info(verbose=True)
contestResults.info(verbose=True)
contestOwnership.info(verbose=True)

Number of contests (with results) added : 676
Number of contests with errors : 274
Total ERROR count : 22050
---------------------------------------------
---------------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 75383 entries, 0 to 75382
Data columns (total 30 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   max_entries             75383 non-null  int64  
 1   name                    75383 non-null  object 
 2   entries                 75383 non-null  int64  
 3   entry_fee               75383 non-null  float64
 4   total_prizes            75383 non-null  float64
 5   date_time               75383 non-null  object 
 6   contest_id              75383 non-null  int64  
 7   draftGroupID            75383 non-null  int64  
 8   gameType                75383 non-null  object 
 9   is_guaranteed           75383 non-null  bool   
 10  is_starred              75383 non-null  bool   
 


[Rerun](#Training)<br>
    

In [16]:
##########################
##########################
# SAVE (OVERWRITE)
contests.to_csv('data/DraftKingsScraper/{}/contests.csv'.format(current_year), index=False)
contestResults.to_csv('data/DraftKingsScraper/{}/contestResults.csv'.format(current_year), index=False)
contestOwnership.to_csv('data/DraftKingsScraper/{}/contestOwnership.csv'.format(current_year), index=False)
##########################
##########################

In [17]:
error_contests = pd.DataFrame(error_contests)
error_contests.to_csv('data/DraftKingsScraper/{}/dailyContests/error_contests_{}.csv'.format(current_year, datetime.datetime.now().strftime("%m-%d-%Y__%H_%M")), index=False)


In [18]:
contestResults

Unnamed: 0,contest_id,place,entry_id,points,lineup,payout
0,164120435,1.0,4.400654e+09,196.88,DST Vikings FLEX Rashid Shaheed QB Josh Allen...,1000000.0
1,164120435,2.0,4.399372e+09,196.86,DST Bengals FLEX J.K. Dobbins QB Baker Mayfie...,200000.0
2,164120435,3.0,4.398626e+09,196.16,DST Bears FLEX J.K. Dobbins QB Baker Mayfield...,100000.0
3,164120435,4.0,4.399413e+09,193.86,DST Bears FLEX J.K. Dobbins QB Baker Mayfield...,50000.0
4,164120435,5.0,4.398888e+09,190.36,DST Vikings FLEX Kenneth Walker III QB Baker ...,30000.0
...,...,...,...,...,...,...
989387,173720940,1.0,4.635381e+09,141.74,CPT Jalen Hurts FLEX Saquon Barkley FLEX Patri...,6.0
989388,173720940,2.0,4.636417e+09,123.47,CPT Xavier Worthy FLEX Jalen Hurts FLEX Patric...,6.0
989389,173720940,3.0,4.635995e+09,121.41,CPT Patrick Mahomes FLEX Jalen Hurts FLEX Trav...,6.0
989390,173720940,4.0,4.633940e+09,95.93,CPT Saquon Barkley FLEX Patrick Mahomes FLEX X...,6.0


In [19]:
contestOwnership

Unnamed: 0,contest_id,player,pos,drafted,points
0,164284865,Patrick Mahomes,FLEX,84.00,16.14
1,164284865,Lamar Jackson,FLEX,80.00,29.12
2,164284865,Rashee Rice,CPT,65.50,30.45
3,164284865,Xavier Worthy,FLEX,50.50,20.80
4,164284865,Mark Andrews,FLEX,47.00,3.40
...,...,...,...,...,...
1988561,173720940,JuJu Smith-Schuster,FLEX,18.18,3.60
1988562,173720940,DeVonta Smith,FLEX,9.09,16.90
1988563,173720940,Eagles,FLEX,9.09,18.00
1988564,173720940,Chiefs,FLEX,9.09,3.00


In [20]:
contests

Unnamed: 0,max_entries,name,entries,entry_fee,total_prizes,date_time,contest_id,draftGroupID,gameType,is_guaranteed,...,draftGroupStatus,contestStatus,pt,pd.LiveFinalSeat,attr.IsPrivate,tix,pd.Experience,is_tournament_of_champ,pd.Prize,crownsAwarded
0,150,NFL $4M Fantasy Football Millionaire [$1M to 1st],951248,5.0,4000000.0,2024-09-08 12:00:00,164120435,109136,Classic,True,...,Finalized,RESULTS_GATHERED,,,,,,,,
1,150,NFL $2.5M Fantasy Football Millionaire [$1M to...,27777,100.0,2500000.0,2024-09-08 12:00:00,164120427,109136,Classic,True,...,Finalized,RESULTS_GATHERED,,,,,,,,
2,150,NFL $2.5M Thursday Kickoff Millionaire [$1M to...,196078,15.0,2500000.0,2024-09-05 19:20:00,164284870,110949,Showdown Captain Mode,True,...,Archived,RESULTS_GATHERED,,,,,,,,
3,30,NFL $4M MEGA Millionaire [$1M to 1st],1000,4444.0,4000000.0,2024-09-08 12:00:00,163439930,109136,Classic,True,...,Finalized,RESULTS_GATHERED,,,,,,,,
4,20,NFL $500K Play-Action [20 Entry Max],198176,3.0,500000.0,2024-09-08 12:00:00,164120436,109136,Classic,True,...,Finalized,RESULTS_GATHERED,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75378,1,NFL Single Stat $1 50-50 (Super Bowl Total Yards),10,1.0,9.0,2025-02-09 15:30:00,173789711,121042,Single Stat - Total Yards,False,...,Historical,RESULTS_GATHERED,1.0,,,False,,,,0.5
75379,1,NFL Single Stat $1 50-50 (Super Bowl Total Yards),10,1.0,9.0,2025-02-09 15:30:00,173789712,121042,Single Stat - Total Yards,False,...,Historical,RESULTS_GATHERED,1.0,,,False,,,,0.5
75380,1,NFL Single Stat $1 Double Up (Super Bowl Total...,11,1.0,10.0,2025-02-09 15:30:00,173789714,121042,Single Stat - Total Yards,False,...,Historical,ERROR,1.0,,,False,,,,0.5
75381,1,NFL Single Stat $1 Double Up (Super Bowl Total...,11,1.0,10.0,2025-02-09 15:30:00,173791661,121042,Single Stat - Total Yards,False,...,Historical,RESULTS_GATHERED,1.0,,,False,,,,0.5


In [21]:
error_contests

Unnamed: 0,0
0,173582436
1,173582528
2,173582527
3,173581455
4,173581456
...,...
269,173789714
270,173786033
271,173694809
272,173786107


In [22]:
: )

SyntaxError: unmatched ')' (2155285666.py, line 1)

In [None]:

contest_ids = error_contests if 'error_contests' in locals() and len(error_contests) > 0 else contests_NEEDING_RESULTS.contest_id.unique()

# contest_ids = contests[(contests['contestStatus']=='ERROR')]['contest_id'].unique()
# contest_ids = contests.contest_id.unique()
# columns = contestResults.columns.tolist()
# contestResults = pd.DataFrame(columns = columns)
# columns = contestOwnership.columns.tolist()
# contestOwnership = pd.DataFrame(columns = columns)
# Find the index of the contest that caused the error

# # Interrupted for loop
# contest_ids = contests_NEEDING_RESULTS.contest_id.unique()
# contest_ids = contest_ids.tolist()
# remaining_contests = contest_ids[contest_ids.index(167603397) - 28:]
# print(len(error_contests))
# print(len(remaining_contests))
# remaining_contests.extend(error_contests)
# contest_ids = list(set(remaining_contests))

#contest_ids = contests[(contests['contestStatus']=='New')]['contest_id'].unique()
print(len(contest_ids))