# 1. Libraries

In [1]:
# Import Libraries
import requests
import pandas as pd
import os
from dotenv import load_dotenv

# 2. Introduction

The purpose of this notebook is to gather data on the European ARMS Challenge (EUAC) tournament series. <br>
The EUAC is an online biweekly tournament series for the video game "ARMS" released for the Nintendo Switch console in 2017. <br>
The EUAC started in 2017 and ran consistently until 2022.<br>
<br>
More information can be found in references below.

This notebook is to purely gather the data, act as a proof of concept, and format it for future use/analysis. <br>
It will then be exported to a csv file if successful. <br>

# 3. Objectives

- Determine what data that is necessary
- Gather EUAC#1 data from start.gg using their api
- Gather EUAC#2 to EUAC#110 data from Challonge.com using their api
- Format the data into a pandas dataframe
- Export to csv

## 3.1. Objective 1

### Determine what data is necessary

For each tournament, we will need: <br>
- The participants who entered
- Each match that was played
- The result of each match
- The date the tournament took place
- The final placements
- Each player's seeding

## 3.2. Objective 2 - Data Gathering

The tournaments were mostly hosted on Challonge.com but the first one was hosted on start.gg. <br>

### 3.2.1. - Start.gg

Start.gg's API uses GraphQL as its query language <br>
The GraphQL queries will be wrapped in multiline strings so they can be sent to the start.gg API to retrieve the data. <br>
Queries also expect an API token to passed through the header. This token will be loaded from a .env file.<br>
<br>
A link to the tutorial that was followed to come up with the queries can be found in References.

In [290]:
# Load API Keys stored in .env file
load_dotenv(".env")
START_GG_API_TOKEN = os.getenv("START_GG_API_TOKEN")

The tournament was hosted at this url: https://www.start.gg/tournament/eu-arms-challenge-1/events <br>
For some of the information that is needed, the id of the event is required. Which is obtained here using the url slug

In [291]:
# GraphQL query wrapped in a multiline string. 
query = """
query TournamentQuery {
  tournament(slug: "tournament/eu-arms-challenge-1") {
    name
    events {
      name
      id
    }
  }
}
"""

In [292]:
# Sends the query and returns the response. Will be using this a lot
def run_query(query):
    url = "https://api.start.gg/gql/alpha"
    headers = {
        "Authorization": f"Bearer {START_GG_API_TOKEN}",
        "Content-Type": "application/json"
    }
    response = requests.post(url, json={'query': query}, headers=headers)
    return response.json()

In [293]:
data = run_query(query)

In [294]:
data

{'data': {'tournament': {'name': 'EU ARMS Challenge #1',
   'events': [{'name': 'EU ARMS Challenge #1', 'id': 53835}]}},
 'extensions': {'cacheControl': {'version': 1,
   'hints': [{'path': ['tournament'], 'maxAge': 300, 'scope': 'PRIVATE'}]},
  'queryComplexity': 2},
 'actionRecords': []}

In [295]:
eventId = data["data"]["tournament"]["events"][0]["id"]

In [296]:
eventId

53835

The id of the event is 53835

Using the event id, a query can be written to get the players, their seedings and placements, and the start date of the tournament <br>
Some of this information could have been gotten before but not all

In [297]:
# Query for players, seedings, placements

query = """
query {
  event(id: 53835) {
    id
    name
    startAt
    standings(query: {
      page: 1
    }) {
      nodes {
        placement
        entrant {
          id
          name
          seeds {
            seedNum
          }
        }
      }
    }
  }
}
"""

In [298]:
data = run_query(query)

In [299]:
data

{'data': {'event': {'id': 53835,
   'name': 'EU ARMS Challenge #1',
   'startAt': 1508605200,
   'standings': {'nodes': [{'placement': 1,
      'entrant': {'id': 1152276,
       'name': 'FR | Maxou0708',
       'seeds': [{'seedNum': 20}]}},
     {'placement': 2,
      'entrant': {'id': 1118825,
       'name': 'Rapha_MTH',
       'seeds': [{'seedNum': 6}]}},
     {'placement': 3,
      'entrant': {'id': 1133810, 'name': 'Sabaca', 'seeds': [{'seedNum': 8}]}},
     {'placement': 4,
      'entrant': {'id': 1152521,
       'name': 'TCM | Raffa',
       'seeds': [{'seedNum': 21}]}},
     {'placement': 5,
      'entrant': {'id': 1152258,
       'name': 'FrankTank',
       'seeds': [{'seedNum': 19}]}},
     {'placement': 5,
      'entrant': {'id': 1114568, 'name': 'ocrim', 'seeds': [{'seedNum': 2}]}},
     {'placement': 7,
      'entrant': {'id': 1139232,
       'name': 'Alumento',
       'seeds': [{'seedNum': 11}]}},
     {'placement': 7,
      'entrant': {'id': 1133727,
       'name': 'SC☆Mo

Data is a nested dictionary. Now we'll gather a list of participants, their placements and seeds, by digging through this

In [300]:
data["data"]["event"]["name"]

'EU ARMS Challenge #1'

In [301]:
data["data"]["event"]["startAt"]

1508605200

In [302]:
tournamentDate = data["data"]["event"]["startAt"]

In [303]:
tournamentDate

1508605200

UNIX time for when tournament took place. Only the date is required. The time is unnecessary

In [304]:
from datetime import datetime

date = datetime.utcfromtimestamp(tournamentDate).strftime('%d/%m/%y')
print(date) 

21/10/17


In [305]:
data["data"]["event"]["standings"]

{'nodes': [{'placement': 1,
   'entrant': {'id': 1152276,
    'name': 'FR | Maxou0708',
    'seeds': [{'seedNum': 20}]}},
  {'placement': 2,
   'entrant': {'id': 1118825, 'name': 'Rapha_MTH', 'seeds': [{'seedNum': 6}]}},
  {'placement': 3,
   'entrant': {'id': 1133810, 'name': 'Sabaca', 'seeds': [{'seedNum': 8}]}},
  {'placement': 4,
   'entrant': {'id': 1152521,
    'name': 'TCM | Raffa',
    'seeds': [{'seedNum': 21}]}},
  {'placement': 5,
   'entrant': {'id': 1152258,
    'name': 'FrankTank',
    'seeds': [{'seedNum': 19}]}},
  {'placement': 5,
   'entrant': {'id': 1114568, 'name': 'ocrim', 'seeds': [{'seedNum': 2}]}},
  {'placement': 7,
   'entrant': {'id': 1139232, 'name': 'Alumento', 'seeds': [{'seedNum': 11}]}},
  {'placement': 7,
   'entrant': {'id': 1133727, 'name': 'SC☆Momso', 'seeds': [{'seedNum': 7}]}},
  {'placement': 9,
   'entrant': {'id': 1114889,
    'name': 'VilleViljar',
    'seeds': [{'seedNum': 4}]}},
  {'placement': 9,
   'entrant': {'id': 1152242,
    'name': 'TC

In [306]:
data["data"]["event"]["standings"]["nodes"][0] #Information for one player

{'placement': 1,
 'entrant': {'id': 1152276,
  'name': 'FR | Maxou0708',
  'seeds': [{'seedNum': 20}]}}

In [307]:
# Placement
data["data"]["event"]["standings"]["nodes"][0]["placement"]

1

In [308]:
# Name
data["data"]["event"]["standings"]["nodes"][0]["entrant"]["name"]

'FR | Maxou0708'

In [309]:
# id
data["data"]["event"]["standings"]["nodes"][0]["entrant"]["id"]

1152276

In [310]:
# Seeding
data["data"]["event"]["standings"]["nodes"][0]["entrant"]["seeds"][0]["seedNum"]

20

In [311]:
# Putting it all together in a for loop. Store information in arrays

playerArray = []
placementArray = []
seedArray = []
idArray = []

for entrant in data["data"]["event"]["standings"]["nodes"]:
    playerArray.append(entrant["entrant"]["name"])
    idArray.append(entrant["entrant"]["id"])
    placementArray.append(entrant["placement"])
    seedArray.append(entrant["entrant"]["seeds"][0]["seedNum"])

In [312]:
# Make a pandas dataframe of the arrays

playerdf = pd.DataFrame({
    "Start ID": idArray,
    "Player": playerArray,
    "Seed": seedArray,
    "Placement": placementArray
})

In [313]:
playerdf.head()

Unnamed: 0,Start ID,Player,Seed,Placement
0,1152276,FR | Maxou0708,20,1
1,1118825,Rapha_MTH,6,2
2,1133810,Sabaca,8,3
3,1152521,TCM | Raffa,21,4
4,1152258,FrankTank,19,5


In [314]:
playerdf.to_csv("EUAC1Placements.csv", index=False)

All players, their Start ids, their placements and seeding, have been gathered. <br>To note for later: players can sign up with "tags". Denoted with a |. But | can be present in a tag

Due to the hierarchical structure of how tournaments in Start.gg can be, we need to use the event ID to get the phase ID, use the Phase Id to get the Phase Group ID, and then using the Phase Group ID we can obtain the sets in the tournaments that were played with their reported scores

In [315]:
# Query to get Phase ID

query = """
query {
  event(id: 53835) {
    id
    name
    phases {
      id
      name
    }
  }
}"""

In [316]:
data = run_query(query)

In [317]:
data

{'data': {'event': {'id': 53835,
   'name': 'EU ARMS Challenge #1',
   'phases': [{'id': 159661, 'name': 'Bracket'}]}},
 'extensions': {'cacheControl': {'version': 1,
   'hints': [{'path': ['event'], 'maxAge': 60, 'scope': 'PRIVATE'}]},
  'queryComplexity': 2},
 'actionRecords': []}

In [318]:
# Phase ID
data["data"]["event"]["phases"][0]["id"]

159661

In [319]:
# Query to get Phase Group ID
query = """
query {
  phase(id: 159661) {
    phaseGroups {
      nodes {
        id
      }
    }
  }
}"""

In [320]:
data = run_query(query)

In [321]:
data

{'data': {'phase': {'phaseGroups': {'nodes': [{'id': 431370}]}}},
 'extensions': {'cacheControl': {'version': 1, 'hints': None},
  'queryComplexity': 1},
 'actionRecords': []}

In [322]:
# Phase Group ID
data["data"]["phase"]["phaseGroups"]["nodes"][0]["id"]

431370

In [323]:
# Query to get all sets

query = """
query {
  phaseGroup(id: 431370) {
    sets(page: 1, perPage: 100) {
      nodes {
        id
        displayScore
        fullRoundText
        winnerId
        slots {
          entrant {
            name
          }
        }
      }
    }
  }
}"""

In [324]:
data = run_query(query)

In [325]:
# Display all sets
data

{'data': {'phaseGroup': {'sets': {'nodes': [{'id': 10598232,
      'displayScore': 'Rapha_MTH 1 - FR | Maxou0708 3',
      'fullRoundText': 'Grand Final',
      'winnerId': 1152276,
      'slots': [{'entrant': {'name': 'Rapha_MTH'}},
       {'entrant': {'name': 'FR | Maxou0708'}}]},
     {'id': 10598233,
      'displayScore': 'FR | Maxou0708 3 - Rapha_MTH 0',
      'fullRoundText': 'Grand Final Reset',
      'winnerId': 1152276,
      'slots': [{'entrant': {'name': 'FR | Maxou0708'}},
       {'entrant': {'name': 'Rapha_MTH'}}]},
     {'id': 10598231,
      'displayScore': 'Sabaca 0 - Rapha_MTH 2',
      'fullRoundText': 'Winners Final',
      'winnerId': 1118825,
      'slots': [{'entrant': {'name': 'Sabaca'}},
       {'entrant': {'name': 'Rapha_MTH'}}]},
     {'id': 10598295,
      'displayScore': 'DQ',
      'fullRoundText': 'Losers Final',
      'winnerId': 1152276,
      'slots': [{'entrant': {'name': 'Sabaca'}},
       {'entrant': {'name': 'FR | Maxou0708'}}]},
     {'id': 1059829

In [326]:
# Showing a set example
data["data"]["phaseGroup"]["sets"]["nodes"][0]

{'id': 10598232,
 'displayScore': 'Rapha_MTH 1 - FR | Maxou0708 3',
 'fullRoundText': 'Grand Final',
 'winnerId': 1152276,
 'slots': [{'entrant': {'name': 'Rapha_MTH'}},
  {'entrant': {'name': 'FR | Maxou0708'}}]}

In [327]:
# Retrieving a player's name
data["data"]["phaseGroup"]["sets"]["nodes"][0]["slots"][0]["entrant"]["name"]

'Rapha_MTH'

This starts at the "end" of the tournament. The last match is first returned

In [328]:
data["data"]["phaseGroup"]["sets"]["nodes"][-3]

{'id': 10598210,
 'displayScore': 'DQ',
 'fullRoundText': 'Winners Round 1',
 'winnerId': 1152122,
 'slots': [{'entrant': {'name': 'Kotorious BRD'}},
  {'entrant': {'name': 'Altair'}}]}

In [329]:
# Showing a set's score
data["data"]["phaseGroup"]["sets"]["nodes"][7]["displayScore"]

'ocrim 0 - FR | Maxou0708 2'

Player names and Scores can be gotten from "displayScore" but DQs will require a bit more work. <br>
Will have to access "slots" to get player names, query dataframe for the winner id. Loser is the other. <br>
<br>
To get the score, the string will be split by the " - " and the last character of each string is the score

In [330]:
scoreSplit = data["data"]["phaseGroup"]["sets"]["nodes"][7]["displayScore"].split(" - ")
scoreSplit

['ocrim 0', 'FR | Maxou0708 2']

In [331]:
# Function to retrieve player name from dataframe with their start id no.

def retrieve_player(df, pid):
    player = df[df["Start ID"] == pid]
    if not player.empty:
        return player.iloc[0]["Player"]
    else:
        print(f"No player matching ID {pid} was found")

In [332]:
retrieve_player(playerdf, 1152276)

'FR | Maxou0708'

In [2]:
import re

In [334]:
player1Array = []
player2Array = []
winnerArray = []
loserArray = []
matchArray = []
scoreArray = []
matchNo = 1

for sets in reversed(data["data"]["phaseGroup"]["sets"]["nodes"]):
    #Setting Player 1 and Player 2
    player1 = sets["slots"][0]["entrant"]["name"]
    player2 = sets["slots"][1]["entrant"]["name"]
    player1Array.append(player1)
    player2Array.append(player2)
    
    # Determining Winner and Loser
    winner = retrieve_player(playerdf, sets["winnerId"])
    winnerArray.append(retrieve_player(playerdf, sets["winnerId"]))
    # If winner is p1, then loser is p2. Otherwise, p1 must be the loser
    if winner == player1:
        loser = player2
    else:
        loser = player1
    loserArray.append(loser)
    
    # Match no
    matchArray.append(matchNo)
    matchNo += 1
    
    # Score
    scoreSplit = sets["displayScore"].split(" - ")
    # Catch DQs and register them as "0--1"
    if scoreSplit[0][-1] == "D" or scoreSplit[0][-1] == "Q":
        score = "0--1"
        scoreArray.append(score)
    else:
        # Write score in perspective of winner. Higher score always first
        # Ex. 2-0. Never 0-2
        if int(scoreSplit[0][-1]) > int(scoreSplit[1][-1]):
            score = f"{scoreSplit[0][-1]}-{scoreSplit[1][-1]}"
            scoreArray.append(score)
        elif int(scoreSplit[1][-1]) > int(scoreSplit[0][-1]):
            score = f"{scoreSplit[1][-1]}-{scoreSplit[0][-1]}"
            scoreArray.append(score)

In [335]:
df = pd.DataFrame({
    "Player1": player1Array,
    "Player2": player2Array,
    "Winner": winnerArray,
    "Score": scoreArray,
    "Loser": loserArray,
    "MatchNo": matchArray,
    "EUAC": 1,
    "Date": date
})

In [336]:
df.head()

Unnamed: 0,Player1,Player2,Winner,Score,Loser,MatchNo,EUAC,Date
0,Alumento,Owdy,Alumento,2-0,Owdy,1,1,21/10/17
1,BambooBoss,FrankTank,FrankTank,2-0,BambooBoss,2,1,21/10/17
2,Kotorious BRD,Altair,Kotorious BRD,0--1,Altair,3,1,21/10/17
3,RD | | Dushni,TCM | Raffa,TCM | Raffa,2-0,RD | | Dushni,4,1,21/10/17
4,FR|TCM | InkAlyut,FR | Maxou0708,FR|TCM | InkAlyut,2-1,FR | Maxou0708,5,1,21/10/17


In [337]:
df.to_csv("EUAC1Sets.csv", index=False)

### 3.2.2 - Challonge.com

#### 3.2.2.1 - Acquiring URLs

Using the direct url for the singular Start.gg link was fine but there are too many Challonge links to do it again that way. <br>
Instead we will acquire the links from parsing the wiki page on this topic which contains them all.

In [338]:
from bs4 import BeautifulSoup

url = "https://armswiki.org/wiki/EU_ARMS_Challenge"

response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

In [339]:
print(soup)

<!DOCTYPE html>

<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>EU ARMS Challenge - ARMS Institute, the ARMS Wiki</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":false,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"abe6ee09a400cfec507e0a8e","wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"EU_ARMS_Challenge","wgTitle":"EU ARMS Challenge","wgCurRevisionId":23064,"wgRevisionId":23064,"wgArticleId":4223,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Tournaments"],"wgPageViewLanguage":"en","wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgRelevantPageName":"EU_ARMS_Challenge","wgRelevantArticleId":

In [340]:
# Reduce data down to the div containing all the link needed
tournamentData = soup.findAll('div', attrs={'class':"mw-body-content mw-content-ltr"})

In [341]:
tournamentArray = []
for links in tournamentData:
    tournamentArray.append(links.find_all("a", class_="external text"))

In [342]:
tournamentArray

[[<a class="external text" href="https://smash.gg/tournament/eu-arms-challenge-1/details" rel="nofollow">EU ARMS Challenge #1</a>,
  <a class="external text" href="https://challonge.com/EUCHALLLENGE2" rel="nofollow">EU ARMS Challenge #2</a>,
  <a class="external text" href="https://challonge.com/EUCHALLLENGE3" rel="nofollow">EU ARMS Challenge #3</a>,
  <a class="external text" href="https://challonge.com/EUCHALLENGE4_Europe" rel="nofollow">EU ARMS Challenge #4 (EUROPE BRACKET)</a>,
  <a class="external text" href="https://challonge.com/EUChallenge5" rel="nofollow">EU ARMS Challenge #5</a>,
  <a class="external text" href="https://challonge.com/EUChallenge6" rel="nofollow">EU ARMS Challenge #6</a>,
  <a class="external text" href="https://challonge.com/EUChallenge7" rel="nofollow">EU ARMS Challenge #7</a>,
  <a class="external text" href="https://challonge.com/EUChallenge8" rel="nofollow">EU ARMS Challenge #8</a>,
  <a class="external text" href="https://challonge.com/EUChallenge9" rel=

A list of lists in a list... Could have done this another way

In [343]:
linkArray = []
for links in tournamentArray:
    for link in links:
        linkArray.append(link["href"]) # Extract only the href tag (the link)

In [344]:
linkArray

['https://smash.gg/tournament/eu-arms-challenge-1/details',
 'https://challonge.com/EUCHALLLENGE2',
 'https://challonge.com/EUCHALLLENGE3',
 'https://challonge.com/EUCHALLENGE4_Europe',
 'https://challonge.com/EUChallenge5',
 'https://challonge.com/EUChallenge6',
 'https://challonge.com/EUChallenge7',
 'https://challonge.com/EUChallenge8',
 'https://challonge.com/EUChallenge9',
 'https://challonge.com/EUChallenge10',
 'https://challonge.com/EUChallenge11',
 'https://challonge.com/EUChallenge12',
 'https://challonge.com/EUChallenge13',
 'https://challonge.com/EUChallenge14',
 'https://challonge.com/EUChallenge15',
 'https://challonge.com/EUChallenge16',
 'https://challonge.com/EUChallenge17',
 'https://challonge.com/EUChallenge18',
 'https://challonge.com/EUChallenge19',
 'https://challonge.com/EUChallenge20',
 'https://challonge.com/EUChallenge21',
 'https://challonge.com/EUChallenge22',
 'https://challonge.com/EUChallenge23',
 'https://challonge.com/EUChallenge24',
 'https://challonge

In [345]:
# Remove non-challonge links

challongeArray = []

for links in linkArray:
    if "challonge" in links:
        challongeArray.append(links)

In [346]:
challongeArray

['https://challonge.com/EUCHALLLENGE2',
 'https://challonge.com/EUCHALLLENGE3',
 'https://challonge.com/EUCHALLENGE4_Europe',
 'https://challonge.com/EUChallenge5',
 'https://challonge.com/EUChallenge6',
 'https://challonge.com/EUChallenge7',
 'https://challonge.com/EUChallenge8',
 'https://challonge.com/EUChallenge9',
 'https://challonge.com/EUChallenge10',
 'https://challonge.com/EUChallenge11',
 'https://challonge.com/EUChallenge12',
 'https://challonge.com/EUChallenge13',
 'https://challonge.com/EUChallenge14',
 'https://challonge.com/EUChallenge15',
 'https://challonge.com/EUChallenge16',
 'https://challonge.com/EUChallenge17',
 'https://challonge.com/EUChallenge18',
 'https://challonge.com/EUChallenge19',
 'https://challonge.com/EUChallenge20',
 'https://challonge.com/EUChallenge21',
 'https://challonge.com/EUChallenge22',
 'https://challonge.com/EUChallenge23',
 'https://challonge.com/EUChallenge24',
 'https://challonge.com/EUChallenge25',
 'https://challonge.com/EUChallenge26',

#### 3.2.2.2 - Challonge API Exploration

Section for loading api keys and interacting with the API to figure out how to collect the data

In [347]:
# Load API Keys from .env file
API_KEY = os.getenv("API_KEY")
API_SECRET = os.getenv("API_SECRET")

In [348]:
import challonge

# Tell pychallonge about your [CHALLONGE! API credentials](http://api.challonge.com/v1).
challonge.set_credentials(API_KEY, API_SECRET)

In [349]:
# See available methods
methods_and_attributes = dir(challonge)
print(methods_and_attributes)

['ChallongeException', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'api', 'attachments', 'fetch', 'get_credentials', 'get_timezone', 'matches', 'participants', 'set_credentials', 'set_timezone', 'set_user_agent', 'tournaments']


In [350]:
# Retrieve a tournament by its id (or its url).
tournament = challonge.tournaments.show(challongeArray[0].split("/")[-1]) # Requires only URL slug

In [351]:
tournament

{'id': 3973078,
 'name': 'EU ARMS Challenge #2',
 'url': 'EUCHALLLENGE2',
 'description': '<p><span style="background-color: rgb(39, 42, 51);">Tournament hosted by EUARMSCompetitve\xa0Discord!</span></p><p><span style="background-color: rgb(39, 42, 51);">The tournament is EU exclusive, meaning anyone outside of EU can\'t participate.</span></p><p>If we get 32+ participants the winner of the tournament will get a\xa0Nintendo eShop Card 15 €!</p><p><span style="background-color: rgb(39, 42, 51);"><br>You must be part of the EU ARMS discord in order to participate and coordinate your matchups\xa0</span><span style="background-color: rgb(39, 42, 51);">\xa0</span>https://discord.gg/W486K28</p><p>Rules:\xa0\xa0https://goo.gl/PkXDZo<br><br></p>',
 'tournament_type': 'double elimination',
 'started_at': datetime.datetime(2017, 11, 12, 14, 7, 13, 162000, tzinfo=<DstTzInfo 'Europe/London' GMT0:00:00 STD>),
 'completed_at': datetime.datetime(2017, 11, 12, 17, 22, 0, 232000, tzinfo=<DstTzInfo 'Eur

In [352]:
matches = challonge.matches.index(tournament["id"])

In [353]:
matches[0]

{'id': 103347707,
 'tournament_id': 3973078,
 'state': 'complete',
 'player1_id': 64282703,
 'player2_id': 64283849,
 'player1_prereq_match_id': None,
 'player2_prereq_match_id': None,
 'player1_is_prereq_match_loser': False,
 'player2_is_prereq_match_loser': False,
 'winner_id': 64283849,
 'loser_id': 64282703,
 'started_at': datetime.datetime(2017, 11, 12, 14, 7, 13, 287000, tzinfo=<DstTzInfo 'Europe/London' GMT0:00:00 STD>),
 'created_at': datetime.datetime(2017, 11, 12, 14, 7, 12, 863000, tzinfo=<DstTzInfo 'Europe/London' GMT0:00:00 STD>),
 'updated_at': datetime.datetime(2017, 11, 12, 14, 21, 3, 299000, tzinfo=<DstTzInfo 'Europe/London' GMT0:00:00 STD>),
 'identifier': 'A',
 'has_attachment': False,
 'round': 1,
 'player1_votes': None,
 'player2_votes': None,
 'group_id': None,
 'attachment_count': None,
 'scheduled_time': None,
 'location': None,
 'underway_at': None,
 'optional': False,
 'completed_at': datetime.datetime(2017, 11, 12, 14, 21, 3, 433000, tzinfo=<DstTzInfo 'Europe

Match stores the players' ids. The winner id, loser id, score

In [354]:
# Retrieve the participants for a given tournament.
participants = challonge.participants.index(tournament["id"])

In [355]:
participants[0]

{'id': 63926376,
 'tournament_id': 3973078,
 'name': '',
 'seed': 1,
 'active': True,
 'created_at': datetime.datetime(2017, 11, 5, 12, 9, 29, 690000, tzinfo=<DstTzInfo 'Europe/London' GMT0:00:00 STD>),
 'updated_at': datetime.datetime(2017, 11, 5, 12, 9, 29, 690000, tzinfo=<DstTzInfo 'Europe/London' GMT0:00:00 STD>),
 'invite_email': None,
 'final_rank': 5,
 'misc': None,
 'icon': None,
 'on_waiting_list': False,
 'invitation_id': None,
 'group_id': None,
 'checked_in_at': datetime.datetime(2017, 11, 12, 13, 39, 2, 915000, tzinfo=<DstTzInfo 'Europe/London' GMT0:00:00 STD>),
 'ranked_member_id': None,
 'custom_field_response': None,
 'clinch': None,
 'integration_uids': None,
 'challonge_username': 'InkA_',
 'challonge_user_id': 1902903,
 'challonge_email_address_verified': True,
 'removable': False,
 'participatable_or_invitation_attached': True,
 'confirm_remove': True,
 'invitation_pending': False,
 'display_name_with_invitation_email_address': 'InkA_',
 'email_hash': 'f1dcf32d96b85

Participants stores a player's name, username, seed, challonge id, and tournament player id

In [356]:
print(len(participants))

17


In [357]:
participants[-1]

{'id': 64282712,
 'tournament_id': 3973078,
 'name': '',
 'seed': 17,
 'active': False,
 'created_at': datetime.datetime(2017, 11, 12, 11, 30, 59, 551000, tzinfo=<DstTzInfo 'Europe/London' GMT0:00:00 STD>),
 'updated_at': datetime.datetime(2017, 11, 12, 11, 30, 59, 551000, tzinfo=<DstTzInfo 'Europe/London' GMT0:00:00 STD>),
 'invite_email': None,
 'final_rank': None,
 'misc': None,
 'icon': None,
 'on_waiting_list': False,
 'invitation_id': None,
 'group_id': None,
 'checked_in_at': None,
 'ranked_member_id': None,
 'custom_field_response': None,
 'clinch': None,
 'integration_uids': None,
 'challonge_username': 'ThatD',
 'challonge_user_id': 2532908,
 'challonge_email_address_verified': False,
 'removable': False,
 'participatable_or_invitation_attached': True,
 'confirm_remove': True,
 'invitation_pending': False,
 'display_name_with_invitation_email_address': 'ThatD',
 'email_hash': '54e045ca1ed55fe5d90d1a4c6980a9db',
 'username': 'ThatD',
 'display_name': 'ThatD',
 'attached_partic

Also stores participants who signed up but did not check-in. Meaning they did not participate in the tournament <br>
Denoted by "checked_in = False" and "active = False"

Players in a "match" only have their player id instead of their name or challonge id.

In [358]:
matches[17]["scores_csv"]

'86-101'

Users can input scores themselves. Can be misleading but will deal with it later in data cleaning

#### 3.2.3.3 - Challonge Data Collection

In this section, we are going to add players from the second EUAC to the table from the first EUAC (playerdf) <br>
For the time being, we are going to keep Start ID and Challonge ID seperate rather than having one ID column. Players from the 1st EUAC may have been present in later EUACs but it is not yet possible to identify them. <br>
Later on, we will identify the players who were in the first EUAC and subsequent ones.

In [359]:
playerdf.head()

Unnamed: 0,Start ID,Player,Seed,Placement
0,1152276,FR | Maxou0708,20,1
1,1118825,Rapha_MTH,6,2
2,1133810,Sabaca,8,3
3,1152521,TCM | Raffa,21,4
4,1152258,FrankTank,19,5


In [360]:
def get_seed(name, df):
    player = df[df["Player"] == name]
    if not player.empty:
        return player.iloc[0]["Seed"]
    else:
        print(f"No player named {name} was found")

In [361]:
def get_placement(name, df):
    player = df[df["Player"] == name]
    if not player.empty:
        return player.iloc[0]["Placement"]
    else:
        print(f"No player named {name} was found")

In [362]:
get_seed("Rapha_MTH", playerdf)

6

In [363]:
p1Seed = []
p2Seed = []
p1Placement = []
p2Placement = []

for p1, p2 in zip(df["Player1"], df["Player2"]):
    p1Seed.append(get_seed(p1, playerdf))
    p1Placement.append(get_placement(p1, playerdf))
    p2Seed.append(get_seed(p2, playerdf))
    p2Placement.append(get_placement(p2, playerdf))
df["P1 Seed"] = p1Seed
df["P1 Placement"] = p1Placement
df["P2 Seed"] = p2Seed
df["P2 Placement"] = p2Placement

In [364]:
df.head()

Unnamed: 0,Player1,Player2,Winner,Score,Loser,MatchNo,EUAC,Date,P1 Seed,P1 Placement,P2 Seed,P2 Placement
0,Alumento,Owdy,Alumento,2-0,Owdy,1,1,21/10/17,11,7,22,17
1,BambooBoss,FrankTank,FrankTank,2-0,BambooBoss,2,1,21/10/17,14,13,19,5
2,Kotorious BRD,Altair,Kotorious BRD,0--1,Altair,3,1,21/10/17,15,13,18,17
3,RD | | Dushni,TCM | Raffa,TCM | Raffa,2-0,RD | | Dushni,4,1,21/10/17,12,17,21,4
4,FR|TCM | InkAlyut,FR | Maxou0708,FR|TCM | InkAlyut,2-1,FR | Maxou0708,5,1,21/10/17,13,9,20,1


In [365]:
playerdf.head()

Unnamed: 0,Start ID,Player,Seed,Placement
0,1152276,FR | Maxou0708,20,1
1,1118825,Rapha_MTH,6,2
2,1133810,Sabaca,8,3
3,1152521,TCM | Raffa,21,4
4,1152258,FrankTank,19,5


Won't need Seed and Placement from first EUAC anymore from playerdf

In [366]:
# Remove Seed and Placement columns
playerdf = playerdf.drop(["Seed", "Placement"], axis = "columns")

In [367]:
# Add players from second EUAC to the player dataframe
nameArray = []
idArray = []
for players in participants:
    if players["active"] == True:
        idArray.append(players["challonge_user_id"])
        nameArray.append(players["challonge_username"])
    
challongedf = pd.DataFrame({
    "Player": nameArray,
    "Challonge ID": idArray
})

playerdf = pd.concat([playerdf,challongedf], ignore_index=False, sort=False)

In [368]:
playerdf.head()

Unnamed: 0,Start ID,Player,Challonge ID
0,1152276.0,FR | Maxou0708,
1,1118825.0,Rapha_MTH,
2,1133810.0,Sabaca,
3,1152521.0,TCM | Raffa,
4,1152258.0,FrankTank,


Concat is nice but a function to add new players to the table would be better for future use

In [369]:
len(df)

43

In [370]:
def add_new_player(df, name, challongeId):
    if name not in df["Player"].values:
        df.loc[len(df)] = [0, name, challongeId]

Within "participants" and "matches" from the API pulls, the ids and availability of access to name differ. <br>
For each tournament, a player has an id but they also have a unique challonge id. A player's name is also not accessible within match, for example. <br>
To get around this limitation, functions will be written to access this data <br>
The same will also have to be done for seeds and placements <br> <br>
Then a loop can be written to compile this data and merge with the existing dataframe containing the matches (df)

In [371]:
# Get a challonge id given their player id.
# Player id is unique for each tournament.
# Necessary for missing information from some api pulls

def get_challonge_id(uid):
    for i in participants:
        if i["id"] == uid:
            return i["challonge_user_id"]
        else:
            pass

In [372]:
# Returns challonge name given a player's tournament id
def get_challonge_name(uid):
    for i in participants:
        if i["id"] == uid:
            return i["challonge_username"]
        else:
            pass

In [373]:
# Returns challonge name given a player's tournament id
def get_challonge_seed(uid):
    for i in participants:
        if i["id"] == uid:
            return i["seed"]
        else:
            pass

In [374]:
# Returns challonge name given a player's tournament id
def get_challonge_placement(uid):
    for i in participants:
        if i["id"] == uid:
            return i["final_rank"]
        else:
            pass

In [375]:
# Get tournament date
date = tournament["started_at"]
date

datetime.datetime(2017, 11, 12, 14, 7, 13, 162000, tzinfo=<DstTzInfo 'Europe/London' GMT0:00:00 STD>)

In [376]:
# Format date
date = date.strftime("%d/%m/%y")
date

'12/11/17'

In [377]:
df.head()

Unnamed: 0,Player1,Player2,Winner,Score,Loser,MatchNo,EUAC,Date,P1 Seed,P1 Placement,P2 Seed,P2 Placement
0,Alumento,Owdy,Alumento,2-0,Owdy,1,1,21/10/17,11,7,22,17
1,BambooBoss,FrankTank,FrankTank,2-0,BambooBoss,2,1,21/10/17,14,13,19,5
2,Kotorious BRD,Altair,Kotorious BRD,0--1,Altair,3,1,21/10/17,15,13,18,17
3,RD | | Dushni,TCM | Raffa,TCM | Raffa,2-0,RD | | Dushni,4,1,21/10/17,12,17,21,4
4,FR|TCM | InkAlyut,FR | Maxou0708,FR|TCM | InkAlyut,2-1,FR | Maxou0708,5,1,21/10/17,13,9,20,1


In [378]:
tournamentNo = 2
matchNo = 1

for i in matches:
    
    # Get player details
    player1Id = i["player1_id"]
    player2Id = i["player2_id"]
    player1CId = get_challonge_id(player1Id)
    player2CId = get_challonge_id(player2Id)
    player1Name = get_challonge_name(player1Id)
    player2Name = get_challonge_name(player2Id)
    
    # Seed/Placements
    player1Seed = get_challonge_seed(player1Id)
    player1Placement = get_challonge_placement(player1Id)
    player2Seed = get_challonge_seed(player2Id)
    player2Placement = get_challonge_placement(player2Id)
    
    
    # Determine winner and loser
    if i["player1_id"] == i["winner_id"]:
        winner = player1Name
        loser = player2Name
    else:
        loser = player1Name
        winner = player2Name
    
    # Score
    score = i["scores_csv"]
    
    # Add to dataframe
    df.loc[len(df)] = [player1Name, player2Name, winner, score, loser, matchNo,
                       tournamentNo, date, player1Seed, player1Placement, player2Seed, player2Placement]
    matchNo += 1

In [379]:
df.tail()

Unnamed: 0,Player1,Player2,Winner,Score,Loser,MatchNo,EUAC,Date,P1 Seed,P1 Placement,P2 Seed,P2 Placement
60,InkA_,_Rem_,_Rem_,86-101,InkA_,18,2,12/11/17,1,5,5,3
61,GameFroggit,Ripha,Ripha,0-2,GameFroggit,19,2,12/11/17,6,5,4,4
62,_Rem_,Ripha,_Rem_,3-0,Ripha,20,2,12/11/17,5,3,4,4
63,Frank001,_Rem_,Frank001,3-0,_Rem_,21,2,12/11/17,2,2,5,3
64,Raffa_,Frank001,Raffa_,3-0,Frank001,22,2,12/11/17,12,1,2,2


Next step is to put this altogether in one big loop and finish compiling all the match data

In [380]:
# Shows a progress bar
import time
from tqdm import tqdm

In [381]:
skippedUrls = []

# Every Challonge url slug except the first one. It's already done
for urls in tqdm(challongeArray[1:], desc="Processing", unit="step"):
    slug = urls.split("/")
    url = slug[-1]
    try:
        # Retrieve a tournament by its id (or its url).
        tournament = challonge.tournaments.show(url)
    except:
        print(f"Error with {url}")
        skippedUrls.append(urls)
        continue
    
    # Retrieve matches
    matches = challonge.matches.index(tournament["id"])
    
    # Retrive participants
    participants = challonge.participants.index(tournament["id"])
    
    # Add new players to playerdf table if they checked in
    for i in participants:
        if i["active"] == True:
            if i["challonge_username"] == None: 
                add_new_player(playerdf, i["display_name"], i["id"])
            else:
                add_new_player(playerdf, i["challonge_username"], i["challonge_user_id"])
        else:
            pass
        
    # Building Dataframe info
    # Exception to the rule
    if url[-3:] == "PVW":
        tournamentNo = "PVW"
    else:
        tournamentNo = re.findall(r'\d+', url)
        tournamentNo = tournamentNo[0]
        tournamentNo = int(tournamentNo)
        
    date = tournament["start_at"].date()
    date = date.strftime("%d/%m/%y")
    
    # Acquiring player information
    matchNo = 0
    for i in matches:
        if i["suggested_play_order"] == None:
            matchNo += 1
        else:
            matchNo = i["suggested_play_order"]
        
        player1id = i["player1_id"]
        player2id = i["player2_id"]
        player1cid = get_challonge_id(player1id)
        player2cid = get_challonge_id(player2id)
        player1Name = get_challonge_name(player1id)
        player2Name = get_challonge_name(player2id)
        player1Seed = get_challonge_seed(player1id)
        player2Seed = get_challonge_seed(player2id)
        player1Placement = get_challonge_placement(player1id)
        player2Placement = get_challonge_placement(player2id)
        if i["player1_id"] == i["winner_id"]:
            winner = player1Name
            loser = player2Name
        else:
            loser = player1Name
            winner = player2Name
        
        # Getting score
        score = i["scores_csv"]
        
        # Add to dataframe
        df.loc[len(df)] = [player1Name, player2Name, winner, score, loser, matchNo, tournamentNo, date,
                      player1Seed, player1Placement, player2Seed, player2Placement]
    time.sleep(1) #Avoid too many api calls

Processing:  30%|████████████████████▎                                              | 33/109 [01:09<01:56,  1.53s/step]

Error with EUAC35


Processing:  32%|█████████████████████▌                                             | 35/109 [01:10<01:03,  1.17step/s]

Error with EUAC36
Error with EUAC37


Processing:  33%|██████████████████████▏                                            | 36/109 [01:10<00:47,  1.52step/s]

Error with EUAC38


Processing:  36%|███████████████████████▉                                           | 39/109 [01:12<00:39,  1.76step/s]

Error with EUAC40
Error with EUAC41


Processing:  37%|████████████████████████▌                                          | 40/109 [01:12<00:31,  2.16step/s]

Error with EUAC42


Processing:  39%|█████████████████████████▊                                         | 42/109 [01:14<00:44,  1.51step/s]

Error with EUAC44


Processing:  39%|██████████████████████████▍                                        | 43/109 [01:15<00:40,  1.64step/s]

Error with EUAC45


Processing: 100%|██████████████████████████████████████████████████████████████████| 109/109 [03:19<00:00,  1.83s/step]


In [384]:
df.tail(50)

Unnamed: 0,Player1,Player2,Winner,Score,Loser,MatchNo,EUAC,Date,P1 Seed,P1 Placement,P2 Seed,P2 Placement
1971,Penzo,Mortal_Instrument,Mortal_Instrument,0-2,Penzo,15,PVW,05/02/23,6,4,3,3
1972,Grimwood96,Mortal_Instrument,Grimwood96,3-1,Mortal_Instrument,17,PVW,05/02/23,2,2,3,3
1973,Ripha,Grimwood96,Ripha,3-2,Grimwood96,18,PVW,05/02/23,1,1,2,2
1974,YoshiBowser,Miari_le_Rat,Miari_le_Rat,0-2,YoshiBowser,1,108,16/04/23,8,9,9,7
1975,Nylis,Alfon42,Nylis,2-0,Alfon42,2,108,16/04/23,7,7,10,9
1976,Ripha,Miari_le_Rat,Ripha,2-0,Miari_le_Rat,5,108,16/04/23,1,1,9,7
1977,Iceman92,Giusesbica004,Iceman92,2-0,Giusesbica004,3,108,16/04/23,4,4,5,5
1978,Alistair__,Nylis,Alistair__,2-0,Nylis,6,108,16/04/23,2,2,7,7
1979,Mortal_Instrument,Yamber,Mortal_Instrument,2-0,Yamber,4,108,16/04/23,3,3,6,5
1980,Ripha,Iceman92,Ripha,2-1,Iceman92,11,108,16/04/23,1,1,4,4


In [385]:
playerdf.tail()

Unnamed: 0,Start ID,Player,Challonge ID
132,0.0,Anasuis,6258388.0
133,0.0,sillyLao,3724750.0
134,0.0,Toadsie,5994772.0
135,0.0,DoJoSeph,6901460.0
136,0.0,ProfPie,6467815.0


In [None]:
#playerdf.to_csv("PlayerDetails.csv", index= False)
#df.to_csv("EUACSets.csv", index=False)

#### 3.2.3.4 Missing links via Selenium

In [3]:
playerdf = pd.read_csv("PlayerDetails.csv")
df = pd.read_csv("EUACSets.csv")

In [108]:
skippedUrls = [
    'https://challonge.com/EUAC35',
    'https://challonge.com/EUAC36',
    'https://challonge.com/EUAC37',
    'https://challonge.com/EUAC38',
    'https://challonge.com/EUAC40',
    'https://challonge.com/EUAC41',
    'https://challonge.com/EUAC42',
    'https://challonge.com/EUAC44',
    'https://challonge.com/EUAC45'
]

Manually checking the missing tournament links in a browser show that they are up and available

In [4]:
from bs4 import BeautifulSoup

In [109]:
#url = skippedUrls[0]
url = skippedUrls[0] #Try the first one

response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

In [110]:
print(soup)

<!DOCTYPE html>
<html lang="en-US"><head><title>Just a moment...</title><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><meta content="IE=Edge" http-equiv="X-UA-Compatible"/><meta content="noindex,nofollow" name="robots"/><meta content="width=device-width,initial-scale=1" name="viewport"/><style>*{box-sizing:border-box;margin:0;padding:0}html{line-height:1.15;-webkit-text-size-adjust:100%;color:#313131;font-family:system-ui,-apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol,Noto Color Emoji}body{display:flex;flex-direction:column;height:100vh;min-height:100vh}.main-content{margin:8rem auto;max-width:60rem;padding-left:1.5rem}@media (width <= 720px){.main-content{margin-top:4rem}}.h2{font-size:1.5rem;font-weight:500;line-height:2.25rem}@media (width <= 720px){.h2{font-size:1.25rem;line-height:1.5rem}}#challenge-error-text{background-image:url(data:image/svg+xml;base64,PHN2ZyB4bWx

Getting blocked by Cloudflare's DDoS protection

To get around this, we are going to open the tournament links with selenium and pull the data from that.

In [7]:
#pip install selenium webdriver-manager

In [8]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import time

In [9]:
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)

time.sleep(5) # Adding a delay to ensure the Javascript loads

html = driver.page_source

driver.quit()

In [10]:
html

'<html lang="en" xmlns="http://www.w3.org/1999/xhtml"><head>\n<meta content="text/html;charset=utf-8" http-equiv="Content-Type">\n<meta name="csrf-param" content="authenticity_token">\n<meta name="csrf-token" content="2iP2dPAly4/62qNX7HPY3jbf3d6FcuNLbwE/Kzl53vFRgmPIF6j9gGcb4vs8nfqgUzY7rILNeEskjXXsM+rtsw==">\n<meta name="asset-host" content="https://assets.challonge.com">\n<meta content="noindex,nofollow" name="robots">\n<meta content="width=device-width, initial-scale=1.0" name="viewport">\n<meta content="https://stream.challonge.com:8000/faye" name="stream-url">\n<link rel="stylesheet" media="all" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,400,600,700">\n<link rel="stylesheet" media="all" href="https://fonts.googleapis.com/css2?family=Radio+Canada+Big:ital,wght@0,400..700;1,400..700&amp;display=swap">\n<link rel="stylesheet" media="all" href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;600;700&amp;display=swap">\n<link rel="stylesheet" media="all

In [11]:
soup = BeautifulSoup(html, "html.parser")

In [12]:
# Find all the script tags
scripts = soup.find_all("script")

# Searching for "participants" and "matches" like what we use in Challonge's api
for tag in scripts:
    if "participants" in tag.text and "matches" in tag.text:
        print(tag.text[-500:])

ser":false,"player1_placeholder_text":null,"player2_placeholder_text":null,"scores":[3,2],"winner_id":108165794,"loser_id":107792038,"md5":"a85a89fd8e0a5c538efd352003249c37"}]},"groups":[]}; window._initialStoreState['ThemeStore'] = {"options":{"hideSeeds":false,"hideIdentifiers":null,"showStationAndTime":false,"participantsPerMatch":2}}; window._initialStoreState['BracketSettingsStore'] = {"panOnSingleClick":false,"zoomScaleOnDoubleClick":null,"use100vh":false,"showDetailsOnHover":true};
//]]>



Data is in there

In [13]:
scripts

[<script async="" crossorigin="anonymous" src="https://connect.facebook.net/en_US/sdk.js?hash=a973b7825e1b704385951048b98f4a2c"></script>,
 <script src="https://cdn.hadronid.net/hadron.js?url=https%3A%2F%2Fchallonge.com%2FEUAC35&amp;ref=&amp;_it=amazon&amp;partner_id=720"></script>,
 <script id="facebook-jssdk" src="//connect.facebook.net/en_US/sdk.js"></script>,
 <script async="" src="https://www.google-analytics.com/analytics.js"></script>,
 <script>
   // redirect XBC clients to the XSplit plugin page
   if (navigator.userAgent.indexOf("XSplit") > 0) {
     window.location = '/EUAC35/xsplit';
   }
 </script>,
 <script async="" src="https://www.googletagmanager.com/gtag/js?id=G-1EEPZLM6JC"></script>,
 <script type="text/javascript">
     function readCookie(name) {
       var nameEQ = name + "=";
       var ca = document.cookie.split(';');
       for(var i=0;i < ca.length;i++) {
         var c = ca[i];
         while (c.charAt(0)==' ') c = c.substring(1,c.length);
         if (c.inde

Manually inspecting the webpage, we can gather what tags to search for

In [14]:
matches = soup.select("g.match")

In [15]:
for i in matches:
    seeds = i.select("text.match--seed")

In [16]:
for seed in seeds:
    print(seed.text)

4
5


In [17]:
for i in matches:
    names = i.select("text.match--player-name")

In [18]:
names

[<text class="match--player-name" clip-path="url(#clipPath3016783)" height="12" text-anchor="start" width="147" x="77" y="15">Yam</text>,
 <text class="match--player-name -winner" clip-path="url(#clipPath5447732)" height="12" text-anchor="start" width="147" x="77" y="15">ocrim_ger</text>]

In [19]:
names[0].text

'Yam'

In [20]:
match = soup.select_one("g.match")
print(match.prettify())

<g class="match -complete" data-identifier="9" data-match-id="178584021" transform="translate(488 231)">
 <defs>
  <clippath id="match-clippath-9">
   <rect height="45" rx="3" ry="3" width="200" x="26" y="5">
   </rect>
  </clippath>
 </defs>
 <text class="match--identifier" height="10" text-anchor="middle" width="24" x="11" y="31">
  9
 </text>
 <rect class="match--wrapper-background" height="49" rx="3" ry="3" width="204" x="24" y="3">
 </rect>
 <rect class="match--base-background" height="45" rx="3" ry="3" width="200" x="26" y="5">
 </rect>
 <g clip-path="url(#match-clippath-9)">
  <svg class="match--player" data-participant-id="107792038" x="0" y="5">
   <title>
    Grimwood96
   </title>
   <defs>
    <clippath id="clipPath9240489">
     <rect height="22" width="143" x="50" y="0">
     </rect>
    </clippath>
    <clippath id="portraitClipPath9240489">
     <path>
     </path>
    </clippath>
   </defs>
   <path class="match--player-background" d="M 50 0 h 147 v 22 h -147 Z">
   </

In [21]:
part = match.find("svg", class_="match--player")

In [22]:
participantId = part["data-participant-id"]

In [23]:
participantId

'107792038'

In [24]:
score = match.find("svg", class_="match--player-score")

In [25]:
for i in matches:
    scores = i.select("text.match--player-score")

In [26]:
scores[0].text

'1'

What we're missing is the challonge id. Manually examining the scipt tags on the webpage, reveal a variable known as "participantUserIdMap". This seems to contain a mapping of challonge to participant ids.

In [27]:
text = ""
for tag in soup.find_all("script"):
    if "participantUserIdMap" in tag.text:
        text = tag.text

In [28]:
text

'window.gon = {};gon.targetingKeyValues={"category":"Video Game - Fighting","game":"Arms"};gon.forceDeferredCallback=false;gon.participantUserIdMap={"1897663":107792038,"2363206":108165794,"2521004":108165544,"3149531":107876180,"2392308":107827464,"3057118":107772816};gon.adminIds=[2405407,2392308,2533091,2670839,2405407];'

In [29]:
start = text.find("participantUserIdMap")
subTextStart = text.find("{", start)
subTextEnd = text.find("}", start)
extract = text[subTextStart: subTextEnd+1]

In [30]:
extract = extract.replace("{", "")
extract = extract.replace("}", "")
extract = extract.replace('"', "")

In [31]:
extract

'1897663:107792038,2363206:108165794,2521004:108165544,3149531:107876180,2392308:107827464,3057118:107772816'

In [32]:
# Split by comma first then by semi-colon

IdSplit = extract.split(",")

In [33]:
IdSplit

['1897663:107792038',
 '2363206:108165794',
 '2521004:108165544',
 '3149531:107876180',
 '2392308:107827464',
 '3057118:107772816']

In [34]:
if 3057118 in playerdf["Challonge ID"].values:
    print("Yes")

Yes


In [35]:
participantArray = []
challongeArray = []
for num in IdSplit:
    split = num.split(":")
    challongeArray.append(split[0])
    participantArray.append(split[1])

In [36]:
participantArray

['107792038', '108165794', '108165544', '107876180', '107827464', '107772816']

In [37]:
challongeArray

['1897663', '2363206', '2521004', '3149531', '2392308', '3057118']

In [38]:
def find_challonge_id(pid):
    pos = 0
    for i in participantArray:
        if int(i) == pid:
            return challongeArray[pos]
        else:
            pos += 1

In [39]:
find_challonge_id(107772816)

'3057118'

In [40]:
def retrieve_player_challonge(df, cid):
    player = df[df["Challonge ID"] == cid]
    if not player.empty:
        return player.iloc[0]["Player"]
    else:
        print("Not found")

In [41]:
for num in challongeArray:
    print(retrieve_player_challonge(playerdf, int(num)))

Grimwood96
replicant___
Iceman92
Yamber
ocrim_ger
YoshiBowser


In [42]:
retrieve_player_challonge(playerdf, 3149531)

'Yamber'

Yam is a display name. Yamber is the challonge name.
Need a function to cross check participants id and challonge id. If the player exists, grab their challonge name. Otherwise, just use their display name for the playerdf until it can be figured out how to get it

What is currently missing is placements and match order

In [43]:
matchesTest = soup.select("g.match")

In [44]:
for i in matchesTest:
    names = i.select("text.match--player-name")
    seeds = i.select("text.match--seed")
    scores = i.select("text.match--player-score")
    print(names[0].text)
    print(seeds[0].text)
    print(scores[0].text)
    print(names[1].text)
    print(seeds[1].text)
    print(scores[1].text)
    part = i.find_all("svg", class_="match--player")
    for x in part:
        participantId = x["data-participant-id"]
        print(participantId)

Grimwood96
1
3
Yam
4
1
107792038
107876180
Yam
4
2
ocrim_ger
5
0
107876180
107827464
ocrim_ger
5
2
YoshiBowser
6
0
107827464
107772816
Iceman92
3
1
Yam
4
2
108165544
107876180
replicant
2
3
Grimwood96
1
2
108165794
107792038
Grimwood96
1
1
replicant
2
3
107792038
108165794
replicant
2
2
Iceman92
3
0
108165794
108165544
Grimwood96
1
2
ocrim_ger
5
0
107792038
107827464
Iceman92
3
2
YoshiBowser
6
0
108165544
107772816
Yam
4
1
ocrim_ger
5
2
107876180
107827464


In [45]:
names

[<text class="match--player-name" clip-path="url(#clipPath3016783)" height="12" text-anchor="start" width="147" x="77" y="15">Yam</text>,
 <text class="match--player-name -winner" clip-path="url(#clipPath5447732)" height="12" text-anchor="start" width="147" x="77" y="15">ocrim_ger</text>]

In [46]:
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://challonge.com/EUAC35/standings")

time.sleep(5)

htmlStandings = driver.page_source

driver.quit()

In [47]:
htmlStandings

'<html lang="en" xmlns="http://www.w3.org/1999/xhtml"><head>\n<meta content="text/html;charset=utf-8" http-equiv="Content-Type">\n<meta name="csrf-param" content="authenticity_token">\n<meta name="csrf-token" content="miLHNfcc7OCj9D9bPjnboBfB6pr+mr8ZWHHF+BwumHe5mbY7GW25aZ8IoVocpT+hA4tLJ/tsf5ELCRu2pv00JQ==">\n<meta name="asset-host" content="https://assets.challonge.com">\n<meta content="noindex,nofollow" name="robots">\n<meta content="width=device-width, initial-scale=1.0" name="viewport">\n<meta content="https://stream.challonge.com:8000/faye" name="stream-url">\n<link rel="stylesheet" media="all" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,400,600,700">\n<link rel="stylesheet" media="all" href="https://fonts.googleapis.com/css2?family=Radio+Canada+Big:ital,wght@0,400..700;1,400..700&amp;display=swap">\n<link rel="stylesheet" media="all" href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;600;700&amp;display=swap">\n<link rel="stylesheet" media="all

In [48]:
soup = BeautifulSoup(htmlStandings, "html.parser")

In [49]:
table = soup.find("table", attrs={'class': "standings"})

In [50]:
# All rows in table body (no headers/footers)
rows = table.find("tbody").find_all("tr")

In [51]:
# standing
print(rows[2].find("td", "rank").text.strip()) # strip for white space

3


In [52]:
# challonge name
rows[2].find('a', attrs={'class':"link-text -primary"}).text 

'Yamber'

In [53]:
# display name
rows[2].find("td", attrs={'class': "white text-center display_name"}).text.strip()

'Yam'

In [54]:
# Putting it altogether
displayNames = []
challongeNames = []
rankings = []

for row in rows:
    ranking = row.find("td", "rank").text.strip()
    rankings.append(ranking)
    challongeName = row.find('a', attrs={'class':"link-text -primary"}).text 
    challongeNames.append(challongeName)
    displayName = row.find("td", attrs={'class': "white text-center display_name"}).text.strip()
    displayNames.append(displayName)

In [55]:
rankings

['1', '2', '3', '4', '5', '5']

In [56]:
challongeNames

['replicant___',
 'Grimwood96',
 'Yamber',
 'ocrim_ger',
 'Iceman92',
 'YoshiBowser']

In [57]:
displayNames

['replicant', 'Grimwood96', 'Yam', 'ocrim_ger', 'Iceman92', 'YoshiBowser']

In [58]:
tempDf = pd.DataFrame({
    "Ranking": rankings,
    "Challonge Name": challongeNames,
    "Display Name": displayNames
})
tempDf.head()

Unnamed: 0,Ranking,Challonge Name,Display Name
0,1,replicant___,replicant
1,2,Grimwood96,Grimwood96
2,3,Yamber,Yam
3,4,ocrim_ger,ocrim_ger
4,5,Iceman92,Iceman92


In [59]:
def get_rankings(df, challongeName):
    ranking = df[df["Challonge Name"] == challongeName]
    if not ranking.empty:
        return ranking.iloc[0]["Ranking"]
    else:
        print(f"{challongeName} not found")

In [60]:
get_rankings(tempDf, "Iceman92")

'5'

In [61]:
matches = soup.select("g.match")

In [62]:
match.select_one("text.match--identifier").text

'9'

In [63]:
soup = BeautifulSoup(html, "html.parser")

In [64]:
matches = soup.select("g.match")

In [107]:
# Tournament Number
tournamentNo = re.findall(r'\d+', "EUAC35") #Hard coded. Change later
tournamentNo = tournamentNo[0]
tournamentNo = int(tournamentNo)

# Date
rawDate = soup.find("div", attrs={'class':"start-time"})
dateStr = rawDate.text.strip()
cleaned = dateStr.replace("CET", "").strip()
dt = datetime.strptime(cleaned, "%B %d, %Y at %I:%M %p")
date = dt.strftime("%d/%m/%y")

for i in matches:
    
    # Seeds
    seeds = i.select("text.match--seed")
    p1Seed = seeds[0].text
    p2Seed = seeds[1].text
    
    # Participant Ids
    parts = i.find_all("svg", class_="match--player")
    part1 = parts[0]["data-participant-id"]
    part2 = parts[1]["data-participant-id"]
    
    # Challonge Ids
    challonge1 = find_challonge_id(int(part1))
    challonge2 = find_challonge_id(int(part2))
    
    # Names COME BACK TO THIS. CHECK IF PLAYER IS NEW
    player1 = retrieve_player_challonge(playerdf, int(challonge1))
    player2 = retrieve_player_challonge(playerdf, int(challonge2))
    
    # Placements
    ranking1 = get_rankings(tempDf, player1)
    ranking2 = get_rankings(tempDf, player2)
    
    # Winner/Loser
    if "winner" in str(parts[0]):
        winner = player1
        loser = player2
    else:
        winner = player2
        loser = player1
    print(winner)
    print(loser)
    
    # Score
    scores = i.select("text.match--player-score")
    score = f"{scores[0].text}-{scores[1].text}"
    
    # Match Number
    matchNo = i.select_one("text.match--identifier").text
    
    break;
    
    # Add to dataframe
    df.loc[len(df)] = [player1, player2, winner, score, loser, matchNo, tournamentNo, date,
                       p1Seed, ranking1, p2Seed, ranking2]
    

27/10/19
27/10/19


SyntaxError: 'break' outside loop (370276732.py, line 14)

In [89]:
df.tail()

Unnamed: 0,Player1,Player2,Winner,Score,Loser,MatchNo,EUAC,Date,P1 Seed,P1 Placement,P2 Seed,P2 Placement
2016,DoJoSeph,YoshiBowser,DoJoSeph,2-0,YoshiBowser,8,110,13/10/24,3,4,5,5
2017,Yamber,ProfPie,Yamber,2-0,ProfPie,7,110,13/10/24,4,2,7,5
2018,DoJoSeph,Yamber,Yamber,0-2,DoJoSeph,9,110,13/10/24,3,4,4,2
2019,Giusesbica004,Yamber,Yamber,0-0,Giusesbica004,11,110,13/10/24,2,3,4,2
2020,Ripha,Yamber,Ripha,3-0,Yamber,12,110,13/10/24,1,1,4,2


In [73]:
parts

[<svg class="match--player" data-participant-id="107792038" x="0" y="5"><title>Grimwood96</title><defs><clippath id="clipPath1977050"><rect height="22" width="143" x="50" y="0"></rect></clippath><clippath id="portraitClipPath1977050"><path></path></clippath></defs><path class="match--player-background" d="M 50 0 h 147 v 22 h -147 Z"></path><path class="match--seed-background" d="M 26 0 h 24 v 22 h -24 Z"></path><text class="match--seed" height="12" text-anchor="middle" width="10" x="38" y="14">1</text><text class="match--player-name -winner" clip-path="url(#clipPath1977050)" height="12" text-anchor="start" width="147" x="77" y="15">Grimwood96</text><g clip-path=""><image height="18" width="18" x="55" xlink:href="https://s3.amazonaws.com/challonge_app/users/images/001/897/663/large/1725714_1.png" y="2"/></g><path class="match--player-score-background -winner" d="M 197 0 h 29 v 22 h -29 Z"></path><text class="match--player-score -winner" height="12" text-anchor="middle" width="21" x="211

In [82]:
if "winner" in str(parts[0]):
    print("0")
elif "winner" in str(parts[1]):
    print("1")
else:
    print("Try another way")

0


In [101]:
dateData = soup.find('div', attrs={'class':"start-time"})

In [102]:
dateData

<div class="text start-time">
October 27, 2019 at  3:00 PM CET
</div>

In [103]:
dateData.text.strip()

'October 27, 2019 at  3:00 PM CET'

In [96]:
dateData.get_text(strip=True)

'October 27, 2019 at  3:00 PM CET'

In [104]:
dateStr =  dateData.text.strip()

In [105]:
from datetime import datetime

cleaned = dateStr.replace("CET", "").strip()

dt = datetime.strptime(cleaned, "%B %d, %Y at %I:%M %p")

formattedDate = dt.strftime("%d/%m/%y")

print(formattedDate)

27/10/19


# References

- https://armswiki.org/wiki/EU_ARMS_Challenge
- https://developer.start.gg/docs/examples/queries/get-event
- https://github.com/ZEDGR/pychallonge