# Gathering Data

## Imports

- requests: Necessary to pull data from the internet

In [1]:
import requests

## Pre-Info

Data for this project will be obtained from SendouQ match data, found at `https://sendou.ink/q/match/[ID]`. The ID numbers increase sequentially, with the first match ever having an ID of 1, and the current last match (at the time of writing) having an ID around 45400.

I use a sample match from my most recent one at the time, which can be found with ID [45359](https://sendou.ink/q/match/45359).

The data we're interested in is only two things, the score of every player in the game, and the result of the match. Since this information is shown on the screen, it should be possible to extract this data from the page source.

Looking at the page source, it's not easily decipherable at first glance, especially since I have little knowledge of web-dev. However, after some investigation, there is some json data which contains the data we're looking for which can be found immediately after the phrase `"features/sendouq/routes/q.match.$id":` and taking the next text until the bracket ends.

In [4]:
match_ID = '45359'

# grab the data from the page
r = requests.get('https://sendou.ink/q/match/' + match_ID)
start_phrase = '"features/sendouq/routes/q.match.$id":'

# find the start_phrase in the result
start_index = r.text.find(start_phrase)

# copy all text after the start_phrase
data = r.text[start_index + len(start_phrase):]

# find the end_index by looking for balanced parentheses
open_parentheses = 0
for i, c in enumerate(data):
    if c == '{':
        open_parentheses += 1
    elif c == '}':
        open_parentheses -= 1
    if open_parentheses == 0:
        break
end_index = i + 1

# trim off the extra data
data = data[:end_index]

# print out data (use an external tool to visualize the json like https://codebeautify.org/jsonviewer)
print(data)

{"match":{"id":45359,"alphaGroupId":376816,"bravoGroupId":376815,"createdAt":1714701635,"reportedAt":1714704978,"reportedByUserId":5317,"memento":{"modePreferences":{"TC":[{"userId":1117,"preference":"PREFER"},{"userId":3168},{"userId":5317,"preference":"PREFER"},{"userId":160},{"userId":4275,"preference":"AVOID"},{"userId":20583,"preference":"AVOID"}],"SZ":[{"userId":1117,"preference":"PREFER"},{"userId":3168},{"userId":5317,"preference":"PREFER"},{"userId":160,"preference":"PREFER"},{"userId":4275,"preference":"PREFER"},{"userId":20583,"preference":"PREFER"}]},"pools":[{"userId":1117,"pool":[{"stages":[2,7,8,10,11,15,22],"mode":"SZ"},{"stages":[2,3,6,8,10,19,22],"mode":"TC"},{"stages":[0,3,6,14,16,18,21],"mode":"RM"},{"stages":[0,6,8,10,17,18,22],"mode":"CB"}]},{"userId":3168,"pool":[{"stages":[2,6,7,8,10,16,22],"mode":"SZ"},{"stages":[2,6,8,10,14,16,22],"mode":"TC"},{"stages":[0,6,8,10,14,16,17],"mode":"CB"},{"stages":[0,6,14,16,17,21,22],"mode":"RM"}]},{"userId":5317,"pool":[{"stag

Visualize this data on any json visualizer, I personally used [codebeautify](https://codebeautify.org/jsonviewer). The first data we want to obtain is the ratings of the players. This can be found at `groupAlpha.members[0-3].skill.ordinal` for the left team, and replace `groupAlpha` with `groupBeta` for the right team.

However, this value is not what we might expect. From the website, player ratings, which can be seen by clicking on the pictures of the ranks, show a number around 1200-1700. However, the value in data is a smaller number between 0-50. Therefore, there must be some conversion to obtain the rating from the ordinal.

We can compare ordinals vs. ratings to see how we can write a conversion function. This data is mostly from the game I was using above, and then some data from the match with the ID right before (ID: `45359`).

| ordinal $(o)$ | rating $(\mathrm R)$ |
|---|---|
|23.189129414253898|1348|
|25.8303791446169|1387|
|23.431072851780918|1351|
|19.726415339139816|1296|
|29.049225431933444|1436|
|24.00009753744956|1360|
|38.153938142944824|1572|
|31.82404177844055|1477|
|11.527993868993082|1173|
|7.477124875867283|1112|

If you graph this data, you can find a linear correlation, with the only error being a very small rounding error. Thus, we can create a function to convert from ordinal to rating.
$$\operatorname R(o) = 15o + 1000$$