# Diagnostics

If the `PcsScraper`-class returns an error, chances are that the error is due to a change on the procyclingstats.com website. As the class does not specify where the error occured, I have created this diagnostics-notebook for the purpose of diagnosing errors in selecting elements from the html-code.

The notebook is structured with seperate code-cells for all the selection-actions that is performed by `PcsScraper`. Thus, the idea is that the notebook simply can be run, and if the output from any cell looks "wrong" or if any cell returns an error, chances are high that it is this specific action that breaks the class. The faulty actions can then be modified and changed in the class.

Below the actions has been split into results and startlists, as these depend on two different `PcsScraper`-methods.

In [None]:
import requests
from bs4 import BeautifulSoup
from tqdm import tqdm

## Results

The cells below are the actions used to select the different variables in the `get_results()`-method. The cell directly below simply performs the necessary setup for the individual cells to run. This is done more elegantly under the hood of `PcsScraper`.

In [None]:
url = 'https://www.procyclingstats.com/race/volta-ao-algarve/2024/stage-3'
response = requests.get(url)
soup = BeautifulSoup(response.text, "lxml")
table = soup.select('tbody')[0]

In [None]:
# Placement
[num + 1 for num in range(len(table.select('tr')))]

In [None]:
# Team
[t.text for t in table.find_all('td', class_='cu600') if "bonis" not in str(t)]

In [None]:
# id
[i['data-id'] for i in table.select('input')]

In [None]:
# rider
[i['data-seo'] for i in table.select('input')]

In [None]:
# time
[i.contents[0].text for i in table.find_all('td', {'class': 'time ar'})]

In [None]:
# parcours
soup.select('ul.infolist li')[7].select('span')[0]['class'][2]

In [None]:
# date
soup.select('ul.infolist li')[0].select('div')[1].text

In [None]:
# distance
float(soup.select('ul.infolist li')[4].select('div')[1].text[:-3])

In [None]:
# points_scale
soup.select('ul.infolist li')[5].select('div')[1].text

In [None]:
# uci_scale
soup.select('ul.infolist li')[6].select('div')[1].text

In [None]:
# profile score
float(soup.select('ul.infolist li')[8].select('div')[1].text)

In [None]:
# vertical meters
float(soup.select('ul.infolist li')[9].select('div')[1].text)

In [None]:
# startlist_quality
float(soup.select('ul.infolist li')[13].select('div')[1].text)

In [None]:
# title
soup.select('title')[0].text[:-8]

## Startlists

The cells below are the actions used to select the different variables in the `get_startlist()`-method. The cell directly below simply performs the necessary setup for the individual cells to run. This is done more elegantly under the hood of `PcsScraper`.

In [None]:
url = 'https://www.procyclingstats.com/race/tour-of-rwanda/2024/startlist'
response = requests.get(url)
startlist_soup = BeautifulSoup(response.text, "lxml")
soup = startlist_soup.find_all(class_="ridersCont")

url = 'https://www.procyclingstats.com/race/tour-of-rwanda/2024/stage-1'
response = requests.get(url)
stage_soup = BeautifulSoup(response.text, "lxml")

In [None]:
# startlist_old
[r.find("a")['href'][6::] for i in soup for r in i.find_all("li")]

In [None]:
# startlist_new
[r.find("a").text for i in soup for r in i.find_all("li")]

In [None]:
# parcours
stage_soup.select('ul.infolist li')[7].select('span')[0]['class'][2]

In [None]:
# date
stage_soup.select('ul.infolist li')[0].select('div')[1].text

In [None]:
# distance
float(stage_soup.select('ul.infolist li')[4].select('div')[1].text[:-3])

In [None]:
# points_scale
stage_soup.select('ul.infolist li')[5].select('div')[1].text

In [None]:
# uci_scale
stage_soup.select('ul.infolist li')[6].select('div')[1].text

In [None]:
# profile_score
float(stage_soup.select('ul.infolist li')[8].select('div')[1].text)

In [None]:
# vertical_meters
float(stage_soup.select('ul.infolist li')[9].select('div')[1].text)

In [None]:
# startlist_quality
float(stage_soup.select('ul.infolist li')[13].select('div')[1].text)