# The questions I will be asking in my data analysis

I've successfully scraped the latest results page for Albert Melbourne so now I will be looking at what questions I want my data analysis to answer and how I find the answers to those questions for Albert Melbourne first and then for Albert **and** the other 48 Melbourne parkruns.

**I have limited myself to three questions to ask in my data analysis:**

* Which is the most popular parkrun in Melbourne this week? Most number of participants.
* Which is the speediest course? As defined by the mean of everyone's times.
* Which is the speediest course? When looking at it in terms of the mean of age grading instead, so doesn't discriminate against courses that skew older in terms of participants.

In [2]:
# Date of event and number of participants

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' 
                  'AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/114.0.0.0 Safari/537.36'
}

response = requests.get('https://www.parkrun.com.au/albertmelbourne/results/latestresults/', headers=headers)
response.raise_for_status()

# Parse the HTML content using BeautifulSoup
doc = BeautifulSoup(response.content, 'html.parser')

# Extract the event date
date_element = doc.find('span', class_='format-date')
event_date = date_element.text.strip() if date_element else 'Date not found'

# Extract the total number of participants (position of the last finisher)
table_rows = doc.select('table tbody tr')
total_participants = len(table_rows)

# Output the results
print(f"Event Date: {event_date}")
print(f"Total Participants: {total_participants}")

Event Date: 28/06/2025
Total Participants: 762


In [13]:
# Updated scraper script to also get mean time. 

from datetime import timedelta
import re

# Extract event date
date_element = doc.find('span', class_='format-date')
event_date = date_element.text.strip() if date_element else 'Date not found'

# Extract the total number of participants (position of the last finisher)
table_rows = doc.select('table tbody tr')
total_participants = len(table_rows)

# Extract time values
rows = doc.select('table tbody tr')
times = []

for row in rows:
    cols = row.find_all('td')
    if len(cols) < 6:
        continue

    time_raw = cols[5].text.strip()

    # Use regex to extract time string (HH:MM:SS or MM:SS)
    time_match = re.search(r'(\d{1,2}:\d{2}(?::\d{2})?)', time_raw)
    if time_match:
        time_str = time_match.group(1)
        time_parts = time_str.split(':')

        if len(time_parts) == 2:  # MM:SS
            minutes, seconds = map(int, time_parts)
            td = timedelta(minutes=minutes, seconds=seconds)
        elif len(time_parts) == 3:  # HH:MM:SS
            hours, minutes, seconds = map(int, time_parts)
            td = timedelta(hours=hours, minutes=minutes, seconds=seconds)
        else:
            continue  # unexpected format, skip

        times.append(td)

# Calculate average time
if times:
    average_time = sum(times, timedelta()) / len(times)
    avg_minutes = int(average_time.total_seconds() // 60)
    avg_seconds = int(average_time.total_seconds() % 60)
    average_str = f"{avg_minutes}:{avg_seconds:02d}"
else:
    average_str = "N/A"

# Print results
print(f"Event Date: {event_date}")
print(f"Total Participants: {total_participants}")
print(f"Average Time: {average_str}")

Event Date: 28/06/2025
Total Participants: 762
Average Time: 31:24


In [14]:
# Updated scraper script again to also get mean age grading. 

# Extract event date
date_element = doc.find('span', class_='format-date')
event_date = date_element.text.strip() if date_element else 'Date not found'

# Extract the total number of participants (position of the last finisher)
table_rows = doc.select('table tbody tr')
total_participants = len(table_rows)

# Extract time values
rows = doc.select('table tbody tr')
times = []
age_grades = []

for row in rows:
    cols = row.find_all('td')
    if len(cols) < 6:
        continue

    # --- TIME ---
    time_raw = cols[5].text.strip()
    time_match = re.search(r'(\d{1,2}:\d{2}(?::\d{2})?)', time_raw)
    if time_match:
        time_str = time_match.group(1)
        time_parts = time_str.split(':')

        if len(time_parts) == 2:
            minutes, seconds = map(int, time_parts)
            td = timedelta(minutes=minutes, seconds=seconds)
        elif len(time_parts) == 3:
            hours, minutes, seconds = map(int, time_parts)
            td = timedelta(hours=hours, minutes=minutes, seconds=seconds)
        else:
            continue
        times.append(td)

    # --- AGE GRADE ---
    
    age_group_cell = cols[3]
    detailed_div = age_group_cell.find('div', class_='detailed')

    if detailed_div:
        age_grade_text = detailed_div.text.strip()  # e.g. "76.55% age grade"
        match = re.search(r'(\d{1,3}\.\d{2})%', age_grade_text)
        if match:
            age_grade = float(match.group(1))
            age_grades.append(age_grade)
    
# --- AVERAGE TIME ---
if times:
    average_time = sum(times, timedelta()) / len(times)
    avg_minutes = int(average_time.total_seconds() // 60)
    avg_seconds = int(average_time.total_seconds() % 60)
    average_str = f"{avg_minutes}:{avg_seconds:02d}"
else:
    average_str = "N/A"

# --- AVERAGE AGE GRADE ---
if age_grades:
    avg_age_grade = sum(age_grades) / len(age_grades)
    avg_age_grade_str = f"{avg_age_grade:.2f}%"
else:
    avg_age_grade_str = "N/A"

print(f"Event Date: {event_date}")
print(f"Total Participants: {total_participants}")
print(f"Average Time: {average_str}")
print(f"Average Age Grade: {avg_age_grade_str}")

Event Date: 28/06/2025
Total Participants: 762
Average Time: 31:24
Average Age Grade: 51.29%


In [17]:
# Tidied up my code a bit to make it more succinct and added in event name. 

# Get event name
header_div = doc.find('div', class_='Results-header')
event_name = header_div.find('h1').text.strip() if header_div and header_div.find('h1') else 'Event name not found'

# Extract event date
date_element = doc.find('span', class_='format-date')
event_date = date_element.text.strip() if date_element else 'Date not found'

# Extract all rows and total participants
table_rows = doc.select('table tbody tr')
total_participants = len(table_rows)

times = []
age_grades = []

for row in table_rows:
    cols = row.find_all('td')
    if len(cols) < 6:
        continue

    # --- TIME ---
    time_raw = cols[5].text.strip()
    time_match = re.search(r'(\d{1,2}:\d{2}(?::\d{2})?)', time_raw)
    if time_match:
        time_str = time_match.group(1)
        time_parts = time_str.split(':')

        if len(time_parts) == 2:
            minutes, seconds = map(int, time_parts)
            td = timedelta(minutes=minutes, seconds=seconds)
        elif len(time_parts) == 3:
            hours, minutes, seconds = map(int, time_parts)
            td = timedelta(hours=hours, minutes=minutes, seconds=seconds)
        else:
            continue
        times.append(td)

    # --- AGE GRADE ---
    age_group_cell = cols[3]
    detailed_div = age_group_cell.find('div', class_='detailed')

    if detailed_div:
        age_grade_text = detailed_div.text.strip()
        match = re.search(r'(\d{1,3}\.\d{2})%', age_grade_text)
        if match:
            age_grade = float(match.group(1))
            age_grades.append(age_grade)

# --- AVERAGE TIME ---
if times:
    average_time = sum(times, timedelta()) / len(times)
    avg_minutes = int(average_time.total_seconds() // 60)
    avg_seconds = int(average_time.total_seconds() % 60)
    average_str = f"{avg_minutes}:{avg_seconds:02d}"
else:
    average_str = "N/A"

# --- AVERAGE AGE GRADE ---
if age_grades:
    avg_age_grade = sum(age_grades) / len(age_grades)
    avg_age_grade_str = f"{avg_age_grade:.2f}%"
else:
    avg_age_grade_str = "N/A"

print(f"Event Name: {event_name}")
print(f"Event Date: {event_date}")
print(f"Total Participants: {total_participants}")
print(f"Average Time: {average_str}")
print(f"Average Age Grade: {avg_age_grade_str}")

Event Name: Albert parkrun, Melbourne
Event Date: 28/06/2025
Total Participants: 762
Average Time: 31:24
Average Age Grade: 51.29%


## I've done the analysis for Albert Melbourne, now to do it for all 49 Melbourne parkruns.

In [19]:
# First I'm bringing in the csv I greated in Notebook 01.

import csv

input_csv = 'melb_parkruns.csv'

with open(input_csv, newline='', encoding='utf-8') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row['eventname'])

albertmelbourne
highlands
hastingsforeshore
diamondcreek
berwicksprings
pointcook
westerfolds
lillydalelake
maribyrnong
pakenham
toolerncreek
froghollow
parkville
brimbank
coburg
jells
altonabeach
wyndhamvale
studley
lalor
chelseabicentennial
wilsonbotanic
karkarook
mullummullum
darebin
mernda
newportlakes
sunbury
marriottwaters
gardinerscreek
rosebud
kmreedyreserve
birdslandreserve
berwickwaters
aurora
warringalparklands
dandenong
cyrilcurtainreserve
cascadesonclydewetlands
dorsetrecreationreserve
frankstonnatureconservationres
lewisparkreserve
warrandyteriverreserve
kirkdalereserve
woodlandshistoricpark
maroondahdam
emeraldlake
aintreereserve
belvedere


In [22]:
# Check to make sure it's reading all 49 event names.

count = 0
with open(input_csv, newline='', encoding='utf-8') as f:
    reader = csv.DictReader(f)
    for row in reader:
        if row['eventname'].strip():
            count += 1

print(f"Total event names: {count}")

Total event names: 49


In [25]:
def parse_event(eventname):
    url = f"https://www.parkrun.com.au/{eventname}/results/latestresults/"

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
    }

    res = requests.get(url, headers=headers)
    if res.status_code != 200:
        print(f"Failed to get data for {eventname}, status code: {res.status_code}")
        return None

    doc = BeautifulSoup(res.text, 'html.parser')

    header_div = doc.find('div', class_='Results-header')
    event_name = header_div.find('h1').text.strip() if header_div and header_div.find('h1') else eventname

    date_element = doc.find('span', class_='format-date')
    event_date = date_element.text.strip() if date_element else 'Date not found'

    table_rows = doc.select('table tbody tr')
    total_participants = len(table_rows)

    times = []
    age_grades = []

    for row in table_rows:
        cols = row.find_all('td')
        if len(cols) < 6:
            continue

        time_raw = cols[5].text.strip()
        time_match = re.search(r'(\d{1,2}:\d{2}(?::\d{2})?)', time_raw)
        if time_match:
            time_str = time_match.group(1)
            time_parts = time_str.split(':')
            if len(time_parts) == 2:
                minutes, seconds = map(int, time_parts)
                td = timedelta(minutes=minutes, seconds=seconds)
            elif len(time_parts) == 3:
                hours, minutes, seconds = map(int, time_parts)
                td = timedelta(hours=hours, minutes=minutes, seconds=seconds)
            else:
                continue
            times.append(td)

        age_group_cell = cols[3]
        detailed_div = age_group_cell.find('div', class_='detailed')
        if detailed_div:
            age_grade_text = detailed_div.text.strip()
            match = re.search(r'(\d{1,3}\.\d{2})%', age_grade_text)
            if match:
                age_grade = float(match.group(1))
                age_grades.append(age_grade)

    if times:
        average_time = sum(times, timedelta()) / len(times)
        avg_minutes = int(average_time.total_seconds() // 60)
        avg_seconds = int(average_time.total_seconds() % 60)
        average_str = f"{avg_minutes}:{avg_seconds:02d}"
    else:
        average_str = "N/A"

    if age_grades:
        avg_age_grade = sum(age_grades) / len(age_grades)
        avg_age_grade_str = f"{avg_age_grade:.2f}%"
    else:
        avg_age_grade_str = "N/A"

    return {
        'event_name': event_name,
        'event_date': event_date,
        'total_participants': total_participants,
        'average_time': average_str,
        'average_age_grade': avg_age_grade_str,
    }

# --- Main ---
input_csv = 'melb_parkruns.csv'
output_csv = 'melb_parkruns_summary.csv'

with open(input_csv, newline='', encoding='utf-8') as f_in, \
     open(output_csv, 'w', newline='', encoding='utf-8') as f_out:

    reader = csv.DictReader(f_in)
    writer = csv.writer(f_out)

    # Write header for output CSV
    writer.writerow(['event_name', 'event_date', 'total_participants', 'average_time', 'average_age_grade'])

    for row in reader:
        eventname = row['eventname'].strip()
        print(f"Processing event: {eventname}")
        data = parse_event(eventname)
        if data:
            writer.writerow([
                data['event_name'],
                data['event_date'],
                data['total_participants'],
                data['average_time'],
                data['average_age_grade'],
            ])
        else:
            print(f"Skipping event {eventname} due to fetch error.")

print("Done! Results saved to", output_csv)

Processing event: albertmelbourne
Failed to get data for albertmelbourne, status code: 405
Skipping event albertmelbourne due to fetch error.
Processing event: highlands
Failed to get data for highlands, status code: 405
Skipping event highlands due to fetch error.
Processing event: hastingsforeshore
Failed to get data for hastingsforeshore, status code: 405
Skipping event hastingsforeshore due to fetch error.
Processing event: diamondcreek
Failed to get data for diamondcreek, status code: 405
Skipping event diamondcreek due to fetch error.
Processing event: berwicksprings
Failed to get data for berwicksprings, status code: 405
Skipping event berwicksprings due to fetch error.
Processing event: pointcook
Failed to get data for pointcook, status code: 405
Skipping event pointcook due to fetch error.
Processing event: westerfolds
Failed to get data for westerfolds, status code: 405
Skipping event westerfolds due to fetch error.
Processing event: lillydalelake
Failed to get data for lilly

In [1]:
import requests
import time
from bs4 import BeautifulSoup
from datetime import timedelta
import re
import csv

def parse_event(eventname):
    url = f"https://www.parkrun.com.au/{eventname}/results/latestresults/"
    
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
    }

    try:
        res = requests.get(url, headers=headers)
        res.raise_for_status()  # Will raise an HTTPError for bad responses
    except requests.exceptions.RequestException as e:
        print(f"Request failed for {eventname}: {e}")
        return None

    doc = BeautifulSoup(res.text, 'html.parser')

    header_div = doc.find('div', class_='Results-header')
    event_name = header_div.find('h1').text.strip() if header_div and header_div.find('h1') else eventname

    date_element = doc.find('span', class_='format-date')
    event_date = date_element.text.strip() if date_element else 'Date not found'

    table_rows = doc.select('table tbody tr')
    total_participants = len(table_rows)

    times = []
    age_grades = []

    for row in table_rows:
        cols = row.find_all('td')
        if len(cols) < 6:
            continue

        time_raw = cols[5].text.strip()
        time_match = re.search(r'(\d{1,2}:\d{2}(?::\d{2})?)', time_raw)
        if time_match:
            time_str = time_match.group(1)
            time_parts = time_str.split(':')
            if len(time_parts) == 2:
                minutes, seconds = map(int, time_parts)
                td = timedelta(minutes=minutes, seconds=seconds)
            elif len(time_parts) == 3:
                hours, minutes, seconds = map(int, time_parts)
                td = timedelta(hours=hours, minutes=minutes, seconds=seconds)
            else:
                continue
            times.append(td)

        age_group_cell = cols[3]
        detailed_div = age_group_cell.find('div', class_='detailed')
        if detailed_div:
            age_grade_text = detailed_div.text.strip()
            match = re.search(r'(\d{1,3}\.\d{2})%', age_grade_text)
            if match:
                age_grade = float(match.group(1))
                age_grades.append(age_grade)

    if times:
        average_time = sum(times, timedelta()) / len(times)
        avg_minutes = int(average_time.total_seconds() // 60)
        avg_seconds = int(average_time.total_seconds() % 60)
        average_str = f"{avg_minutes}:{avg_seconds:02d}"
    else:
        average_str = "N/A"

    if age_grades:
        avg_age_grade = sum(age_grades) / len(age_grades)
        avg_age_grade_str = f"{avg_age_grade:.2f}%"
    else:
        avg_age_grade_str = "N/A"

    return {
        'event_name': event_name,
        'event_date': event_date,
        'total_participants': total_participants,
        'average_time': average_str,
        'average_age_grade': avg_age_grade_str,
    }

# Delay between requests
time.sleep(143)

# --- Main ---
input_csv = 'melb_parkruns.csv'
output_csv = 'melb_parkruns_summary.csv'

with open(input_csv, newline='', encoding='utf-8') as f_in, \
     open(output_csv, 'w', newline='', encoding='utf-8') as f_out:

    reader = csv.DictReader(f_in)
    writer = csv.writer(f_out)

    # Write header for output CSV
    writer.writerow(['event_name', 'event_date', 'total_participants', 'average_time', 'average_age_grade'])

    for row in reader:
        eventname = row['eventname'].strip()
        print(f"Processing event: {eventname}")
        data = parse_event(eventname)
        if data:
            writer.writerow([
                data['event_name'],
                data['event_date'],
                data['total_participants'],
                data['average_time'],
                data['average_age_grade'],
            ])
        else:
            print(f"Skipping event {eventname} due to fetch error.")

print("Done! Results saved to", output_csv)

Processing event: albertmelbourne
Processing event: highlands
Processing event: hastingsforeshore
Processing event: diamondcreek
Processing event: berwicksprings
Processing event: pointcook
Processing event: westerfolds
Processing event: lillydalelake
Processing event: maribyrnong
Processing event: pakenham
Processing event: toolerncreek
Processing event: froghollow
Processing event: parkville
Processing event: brimbank
Processing event: coburg
Processing event: jells
Processing event: altonabeach
Processing event: wyndhamvale
Processing event: studley
Processing event: lalor
Processing event: chelseabicentennial
Processing event: wilsonbotanic
Processing event: karkarook
Processing event: mullummullum
Processing event: darebin
Processing event: mernda
Processing event: newportlakes
Processing event: sunbury
Processing event: marriottwaters
Processing event: gardinerscreek
Processing event: rosebud
Processing event: kmreedyreserve
Processing event: birdslandreserve
Processing event: be

In [None]:
# Could only scrape 46/49 events so scraped last three manually. 

In [2]:
# emeraldlake

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' 
                  'AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/114.0.0.0 Safari/537.36'
}

response = requests.get('https://www.parkrun.com.au/emeraldlake/results/latestresults/', headers=headers)
response.raise_for_status()

# Parse the HTML content using BeautifulSoup
doc = BeautifulSoup(response.content, 'html.parser')

# Get event name
header_div = doc.find('div', class_='Results-header')
event_name = header_div.find('h1').text.strip() if header_div and header_div.find('h1') else 'Event name not found'

# Extract event date
date_element = doc.find('span', class_='format-date')
event_date = date_element.text.strip() if date_element else 'Date not found'

# Extract all rows and total participants
table_rows = doc.select('table tbody tr')
total_participants = len(table_rows)

times = []
age_grades = []

for row in table_rows:
    cols = row.find_all('td')
    if len(cols) < 6:
        continue

    # --- TIME ---
    time_raw = cols[5].text.strip()
    time_match = re.search(r'(\d{1,2}:\d{2}(?::\d{2})?)', time_raw)
    if time_match:
        time_str = time_match.group(1)
        time_parts = time_str.split(':')

        if len(time_parts) == 2:
            minutes, seconds = map(int, time_parts)
            td = timedelta(minutes=minutes, seconds=seconds)
        elif len(time_parts) == 3:
            hours, minutes, seconds = map(int, time_parts)
            td = timedelta(hours=hours, minutes=minutes, seconds=seconds)
        else:
            continue
        times.append(td)

    # --- AGE GRADE ---
    age_group_cell = cols[3]
    detailed_div = age_group_cell.find('div', class_='detailed')

    if detailed_div:
        age_grade_text = detailed_div.text.strip()
        match = re.search(r'(\d{1,3}\.\d{2})%', age_grade_text)
        if match:
            age_grade = float(match.group(1))
            age_grades.append(age_grade)

# --- AVERAGE TIME ---
if times:
    average_time = sum(times, timedelta()) / len(times)
    avg_minutes = int(average_time.total_seconds() // 60)
    avg_seconds = int(average_time.total_seconds() % 60)
    average_str = f"{avg_minutes}:{avg_seconds:02d}"
else:
    average_str = "N/A"

# --- AVERAGE AGE GRADE ---
if age_grades:
    avg_age_grade = sum(age_grades) / len(age_grades)
    avg_age_grade_str = f"{avg_age_grade:.2f}%"
else:
    avg_age_grade_str = "N/A"

print(f"Event Name: {event_name}")
print(f"Event Date: {event_date}")
print(f"Total Participants: {total_participants}")
print(f"Average Time: {average_str}")
print(f"Average Age Grade: {avg_age_grade_str}")

Event Name: Emerald Lake parkrun
Event Date: 28/06/2025
Total Participants: 44
Average Time: 32:26
Average Age Grade: 50.25%


In [3]:
# aintreereserve

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' 
                  'AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/114.0.0.0 Safari/537.36'
}

response = requests.get('https://www.parkrun.com.au/aintreereserve/results/latestresults/', headers=headers)
response.raise_for_status()

# Parse the HTML content using BeautifulSoup
doc = BeautifulSoup(response.content, 'html.parser')

# Get event name
header_div = doc.find('div', class_='Results-header')
event_name = header_div.find('h1').text.strip() if header_div and header_div.find('h1') else 'Event name not found'

# Extract event date
date_element = doc.find('span', class_='format-date')
event_date = date_element.text.strip() if date_element else 'Date not found'

# Extract all rows and total participants
table_rows = doc.select('table tbody tr')
total_participants = len(table_rows)

times = []
age_grades = []

for row in table_rows:
    cols = row.find_all('td')
    if len(cols) < 6:
        continue

    # --- TIME ---
    time_raw = cols[5].text.strip()
    time_match = re.search(r'(\d{1,2}:\d{2}(?::\d{2})?)', time_raw)
    if time_match:
        time_str = time_match.group(1)
        time_parts = time_str.split(':')

        if len(time_parts) == 2:
            minutes, seconds = map(int, time_parts)
            td = timedelta(minutes=minutes, seconds=seconds)
        elif len(time_parts) == 3:
            hours, minutes, seconds = map(int, time_parts)
            td = timedelta(hours=hours, minutes=minutes, seconds=seconds)
        else:
            continue
        times.append(td)

    # --- AGE GRADE ---
    age_group_cell = cols[3]
    detailed_div = age_group_cell.find('div', class_='detailed')

    if detailed_div:
        age_grade_text = detailed_div.text.strip()
        match = re.search(r'(\d{1,3}\.\d{2})%', age_grade_text)
        if match:
            age_grade = float(match.group(1))
            age_grades.append(age_grade)

# --- AVERAGE TIME ---
if times:
    average_time = sum(times, timedelta()) / len(times)
    avg_minutes = int(average_time.total_seconds() // 60)
    avg_seconds = int(average_time.total_seconds() % 60)
    average_str = f"{avg_minutes}:{avg_seconds:02d}"
else:
    average_str = "N/A"

# --- AVERAGE AGE GRADE ---
if age_grades:
    avg_age_grade = sum(age_grades) / len(age_grades)
    avg_age_grade_str = f"{avg_age_grade:.2f}%"
else:
    avg_age_grade_str = "N/A"

print(f"Event Name: {event_name}")
print(f"Event Date: {event_date}")
print(f"Total Participants: {total_participants}")
print(f"Average Time: {average_str}")
print(f"Average Age Grade: {avg_age_grade_str}")

Event Name: Aintree Reserve parkrun
Event Date: 28/06/2025
Total Participants: 61
Average Time: 32:29
Average Age Grade: 49.33%


In [6]:
# belvedere

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' 
                  'AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/114.0.0.0 Safari/537.36'
}

response = requests.get('https://www.parkrun.com.au/belvedere/results/latestresults/', headers=headers)
response.raise_for_status()

# Parse the HTML content using BeautifulSoup
doc = BeautifulSoup(response.content, 'html.parser')

# Get event name
header_div = doc.find('div', class_='Results-header')
event_name = header_div.find('h1').text.strip() if header_div and header_div.find('h1') else 'Event name not found'

# Extract event date
date_element = doc.find('span', class_='format-date')
event_date = date_element.text.strip() if date_element else 'Date not found'

# Extract all rows and total participants
table_rows = doc.select('table tbody tr')
total_participants = len(table_rows)

times = []
age_grades = []

for row in table_rows:
    cols = row.find_all('td')
    if len(cols) < 6:
        continue

    # --- TIME ---
    time_raw = cols[5].text.strip()
    time_match = re.search(r'(\d{1,2}:\d{2}(?::\d{2})?)', time_raw)
    if time_match:
        time_str = time_match.group(1)
        time_parts = time_str.split(':')

        if len(time_parts) == 2:
            minutes, seconds = map(int, time_parts)
            td = timedelta(minutes=minutes, seconds=seconds)
        elif len(time_parts) == 3:
            hours, minutes, seconds = map(int, time_parts)
            td = timedelta(hours=hours, minutes=minutes, seconds=seconds)
        else:
            continue
        times.append(td)

    # --- AGE GRADE ---
    age_group_cell = cols[3]
    detailed_div = age_group_cell.find('div', class_='detailed')

    if detailed_div:
        age_grade_text = detailed_div.text.strip()
        match = re.search(r'(\d{1,3}\.\d{2})%', age_grade_text)
        if match:
            age_grade = float(match.group(1))
            age_grades.append(age_grade)

# --- AVERAGE TIME ---
if times:
    average_time = sum(times, timedelta()) / len(times)
    avg_minutes = int(average_time.total_seconds() // 60)
    avg_seconds = int(average_time.total_seconds() % 60)
    average_str = f"{avg_minutes}:{avg_seconds:02d}"
else:
    average_str = "N/A"

# --- AVERAGE AGE GRADE ---
if age_grades:
    avg_age_grade = sum(age_grades) / len(age_grades)
    avg_age_grade_str = f"{avg_age_grade:.2f}%"
else:
    avg_age_grade_str = "N/A"

print(f"Event Name: {event_name}")
print(f"Event Date: {event_date}")
print(f"Total Participants: {total_participants}")
print(f"Average Time: {average_str}")
print(f"Average Age Grade: {avg_age_grade_str}")

Event Name: Belvedere parkrun
Event Date: 28/06/2025
Total Participants: 220
Average Time: 33:27
Average Age Grade: 51.18%


In [7]:
import pandas as pd

df = pd.read_csv('melb_parkruns_summary.csv')
df

Unnamed: 0,event_name,event_date,total_participants,average_time,average_age_grade
0,"Albert parkrun, Melbourne",28/06/2025,762,31:24,51.29%
1,Highlands parkrun,28/06/2025,85,36:30,46.54%
2,Hastings Foreshore parkrun,28/06/2025,171,32:47,52.30%
3,Diamond Creek parkrun,28/06/2025,294,33:14,51.12%
4,Berwick Springs parkrun,28/06/2025,184,32:42,51.29%
5,Point Cook parkrun,28/06/2025,138,33:28,48.13%
6,Westerfolds parkrun,28/06/2025,166,33:02,49.61%
7,Lillydale Lake parkrun,28/06/2025,220,29:55,54.20%
8,Maribyrnong parkrun,28/06/2025,414,30:49,52.66%
9,Pakenham parkrun,28/06/2025,159,32:32,50.97%


In [9]:
df.columns

Index(['event_name', 'event_date', 'total_participants', 'average_time',
       'average_age_grade'],
      dtype='object')

In [12]:
new_events = [
    {
        "event_name": "Emerald Lake parkrun",
        "event_date": "28/06/2025",
        "total_participants": 44,
        "average_time": "32:26",
        "average_age_grade": "50.25%"
    },
    {
        "event_name": "Aintree Reserve parkrun",
        "event_date": "28/06/2025",
        "total_participants": 61,
        "average_time": "32:29",
        "average_age_grade": "49.33%"
    },
    {
        "event_name": "Belvedere parkrun",
        "event_date": "28/06/2025",
        "total_participants": 220,
        "average_time": "33:27",
        "average_age_grade": "51.18%"
    }
]

df_new = pd.DataFrame(new_events)

In [13]:
df_combined = pd.concat([df, df_new], ignore_index=True)
df_combined

Unnamed: 0,event_name,event_date,total_participants,average_time,average_age_grade
0,"Albert parkrun, Melbourne",28/06/2025,762,31:24,51.29%
1,Highlands parkrun,28/06/2025,85,36:30,46.54%
2,Hastings Foreshore parkrun,28/06/2025,171,32:47,52.30%
3,Diamond Creek parkrun,28/06/2025,294,33:14,51.12%
4,Berwick Springs parkrun,28/06/2025,184,32:42,51.29%
5,Point Cook parkrun,28/06/2025,138,33:28,48.13%
6,Westerfolds parkrun,28/06/2025,166,33:02,49.61%
7,Lillydale Lake parkrun,28/06/2025,220,29:55,54.20%
8,Maribyrnong parkrun,28/06/2025,414,30:49,52.66%
9,Pakenham parkrun,28/06/2025,159,32:32,50.97%


In [1]:
df_combined.to_csv('melb_parkruns_summary.csv', index=False)

NameError: name 'df_combined' is not defined

For some reason my csv file has become corrupted and has turned all the MM:SS average_time values to HH:MM:00. I have saved a version of the data as an Excel Workbook to get around this problem as re-pasting the correct values into the columns does not persist after saving it.

**UPDATE:** There was nothing wrong with my csv file, it was how Excel was automatically formatting it that was the problem. Still got to have create an Excel Workbook version for manipulating the data in Excel. 

**NOTE:** Didn't realise that parkrun had a no scraping policy (https://www.parkrun.com/scraping) but as I'm using this data for a university assignment it is personal, non-commercial use.