# Gym Attendance Mapping
## Background 

For years now I have been going to the gym routinely. 
In recent times, I have encountered a question repeatedly; which gym should go to? 
The gym I am a member of, "Revo Gym", has many locations across Perth and Australia. In recent years, their membership has grown so dramatically that it has become difficult to use the gym if you pick your time wrong. 


<hr>
### The Problem 
***How can I better understand which gym to go to and when to ensure low gym attendance?***


Luckily, there was a way to develop a solution. The gym hosts a website for live member counts at each gym. Each location had it's own dedicated count, from which data could be scraped in real time to gain insight. 

### Idea Outline

In order to gather the attendance data, I decided to use a webscraper to gather this data in realtime. 
The Revo website https://revofitness.com.au/livemembercount/, has dropdowns for each location. In order to map each location, I need positional coordinates for each location, which was gathered manually through Google. 

Inspecting the html of the webpage reveals the name of the HTML asset necessary to be scraped. 

The following libraries are needed to run the program:

In [1]:
import time
from datetime import datetime
import csv
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.action_chains import ActionChains
import concurrent.futures
import folium
from folium.plugins import LocateControl
from branca.colormap import LinearColormap as colourmap
import webbrowser

<br>

The URL of the webpage needs to be assigned:

In [2]:
url = "https://revofitness.com.au/livemembercount/"

<br>

Each location has several bits of information for it's coordinates, HTML asset name and name of the location:

In [3]:
location_coordinates = {
    "Banksia Grove": {"lat": -31.672501, "lng": 115.746849, "value": "banksia-grove"},
    "Belmont": {"lat": -31.964600, "lng": 115.935509, "value": "belmont"},
    "Cockburn": {"lat": -32.1263833, "lng": 115.830261, "value": "cockburn"},
    "Canning Vale": {"lat": -32.090169092344475, "lng": 115.91887857705403, "value": "canning-vale"},
    "Cannington": {"lat": -32.015549893973635, "lng": 115.93979642957694, "value": "cannington"},
    "Claremont": {"lat": -31.9763005, "lng": 115.7817048, "value": "claremont"},
    "Innaloo": {"lat": -31.8953548, "lng": 115.8011121, "value": "innaloo"},
    "Joondalup": {"lat": -31.7487718, "lng": 115.7653477, "value": "joondalup"},
    "Kelmscott": {"lat": -32.1157098, "lng": 116.0168508, "value": "kelmscott"},
    "Kwinana": {"lat": -32.2461406, "lng": 115.8151724, "value": "kwinana"},
    "Midland": {"lat": -31.8981609, "lng": 116.0169634, "value": "midland"},
    "Mount Hawthorn": {"lat": -31.9224346, "lng": 115.8412221, "value": "mount-hawthorn"},
    "Mirrabooka": {"lat": -31.86907, "lng": 115.86119, "value": "mirrabooka"},
    "Morley": {"lat": -31.8961967, "lng": 115.8944065, "value": "morley"},
    "Myaree": {"lat": -32.0410415, "lng": 115.8157906, "value": "myaree"},
    "Northbridge": {"lat": -31.9449892, "lng": 115.8531079, "value": "northbridge"},
    "O'Connor": {"lat": -32.0568007, "lng": 115.7928479, "value": "oconnor"},
    "Scarborough": {"lat": -31.8952171, "lng": 115.7573181, "value": "scarborough"},
    "Shenton Park": {"lat": -31.9538897, "lng": 115.7970556, "value": "shenton-park"},
    "Victoria Park": {"lat": -31.9687955, "lng": 115.8896312, "value": "victoria-park"}
}

<br>

The next step is to create the map object for which any items can be added. For this, I used Folium as it is a good interactive platform for easily creating customizable maps:

In [4]:
csv_filename = "data1.csv"

m = folium.Map(
    location=[-32.023663961724345, 115.86145511966623],
    zoom_start=11.5,
    tiles= "CartoDB positron",
    attr="Stamen Toner Lite",
)

The location coordinates correspond to my city of Perth, Western Australia. 
<br>
An important part of my decision to use Folium for mapping is that is hughly customizable for aesthetics. As such, I changed the theme of the map to Stamen Tiles' Toner Lite

<br>
In order to access the webpage in short amounts of time with a webscraper, there is a popup that is necessary to overcome. As such, I included a function for clicking outside of the banner (in this instance, I chose for it to click the 'heading') to dismiss it:

In [5]:
def dismiss_banners(driver):
    element_selector = "p.ticker-heading"
    element = driver.find_element(By.CSS_SELECTOR, element_selector)
    actions = ActionChains(driver)
    actions.move_to_element(element).click().perform()

<br>
Now comes the part that I'm interested in; webscraping the attendance data for each location. 

<br>For this, I am going to ammend it to a CSV file. I had originally planned on exporting the data to R for better graphing capailities, but was satisfied with what Folium was able to provide in Python. 

<br> Each element of the following chunk has been anotated for ease of reading: 

In [9]:
def scrape_and_append_to_csv_for_location(location, driver):
    try:
        # Find element by HTML ID
        dropdown = driver.find_element(By.ID, "gyms-wa")

        # Select the value for the location, as named above
        dropdown.send_keys(location_coordinates[location]["value"])

        # Wait for web content to load 
        time.sleep(2)

        # Dismiss banners and any overlays by clicking on the same element as the attendance number
        dismiss_banners(driver)

        # Get the page source after dismissing the banners
        page_source = driver.page_source

        # Use beautifulsoup to parse the html page
        soup = BeautifulSoup(page_source, "html.parser")

        # Use CSS selector to extract the attendance value 
        target_element = soup.select_one(f"span.the-number#{location_coordinates[location]['value']}-number")
        if target_element:
            attendance = int(target_element.get_text(strip=True))  # Convert attendance to an integer, because it can only be an int
        else:
            attendance = "N/A"

        # Get current timestamp
        timestamp = datetime.now().strftime('%m-%d %H:%M')

        #This following part was added later in the code to make the output CSV easier to use 
        # Change Lat and Long coordinates to floats before storing
        lat = float(location_coordinates[location]['lat'])
        lng = float(location_coordinates[location]['lng'])

        # Append data to CSV file
        with open(csv_filename, "a", newline="") as csvfile:
            csv_writer = csv.writer(csvfile, quoting=csv.QUOTE_MINIMAL)
            csv_writer.writerow([timestamp, location, attendance, lat, lng])

        print(f"Data scraped at {timestamp} for {location}: {attendance}")

        # Create the colourmap to visualise attendance easier 
        attendance_colormap = colourmap(["green", "yellow", "red"], None, 0, 100, )

        # Add Location names to markers
        folium.Marker(
            location=[location_coordinates[location]["lat"], location_coordinates[location]["lng"]],
            icon=folium.DivIcon(
                icon_anchor = (25,10),
                html=f"""<div style="text-align: center; font-size: 15pt; color: black; font-family: helvetica;">{location}</div>"""
            ),
        ).add_to(m)

        # Add a marker to the Folium map for each location
        folium.CircleMarker(
            location=(location_coordinates[location]['lat'], location_coordinates[location]["lng"]),
            radius=attendance / 2,  
            color=attendance_colormap(attendance),  
            fill=True,
            fill_color=attendance_colormap(attendance),  
            fill_opacity=attendance / 150,  
            popup=f"{location}: {attendance}",
            tooltip=f"Attendance: {attendance}",
        ).add_to(m)

        m.save("attendance_map_new.html")

    except Exception as e:
        print("Error:", e)

<br> The next step was to include the use of paralell scrapers using Concurrent Futures. This would make the process much quicker and allow it to be run more regularly. Without it, the process took about 40 seconds to run, which is cut down greatly to about 20seconds with the use of concurrent scrapers.

<br> The following code was used:

In [7]:
def scrape_data_in_parallel():
    try:
        firefox_options = Options()
        firefox_options.add_argument("--headless")
        service = Service(executable_path="/Users/abcd/geckodriver", log_output=None)
        driver = webdriver.Firefox(service=service, options=firefox_options)

        driver.get(url)

        # Clear the CSV file and write data
        # The program was having trouble with ammending new data, so this was added to account for this
        with open(csv_filename, "w", newline="") as csvfile:
            csv_writer = csv.writer(csvfile)
            csv_writer.writerow(["Date", "Location", "Attendance", "Coordinates"])

        # Threadpool executor can use up to 5 executors, but the ROI on time seemed to max out around 3
        with concurrent.futures.ThreadPoolExecutor(max_workers= 3) as executor:
            # Use executor.map to parallelize data scraping for multiple locations
            executor.map(lambda location: scrape_and_append_to_csv_for_location(location, driver), location_coordinates)

        # Close the WebDriver
        driver.quit()

    except Exception as e:
        print("Error:", e)



<hr>
## The Output
After all of that, we can call the function responsible for scraping and see what we end up with:

In [8]:
scrape_data_in_parallel()
m

Data scraped at 10-25 09:46 for Belmont: 32
Data scraped at 10-25 09:46 for Banksia Grove: 16
Data scraped at 10-25 09:46 for Cockburn: 28
Data scraped at 10-25 09:46 for Canning Vale: 65
Data scraped at 10-25 09:46 for Cannington: 57
Data scraped at 10-25 09:46 for Claremont: 53
Data scraped at 10-25 09:46 for Innaloo: 56
Data scraped at 10-25 09:46 for Joondalup: 82
Data scraped at 10-25 09:46 for Kelmscott: 21
Data scraped at 10-25 09:46 for Kwinana: 40
Data scraped at 10-25 09:46 for Midland: 41
Data scraped at 10-25 09:46 for Mount Hawthorn: 40
Data scraped at 10-25 09:46 for Mirrabooka: 44
Data scraped at 10-25 09:46 for Morley: 40
Data scraped at 10-25 09:46 for Myaree: 61
Data scraped at 10-25 09:46 for Northbridge: 13
Data scraped at 10-25 09:46 for O'Connor: 41
Data scraped at 10-25 09:46 for Scarborough: 92
Data scraped at 10-25 09:46 for Shenton Park: 17
Data scraped at 10-25 09:46 for Victoria Park: 33
