# WCA Results Database - Automated Download and Import

Created by Michael George (AKA Logiqx)

Download the latest database extract from https://www.worldcubeassociation.org/results/misc/export.html

## Initialisation

Basic approach to determine the project directory

In [1]:
import os, sys

projdir = os.path.realpath(os.path.join(sys.path[0], '..'))

## Import Common Libraries

Import the libraries that are used throughout this notebook

In [2]:
# Time module used for performance counters
import time

## Determine the Database Details

Connection details for MySQL / MariaDB database

Note: You will need to specify the password in $HOME/.my.cnf

In [3]:
hostname = os.environ['MYSQL_HOST']
database = os.environ['MYSQL_DATABASE']
username = os.environ['MYSQL_USER']

## Download the ZIP

Save the ZIP to the local machine.

In [4]:
# Hack to force IPv4 - required on my Windows laptop for Alpine 3.13 (and newer)
import Force_IPv4

# The library urllib2 will be used for the download
import urllib.request

# Start time in fractional seconds
pc1 = time.perf_counter()

# Create file handle for the ZIP
zip_url = "https://www.worldcubeassociation.org/results/misc/WCA_export.sql.zip"
req = urllib.request.Request(zip_url, headers={'User-Agent': 'Mozilla'})
infile = urllib.request.urlopen(req, timeout = 900)
    
# Write the ZIP to a local file
zip_fn = os.path.basename(zip_url)
with open(zip_fn, "wb") as outfile:
    chunk = infile.read(4096)
    while chunk:
        outfile.write(chunk)
        chunk = infile.read(4096)

# Close the URL
infile.close()

# End time in fractional seconds
pc2 = time.perf_counter()

print("Download completed in %0.2f seconds" % (pc2 - pc1))

Download completed in 345.95 seconds


## Extract the SQL

Extract the SQL script from within the ZIP file.

In [5]:
# Use the zipfile library to handle the zipfile
import zipfile

# Start time in fractional seconds
pc1 = time.perf_counter()

# Open the ZIP file
zipfile = zipfile.ZipFile(zip_fn, "r")

# Iterate through members
for member in zipfile.namelist():
    
    # Is it the SQL?
    if member.endswith(".sql"):
        
        # Extract the SQL
        zipfile.extract(member)

# Close the ZIP file
zipfile.close()

# End time in fractional seconds
pc2 = time.perf_counter()

print("Extract completed in %0.2f seconds" % (pc2 - pc1))

Extract completed in 7.22 seconds


## Generic SQL Function

Simple function to run a SQL script using the MySQL client

In [6]:
import subprocess

def runSqlScript(source):   
    cmd = ['mysql', '--host=%s' % hostname, '--database=%s' % database, '--user=%s' % username, '--default-character-set=utf8']

    with open(source) as infile:
        proc = subprocess.Popen(cmd, stdin = infile, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
        stdout, stderr = proc.communicate()
        if proc.returncode != 0:
            raise Exception('%s returned %d: %s' % (source, proc.returncode, stderr.decode('utf-8')))

## Populate the WCA Database

Note: The actual database is expected to exist already

In [7]:
# Start time in fractional seconds
pc1 = time.perf_counter()

sqlScript = 'WCA_export.sql'
runSqlScript(sqlScript)
os.unlink(sqlScript)

# End time in fractional seconds
pc2 = time.perf_counter()

print("Load completed in %0.2f seconds" % (pc2 - pc1))

Load completed in 173.01 seconds


## Schema Changes

Alter tables and create table indices

In [8]:
# Start time in fractional seconds
pc1 = time.perf_counter()

runSqlScript(os.path.join(projdir, 'sql', 'alter_tables.sql'))
runSqlScript(os.path.join(projdir, 'sql', 'create_indices.sql'))

# End time in fractional seconds
pc2 = time.perf_counter()

print("Indexing completed in %0.2f seconds" % (pc2 - pc1))

Indexing completed in 137.94 seconds


## All Done!