# WCA Results - Automated Download and Import

Created by Michael George (AKA Logiqx)

Download the latest database extract from https://www.worldcubeassociation.org/results/misc/export.html

Note: Downloads with a specific filename instead of  https://www.worldcubeassociation.org/results/misc/WCA_export.sql.zip

## Import Common Libraries

Import the libraries that are used throughout this notebook

In [2]:
# Miscellaneous operating system interfaces
import os

# Time module used for performance counters
import time

## Determine the Database Details

Connection details for MySQL / MariaDB database

Note: You will need to specify the password in $HOME/.my.cnf

In [3]:
hostname = os.environ['MYSQL_HOSTNAME']
database = os.environ['MYSQL_DATABASE']
username = os.environ['MYSQL_USER']

## Download the HTML

Fetch the database export  page from the WCA website.

In [5]:
# The library urllib2 will be used for the download
import urllib.request
import ssl

# Do not verify certicates
context = ssl._create_unverified_context()

zip_url = "https://www.worldcubeassociation.org/results/misc/WCA_export.sql.zip"
zip_fn = os.path.basename(zip_url)

## Download the ZIP

Save the ZIP to the local machine.

In [5]:
# Start time in fractional seconds
pc1 = time.perf_counter()

# Create file handle for the ZIP
infile = urllib.request.urlopen(zip_url, context=context)

# Write the ZIP to a local file
with open(zip_fn, "wb") as outfile:
    outfile.write(infile.read())

# Close the URL
infile.close()

# End time in fractional seconds
pc2 = time.perf_counter()

print("Download completed in %0.2f seconds" % (pc2 - pc1))

Download completed in 56.89 seconds


## Extract the SQL

Extract the SQL script from within the ZIP file.

In [6]:
# Use the zipfile library to handle the zipfile
import zipfile

# Start time in fractional seconds
pc1 = time.perf_counter()

# Open the ZIP file
zipfile = zipfile.ZipFile(zip_fn, "r")

# Iterate through members
for member in zipfile.namelist():
    
    # Is it the SQL?
    if member.endswith(".sql"):
        
        # Extract the SQL
        zipfile.extract(member)

# Close the ZIP file
zipfile.close()

# End time in fractional seconds
pc2 = time.perf_counter()

print("Extract completed in %0.2f seconds" % (pc2 - pc1))

Extract completed in 7.66 seconds


## Generic SQL Function

Simple function to run a SQL script

In [7]:
def runSqlScript(source):
    # Construct the command line
    cmd = 'mysql --host=%s --database=%s --user=%s --execute="source %s" --default-character-set=utf8' % (hostname, database, username, source)

    # Execute the command line and report any errors
    result = os.system(cmd)
    if result != 0:
        print('%s returned %d' % (source, result))

## Populate the WCA Database

Note: The actual database is expected to exist already

In [8]:
# Start time in fractional seconds
pc1 = time.perf_counter()

sqlScript = 'WCA_export.sql'
runSqlScript(sqlScript)
os.unlink(sqlScript)

# End time in fractional seconds
pc2 = time.perf_counter()

print("Load completed in %0.2f seconds" % (pc2 - pc1))

Load completed in 139.71 seconds


## Schema Changes

Alter tables and create table indices

In [9]:
# Start time in fractional seconds
pc1 = time.perf_counter()

runSqlScript('../sql/alter_tables.sql')
runSqlScript('../sql/create_indices.sql')

# End time in fractional seconds
pc2 = time.perf_counter()

print("Indexing completed in %0.2f seconds" % (pc2 - pc1))

Indexing completed in 122.98 seconds


## All Done!