# Create Extracts

Author: Michael George (AKA Logiqx)

Purpose: Extract data from the WCA database and save as CSV files

## Initialisation

Basic approach to determine the project directory

In [1]:
import os, sys

projdir = os.path.realpath(os.path.join(sys.path[0], '..'))

## Database Parameters

Connection details for MySQL / MariaDB database

Note: You may need to update the password in `$HOME/.my.cnf`

In [2]:
hostname = os.environ['MYSQL_HOST']
database = os.environ['MYSQL_DATABASE']
username = os.environ['MYSQL_USER']

## Generic SQL Function

Simple function to run a SQL extract and save the results in a CSV

In [3]:
import sqlparse, pymysql
import csv

def runSqlExtract(extract):
   
    source = os.path.join(projdir, 'sql', 'extract_%s.sql' % extract)
    with open(source) as infile:
        script = infile.read()
        statements = sqlparse.split(script)

        con = pymysql.connect(host=hostname, user=username, db=database,
                              read_default_file='~/.my.cnf', autocommit=True)
        with con:
            cur = con.cursor()

            for statement in statements:
                if statement:
                    cur.execute(statement)
                    rows = cur.fetchall()

                    fn = os.path.join(projdir, 'data', 'extract', extract + '.csv')
                    if not os.path.exists(os.path.dirname(fn)):
                        os.makedirs(os.path.dirname(fn))

                    with open(fn, 'w') as outfile:
                        csvWriter = csv.writer(outfile, quoting = csv.QUOTE_MINIMAL, lineterminator = os.linesep)

                        for row in rows:
                            csvWriter.writerow(row)

## Run Extracts

Extract data from database for subsequent analysis - percentiles, rankings, etc

In [4]:
import time
pc1 = time.perf_counter()

runSqlExtract('events')
runSqlExtract('competitions')
runSqlExtract('continents')
runSqlExtract('countries')
runSqlExtract('senior_rankings')
runSqlExtract('seniors')

pc2 = time.perf_counter()
print("Extracts completed in %0.2f seconds" % (pc2 - pc1))

Extracts completed in 2.57 seconds


## All Done!