## BIOS821 Final Project: European Soccer Database
### Allison Young
#### November 2019


### Pre-work
Please complete the following steps prior to running the script in this notebook.

#### 1. Download the Data
This repository does not contain any actual data. Thus, in order to use this code, you must first download the database
from https://www.kaggle.com/hugomathien/soccer. From there, the code may be used to automate the rest of the process.

#### 2. Create a MapBox Account
If you wish to add additional functionality using the included soccer_geocode package, you will need to create your own account at https://www.mapbox.com/. Once you have made your own account, locate your account key via the account dashboard for mapbox. Copy this key, which should be a string of characters starting with "pk.", and add it export this vaule as the environmental variable "MAP_PASS" in your local home directory. For assistance with creating local environment variables in Mac OS, you can check out this helpful article : https://medium.com/@himanshuagarwal1395/setting-up-environment-variables-in-macos-sierra-f5978369b255

In this case, the variable added to your .bash_profile should look something like this, with your own pass key in place of the Xs:

export MAP_PASS="pk.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

Once you have downloaded the data to your computer's Download Folder, created your MapBox Account, and added your API key variable, you are ready to set up your working environment to best utilize the included scripts and packages. 

#### 3. Clone Repo to Local Machine
Clone this repository to desired location on your local machine using gitlab.

#### 4. Build and Run Docker Image (if you do not have appropriate packages and software on your local machine)
Next, start a docker image on your machine, using the provided Dockerfile. To do this, simply navigate from terminal to your local copy of the repo and type the following into bash:

*"docker build -t soccer"*

This will create a container image on your local machine, called soccer. The build process may take several minutes to complete. Once finished, you can check the image has been built correctly by typing the following into bash, and checking that "soccer" is listed.

*"docker images"*

Finally, to run the container, type the following into bash:

*"docker run -it soccer bash"*

This will open the container on your terminal. 

#### 5. Open Repo in Jupyter Notebook or Lab
From here, you can simply type 
*"Jupyter Notebook"*
into the terminal, and it will bring you to a web portal, where you will enter the token provided as part of the html link the first time you open Jupyter. You have the option to set a password for future, more efficient access. 

Finally, you should see the repo on Jupyter Notebook, and be able to access the rest of the code via the jupyter notebook entitled "European_Soccer_Database_Pipeline".

To exit the container at any time and return to your local terminal , press ctrl+d.

### Part I: Load the Soccer Database

In order unzip the downloaded file, you will need to define three variables: a) the download directory, b) the location where you store md5 files (or simply the dowload directory again if you do not have an md5 directory), and c) the location of your repo, and then run the script "get_data.sh"

### Add your directories to script here:

**!a=**    *insert download directory here*

**!b=**    *insert md5 directory here*

**!c=**   *insert repo directory*

In [None]:
!a=    
!b=   
!c=   

!/bin/bash
!bash get_data.sh
!prepare_data $a $b $c 

The "get_data.sh" script will check for the file downloaded to your computer in the downloads folder, ensure that it is the correct file by matching the md5 of the file against that of the desired file. Finally, it will unzip the downloaded file to the designated location of your local repository. 

### Part II: Connect to SQL Database and Create Geocode Table

The next set of commands will create a connection to sqlite database, and then create a new table called "geocodes". Using the soccer_geocode package included in this repo, a MapBoxGeocoder object will accept a list of countries from the database, and use the MapBox API to add the latitude and longitude coordinates to the "geocodes" table.

In [1]:
### Import packages
import os
import sqlite3
from sqlite3 import Error
from soccer_geocode import TableMaker as tm
from soccer_geocode import MapBoxGeocoder,ConstrainedGeocoder

### Connect to the database
connection = sqlite3.connect("database.sqlite")

### Set API Key
map_pass = os.environ.get('MAP_PASS') 

### Make Table
tm.tableMaker(connection)

### Fetch Country list from database
countries = tm.fetchCountries(connection)

### Initiate Geocoder
G = MapBoxGeocoder(map_pass,countries)

### Geocode countries (and regions)
G.labelCountries
G.geocode

### Package Table Attributes
data_tuples = G.packageTable

### Fill Table
tm.addValues(connection,data_tuples)

### Show Table, to confirm process was successful
print("Resulting Table: ")
tm.fetchRows(connection) 


Geocodes table created
Countries have been fetched
MapBoxGeocoder initiated
1 1 Belgium 51 5
2 1729 England 51.5 -0.11667
3 4769 France 47 2
4 7809 Germany 51 10
5 10257 Italy 43 12
6 13274 Netherlands 52.31667 5.55
7 15722 Poland 52 19
8 17642 Portugal 38.7 -9.18333
9 19694 Scotland 57 -5
10 21518 Spain 40 -3
11 24558 Switzerland 46.79856 8.23197
Values added to Geocodes Table
Resulting Table: 


[(1, 1, 'Belgium', 51.0, 5.0),
 (2, 1729, 'England', 51.5, -0.11667),
 (3, 4769, 'France', 47.0, 2.0),
 (4, 7809, 'Germany', 51.0, 10.0),
 (5, 10257, 'Italy', 43.0, 12.0),
 (6, 13274, 'Netherlands', 52.31667, 5.55),
 (7, 15722, 'Poland', 52.0, 19.0),
 (8, 17642, 'Portugal', 38.7, -9.18333),
 (9, 19694, 'Scotland', 57.0, -5.0),
 (10, 21518, 'Spain', 40.0, -3.0),
 (11, 24558, 'Switzerland', 46.79856, 8.23197)]

### Part III: Building a Working Analytic Dataset