## getting the data from github
the first thing we want to do is get the data from github.  The following code pulls down the zipped psv file 

## pushing the data into Google Big Query
The next step is to push the data into google big query. 
You can use this method: https://cloud.google.com/bigquery/loading-data-into-bigquery#loaddatapostrequest
Or you can figure out some other python-friendly, programmatic menthod.  
Also, don't forget that we have a custom delimiter ('|'), that will have to be set with the configuration.load.fieldDelimiter option.  This defines the custom separator for fields in a CSV file.  The default value is a comma (','). Since ('|') is not a comman, we need to set this option. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. 

In [30]:
import json
from time import time
import csv
import zipfile, io, requests
import httplib2
from apiclient.discovery import build
from oauth2client.client import flow_from_clientsecrets
from oauth2client.service_account import ServiceAccountCredentials
from oauth2client.file import Storage
from oauth2client import tools
import schema
from apiclient.http import MediaFileUpload 
import os

### get all the dataset downloaded& schema prepared

In [8]:
#Schema Generating schema.py imported
SCHEMA=schema.schema
#get the csv files unzipped and stored locally
for i in range(1960,2013):
    request = requests.get('https://github.com/estheryou/fbi-reta-data/blob/master/recoded-data/reta_%s_data.csv.zip?raw=true'%(i))
    zfile = zipfile.ZipFile(io.BytesIO(request.content)) 
    test = zfile.open('reta_%s_data.csv'%(i)).read()
    a=open('reta_%s.csv'%i, 'w') 
    a.write(test.decode("utf-8")) 
    a.close()

### Project Information

In [17]:
project_id='my-fbi-resume-project'
dataset_id='the_data_set_from_fbi_test'

###  Step1 Oauth2.0 Authorization
reference:https://developers.google.com/api-client-library/python/guide/aaa_oauth
scope:https://developers.google.com/identity/protocols/googlescopes


In [10]:
def auth():
    #Thanks to Stackoverflow
    scopes = ['https://www.googleapis.com/auth/bigquery']## looks like a list,might append more
    credentials = ServiceAccountCredentials.from_json_keyfile_name('My FBI resume.json', scopes)

    http0 = httplib2.Http()
    http = credentials.authorize(http0)
    #could save token just being lazy here
    #the one with the credential info
    Big_Query_service= build('bigquery', 'v2', http=http)
    JOBS=Big_Query_service.jobs()
    return JOBS

In [11]:
Bigquery=auth()

### Step2 Upload the datsets into Google Bigquery

In [1]:
def upload(Jobs):
    t0 = time()
    for i in range(1960,2013):
        TABLE_ID='reta_%s'%i
        job_id='job_id_t%s'%i
        load={
          'destinationTable': {
          'projectId': project_id,
          'datasetId': dataset_id,
          'tableId': TABLE_ID},
          'schema':SCHEMA
           }
        upload = MediaFileUpload('reta_%s.csv'%(i), # sample.csv is from the bigquerye2e ,
                           mimetype='application/octet-stream',
                           # This enables resumable uploads.
                           resumable=False) #post mode s
        
        Jobs.insert(projectId=project_id,
                     body={
                            "jobReference": {"jobId": job_id} ,
                            "configuration": {
                            'sourceUris': ('reta_%s.csv'%i),
                            "load": load
                    }},media_body=upload).execute()
    print "time:", round(time()-t0, 3), "s"

In [27]:
upload(Bigquery)

training time: 16183.883 s


### Step3 Remove the local files otherwise waste of space

In [40]:
def remove_local():
    
    root=os.getcwd()
    for i in range(1960,2013):
        print root+'/reta_%s.csv'%(i)
        os.remove(root+'/reta_%s.csv'%(i))
        

## The Main Function

In [150]:
def main():
    auth()
    upload()
    remove_local()