# Task 2: A Sample of Owners
These files are not easy to use in their current chronological arrangement, though having them in a large system like GBQ will solve a lot of our problems. Nevertheless, it’ll be convenient to have a local sample of owners to do work. 
This task asks you to generate a file of owners where the file contains every record for each owner. There will be more than one owner in the file, and I do not want you to include card_no==3, which is the code for non-owners. The size of the sample is up to you, but I’d recommend shooting for a sample that’s around 250 MB. That’s big enough to be rich, but small enough to be fast. Ish.

Deliverable
A python script that handles the following tasks: 
1.	Connects to your GBQ instance.
2.	Builds a list of owners. 
3.	Takes a sample of the owners. 
4.	Extracts all records associated with those owners and writes them to a local text file. 

You’ll submit your code carrying out the steps. 

In [1]:
import os
import re
import datetime 

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pandas_gbq
import janitor

# Do our imports for the code
from google.cloud import bigquery
from google.oauth2 import service_account

# GBQ Set-up

In [2]:
# These first two values will be different on your machine. 
service_path = "C:\\Users\\jshay\\OneDrive\\Documents\\Applied Data Analytics\\Wedge\\"
service_file = 'umt-msba-gg-key.json'
gbq_proj_id = 'umt-msba'
dataset_id = 'transactions'

# And this should stay the same. 
private_key =service_path + service_file

In [3]:
# Now we pass in our credentials so that Python has permission to access our project.
credentials = service_account.Credentials.from_service_account_file(service_path + service_file)

In [4]:
# And finally we establish our connection
client = bigquery.Client(credentials = credentials, project=gbq_proj_id)

In [5]:
for item in client.list_datasets() : 
    print(item.full_dataset_id)

umt-msba:dram_shop
umt-msba:transactions
umt-msba:wedge_example
umt-msba:wedge_transactions


# Build Owner List/Extract All Records

In [6]:
#SQL Query 
ownerq1 = """
    SELECT DISTINCT(card_no) AS dc
    FROM `umt-msba.transactions.transArchive_*`
    WHERE card_no != 3
"""

owners_list = pandas_gbq.read_gbq(ownerq1,project_id = gbq_proj_id)

Downloading: 100%|██████████| 27207/27207 [00:01<00:00, 17191.50rows/s]


In [7]:
owners_list = owners_list.sample(n = 30)

wedge_owners = owners_list.dc.tolist()

In [8]:
print(wedge_owners)

[10623.0, 15461.0, 51381.0, 50597.0, 59938.0, 16834.0, 51744.0, 14689.0, 14780.0, 34298.0, 51322.0, 17878.0, 10480.0, 64990.0, 11171.0, 15419.0, 18878.0, 37515.0, 14542.0, 17854.0, 47954.0, 24204.0, 51137.0, 13962.0, 16797.0, 65836.0, 16909.0, 11641.0, 22755.0, 15697.0]


In [9]:
owners_sample = """
    SELECT *
    FROM `umt-msba.transactions.transArchive_*`
    WHERE card_no IN (
"""

In [10]:
",".join([str(num) for num in wedge_owners])

'10623.0,15461.0,51381.0,50597.0,59938.0,16834.0,51744.0,14689.0,14780.0,34298.0,51322.0,17878.0,10480.0,64990.0,11171.0,15419.0,18878.0,37515.0,14542.0,17854.0,47954.0,24204.0,51137.0,13962.0,16797.0,65836.0,16909.0,11641.0,22755.0,15697.0'

In [11]:
final_owners = owners_sample+",".join([str(num) for num in wedge_owners])+')'

In [12]:
wedge_owners_sample = pandas_gbq.read_gbq(final_owners,project_id = gbq_proj_id)

Downloading: 100%|██████████| 62556/62556 [00:53<00:00, 1174.49rows/s]


In [13]:
wedge_owners_sample.shape

(62556, 50)

# Save output to csv file

In [14]:
wedge_owners_sample.to_csv('wedgeresults.csv')