# ST446 Distributed Computing for Big Data

## Week 2 class: Google Bigtable 2

### LT 2020

Following from the [previous exercise](google_bigtable_class_activity.ipynb), you will now connect to an existing Bigtable instance and enter information about your laptop system properties. 

You may use what we covered in week 1 of the course to retrieve this information from a command line interface. This will allow us to collect information about system properties of laptops used in this course.  

You will be informed once the bigtable instance is available


### Initial steps

Before running the code below for the first time, you need to:

* download the credential we already created for you from [here](st446-lent-715346564d9e.json)

## 2. Adding your laptop system information to a Bigtable table

### 2.A. Connecting to an existing Bigtable instance

In [10]:
from oauth2client.client import GoogleCredentials
from google.cloud import bigtable
from google.oauth2 import service_account

import numpy as np
import pandas as pd

# please use the credentials we provide:
credentials = service_account.Credentials.from_service_account_file('st446-lent-715346564d9e.json')
print(credentials.service_account_email)
print(credentials.project_id)

524309205186-compute@developer.gserviceaccount.com
st446-lent


In [11]:
project_id = "st446-lent" 
client = bigtable.Client(project=project_id, credentials = credentials, admin=True) # remove the credentials = credentials if you use the default credential 
instance = client.instance("st446-bigtable-instance-milan")
table = instance.table("master-table")

cf_sysinfo = "sysinfo"

### 2.B. Adding your laptop system information 

In [12]:
row_key = 'testname' 

d = {
    'num_of_cpu'.encode('utf-8'): '4'.encode('utf-8'),
    'num_physical_cpu'.encode('utf-8'): '2'.encode('utf-8'),
    'memory_size'.encode('utf-8'): '17179869184'.encode('utf-8'),
    'os'.encode('utf-8'): 'Mac OS Sierra'.encode('utf-8'),
    'processor_type'.encode('utf-8'): 'x86_64h (Intel x86-64h Haswell)'.encode('utf-8'),
    'cpu_frequency'.encode('utf-8'): '3300000000'.encode('utf-8'),
    'disk_size'.encode('utf-8'): '931GB'.encode('utf-8'),
    'kernel_version'.encode('utf-8'): 'Darwin Kernel Version 16.7.0: Mon Nov 13 21:56:25 PST 2017; root:xnu-3789.72.11~1/RELEASE_X86_64'.encode('utf-8'),
    'free_disk_space'.encode('utf-8'): '679GB'.encode('utf-8')
    }

row = table.row(row_key)

for col_id, val in d.items():
    row.set_cell(cf_sysinfo, col_id, val)

row.commit()

### 2.C. Reading rows of the table

In [13]:
partial_rows = table.read_rows()
#partial_rows = table.read_rows(b'milan', b'milan21')
partial_rows.consume_all()

# result will be used later to create a dataframe and show table values by using Python dataframe API
result = {}

col_name = None

for row_key, row in partial_rows.rows.items():

    key = row_key.decode('utf-8')
    cells = row.cells[cf_sysinfo] # get all cells in the same col family

    if col_name is None:
        col_name = [k.decode('utf-8') for k in cells.keys()]

    # store one row 
    one_row_result = []

    for col_key, col_val in cells.items():
        value = col_val[0].value
        one_row_result.append(value.decode('utf-8'))
    result[key] = one_row_result
    
df = pd.DataFrame.from_dict(result, orient='index')
df.columns = col_name
df

Unnamed: 0,cpu_frequency,disk_size,free_disk_space,kernel_version,memory_size,num_of_cpu,num_physical_cpu,os,processor_type
testname,3300000000,931GB,679GB,Darwin Kernel Version 16.7.0: Mon Nov 13 21:56...,17179869184,4,2,Mac OS Sierra,x86_64h (Intel x86-64h Haswell)
yourname,3300000000,931GB,679GB,Darwin Kernel Version 16.7.0: Mon Nov 13 21:56...,17179869184,4,2,Mac os,x86_64h (Intel x86-64h Haswell)
