<a href="https://colab.research.google.com/github/PRASAD212019/GenerativeAI/blob/main/IAAC_Intro_to_BigQuery.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup

In [None]:
from google.colab import auth
# login with the gmail id that is attached to your GCP account
auth.authenticate_user()

In [None]:
!gcloud alpha billing accounts list

ACCOUNT_ID            NAME                OPEN  MASTER_ACCOUNT_ID
01C5D5-791400-E12C43  My Billing Account  True


In [None]:
import os
# use an unique id
project_id = 'bq-ven-02'

# Fill this billing account with yours
bac_id = "01C5D5-791400-E12C43"



## Level 1

This is a comment

## Project Creation

In [None]:
from pprint import pprint

from googleapiclient import discovery
from oauth2client.client import GoogleCredentials

credentials = GoogleCredentials.get_application_default()

service = discovery.build('cloudresourcemanager', 'v1', credentials=credentials)

project_body = {
    "projectId": project_id,
  "name": project_id,
}

request = service.projects().create(body=project_body)
response = request.execute()
# TODO: Change code below to process the `response` dict:
pprint(response)

{'name': 'operations/cp.8509892374942794122'}


In [None]:
bq-ven-02

In [None]:
# bind to this service account so it can be deleted automatically later
!gcloud projects add-iam-policy-binding bq-ven-02 \
--member "serviceAccount:linear-cinema-270422@appspot.gserviceaccount.com" --role "roles/owner"

In [None]:
# bind to this service account so it can be deleted automatically later
!gcloud projects add-iam-policy-binding first-fuze-271821 \
--member "serviceAccount:linear-cinema-270422@appspot.gserviceaccount.com" --role "roles/owner"

In [None]:
# bind to this service account so it can be deleted automatically later
!gcloud projects add-iam-policy-binding {project_id} \
--member "serviceAccount:linear-cinema-270422@appspot.gserviceaccount.com" --role "roles/owner"

[1;31mERROR:[0m (gcloud.projects.add-iam-policy-binding) INVALID_ARGUMENT: Request contains an invalid argument.


In [None]:
os.environ['DEVSHELL_PROJECT_ID'] = project_id
os.environ['PROJECT'] = project_id


### Set Billing on Project

In [None]:
name = 'projects/' + project_id
service = discovery.build('cloudbilling', 'v1',  cache_discovery=False)
billing_request = service.projects().updateBillingInfo(name=name,
                            body={"billingAccountName": "billingAccounts/" + bac_id,
                                  "billingEnabled": True})
billing_response = billing_request.execute()
pprint(billing_response)

{'billingAccountName': 'billingAccounts/01C5D5-791400-E12C43',
 'billingEnabled': True,
 'name': 'projects/bq-ven-02/billingInfo',
 'projectId': 'bq-ven-02'}


In [None]:
!gcloud config set project {project_id}

Updated property [core/project].


### Create Bucket

In [None]:
#if this fails, try once more after a few seconds, billing might not be updated yet

from google.cloud import storage

bucket_name = project_id

storage_client = storage.Client(project=project_id)

bucket = storage_client.create_bucket(bucket_name)

print("Bucket {} created".format(bucket.name))

Bucket bq-ven-02 created


### Enable APIs

In [None]:
!gcloud services list --available

In [None]:
# takea s few minutes
!gcloud services enable bigquery.googleapis.com
!gcloud services enable bigquerydatatransfer.googleapis.com



Operation "operations/acf.803014a6-bb0a-414c-b595-1d67cc02278f" finished successfully.
Operation "operations/acf.042c865f-f07e-410d-9cd3-f08a75c432e8" finished successfully.


In [None]:
# You can skip running these
!gcloud services enable cloudbuild.googleapis.com
!gcloud services enable pubsub.googleapis.com
!gcloud services enable compute.googleapis.com
!gcloud services enable storage-api.googleapis.com
!gcloud services enable storage-component.googleapis.com
!gcloud services enable servicemanagement.googleapis.com
!gcloud services enable iam.googleapis.com
!gcloud services enable bigquery.googleapis.com
!gcloud services enable dataproc.googleapis.com

# Lab 1 - Explore BQ using public dataset

In this lab you:

- Query a public dataset

- Create a custom table

- Load data into a table

- Query a table

## Query a public dataset

In this section, you load a public dataset, USA Names, into BigQuery, then query the dataset to determine the most common names in the US between 1910 and 2013.

### USA Names
Query bigquery-public-data.usa_names.usa_1910_2013 for the name and gender of the babies in this dataset, and then list the top 10 names in descending order.

In [None]:
%%bigquery --project {project_id}
SELECT
  name, gender,
  SUM(number) AS total
FROM
  `bigquery-public-data.usa_names.usa_1910_2013`
GROUP BY
  name, gender
ORDER BY
  total DESC
LIMIT
  10


Unnamed: 0,name,gender,total
0,James,M,4924235
1,John,M,4818746
2,Robert,M,4703680
3,Michael,M,4280040
4,William,M,3811998
5,Mary,F,3728041
6,David,M,3541625
7,Richard,M,2526927
8,Joseph,M,2467298
9,Charles,M,2237170


In [None]:
# loading output into Dataframe
%%bigquery --project {project_id} df
SELECT
  name, gender,
  SUM(number) AS total
FROM
  `bigquery-public-data.usa_names.usa_1910_2013`
GROUP BY
  name, gender
ORDER BY
  total DESC


In [None]:
# print first 10 in dataframe
df.head(5)

Unnamed: 0,name,gender,total
0,James,M,4924235
1,John,M,4818746
2,Robert,M,4703680
3,Michael,M,4280040
4,William,M,3811998


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32680 entries, 0 to 32679
Data columns (total 3 columns):
name      32680 non-null object
gender    32680 non-null object
total     32680 non-null int64
dtypes: int64(1), object(2)
memory usage: 766.1+ KB


## Create Custom Table

### Download the data to your local computer


In [None]:
!wget -q http://www.ssa.gov/OACT/babynames/names.zip

In [None]:
!ls -al

total 7060
drwxr-xr-x 1 root root    4096 Mar 21 22:49 .
drwxr-xr-x 1 root root    4096 Mar 21 22:28 ..
-rw-r--r-- 1 root root    2669 Mar 21 22:30 adc.json
drwxr-xr-x 1 root root    4096 Mar 21 22:30 .config
-rw-r--r-- 1 root root 7200451 May 10  2019 names.zip
drwxr-xr-x 1 root root    4096 Mar 18 16:23 sample_data


In [None]:
!unzip names.zip

In [None]:
!ls -al /content/yob2014.txt

-rw-r--r-- 1 root root 428107 Apr  5  2019 /content/yob2014.txt


In [None]:
!head /content/yob2014.txt

Emma,F,20936
Olivia,F,19807
Sophia,F,18609
Isabella,F,17089
Ava,F,15696
Mia,F,13512
Emily,F,12647
Abigail,F,12085
Madison,F,10320
Charlotte,F,10115


In [None]:
!gsutil cp /content/yob2014.txt gs://{project_id}

Copying file:///content/yob2014.txt [Content-Type=text/plain]...
-
Operation completed over 1 objects/418.1 KiB.                                    


In [None]:
!gsutil ls -al gs://{project_id}

    428107  2020-03-21T22:50:58Z  gs://bq-ven-02/yob2014.txt#1584831058691417  metageneration=1
TOTAL: 1 objects, 428107 bytes (418.07 KiB)


### Create a dataset


In [None]:
dataset_name = "babynames"

In [None]:
!bq --location=US mk -d {dataset_name} --project_id {project_id}


Welcome to BigQuery! This script will walk you through the 
process of initializing your .bigqueryrc configuration file.

First, we need to set up your credentials if they do not 
already exist.

Credential creation complete. Now we will select a default project.

List of projects:
  #        projectId          friendlyName   
 --- ---------------------- ---------------- 
  1   bq-ven-02              bq-ven-02       
  2   linear-cinema-270422   Master Project  
  3   first-fuze-271821      Test Project    
Found multiple projects. Please enter a selection for 
which should be the default, or leave blank to not 
set a default.

Enter a selection (1 - 3): 1

BigQuery configuration complete! Type "bq" to get started.

Dataset 'bq-ven-02:babynames' successfully created.


In [None]:
!bq mk \
--table \
{project_id}:{dataset_name}.names_2014 \
name:string,gender:string,count:integer


### Load the data into a new table


In [None]:
!bq load \
    --source_format=CSV \
    {project_id}:{dataset_name}.names_2014 \
    gs://{project_id}/yob2014.txt \
    name:STRING,gender:STRING,count:INTEGER

Waiting on bqjob_r2d6bf6951a0b8aa3_00000170ff4e3eec_1 ... (0s) Current status: DONE   


### Query the table


In [None]:
%%bigquery  --project {project_id}
SELECT
 name, count
FROM
 `babynames.names_2014`
WHERE
 gender = 'M'
ORDER BY count DESC LIMIT 5

Unnamed: 0,name,count
0,Noah,19305
1,Liam,18462
2,Mason,17201
3,Jacob,16883
4,William,16820


## Congratulations!
You queried a public dataset, then created a custom table, loaded data into it, and then ran a query against that table.

# Delete Project

In [None]:
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials

credentials = GoogleCredentials.get_application_default()

service = discovery.build('cloudresourcemanager', 'v1', credentials=credentials)


request = service.projects().delete(projectId=project_id)
request.execute()