<h1>Imports and Initializations</h1>
<ul>
<li><b>dataset_id</b> is the id of the google bigQuery database</li>
<li><b>table</b> is the id of the google biqQuery database table</li>
<li><b>bicket_name</b> is the name of the google bucket</li>
<li><b>root</b> is the name of the google bucket location</li>
</ul>
<p>Actuall data is in google bucket but we will be using biqQuery to handle it</p>

In [1]:
import os

from google.cloud import storage
from google.cloud import bigquery

from ipywidgets import IntProgress

dataset_id = 'dspd_aftabkhalil_dataset'
#change with 'sounds' to download complete data
table_id = 'sounds_sample'

bucket_name = "dspd_aftabkhalil_bucket"

#change with 'data' to download complete data
#Do not add ./ before root here
root = 'data_sample'

<h1>Method to execute any query on bigQuery</h1>

In [2]:
def run_query(query):
    client = bigquery.Client()
    query_job = client.query(query)
    data = query_job.result()
    return data

<h1>Get the types of resources from bigQuery database</h1>

In [3]:
def get_resource_types():
    query = f'SELECT type FROM {dataset_id}.{table_id} group by type;'
    result = run_query(query)
    resource_types = []
    for r in list(result):
        resource_types.append(r.get('type'))
    return resource_types

<h1>Get resources from biqQuery database</h1>

In [4]:
def get_resources(root, resource_type):
    query = (f'SELECT name FROM {dataset_id}.{table_id} '
             f'WHERE type = "{resource_type}" AND location LIKE "{root}/{resource_type}/%" '
             f'GROUP BY name')
    result = run_query(query)
    resource = []
    for r in list(result):
        resource.append(r.get('name'))
    return resource

<h1>Get google bucket</h1>

In [5]:
def create_or_get_bucket(bucket_name):
    
    #Create storage client
    storage_client = storage.Client()
    
    #Get already existsing buckets
    buckets = list(storage_client.list_buckets())
    
    #Check if required bucket already exists
    bucket = next((b for b in buckets if b.name == bucket_name), None)
    
    #If bucket already exists retuen it
    if(bucket != None):
        print(f'Bucket already exixts {bucket.name} in {bucket.location} with storage class {bucket.storage_class}')
        return bucket
    #Else create and return bucket
    else:
        bucket = storage_client.bucket(bucket_name)
        new_bucket = storage_client.create_bucket(bucket, location="us")
        print(f'Created bucket {new_bucket.name} in {new_bucket.location} with storage class {new_bucket.storage_class}')
        return new_bucket

_ = create_or_get_bucket(bucket_name)

Bucket already exixts dspd_aftabkhalil_bucket in US with storage class STANDARD


<h1>Function to download resource from google bucket</h1>

In [6]:
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)

def download_blob(resource_full_path):
    blob = bucket.blob(resource_full_path)
    blob.download_to_filename(resource_full_path)

<h1>Lets Download!</h1>
<p>Note that data will actually be downloaded if it not exists locally or forse_download is set to True</p>

In [7]:
def download_dataset(root, force_download = False):
    resource_types = get_resource_types()
    resource_types.sort()
    
    max_count = len(resource_types)
    
    print(f'There are a total of {max_count} types in dataset')    
    uploadBar = IntProgress(min = 0, max = max_count)
    display(uploadBar)   
    
    for resource_type in resource_types:
        folder = f'{root}/{resource_type}'
        if not os.path.exists(folder):
            os.makedirs(folder)
        
        remote_resources = get_resources(root, resource_type)
        
        for remote_resource in remote_resources:
            resource_full_path = f'{root}/{resource_type}/{remote_resource}'
            if(force_download or not os.path.exists(f'{resource_full_path}')):
                download_blob(resource_full_path)
                
        #We increment the progress when one class is downloaded
        uploadBar.value += 1
                
download_dataset(root)
print("Download complete")

There are a total of 30 types in dataset


IntProgress(value=0, max=30)

Download complete


<h1>Wait for the above block to print "Download complete" 🐍</h1>

<hr><hr>