## Project to Upload Files to GCS using Python

As part of the series of lectures we will see how to upload files to GCS using Python. We will be using `glob`, `os`, `storage` from `google.cloud` to build the application logic.

Here are the design details.
* First, we need to get list of file names from the local file system to upload.
* We need to build `blob` object for each file.
* We can use `upload_from_filename` on top of blob object to upload file as blob in GCS.
* We will use metadata or data driven development approach to take care uploading all the files related to retail to GCS.
* Blobs will be named using file names as reference.

In [2]:
!gsutil rm -r gs://airetail_mld/pythondemo

CommandException: No URLs matched: gs://airetail_mld/pythondemo


In [3]:
!gsutil ls gs://airetail_mld/

gs://airetail_mld/retail_db/


In [4]:
import glob

In [5]:
src_base_dir = '../../data/retail_db'

In [6]:
items = glob.glob(f'{src_base_dir}/**', recursive=True)

In [7]:
items

['../../data/retail_db/',
 '../../data/retail_db/customers',
 '../../data/retail_db/customers/part-00000',
 '../../data/retail_db/products',
 '../../data/retail_db/products/part-00000',
 '../../data/retail_db/create_db_tables_pg.sql',
 '../../data/retail_db/departments',
 '../../data/retail_db/departments/part-00000',
 '../../data/retail_db/order_items',
 '../../data/retail_db/order_items/part-00000',
 '../../data/retail_db/schemas.json',
 '../../data/retail_db/orders',
 '../../data/retail_db/orders/part-00000',
 '../../data/retail_db/categories',
 '../../data/retail_db/categories/part-00000',
 '../../data/retail_db/load_db_tables_pg.sql']

In [8]:
item = items[2]

In [9]:
item

'../../data/retail_db/customers/part-00000'

In [10]:
import os
os.path.isfile(item)

True

In [11]:
files = filter(lambda item: os.path.isfile(item), items)

In [12]:
list(files)

['../../data/retail_db/customers/part-00000',
 '../../data/retail_db/products/part-00000',
 '../../data/retail_db/create_db_tables_pg.sql',
 '../../data/retail_db/departments/part-00000',
 '../../data/retail_db/order_items/part-00000',
 '../../data/retail_db/schemas.json',
 '../../data/retail_db/orders/part-00000',
 '../../data/retail_db/categories/part-00000',
 '../../data/retail_db/load_db_tables_pg.sql']

In [13]:
files = list(filter(lambda item: os.path.isfile(item), items))
file = files[0]

In [14]:
file

'../../data/retail_db/customers/part-00000'

In [15]:
file.split('/')[3:]

['retail_db', 'customers', 'part-00000']

#### The following function gets the path in which the file/blob will be stored in the bucket

In [24]:
rel_dir = lambda x: '/'.join(x.split('/')[3:])
rel_dir(file)

'retail_db/customers/part-00000'

In [25]:
# base directory inside bucket
tgt_base_dir = 'pythondemo'

### Store files into airetail_mld/pythondemo

In [26]:
from google.cloud import storage

gsclient = storage.Client()

In [27]:
# create file list and bucket
files = filter(lambda item: os.path.isfile(item), items)
bucket = gsclient.get_bucket('airetail_mld')

for file in files:
    
    print(f'Uploading file {file}')

    blob_suffix = rel_dir(file)
    blob_name = f'{tgt_base_dir}/{blob_suffix}'
    blob = bucket.blob(blob_name)
    blob.upload_from_filename(file)

Uploading file ../../data/retail_db/customers/part-00000
Uploading file ../../data/retail_db/products/part-00000
Uploading file ../../data/retail_db/create_db_tables_pg.sql
Uploading file ../../data/retail_db/departments/part-00000
Uploading file ../../data/retail_db/order_items/part-00000
Uploading file ../../data/retail_db/schemas.json
Uploading file ../../data/retail_db/orders/part-00000
Uploading file ../../data/retail_db/categories/part-00000
Uploading file ../../data/retail_db/load_db_tables_pg.sql


Always double check!

In [29]:
!gsutil ls -r gs://airetail_mld/pythondemo

gs://airetail_mld/pythondemo/:

gs://airetail_mld/pythondemo/retail_db/:
gs://airetail_mld/pythondemo/retail_db/create_db_tables_pg.sql
gs://airetail_mld/pythondemo/retail_db/load_db_tables_pg.sql
gs://airetail_mld/pythondemo/retail_db/schemas.json

gs://airetail_mld/pythondemo/retail_db/categories/:
gs://airetail_mld/pythondemo/retail_db/categories/part-00000

gs://airetail_mld/pythondemo/retail_db/customers/:
gs://airetail_mld/pythondemo/retail_db/customers/part-00000

gs://airetail_mld/pythondemo/retail_db/departments/:
gs://airetail_mld/pythondemo/retail_db/departments/part-00000

gs://airetail_mld/pythondemo/retail_db/order_items/:
gs://airetail_mld/pythondemo/retail_db/order_items/part-00000

gs://airetail_mld/pythondemo/retail_db/orders/:
gs://airetail_mld/pythondemo/retail_db/orders/part-00000

gs://airetail_mld/pythondemo/retail_db/products/:
gs://airetail_mld/pythondemo/retail_db/products/part-00000


#### Double check using gsclient

In [30]:
gsclient.list_blobs?

[0;31mSignature:[0m
[0mgsclient[0m[0;34m.[0m[0mlist_blobs[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mbucket_or_name[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmax_results[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpage_token[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mprefix[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdelimiter[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mstart_offset[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mend_offset[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minclude_trailing_delimiter[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mversions[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mprojection[0m[0;34m=[0m[0;34m'noAcl'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfields[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m

In [31]:
gsclient.list_blobs(
    'airetail_mld',
    prefix='pythondemo'
)

<google.api_core.page_iterator.HTTPIterator at 0x7f9750d05910>

In [33]:
blobs = list(gsclient.list_blobs(
    'airetail_mld',
    prefix='pythondemo'
))

In [34]:
blobs

[<Blob: airetail_mld, pythondemo/retail_db/categories/part-00000, 1681660288694345>,
 <Blob: airetail_mld, pythondemo/retail_db/create_db_tables_pg.sql, 1681660282587244>,
 <Blob: airetail_mld, pythondemo/retail_db/customers/part-00000, 1681660281410141>,
 <Blob: airetail_mld, pythondemo/retail_db/departments/part-00000, 1681660282922310>,
 <Blob: airetail_mld, pythondemo/retail_db/load_db_tables_pg.sql, 1681660292156368>,
 <Blob: airetail_mld, pythondemo/retail_db/order_items/part-00000, 1681660286663982>,
 <Blob: airetail_mld, pythondemo/retail_db/orders/part-00000, 1681660288373614>,
 <Blob: airetail_mld, pythondemo/retail_db/products/part-00000, 1681660282097017>,
 <Blob: airetail_mld, pythondemo/retail_db/schemas.json, 1681660287110435>]