## Project to Upload Files to GCS using Python

As part of the series of lectures we will see how to upload files to GCS using Python. We will be using `glob`, `os`, `storage` from `google.cloud` to build the application logic.

Here are the design details.
* First, we need to get list of file names from the local file system to upload.
* We need to build `blob` object for each file.
* We can use `upload_from_filename` on top of blob object to upload file as blob in GCS.
* We will use metadata or data driven development approach to take care uploading all the files related to retail to GCS.
* Blobs will be named using file names as reference.

In [2]:
!gsutil ls

gs://deg_gcp_ak/


In [94]:
!gsutil rm -r gs://deg_gcp_ak/pythondemo

In [83]:
!gsutil ls gs://deg_gcp_ak/

In [5]:
import glob

In [34]:
import os
os.path.isfile(item)

False

In [55]:
src_base_dir = '../../data/retail_db'
items = glob.glob(f'{src_base_dir}/**', recursive=True)

In [65]:

items_replace = list(map(lambda x: x.replace("\\","/"),items))
items_replace

['../../data/retail_db/',
 '../../data/retail_db/categories',
 '../../data/retail_db/categories/part-00000',
 '../../data/retail_db/create_db_tables_pg.sql',
 '../../data/retail_db/customers',
 '../../data/retail_db/customers/part-00000',
 '../../data/retail_db/departments',
 '../../data/retail_db/departments/part-00000',
 '../../data/retail_db/load_db_tables_pg.sql',
 '../../data/retail_db/orders',
 '../../data/retail_db/orders/part-00000',
 '../../data/retail_db/order_items',
 '../../data/retail_db/order_items/part-00000',
 '../../data/retail_db/products',
 '../../data/retail_db/products/part-00000',
 '../../data/retail_db/schemas.json']

In [66]:
files = list(filter(lambda item: os.path.isfile(item) and item.endswith('part-00000'), items_replace))
files

['../../data/retail_db/categories/part-00000',
 '../../data/retail_db/customers/part-00000',
 '../../data/retail_db/departments/part-00000',
 '../../data/retail_db/orders/part-00000',
 '../../data/retail_db/order_items/part-00000',
 '../../data/retail_db/products/part-00000']

In [31]:
files

['../../data/retail_db\\categories\\part-00000',
 '../../data/retail_db\\customers\\part-00000',
 '../../data/retail_db\\departments\\part-00000',
 '../../data/retail_db\\orders\\part-00000',
 '../../data/retail_db\\order_items\\part-00000',
 '../../data/retail_db\\products\\part-00000']

In [70]:
def get_file_names(src_base_dir):
    items = glob.glob(f'{src_base_dir}/**', recursive=True)
    return list(filter(lambda item: os.path.isfile(item) and item.endswith('part-00000'), items_replace))

In [71]:
files=get_file_names(src_base_dir)

In [72]:
files

['../../data/retail_db/categories/part-00000',
 '../../data/retail_db/customers/part-00000',
 '../../data/retail_db/departments/part-00000',
 '../../data/retail_db/orders/part-00000',
 '../../data/retail_db/order_items/part-00000',
 '../../data/retail_db/products/part-00000']

In [73]:

file = files[0]

In [74]:
file

'../../data/retail_db/categories/part-00000'

In [75]:
file.split('/')[3:]

['retail_db', 'categories', 'part-00000']

In [76]:
'/'.join(file.split('/')[3:])

'retail_db/categories/part-00000'

In [78]:
tgt_base_dir = 'pythondemo'

In [79]:
from google.cloud import storage

In [80]:
gsclient = storage.Client()

In [95]:
#files = get_file_names(src_base_dir)
list(filter(lambda item: os.path.isfile(item) , items_replace))
bucket = gsclient.get_bucket('deg_gcp_ak')
for file in files:
    print(f'Uploading file {file}')
    blob_suffix = '/'.join(file.split('/')[3:])
    blob_name = f'{tgt_base_dir}/{blob_suffix}'
    blob = bucket.blob(blob_name)
    blob.upload_from_filename(file)

Uploading file ../../data/retail_db/categories/part-00000
Uploading file ../../data/retail_db/customers/part-00000
Uploading file ../../data/retail_db/departments/part-00000
Uploading file ../../data/retail_db/orders/part-00000
Uploading file ../../data/retail_db/order_items/part-00000
Uploading file ../../data/retail_db/products/part-00000


In [96]:
!gsutil ls -r gs://deg_gcp_ak/pythondemo

In [86]:
gsclient.list_blobs?

[1;31mSignature:[0m
[0mgsclient[0m[1;33m.[0m[0mlist_blobs[0m[1;33m([0m[1;33m
[0m    [0mbucket_or_name[0m[1;33m,[0m[1;33m
[0m    [0mmax_results[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mpage_token[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mprefix[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mdelimiter[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mstart_offset[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mend_offset[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0minclude_trailing_delimiter[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mversions[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mprojection[0m[1;33m=[0m[1;34m'noAcl'[0m[1;33m,[0m[1;33m
[0m    [0mfields[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mpage_size[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mtimeout[0m[1;33m=[0m[1;36m60[0m

In [87]:
gsclient.list_blobs(
    'deg_gcp_ak',
    prefix='pythondemo'
)

<google.api_core.page_iterator.HTTPIterator at 0x1c6a8b4b220>

In [97]:
blobs = list(gsclient.list_blobs(
    'deg_gcp_ak',
    prefix='pythondemo'
))

In [91]:
blobs

[<Blob: deg_gcp_ak, pythondemo/retail_db/categories/part-00000, 1704646712848241>,
 <Blob: deg_gcp_ak, pythondemo/retail_db/customers/part-00000, 1704646713601350>,
 <Blob: deg_gcp_ak, pythondemo/retail_db/departments/part-00000, 1704646713761942>,
 <Blob: deg_gcp_ak, pythondemo/retail_db/order_items/part-00000, 1704646719784759>,
 <Blob: deg_gcp_ak, pythondemo/retail_db/orders/part-00000, 1704646715907185>,
 <Blob: deg_gcp_ak, pythondemo/retail_db/products/part-00000, 1704646720006662>]