## Project to Upload Files to GCS using Python

As part of the series of lectures we will see how to upload files to GCS using Python. We will be using `glob`, `os`, `storage` from `google.cloud` to build the application logic.

Here are the design details.
* First, we need to get list of file names from the local file system to upload.
* We need to build `blob` object for each file.
* We can use `upload_from_filename` on top of blob object to upload file as blob in GCS.
* We will use metadata or data driven development approach to take care uploading all the files related to retail to GCS.
* Blobs will be named using file names as reference.

In [1]:
!gsutil rm -r gs://stage-nelson/pythondemo

Removing gs://stage-nelson/pythondemo/retail_db/orders/part-00000#1726534585076502...
/ [1 objects]                                                                   

Operation completed over 1 objects.                                              


In [2]:
!gsutil ls gs://stage-nelson/

gs://stage-nelson/retail_db/


In [3]:
import glob

In [4]:
src_base_dir = '../../data/retail_db'

In [5]:
items = glob.glob(f'{src_base_dir}/**', recursive=True)

In [6]:
items

['../../data/retail_db\\',
 '../../data/retail_db\\categories',
 '../../data/retail_db\\categories\\part-00000',
 '../../data/retail_db\\create_db_tables_pg.sql',
 '../../data/retail_db\\customers',
 '../../data/retail_db\\customers\\part-00000',
 '../../data/retail_db\\departments',
 '../../data/retail_db\\departments\\part-00000',
 '../../data/retail_db\\load_db_tables_pg.sql',
 '../../data/retail_db\\orders',
 '../../data/retail_db\\orders\\part-00000',
 '../../data/retail_db\\order_items',
 '../../data/retail_db\\order_items\\part-00000',
 '../../data/retail_db\\products',
 '../../data/retail_db\\products\\part-00000',
 '../../data/retail_db\\schemas.json']

In [7]:
item = items[2]

In [8]:
item

'../../data/retail_db\\categories\\part-00000'

In [9]:
import os
os.path.isfile(item)

True

In [10]:
files = filter(lambda item: os.path.isfile(item), items)

In [11]:
list(files)

['../../data/retail_db\\categories\\part-00000',
 '../../data/retail_db\\create_db_tables_pg.sql',
 '../../data/retail_db\\customers\\part-00000',
 '../../data/retail_db\\departments\\part-00000',
 '../../data/retail_db\\load_db_tables_pg.sql',
 '../../data/retail_db\\orders\\part-00000',
 '../../data/retail_db\\order_items\\part-00000',
 '../../data/retail_db\\products\\part-00000',
 '../../data/retail_db\\schemas.json']

In [12]:
files = list(filter(lambda item: os.path.isfile(item), items))
file = files[0]

In [13]:
file

'../../data/retail_db\\categories\\part-00000'

In [14]:
file.split('/')[3:]

['retail_db\\categories\\part-00000']

In [15]:
'/'.join(file.split('/')[3:])

'retail_db\\categories\\part-00000'

In [16]:
tgt_base_dir = 'pythondemo'

In [17]:
from google.cloud import storage

In [18]:
gsclient = storage.Client()

In [19]:
files = filter(lambda item: os.path.isfile(item), items)
bucket = gsclient.get_bucket('stage-nelson')
for file in files:
    print(f'Uploading file {file}')
    blob_suffix = '/'.join(file.split('/')[3:])
    blob_name = f'{tgt_base_dir}/{blob_suffix}'
    blob = bucket.blob(blob_name)
    blob.upload_from_filename(file)

Uploading file ../../data/retail_db\categories\part-00000
Uploading file ../../data/retail_db\create_db_tables_pg.sql
Uploading file ../../data/retail_db\customers\part-00000
Uploading file ../../data/retail_db\departments\part-00000
Uploading file ../../data/retail_db\load_db_tables_pg.sql
Uploading file ../../data/retail_db\orders\part-00000
Uploading file ../../data/retail_db\order_items\part-00000
Uploading file ../../data/retail_db\products\part-00000
Uploading file ../../data/retail_db\schemas.json


In [20]:
!gsutil ls -r gs://stage-nelson/pythondemo

gs://stage-nelson/pythondemo/:
gs://stage-nelson/pythondemo/retail_db\categories\part-00000
gs://stage-nelson/pythondemo/retail_db\create_db_tables_pg.sql
gs://stage-nelson/pythondemo/retail_db\customers\part-00000
gs://stage-nelson/pythondemo/retail_db\departments\part-00000
gs://stage-nelson/pythondemo/retail_db\load_db_tables_pg.sql
gs://stage-nelson/pythondemo/retail_db\order_items\part-00000
gs://stage-nelson/pythondemo/retail_db\orders\part-00000
gs://stage-nelson/pythondemo/retail_db\products\part-00000
gs://stage-nelson/pythondemo/retail_db\schemas.json


In [21]:
gsclient.list_blobs?

[1;31mSignature:[0m
[0mgsclient[0m[1;33m.[0m[0mlist_blobs[0m[1;33m([0m[1;33m
[0m    [0mbucket_or_name[0m[1;33m,[0m[1;33m
[0m    [0mmax_results[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mpage_token[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mprefix[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mdelimiter[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mstart_offset[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mend_offset[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0minclude_trailing_delimiter[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mversions[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mprojection[0m[1;33m=[0m[1;34m'noAcl'[0m[1;33m,[0m[1;33m
[0m    [0mfields[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mpage_size[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mtimeout[0m[1;33m=[0m[1;36m60[0m

In [22]:
gsclient.list_blobs(
    'stage-nelson',
    prefix='pythondemo'
)

<google.api_core.page_iterator.HTTPIterator at 0x24c434b47a0>

In [23]:
blobs = list(gsclient.list_blobs(
    'stage-nelson',
    prefix='pythondemo'
))

In [24]:
blobs

[<Blob: stage-nelson, pythondemo/retail_db\categories\part-00000, 1726536728736964>,
 <Blob: stage-nelson, pythondemo/retail_db\create_db_tables_pg.sql, 1726536729074028>,
 <Blob: stage-nelson, pythondemo/retail_db\customers\part-00000, 1726536730280525>,
 <Blob: stage-nelson, pythondemo/retail_db\departments\part-00000, 1726536730619203>,
 <Blob: stage-nelson, pythondemo/retail_db\load_db_tables_pg.sql, 1726536732859526>,
 <Blob: stage-nelson, pythondemo/retail_db\order_items\part-00000, 1726536736166700>,
 <Blob: stage-nelson, pythondemo/retail_db\orders\part-00000, 1726536734453281>,
 <Blob: stage-nelson, pythondemo/retail_db\products\part-00000, 1726536736946860>,
 <Blob: stage-nelson, pythondemo/retail_db\schemas.json, 1726536737298078>]

In [27]:
blobs.count

<function list.count(value, /)>