# Storage manager

There are two types of storage in tf project
- file storage buckets (google storage bucket)
- data warehouse database (bigquery)

To gain access to GCP resources, we need to create service account and obtain a json key from [GCP](https://cloud.google.com/docs/authentication/getting-started#cloud-console)

Alternatively, if running on compute engine (including Vertex AI notebook), authetication is not necessary

## Storage bucket

In [None]:
bucket_name = "tmp"
!gsutil mb gs://{bucket_name}/
!gsutil rm -r gs://{bucket_name}/

In [None]:
# GCP to GS
code_name = 'triangle_high_time_res_4M_fix'
!gsutil -m rsync -d -r models/{code_name} gs://tf_mirror/{code_name}

In [None]:
# GS to local
code_name = 'Refrac_5M_fix'
!mkdir models/{code_name}
!gsutil -m rsync -r gs://tf_mirror/{code_name} models/{code_name}

In [10]:
from google.oauth2 import service_account

def load_gcp_key(json:str="secret/gcp.json") -> "GCP credentials":
    """Load Google Cloud credentials from json file"""
    return service_account.Credentials.from_service_account_file(
        json, scopes=["https://www.googleapis.com/auth/cloud-platform"],
    )

credentials = load_gcp_key()

{'return': 'GCP credentials'}

# UConn Linux Box workflow

1. I had put a secret json key in secret folder "/secret/gcp.json"
2. Use load_gcp_key() function to get service_account.Credentials (credentials) object
3. Use credentials to initialize clients (e.g., ```google.cloud.<service>.Client(credentials=credentials)```)
4. Use google.cloud API to access GCP resources


In [28]:
from google.cloud import bigquery
client = bigquery.Client(credentials=credentials)
# dataset = client.create_dataset('uconn_test', exists_ok=True)

In [29]:
from google.cloud import storage
client = storage.Client(credentials=credentials)

In [27]:
from glob import glob
from tqdm import tqdm
from google.cloud import storage

def upload_to_gcp(gcs_client, local_directory: str, bucket_name: str, bucket_directory: str):
    """Upload a directory to GCP storage bucket"""
    assert os.path.isdir(local_directory)
    relative_paths = glob(local_directory + '/**', recursive=True)
    try:
        bucket = gcs_client.get_bucket(bucket_name)
    except:
        bucket = gcs_client.create_bucket(bucket_name)

    for local_file in tqdm(relative_paths):
        remote_path = f'{bucket_directory}/{"/".join(local_file.split(os.sep)[1:])}'
        if os.path.isfile(local_file):
            blob = bucket.blob(remote_path)
            blob.upload_from_filename(local_file)

# upload_to_gcp(gcs_client=client, local_directory="src", bucket_name="uconn_test2", bucket_directory="src")

100%|██████████| 16/16 [00:04<00:00,  3.61it/s]
