# Using the Google Drive in Python and openEO environment

In WEED we want to use files from the GoogleDrive for the processing - mainly in the benchmarking part of the project. MOreover, some of the results of the benchmarking should be directly uploaded to the GDrive and therefore available for all user. The following steps had to be implemented to access the GDrive from Python.

1. Sign in to the WEED Google Account and open the Google Cloud Console to create a project. <br>
  - go to https://console.cloud.google.com/  <br>
  - Create a New Project: Click on “Select a project” at the top and then “New Project”. Give it a name and create. Name is "WEED-2024"
2. enable the Google Drive API for the WEED project <br>
  - Navigate to APIs & Services: In the left sidebar, go to “APIs & Services” > “Dashboard”.
  - Enable APIs: Click on “Enable APIs and Services”. Search for “Google Drive API” and enable it for your project.
3. create credentials
  - Generate Credentials: In the left sidebar, go to “APIs & Services” > “Credentials”.
  - Create Credentials: Click “Create Credentials” > “Service Account”. Fill in details, choose a role, and create a JSON key. Save this JSON key securely.
    - Name: gdrive_access
    - Email: gdrive-access@weed-2024.iam.gserviceaccount.com (automatic generation)
    - enable the service account status
4. JSON access key
  - Download and Use JSON Key: Store the downloaded JSON key securely on your local machine. This key will be used for authenticating your application to access the Drive API. (The service account key can only be retrieved the first time)
  - Therefore, if you need a new key just add a new key to the gdrive_access credentials!!!!!
  - BEST: store the key in a secure location like the VITO valut and access it from there client-side (https://confluence.vito.be/pages/viewpage.action?spaceKey=EP&title=Vault+user+guide)
  - link to the VITO vault which you can access with your TERRASCOPE credentials: https://vault.vgt.vito.be/ui/vault/auth?with=ldap
5. make sure that for each file and folder which should be available in Python the servie account email address is added to the "access" of these files and folders. Currently only the folder "openeo_tests" under WEED/working/WP4_ToolboxDvlpt is granted access in this way.
6. install the pydrive2 python package in the WEED environment

In [None]:
from pydrive2.fs import GDriveFileSystem
import pandas as pd
import hvac
from getpass import getpass
from eo_processing.utils.storage import WEED_storage

In [None]:
def get_WEED_credentials(username: str = 'buchhornm', key: str = 'gdrive-access') -> str:
    """
    Retrieves WEED access credentials from Terrascope VAULT using LDAP authentication.

    This method prompts the user to enter their password for Terrascope VAULT, authenticates
    with the VAULT using LDAP, and fetches credentials from the WEED KV storage path.

    :param username: LDAP username used to authenticate with the VAULT, defaults to 'buchhornm'
    :param key: Key in the KV WEED storage to get value from, defaults to 'gdrive-access'
    :return: credentials as a string
    """
    password_prompt = 'Please enter your password for the Terrascope VAULT: '
    service_account_password = getpass(prompt=password_prompt)

    client = hvac.Client(url='https://vault.vgt.vito.be')

    client.auth.ldap.login(
        username=username,
        password=service_account_password,
        mount_point='ldap'
    )

    secret_version_response = client.secrets.kv.v2.read_secret_version(mount_point='kv',
                                                                       path='TAP/apps/WEED',
                                                                       raise_on_deleted_version=True)

    client.logout()

    return secret_version_response['data']['data'][key]

In [None]:
# get the credentials for the GDrive service account access from the VITO TERRASCOPE vault
gdrive_credentials = get_WEED_credentials(username='deroob', key='gdrive-access')

In [None]:
# init the fsspec filesystem to access the files & folders available for the service account credentials
# gdrive-access@weed-2024.iam.gserviceaccount.com
# "1k27bitdRp41AtHq1xupyqwKaTLzrMUMu" is the ID of the only folder currently available for this service account
# if more folder should be available then add the email_address to the user of the files and/or folders wished
# see: https://filesystem-spec.readthedocs.io/en/latest/usage.html#use-a-file-system for more info to interact with file system

gdrive = GDriveFileSystem("1k27bitdRp41AtHq1xupyqwKaTLzrMUMu",
                          use_service_account=True,
                          client_json=gdrive_credentials,)

In [None]:
# list files and folders
for root, dnames, fnames in gdrive.walk(gdrive.root):
    print(root, dnames, fnames)

In [None]:
# we can interact now quite easy with the data - example for the CSV file read in pandas
# Note: you always have to start from the entrance point and then add the sub-folder plus filenames separated by "/"
with gdrive.open(gdrive.root + "/" + 'SK_v5_reference-points_EUNIS2012.csv', 'rb') as f:
    df = pd.read_csv(f)
df.head()


Now we're going to try the same things but with the WEED_storage object

In [None]:
storage = WEED_storage(username='deroob')

In [None]:
import json
gdrive = GDriveFileSystem("1k27bitdRp41AtHq1xupyqwKaTLzrMUMu",
                          use_service_account=True,
                          client_json=json.dumps(storage.gdrive_credentials),)

In [None]:
for root, dnames, fnames in gdrive.walk(gdrive.root):
    print(root, dnames, fnames)