# Google Workspace API

There is API for almost all the products google has like Drive, GMail, Calendar, Keep etc and all cloud products. API can be used by sending RAW requests or using clients that let interact using language of your choice like Python or Java.

Python Client libraries are available for most cloud products, but for other APIs like drive, there is **common** `google-api-python-client` that lets use python to interact with APIs.

To use it you need a project in google cloud and projects can be managed in **devconsole** [link](https://console.cloud.google.com/).

Create a project and enable the api you want to use.

## Google Drive API

API Intro <https://developers.google.com/drive/api/guides/about-sdk>

Next, authorize API requests. Create user account authorization credentials. This will give you JSON with your secrets. This string and projectID when sent to Google Auth server, authenticates you.

You need URI with port from which you will send and recieve response. Also you need to add test account, only they can have access to this app.


- **Understanding**
    - API has `REST Resources` which is like an item, like in workspace api, `drives`, `files`, `comments` etc.
    - Each resources has `methods` like `files.update()`
    - each `methods` has
        - Path parameters - `files.update(fileId='')`
        - Query parameters - `files().update(addParents=folder_id, removeParents=previous_parents, fields='id, parents')`
        - response
    - eg: `file = service.files().update(fileId=file_id, addParents=folder_id, removeParents=previous_parents, fields='id, parents').execute()`

- References
    - https://codelabs.developers.google.com/codelabs/gsuite-apis-intro


`pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib`

```sh
curl \
  'https://www.googleapis.com/drive/v2/files?key=[YOUR_API_KEY]' \
  --header 'Authorization: Bearer [YOUR_ACCESS_TOKEN]' \
  --header 'Accept: application/json' \
  --compressed
```


API Doc Web-Requests: https://developers.google.com/drive/api/v2/reference/files/list

Codelab: https://codelabs.developers.google.com/codelabs/gsuite-apis-intro/

Python Quick Start: https://developers.google.com/drive/api/quickstart/python

Search files query string: https://developers.google.com/drive/api/guides/search-files

Move files and folders: https://developers.google.com/drive/api/guides/folder

Batch Bul requests: https://github.com/googleapis/google-api-python-client/blob/main/docs/batch.md

Sample JSON Keys:
```json
{'kind': 'drive#file',
 'id': 'flrtahuli58ypw9843hptq9834htdfsfd',
 'name': 'fdsfklansd;fkjs.810073.jpg',
 'mimeType': 'image/jpeg',
 'description': '',
 'starred': False,
 'trashed': False,
 'explicitlyTrashed': False,
 'parents': ['dkasfnu35i4h5r948p'],
 'spaces': ['photos', 'drive'],
 'version': '2434325',
 'webContentLink': 'https://drive.google.com/uc?id=flrtahuli58ypw9843hptq9834htdfsfd&export=download',
 'webViewLink': 'https://drive.google.com/file/d/flrtahuli58ypw9843hptq9834htdfsfd/view?usp=drivesdk',
 'iconLink': 'https://drive-thirdparty.googleusercontent.com/34/type/image/jpeg',
 'hasThumbnail': True,
 'thumbnailLink': 'https://lh3.googleusercontent.com/Ifgklndflgkndflkgndflkgndflkjg',
 'thumbnailVersion': '2',
 'viewedByMe': False,
 'createdTime': '2017-06-26T06:47:32.000Z',
 'modifiedTime': '2017-06-28T05:46:42.000Z',
 'modifiedByMeTime': '2017-06-28T05:46:42.000Z',
 'modifiedByMe': True,
 'owners': [{'kind': 'drive#user',
   'displayName': 'James Smith',
   'photoLink': 'https://lh3.googleusercontent.com/a/fkjdsngoierutho4',
   'me': True,
   'permissionId': '57808692789637',
   'emailAddress': 'johndoe@gmail.com'}],
 'lastModifyingUser': {'kind': 'drive#user',
  'displayName': 'James John Smith Doe',
  'photoLink': 'https://lh3.googleusercontent.com/a/Adfkdsjgoi5jotir0s64',
  'me': True,
  'permissionId': '57808692789637',
  'emailAddress': 'johndoe@gmail.com'},
 'shared': False,
 'ownedByMe': True,
 'capabilities': {'canAcceptOwnership': False,
  'canAddChildren': False,
  'canAddMyDriveParent': False,
  'canChangeCopyRequiresWriterPermission': True,
  'canChangeSecurityUpdateEnabled': False,
  'canChangeViewersCanCopyContent': True,
  'canComment': True,
  'canCopy': True,
  'canDelete': True,
  'canDownload': True,
  'canEdit': True,
  'canListChildren': False,
  'canModifyContent': True,
  'canModifyContentRestriction': True,
  'canModifyLabels': False,
  'canMoveChildrenWithinDrive': False,
  'canMoveItemIntoTeamDrive': True,
  'canMoveItemOutOfDrive': True,
  'canMoveItemWithinDrive': True,
  'canReadLabels': False,
  'canReadRevisions': True,
  'canRemoveChildren': False,
  'canRemoveMyDriveParent': True,
  'canRename': True,
  'canShare': True,
  'canTrash': True,
  'canUntrash': True},
 'viewersCanCopyContent': True,
 'copyRequiresWriterPermission': False,
 'writersCanShare': True,
 'permissions': [{'kind': 'drive#permission',
   'id': '90683509764809',
   'type': 'user',
   'emailAddress': 'johndoe@gmail.com',
   'role': 'owner',
   'displayName': 'John Doe',
   'photoLink': 'https://lh3.googleusercontent.com/a/5986h4ngtrfidkjg-oei5jyoit4=s64',
   'deleted': False,
   'pendingOwner': False}],
 'permissionIds': ['5874508934752'],
 'originalFilename': '2342.3453.jpg',
 'fullFileExtension': 'jpg',
 'fileExtension': 'jpg',
 'md5Checksum': 'dfmkni8eo45r986yw539u8yugwher',
 'size': '3435w34',
 'quotaBytesUsed': '0',
 'headRevisionId': 'flgkj4p58y6uh3098whpe9rusfNILUGOU84u59tef',
 'imageMediaMetadata': {'width': 1204,
  'height': 1603,
  'rotation': 0,
  'time': '2017:06:26 12:17:32',
  'colorSpace': 'sRGB'},
 'isAppAuthorized': False}
 ```

In [None]:
#from __future__ import print_function

import os.path, json

from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

In [None]:
# If modifying these scopes, delete the file token.json.

SCOPES = [
    'https://www.googleapis.com/auth/drive.metadata.readonly',
    'https://www.googleapis.com/auth/drive.file',
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/drive.appdata']

# We get access to these scopes only.

In [None]:
"""Authorization"""

creds = None
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('./secrets/token.json'):
    try:
        creds = Credentials.from_authorized_user_file('token.json', SCOPES)
        print('authentication exists')
    except:
        print('token read failed')
        pass
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
    if creds and creds.expired and creds.refresh_token:
        print('getting authenticated')
        creds.refresh(Request())
    else:
        flow = InstalledAppFlow.from_client_secrets_file(
            'no-git/client_secret.json', SCOPES)
        creds = flow.run_local_server(port=44599)
    # Save the credentials for the next run
    with open('./secrets/token.json', 'w') as token:
        token.write(creds.to_json())

In [None]:
# Start Service
service = build('drive', 'v3', credentials=creds)


In [None]:
"""Print Get latest 10 files"""

try:
    service = build('drive', 'v3', credentials=creds)

    # Call the Drive v3 API
    results = service.files().list(
        pageSize=10, fields="nextPageToken, files(id, name)").execute()
    
    # results is JSON object with nextPagetToken and files
    
    items = results.get('files', [])

    if not items:
        print('No files found.')
        
    print('Files:')
    for item in items:
        print(u'{0} - ({1})'.format(item['name'], item['id']))
        
except HttpError as error:
    # TODO(developer) - Handle errors from drive API.
    print(f'An error occurred: {error}')

## More Variations and Impementation

root parent id dlksfngaskljgna query is `and 'dlksfngaskljgna' in parents"`

search help https://developers.google.com/drive/api/guides/search-files

open folder `https://drive.google.com/drive/folders/<file_id>`

open file `https://drive.google.com/uc?id=<file_id>`

method - https://developers.google.com/drive/api/v3/reference/files

- `service.files().list().execute()` returns kind id name type and nextPageToken
- `service.files().get(fileId="<file_id>", fields='*').execute()` GET single file


```py

```

In [None]:
import pickle, pandas as pd

In [None]:
%%time

"""
find all jpg
    - in parent folder, id = "dlksfngaskljgna"
    - having quotaBytesUsed = 0
"""

parent_folder_id = "dlksfngaskljgna"
page_token = None # which page to get
searched_files = []# list of files

while True:
    response = service.files().list(q=f"name contains '.jpg' and '{parent_folder_id}' in parents ",
                                    spaces='drive',
                                    pageSize=1000,
                                    fields='nextPageToken, files(id, name, quotaBytesUsed, size, webViewLink)',
                                    pageToken=page_token).execute()
    files = response.get('files', [])
    searched_files.extend( files )
    # print('Files added ' + str(len(files)) )
    page_token = response.get('nextPageToken', None)
    if page_token is None:
        break
print(f"Total files searched: {len(searched_files)}")

df = pd.DataFrame(data=searched_files)

In [None]:
pickle_out = open("jpeg_file_list.pickle","wb")
pickle.dump(searched_files, pickle_out)
pickle_out.close()

In [None]:
df.head()

In [None]:
df[df["quotaBytesUsed"]!="0"]

In [None]:
# Get file details

# service.files().get(fileId="1xTio0pLR_5tauRtsuawzthS4cFnCgBQrVQ", fields='*').execute()

# Move file with size

In [None]:
file_metadata = {
    'name': 'to-be-backed',
    'mimeType': 'application/vnd.google-apps.folder'
}

In [None]:
file = service.files().create(body=file_metadata, fields='id').execute()
print(F'Folder ID: "{file.get("id")}".')

In [None]:
file_id = "flknsdgio4tjh93"
folder_id = "sdkfngerio-dsaflknaowi4"
# Retrieve the existing parents to remove
file = service.files().get(fileId=file_id, fields='parents').execute()
previous_parents = ",".join(file.get('parents'))
print(f"Previous parents of file is: {previous_parents}")
# Move the file to the new folder
file = service.files().update(fileId=file_id, addParents=folder_id,
                          removeParents=previous_parents,
                          fields='id, parents').execute()
print(f"New parents of file is: {file.get('parents')}")

In [None]:
file_not_move = []
def move_file_to_folder(file_id, folder_id):
    """Move specified file to the specified folder.
    Args:
        file_id: Id of the file to move.
        folder_id: Id of the folder
    Print: An object containing the new parent folder and other meta data
    Returns : Parent Ids for the file

    Load pre-authorized user credentials from the environment.
    TODO(developer) - See https://developers.google.com/identity
    for guides on implementing OAuth2 for the application.
    """

    try:

        # Retrieve the existing parents to remove
        print('-------------------')
        print(f'Moving file: {file_id}')
        file = service.files().get(fileId=file_id, fields='parents').execute()
        previous_parents = ",".join(file.get('parents'))
        print(f'Current parent: {previous_parents}')
        # Move the file to the new folder
        file = service.files().update(fileId=file_id, addParents=folder_id,
                                      removeParents=previous_parents,
                                      fields='id, parents').execute()
        print(f'New parent: {file.get("parents")}')
        return None

    except HttpError as error:
        print(F'An error occurred: {error}')
        file_not_move.append(file_id)
        return None



In [None]:
df_has_size = df[df["quotaBytesUsed"]!="0"]

In [None]:
df_has_size.iloc[:2]

In [None]:
folder_id = "sfwrt3-23423"
df_has_size.iloc[21:].apply(lambda x: move_file_to_folder(x.id, folder_id), axis=1)

In [None]:
file_not_move

    Perfect, now root folder has no JPEG with size > 0, time tp delete all JPEG in ROOT with 0 file size

# Delete file with 0 size

- Iteration 1
    - Total files searched: 23775
    - CPU times: user 219 ms, sys: 82.9 ms, total: 302 ms
    - Wall time: 1min
    - df.shape = (23775, 5)
    - delete time
        - CPU times: user 2.57 s, sys: 653 ms, total: 3.23 s
        - Wall time: 8min 38s
        - file_not_deleted = 0
        - len(file_deleted) = 707. why?

In [None]:
%%time
"""
find all jpg again
    - now not file should have size more than 0
    - in parent folder, id = "34r23t32trt23"
    - having quotaBytesUsed = 0
"""
parent_folder_id = "34r23t32trt23"
page_token = None # which page to get
searched_files = []# list of files

while True:
    response = service.files().list(q=f"name contains '.jpg' and '{parent_folder_id}' in parents  and trashed = false",
                                    spaces='drive',
                                    pageSize=1000,
                                    fields='nextPageToken, files(id, name, trashed, quotaBytesUsed, size, webViewLink)',
                                    pageToken=page_token).execute()
    files = response.get('files', [])
    searched_files.extend( files )
    # print('Files added ' + str(len(files)) )
    page_token = response.get('nextPageToken', None)
    if page_token is None:
        break
print(f"Total files searched: {len(searched_files)}")

df = pd.DataFrame(data=searched_files)

In [None]:
df.shape

In [None]:
df.head()

In [None]:
df[df["quotaBytesUsed"]!="0"].shape

In [None]:
df_to_delete = df[ (df['trashed'] == False) & (df['quotaBytesUsed'] == '0') ]
df_to_delete.shape

In [None]:
df.iloc[1]

In [None]:
file_id = 'fsdge4t34536tyewrhdgfds'

# Get file details

service.files().get(fileId=file_id, fields='trashed, id').execute()

In [None]:
# Save resilts to csv

df.to_csv('jpg-in-root-size-0-iter-2.csv')

In [None]:
# file delete function

file_not_deleted = []
file_deleted = []
def delete_file(file_id):
    """Deletes file by file_id"""
    try:
        service.files().delete(fileId=file_id).execute()
        file_deleted.append(file_id)
    except:
        print(f'Error deleting file "{file_id}".')
        file_not_deleted.append(file_id)

In [None]:
%%time

# Delete files

df_to_delete.iloc[:].apply(lambda x: delete_file(x.id), axis=1)

# Error deleting file "sfdfs3425242-g".
# CPU times: user 1min 38s, sys: 31.3 s, total: 2min 10s
# Wall time: 3h 55min 11s

In [None]:
len(file_not_deleted)

In [None]:
len(file_deleted)

In [None]:
import time

In [None]:
%%time
start = time.time()
time.sleep(2)
end = time.time()
total = end - start
print(f'time taken: {total}')