<a href="https://colab.research.google.com/github/MarCnu/gsheets_ml_scheduler/blob/main/colab_tutorials/GSheetsMLScheduler_All_Features.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ALL FEATURES: Tutorial for the GSheetsMLScheduler library

https://github.com/MarCnu/gsheets_ml_scheduler  

A simple experiment scheduler that allows multiple instances of Colab to fetch machine learning experiment metaparameters by reading/writing in a sheet from Google Docs Sheets.  

## A simple tool with only 3 features:  
1. Let your Colab script fetch a run config from a sheet
2. Check during a run if the config has been updated (to manually change the learning rate for example)
3. Let your Colab script write in the sheet a list of new future runs (for a metaparameter grid search for example)

Check the other Colab tutorial:  
![Colab Tutorial](https://colab.research.google.com/assets/colab-badge.svg) [BASIC USE: Tutorial for the GSheetsMLScheduler library](https://colab.research.google.com/drive/1JsnfMWknoiij5l5V1lQSdofWJxudJwSN#scrollTo=6R5navlTXOiV)

In [1]:
!pip install git+https://github.com/MarCnu/gsheets_ml_scheduler.git

Collecting git+https://github.com/MarCnu/gsheets_ml_scheduler.git
  Cloning https://github.com/MarCnu/gsheets_ml_scheduler.git to /tmp/pip-req-build-0_52pnkb
  Running command git clone --filter=blob:none --quiet https://github.com/MarCnu/gsheets_ml_scheduler.git /tmp/pip-req-build-0_52pnkb
  Resolved https://github.com/MarCnu/gsheets_ml_scheduler.git to commit 6e9d50991b7cc6b8fba2aaed04985cf0f19f3b56
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: gsheets-ml-scheduler
  Building wheel for gsheets-ml-scheduler (setup.py) ... [?25l[?25hdone
  Created wheel for gsheets-ml-scheduler: filename=gsheets_ml_scheduler-1.1-py3-none-any.whl size=8758 sha256=14c88c1ac68a24294591c269cd87c14920c74ffd5564ace13f77a801ce13906c
  Stored in directory: /tmp/pip-ephem-wheel-cache-xswlunyi/wheels/fc/4c/5e/e68eb5663e3ce3dbfaa57e9c409758ac33d458f08cbf343bf8
Successfully built gsheets-ml-scheduler
Installing collected packages: gsheets-ml-scheduler
Successfully i

In [2]:
from gsheets_ml_scheduler.scheduler import GSheetsMLScheduler
from gsheets_ml_scheduler.run_writer import GSheetsMLRunWriter

import time

In [3]:
################
# Here is the basic template
# https://docs.google.com/spreadsheets/d/1HSmobuuXsOgUOM5cQ-ecHJS9hVrEj6D3AZG8gokbj6I/edit
# *** File > Make a copy
# *** Share > Get link (No need to give special read/write rights)
# Replace sheets_link with your own link
################
sheets_link = "https://docs.google.com/spreadsheets/d/1HSmobuuXsOgUOM5cQ-ecHJS9hVrEj6D3AZG8gokbj6I/edit"

# 1) Write new runs to the Sheet
```python
# gsheets_file_url (str): The sharing link of your Google Docs Sheets
# sheet_index(int, optional): In case you want to use a specific tab of the Google Docs Sheets
# comma_number_format (bool, optional): For Google Docs languages that use comma separators for decimal numbers ("-2,0" "5,0E-3")
# service_account_json_path (str, optional): To use Google Service Account to access the Google Docs Sheets API, mandatory if you're not using Colab
run_writer = GSheetsMLRunWriter(gsheets_file_url, sheet_index=0, comma_number_format=False, service_account_json_path=None)
# configs (list of dicts): A list of configs to be added to the Sheet
run_writer.write_runs(configs)
```
Let's use the method `write_runs(config_list)` from `GSheetsMLRunWriter` to write new runs in "ready" status in the Sheet, for metaparameter grid search.

In [4]:
# Each time you restart a Colab session, a popup will ask for read/write rights again
# Use comma_number_format=True if your Google Docs Sheets language uses comma separators for decimal numbers ("-2,0" "5,0E-3")
run_writer = GSheetsMLRunWriter(sheets_link, comma_number_format=True)

Run Writer connected to GSheet


In [5]:
new_config_list = []
for n_epoch in ["",20,40]: # Use "" for the scheduler to use the config_key default value from the Line 2 of the Sheet
  for learning_rate in [1e-3, 1e-4, 1e-5]:
    for manual_run_stop in [""]:
      for manual_worker_stop in [""]:
        new_config_list.append({"n_epoch": n_epoch, "learning_rate": learning_rate, "manual_run_stop": manual_run_stop, "manual_worker_stop": manual_worker_stop})

# You can provide a custom run_name, otherwise, a line number is used as default
new_config_list[3]["run_name"] = "Custom Run Name"

# If the columns for the config keys "manual_run_stop" and "manual_worker_stop" don't exist, GSheetMLRunWriter will create them
run_writer.write_runs(new_config_list)

# Advanced undocumented use: you can manually use the gspread API library
# Here, we write the default values for the newly created columns
_ = run_writer.sheet.update_cell(2, 1+run_writer.key_ids["manual_run_stop"], "FALSE")
_ = run_writer.sheet.update_cell(2, 1+run_writer.key_ids["manual_worker_stop"], "FALSE")

################
# Line 1 is reserved for naming config keys
# Line 2 is reversed for default_config values
# Lines below that are free to use for your runs
#
# Columns "run_name", "status" and "worker_name" are MANDATORY
# Column order doesn't matter (all is based on Line 1 column names)
################

# 2) Check for config updates during a run and modify the "status" with a custom text

```python
# gsheets_file_url (str): The sharing link of your Google Docs Sheets
# sheet_index(int, optional): In case you want to use a specific tab of the Google Docs Sheets
# hardcoded_default_config (dict, optional): For static metaparameters not provided to the Sheet
# comma_number_format (bool, optional): For Google Docs languages that use comma separators for decimal numbers ("-2,0" "5,0E-3")
# service_account_json_path (str, optional): To use Google Service Account to access the Google Docs Sheets API, mandatory if you're not using Colab
scheduler = GSheetsMLScheduler(gsheets_file_url, sheet_index=0, hardcoded_default_config=None, comma_number_format=False, service_account_json_path=None)
```  
Let's use `hardcoded_default_config`. You can also use `comma_number_format` like with the RunWriter

In [6]:
# Use hardcoded_default_config to manage static metaparameters that are not provided to the Sheet
# Priority order: Sheet run config > Sheet default_config > hardcoded_default_config
hardcoded_default_config = {
    "n_epoch": 10,
    "learning_rate": 1e-3,
    "batch_size": 20,
    "n_layers": 5
}

scheduler = GSheetsMLScheduler(sheets_link, hardcoded_default_config=hardcoded_default_config, comma_number_format=True)

Scheduler connected to GSheets, its name is worker <Q5JbzF>


```python
# This can be used my multiple Colab instances (aka workers) in parallel
run_name, config = scheduler.find_claim_and_start_run()
# This both downloads the Sheet run config and changes the "status" at the same time
config, changed_keys = scheduler.sync_config_and_status(new_status_str=None)
# This sets the "status" column of the current run to "done
scheduler.run_done(new_status_str="done")
```
You can manually change a value in the Sheet during the run and get Colab to check for changes.  
This can be for example used to change the `learning_rate` manually when you monitor graphs in TensorBoard or Weight&Biases.  
Another use case demonstrated here is to manually stop a run or manually stop a worker.

In [7]:
while True:
  # Find a run with the status "ready"
  run_name, config = scheduler.find_claim_and_start_run()
  if run_name is None:
    print("No more ready runs")
    break
  print(f"Starting run {run_name} with config={config}")

  # Your learning loop here
  n_epoch_or_interrupt_epoch = config["n_epoch"] # To display in run_done()

  for epoch in range(config["n_epoch"]):
    # Once in a while, we go check for updates and we also modify the "status"
    if epoch % 4 == 0:
      config, changed_keys = scheduler.sync_config_and_status(f'{epoch} / {config["n_epoch"]}')
      for key in changed_keys:
        print("Key", key, "has been modified. New value:", config[key])
      if config["manual_run_stop"] == True:
        print("Manual run stop triggered, abort this run to find another one")
        n_epoch_or_interrupt_epoch = epoch
        break

    # The content of your learning loop here
    time.sleep(0.5)

  # Replaces status "running" and sets a blue background
  scheduler.run_done(f'{n_epoch_or_interrupt_epoch} / {config["n_epoch"]}')
  if config["manual_worker_stop"] == True:
        print("Manual worker stop triggered, stop searching for next ready runs")
        break

Starting run 1 with config={'n_epoch': 10, 'learning_rate': 0.0005, 'batch_size': 20, 'n_layers': 5, 'manual_run_stop': False, 'manual_worker_stop': False}
Starting run 2 with config={'n_epoch': 10, 'learning_rate': 0.0001, 'batch_size': 20, 'n_layers': 5, 'manual_run_stop': False, 'manual_worker_stop': False}
Starting run 3 with config={'n_epoch': 10, 'learning_rate': 5e-05, 'batch_size': 20, 'n_layers': 5, 'manual_run_stop': False, 'manual_worker_stop': False}
Starting run 6 with config={'n_epoch': 10, 'learning_rate': 0.001, 'batch_size': 20, 'n_layers': 5, 'manual_run_stop': False, 'manual_worker_stop': False}
Starting run 7 with config={'n_epoch': 10, 'learning_rate': 0.0001, 'batch_size': 20, 'n_layers': 5, 'manual_run_stop': False, 'manual_worker_stop': False}
Starting run 8 with config={'n_epoch': 10, 'learning_rate': 1e-05, 'batch_size': 20, 'n_layers': 5, 'manual_run_stop': False, 'manual_worker_stop': False}
Key manual_run_stop has been modified. New value: True
Manual run s

# 3) Miscellaneous methods and attributes

In [8]:
print("Colors", scheduler.colors) # You can change the colors

scheduler.download_data() # Manually downloads the gsheet data

print("Nb_runs", scheduler.nb_runs)
print("Keys", scheduler.keys)
print("Key_ids", scheduler.key_ids)
print("Config_keys", scheduler.config_keys)
print("Config_defaults", scheduler.config_defaults)
print("Values", scheduler.values)

# Use get_run_config(run_index) to get value + config_defaults (but not hardcoded_config_defaults)
print("\nAll run-status-configs")
for i in range(scheduler.nb_runs):
  print([scheduler.values["run_name"][i], scheduler.values["status"][i], scheduler.get_run_config(i)])

Colors {'running': {'red': 1.0, 'green': 0.93, 'blue': 0.8}, 'done': {'red': 0.8, 'green': 0.9, 'blue': 1.0}, 'default_text': {'red': 0.8, 'green': 0.8, 'blue': 0.8}, 'modified_text': {'red': 0.0, 'green': 0.7, 'blue': 0.12}}
Nb_runs 15
Keys ['run_name', 'status', 'worker_name', 'n_epoch', 'learning_rate', 'manual_run_stop', 'manual_worker_stop']
Key_ids {'run_name': 0, 'status': 1, 'worker_name': 2, 'n_epoch': 3, 'learning_rate': 4, 'manual_run_stop': 5, 'manual_worker_stop': 6}
Config_keys ['n_epoch', 'learning_rate', 'manual_run_stop', 'manual_worker_stop']
Config_defaults {'n_epoch': 10, 'learning_rate': 0.001, 'manual_run_stop': False, 'manual_worker_stop': False}
Values {'run_name': ['0', '1', '2', '3', '4', '5', '6', '7', '8', 'Custom Run Name', '10', '11', '12', '13', '14'], 'status': ['done', '10 / 10', '10 / 10', '10 / 10', '', '', '10 / 10', '10 / 10', '4 / 10', '8 / 20', 'ready', 'ready', 'ready', 'ready', 'ready'], 'worker_name': ['dF53M7', 'Q5JbzF', 'Q5JbzF', 'Q5JbzF', ''

In [9]:
# You can make manual read/write operations to the Sheet using the gspread library
# scheduler.all_sheets contains the gspread file root, if you want to access another tab of the file
# scheduler.sheet can be used to call all gspread functions
_ = scheduler.sheet.update_cell(2, 1+scheduler.key_ids["manual_run_stop"], "FALSE") # Google Sheets uses (1,1) for the top left cell, not (0,0) as in normal Python
_ = scheduler.sheet.update_cell(2, 1+scheduler.key_ids["manual_worker_stop"], "FALSE")

In [10]:
# The run_writer has a much more basic data processing compared to the scheduler
# Try to use the scheduler when it's possible
# Read the source code to see the details of which attributes/methods are part of GSheetMLRunWriter
run_writer.sheet

<Worksheet 'Basic Template' id:0>