# Workflow Cost Estimator
This notebook demonstrates cost estimation for finished or in-progress workflows.

This is an experimental feature:
  - Cost estimates may not be accurate.
  - CPUs, Memory, and runtime is pulled from Terra's Firecloud API
    [monitorSubmission](https://api.firecloud.org/#/Submissions/monitorSubmission) endpoint. This information is
    available for 42 days after workflow completion.
  - GCP Instance type is assumed custom configurations of eith N1 or N2 instance type.

*author: Brian Hannafious, Genomics Institute, University of California Santa Cruz*


Install the newest version of [terra-notebook-utils](https://github.com/DataBiosphere/terra-notebook-utils)


In [None]:
%pip install --upgrade --no-cache-dir git+https://github.com/DataBiosphere/terra-notebook-utils


Define some useful functions.


In [None]:
from terra_notebook_utils import costs, workflows

def list_submissions_chronological():
    listing = [(s['submissionDate'], s) for s in workflows.list_submissions()]
    for date, submission in sorted(listing):
        yield submission

def cost_for_submission(submission_id: str):
    submission = workflows.get_submission(submission_id)
    for wf in submission['workflows']:
        shard_number = 1  # keep track of scattered workflows
        for shard_info in workflows.estimate_workflow_cost(submission_id, wf['workflowId']):
            shard_info['workflow_id'] = wf['workflowId']
            shard_info['shard'] = shard_number
            shard_number += 1
            yield shard_info

def estimate_job_cost(cpus: int, memory_gb: int, runtime_hours: float, preemptible: bool) -> float:
    return costs.GCPCustomN1Cost.estimate(cpus, memory_gb, runtime_hours * 3600, preemptible)


List submissions in chronological order.


In [None]:
for s in list_submissions_chronological():
    print(s['submissionId'], s['submissionDate'], s['status'])


In [None]:
# submission_id = ""  # Uncomment and insert your submission id here
total_cost = 0
print("%37s" % "workflow_id",
      "%6s" % "shard",
      "%5s" % "cpus",
      "%12s" % "memory",
      "%13s" % "duration",
      "%7s" % "cost")
for shard_info in cost_for_submission(submission_id):
    total_cost += shard_info['cost']
    print("%37s" % shard_info['workflow_id'],
          "%6i" % shard_info['shard'],
          "%5i" % shard_info['number_of_cpus'],
          "%10iGB" % shard_info['memory'],
          "%12.2fh" % (shard_info['duration'] / 3600),  # convert from seconds to hours
          "%7s" % ("$%.2f" % shard_info['cost']))
    shard_info['duration'] /= 3600  # convert from seconds to hours
print("%85s" % ("total_cost: $%.2f" % round(total_cost, 2)))


Explore costs for potential workflow configurations and runtimes.


In [None]:
# Define configurations for: cpus, memory(GB), runtime(hours), preemptible
configurations = [(10, 64, 5, False),
                  (8, 32, 10, False),
                  (10, 64, 5, True),
                  (8, 32, 10, True)]
print("%8s" % "cpus",
      "%8s" % "memory",
      "%8s" % "runtime",
      "%12s" % "preemptible",
      "%8s" % "cost")
for cpus, memory_gb, runtime_hours, preemptible in configurations:
    cost = estimate_job_cost(cpus, memory_gb, runtime_hours, preemptible)
    print("%8i" % cpus,
          "%6iGB" % memory_gb,
          "%7ih" % runtime_hours,
          "%12s" % str(preemptible),
          "%8s" % ("$%.2f" % cost))


## Contributions
Contributions, bug reports, and feature requests are welcome on:
  - [terra-notebook-utils GitHub](https://github.com/DataBiosphere/terra-notebook-utils) for general functionality.
  - [bdcat_notebooks GitHub](https://github.com/DataBiosphere/bdcat_notebooks) for this notebook.