#Workspace Estimator

To understand your current workload in Databricks, we need to gather some basic information that includes:
- Jobs, average monthly running frequency, average running time steps.
- Tasks within each job, notebooks of each step.
- Job Categories: Data Engineering, Machine Learning, or Streaming.
- Cluster information [Size, type]

\*All this information will be generated using Databrick Python API, and it will rest in your account for your perusal after that, if you agree, we will provide the steps to share that information with us.

## Requirements
1. Databricks Host (should begin with https://). <br> 
example: https://demo.cloud.databricks.com <br>
for more details visit [workspace details](https://docs.databricks.com/workspace/workspace-details.html)
2. token. For more details [Databricks Authentication](https://docs.databricks.com/dev-tools/api/latest/authentication.html)

### Permissions
The workspace admin account will required the following permissions:
- Personal Access Tokens
- Workspace visibility Control
- Cluster Visibility Control
- Job Visibility Control
- DBS File Browser
### Workflows/Jobs Requirements
To be able to make an estimation, each jobs need to have:
- A scheduling configured 
- At least one successful execution in the last 60 days. 
- If runnning in staging or production mirror, it must include
  - Same machine configuration
  - Complete dataset


In [0]:
!pip install tqdm
!pip install snow-workspace-extractor

### Configure security settings.

In [0]:

dbutils.widgets.removeAll()
dbutils.widgets.dropdown("runs_for_last_days", "60", ["15", "30", "60"])
wkp_name_instructions = "please write workspace name"
dbutils.widgets.text("workspace_name", wkp_name_instructions)
# following line tries to get the host from the current workspace, if that fails you can change it manually to your desired host.
# host_name = "demo.azuredatabricks.net"
host_name = dbutils.notebook.entry_point.getDbutils().notebook().getContext().browserHostName().get()
url = f"https://{host_name}/" if host_name != '' else None
if url is None:
    raise Exception("Please provide the workspace url available in your address bar in the variable 'url' i.e. 'https://demo.azuredatabricks.net/'")
# Following line tries to get the token of the current notebook if that fails you can change it manually as in the example below.
# We advise against using an explicit token. Please store it in a secret scope.
# token = dbutils.secrets.get(scope='my-secrets', key='workload-query')
token = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()
if token == '' or token is None:
    raise Exception("Please provide a token to query Databricks API. Please define it in the variable 'token'.")
workspace_name = dbutils.widgets.get('workspace_name')
workspace_name = 'workspace_name' if dbutils.widgets.get('workspace_name') == wkp_name_instructions else workspace_name
days = int(dbutils.widgets.get('runs_for_last_days'))

In [0]:
import shutil
import random
api_util = Util()
tmp_folder ='%05x' % random.randrange(16**5)
tmp_driver = f'file:///tmp/{tmp_folder}/'
filename = api_util.get_file_name(workspace_name)
driver_zip_filename = f'/tmp/{filename}'
driver_to_zip = f'/tmp/{tmp_folder}/'
driver_folder = f'file:///tmp/{tmp_folder}' 


dbutils.fs.mkdirs(tmp_driver)
client = Sizing(url, token, driver_to_zip)
client.get_metadata(days)
client.show_results(days)

compress_file_path = api_util.compress_folder_to_zip(driver_to_zip, driver_zip_filename, extension="zip", split_size_mb=1)
_, compress_file_name, _, _ = api_util.get_path_separated(compress_file_path)

zip_path = f'file:///tmp/{compress_file_name}'
zip_destination = 'dbfs:/FileStore/WAS_Tool/results'
dbutils.fs.mkdirs(zip_destination)
dbutils.fs.cp(zip_path, zip_destination)

In [0]:
from IPython.display import display as displayHTML, HTML
html = f'<html><div  style="display:flex;justify-content: center;"><a href=/files/WAS_Tool/results/{filename}.zip><button style="background-color:#249edc;color: #fff;border:1px solid #249edc;cursor:pointer;border-radius:45px;font-weight:800;line-height:18px;padding: 8px 16px" type="button">DOWNLOAD ZIP</button></a></div></html>'
displayHTML(HTML(html))
print(f"In case the download button was not being displayed, please click on the following link: {url}files/WAS_Tool/results/{filename}.zip")
