# Databricks Connect Configuration

This notebook will take you through the steps to set up your connection to your Databricks cluster using Databricks Connect.

You will only need to run through this notebook once, after which you can start your notebooks with the `get_spark_context()` function.

## Get your personal cluster ID
As part of enabling Databricks Connect, the platform will have created a personal cluster for you to use.

To connect to it we broadcast most of the information into your workspace, apart from the cluster ID which, unfortunately, you'll have to get manually.

Follow the steps below to find this ID, or ask your administrator if you don't have permission to access the workspace.

### Steps to follow

* The ID is in the form `xxxx-xxxxxx-xxxxxxxx`, all alphanumeric characters
* Run the below cell to get your organisation's link to your Analytics Databricks workspace.
* Click on the cluster that belongs to you. It should have a name similar to `singleuser-<your name>`
* The cluster ID is in the URL this takes you to. Alternately, go to Configuration -> Advanced options -> Tags -> ClusterId
* Copy this ID to the cell two down, and then run the cell to set the `cluster_id` variable

In [None]:
# Run this cell to print the URL, from which you can get your cluster ID
from os import environ
print(f"{environ['DATABRICKS_ADDRESS']}/?o={environ['DATABRICKS_ORG_ID']}#setting/clusters")

In [None]:
# Add the cluster ID so it can be added to your configuration
cluster_id = "xxxx-xxxxxx-xxxxxxxx"

## Configure your environmemt
Run the below cell to set the configuration for your Databricks Connect program

In [None]:
from json import dump

if not cluster_id:
    raise Exception("Add the cluster ID to cell above, and run it to set the variable")

configuration = {
    "host": environ["DATABRICKS_ADDRESS"],
    "token": "",
    "cluster_id": cluster_id,
    "org_id": environ["DATABRICKS_ORG_ID"],
    "port": environ["DATABRICKS_PORT"]
}

with open(f"{environ['HOME']}/.databricks-connect", "w") as dbc_config:
    dump(configuration, dbc_config, indent="  ")


## Get your Azure permissions
If you have not done this yet, you will need to use the `az` command line tool to connect to Azure.

Open a terminal using the JupyterLab launcher, and run `az login`. Follow the instructions to get yourself set up.


## Using Databricks Connect
Below is an example of how to get your Spark context and run commands in your Databricks Workspace. You can use this to get started with your own notebooks.

In [None]:
# Example of how to use the spark context
from ingenii_databricks_connect import get_spark_context

spark = get_spark_context()

for database in spark.sql("SHOW DATABASES").collect():
    print(database.databaseName)