# Introduction to the Sama SDK and Databricks Connector Tutorial
In this tutorial, we'll show you how to set up and use the Sama SDK and Databricks Connector. These tools make it easy to create, monitor, and access annotated tasks within Databricks. We'll guide you through the installation, configuration, and provide practical examples for your understanding. Let's begin simplifying your workflow with these tools!

<img src="https://sama-documentation-assets.s3.amazonaws.com/databricks/images/partnerships/databricksXsama.png" object-fit="scale-down"/>

# Step 1: Requirements
**1.1 Installation requirements**
Before you begin, make sure you install Sama
By runing the following cell you will download and install Sama, and their dependencies.

In [None]:
%pip install sama

**Importing Sama Client**
Now you can import the Sama client

In [None]:
from sama.databricks import Client

**1.2 Project requirements**

You need to specify:
*   Project ID
*   [API Key](https://accounts.sama.com/account/profile)

📘 **Note**: Your Sama Project Manager will provide you with the correct Project ID(s). The following instructions also assume that your Project Manager has already configured all the necessary Sama Project inputs and outputs.

# Step 2: Provide API Key and Project ID
To begin interacting with the Databricks Connector, you'll need to provide your API key and Project ID. These credentials are essential for establishing a connection.

Replace "YOUR_API_KEY" and "YOUR_PROJECT_ID" with the credentials you've obtained.

In [None]:
API_KEY = "YOUR_API_KEY" #@param {type:"string"}
PROJECT_ID = "YOUR_PROJECT_ID" #@param {type:"string"}

if not(API_KEY):
  raise ValueError("API_KEY not set")
if not(PROJECT_ID):
  raise ValueError("PROJECT_ID not set")

client = Client(API_KEY)
client.get_project_information(PROJECT_ID)

✅ This cell returns the project id, name, state, type, asset_country_code, project_group, name, description and client

# Step 3: Usage with Spark Dataframes

Now that you are set up and properly configured, you can start:

1. Creating tasks, using data from a Dataframe, in the Sama Platform to be picked up by the annotators and quality teams.
2. Monitoring batch, task, and project status.
3. Get delivered tasks, into a Dataframe, which have been annotated and reviewed by our quality team.
4. Rejecting and deleting tasks.

📘 **Note**: As an alternative, [Sama API](https://docs.sama.com/reference/documentation) is also available for all of these and more.

**3.1 Provide a Sample Dataset**

In this step, we'll provide you with a sample JSON dataset that you can use for this demo and testing purposes.
You can see json schema of different output types in [sama.helpjuice.com](https://sama.helpjuice.com/en_US/recipes/recipe-json) creation format

In [None]:
import json

sample_data = [{
    'url': 'https://static.wikia.nocookie.net/speedracer/images/9/9a/Speed_Racer_behind_the_wheel.png',
    'name': 'speed_racer_img_1',
    'client_batch_id': 'speed_racer',
    'output_weather_condition': {'rain': '0', 'snow': '0', 'clear': '1'},
    'output_vehicle_image_annotation': {
        'layers': {
            'vector_tagging': [
                {
                    'shapes': [{
                        'tags': {
                            'transcription': '',
                            'type_of_vehicle': '1'
                        },
                        'type': 'rectangle',
                        'index': 1,
                        'points': [[51, 20], [150, 20], [51, 72], [150, 72]]
                    }],
                    'group_type': None
                },
                {
                    'shapes': [{
                        'tags': {
                            'transcription': '',
                            'type_of_vehicle': '1'
                        },
                        'type': 'rectangle',
                        'index': 2,
                        'points': [[160, 71], [199, 71], [160, 83], [199, 83]]
                    }],
                    'group_type': None
                }
            ]
        }
    }
},
{
    'url': 'https://media.comicbook.com/uploads1/2015/05/speed-racer-137552.jpg',
    'name': 'speed_racer_img_2'
},
{
    'url': 'https://upload.wikimedia.org/wikipedia/en/8/81/Speed_Racer_Family.jpg',
    'name': 'speed_racer_img_3'
},
{
    'url': 'https://upload.wikimedia.org/wikipedia/en/2/25/Speed_Racer_promotional_image.jpg',
    'name': 'speed_racer_img_4'
}
]

**3.2 Create a Dataframe using the sample data**

📘 **Note**: `spark.createDataFrame` can't convert output_vehicle_image_annotation to a MapType properly without a Dataframe schema. Convert it to a JSON string as an alternative.

In [None]:
#CREATE a DataFrame using this modified data
df = spark.createDataFrame(sample_data)

✅ After running the cell above, you can anticipate receiving an output similar to the following:  `{'batch_id': 123456}`

In [None]:
df = client.get_task_status_to_table(spark, PROJECT_ID, task_id="task_id")
display(df)

✅ This cell returns a dataframe of task data and its current status. See docs for additional filters.

In [None]:
df = client.get_delivered_tasks_to_table(spark, PROJECT_ID, client_batch_id="651b4adaf97fd2713cabbba4", from_timestamp="2023-09-13T00:00:00.000Z")
display(df)

✅ This cell returns a dataframe of delivered(fully annotated) task data and answers. See docs for additional filters.

In [None]:
df = client.get_delivered_tasks_since_last_call_to_table(spark, PROJECT_ID, consumer="consumer value")
display(df)

✅ This cell returns a dataframe of delivered(fully annotated) task data and answers since the last call to this endpoint with a specific consumer key. See docs for additional filters.

# Other SDK functions
1. Get task and delivery schemas
1. Get status and cancel batch creation jobs
1. Update task priorities
1. Reject and delete tasks
1. Get project stats and information

Please see full documentation at [docs.sama.com](https://docs.sama.com)

In [None]:
#GET task creation schema
client.get_creation_task_schema(PROJECT_ID)

In [None]:
#GET delivery task schema 
client.get_delivery_task_schema(PROJECT_ID)

In [None]:
#CANCEL a batch creation job
client.cancel_batch_creation_job(PROJECT_ID, "testbatchid12345")

In [None]:
#UPDATE task priorities
client.update_task_priorities(PROJECT_ID, ["testtaskid1", "testtaskid2"], -100)

In [None]:
#REJECT a task
client.reject_task(PROJECT_ID, task_id="testtaskid3", reasons=["Not accurate"])

In [None]:
#DELETE a task
client.delete_tasks(PROJECT_ID, ["testtaskid4", "testtaskid5"])

In [None]:
#GET status of batch creation job
statuses = client.get_status_batch_creation_job(PROJECT_ID, "testbatchid")

for item in statuses:
    print(item)

In [None]:
#GET project information
client.get_project_information(PROJECT_ID)

In [None]:
#GET project stats
client.get_project_stats(PROJECT_ID)