GitHub - Samasource/sama-python-client: Sama Python SDK

Sama Python SDK and Databricks Connector

This is the Python Client for the Sama API endpoints and Databricks Connector.

See our Python SDK tutorial and Databricks Connector tutorial.

Usage

Python Client

from sama import Client

client = Client("your_api_key")
client.create_task_batch("project_id", [{"url": "https://yoururl.com/img.jpg", "input2": "value2"}])
client.get_delivered_tasks("project_id", from_timestamp="2023-09-02T10:23:36.536167366Z")

Databricks Connector

from sama.databricks import Client

client = Client("your_api_key")
client.create_task_batch_from_table("project_id", spark_df) # spark_df contains inputs to tasks
spark_df = client.get_delivered_tasks_to_table(spark, "project_id", from_timestamp="2023-09-02T10:23:36.536167366Z")

sama Client

This class provides methods to interact with SamaHub API endpoints.

`init` method

This method is the constructor to initialize the SamaHub API client.

Parameters

api_key (str): The API key to use for authentication.
silent (bool, optional): Whether to suppress all print/log statements. Defaults to True.
logger (Logger, optional): The logger to use for logging. Defaults to None.
log_level (int, optional): The log level to use for logging. Defaults to logging.INFO.

`create_task_batch`

This method creates a batch of tasks on SamaHub using the asynchronous batch task creation API endpoints, specifically the tasks file upload approach.

Parameters

proj_id (str): The project ID on SamaHub where tasks will be created.
task_data_records (List[Dict[str, Any]]): A list of task "data" dictionaries which can contain inputs and pre-annotations.
batch_priority (int, default=0): The priority of the batch. A negative number indicates a higher priority.
notification_email (Union[str, None], default=None): An email address where SamaHub will send notifications about the batch creation status.
submit (bool, default=False): A flag determining whether to create the tasks in a submitted state.

Returns

A JSON response from the last create a batch of tasks from an uploaded file endpoint call.

Description

The method first constructs a tasks list with data, priority, and submission status. It then calls the 'create a batch of tasks' endpoint without providing the actual tasks list. This initial call returns a batch ID and a tasks_put_url (a pre-signed AWS S3 URL), which allows for the tasks to be uploaded directly to S3. This method bypasses the 1000 tasks limit. After uploading the tasks to S3, a subsequent API call is made to notify the system that the tasks are uploaded and to begin the task creation process.

`cancel_batch_creation_job`

Cancels an ongoing batch creation job.

Parameters:

proj_id (str): The project ID on SamaHub where the task exists.
batch_id (str): The ID of the batch to cancel.

Returns:

Response from the API endpoint.

`reject_task`

Rejects a task on SamaHub to send it back for rework.

Parameters:

proj_id (str): The project ID on SamaHub where the task exists.
task_id (str): The ID of the task to reject.
reasons (List[str]): List of reasons for rejecting the task.

Returns:

Response from the API endpoint.

Updates priority of tasks

Parameters:

project_id (str): The project ID on SamaHub where the task exists.
task_ids (List[str]): The IDs of the tasks to update priority.
priority (int): The priority.

Returns:

Response from the API endpoint.

Delete tasks

Parameters:

project_id (str): The project ID on SamaHub where the task exists.
task_ids (List[str]): The IDs of the tasks to delete.

Returns:

Response from the API endpoint.

`get_task_status`

Fetches the status and details of a single task. More details can be found in the Sama documentation.

Parameters:

proj_id: The project ID on SamaHub.
task_id: The ID of the task to fetch.
same_as_delivery (default=True): Whether to fetch the task as it would be delivered.

Returns:

Task details from the API.

`get_multi_task_status`

Fetches status and details for multiple tasks. Returns a generator object. More details can be found in the Sama documentation.

Parameters:

proj_id (str): The unique identifier of the project on SamaHub. This parameter specifies the project under which the tasks reside.
batch_id (str, optional): The identifier for a batch within the project. If provided, it filters tasks that belong to this specific batch.
client_batch_id (str, optional): The client-specific identifier for a batch. This is useful for filtering tasks based on client-defined batches.
client_batch_id_match_type (str, optional): This parameter specifies how the client_batch_id should be matched. Common options might include "exact" or "contains".
date_type (str, optional): Determines which date to use for the timestamp filters. Examples might include "creation_date" or "completion_date".
from_timestamp (str, optional): Filters tasks that have a date (specified by date_type) after this timestamp.
to_timestamp (str, optional): Filters tasks that have a date (specified by date_type) before this timestamp.
state (TaskStates, optional): An enum value that specifies the desired status of the tasks to filter. For example, "delivered" or "acknowledged".
omit_answers (bool, optional): Flag to determine if answers related to tasks should be omitted from the response. Defaults to True.

Returns:

An iterable generator object with task details.

`get_delivered_tasks`

Get all task deliveries since a given timestamp (RFC3339 format).

Parameters:

proj_id (str): The unique identifier of the project on SamaHub. It specifies the project under which the deliveries reside.
batch_id (str, optional): The identifier for a batch within the project. If provided, it filters deliveries that belong to this specific batch.
client_batch_id (str, optional): The client-specific identifier for a batch. This is useful for filtering deliveries based on client-defined batches.
client_batch_id_match_type (str, optional): Specifies how the client_batch_id should be matched. Common options might include "exact" or "contains".
from_timestamp (str, optional): Filters deliveries that have a date after this timestamp.
task_id (str, optional): The unique identifier for a specific task. If provided, it fetches deliveries related to this specific task.

Returns:

An iterable generator object with task deliveries.

`get_deliveried_tasks_since_last_call`

Fetches all task deliveries since the last call based on a consumer token.

Parameters:

proj_id (str): The unique identifier of the project on SamaHub. It specifies the project under which the deliveries reside.
batch_id (str, optional): The identifier for a batch within the project. If provided, it filters deliveries that belong to this specific batch.
client_batch_id (str, optional): The client-specific identifier for a batch. This is useful for filtering deliveries based on client-defined batches.
client_batch_id_match_type (str, optional): Specifies how the client_batch_id should be matched. Common options might include "exact" or "contains".
`consumer (str, optional)``: Token that identifies the caller, so different consumers can be in different places of the delivered tasks list.

Returns:

An iterable generator object with task deliveries.

`get_status_batch_creation_job`

Fetches information about a batch creation job.

Parameters:

proj_id: The project ID on SamaHub.
batch_id: The ID of the batch to fetch details for.
omit_failed_task_data (default=False): Whether to omit data about failed tasks.

Returns:

Batch creation job details.

`get_creation_task_schema`

Fetches the JSON schema for task creation on SamaHub.

Parameters:

project_id (str): The project ID on SamaHub.

Returns:

A dictionary containing the JSON schema for task creation.

`get_delivery_task_schema`

Fetches the JSON schema for task deliveries on SamaHub.

Parameters:

project_id (str): The project ID on SamaHub.

Returns:

A dictionary containing the JSON schema for task deliveries.

`get_project_information`

Fetches high-level information about a project from SamaHub.

Parameters:

project_id (str): The project ID on SamaHub.

Returns:

A dictionary containing information about the project.

`get_project_stats`

Fetches high-level statistics about a project's tasks within a specified time frame from SamaHub.

Parameters:

project_id (str): The project ID on SamaHub.
from_timestamp (str, optional): Filters tasks that have a date after this timestamp.
to_timestamp (str, optional): Filters tasks that have a date before this timestamp.

Returns:

A dictionary containing project statistics.

sama.databricks Client

`create_task_batch_from_table` method

Creates a batch of tasks using data from a DataFrame. Each DataFrame column will be used as an input to the task creation, e.g. url='https://wiki.com/img.jpg', client_batch_id='batch1' Prepend 'output_' to column to specify pre-annotations Return JSON - batch_id if successful

Parameters:

spark_dataframe (DataFrame): The Spark DataFrame to be converted to task data records.
project_id (str): The project ID on SamaHub where tasks are to be created.
batch_priority (int): The priority of the batch. Defaults to 0. Negative numbers indicate higher priority
notification_email (Union[str, None]): The email address where SamaHub should send notifications about the batch creation status. Defaults to None
submit (bool): Whether to create the tasks in submitted state. Defaults to False

Returns: JSON - batch_id if successful

`get_delivered_tasks_to_table` method

This method fetches all deliveries since a given timestamp

Parameters:

spark (SparkSession) : A spark session
project_id (str): The unique identifier of the project on SamaHub. Specifies the project under which the deliveries reside.
batch_id (str, optional): The identifier for a batch within the project. If provided, filters deliveries that belong to this batch.
client_batch_id (str, optional): The client-specific identifier for a batch. Useful for filtering deliveries based on client-defined batches.
client_batch_id_match_type (str, optional): Specifies how the client_batch_id should be matched. Common options might include "exact" or "contains".
from_timestamp (str, optional): Filters deliveries that have a date after this timestamp.
task_id (str, optional): The unique identifier for a specific task. If provided, fetches deliveries related to this specific task.

Returns:

Returns deliveries in a DataFrame.

`get_delivered_tasks_since_last_call_to_table` method

Fetches all deliveries since the last call based on a consumer token.

Parameters:

spark (SparkSession) : A spark session
project_id (str): The unique identifier of the project on SamaHub. Specifies the project under which the deliveries reside.
batch_id (str, optional): The identifier for a batch within the project. If provided, filters deliveries that belong to this batch.
client_batch_id (str, optional): The client-specific identifier for a batch. Useful for filtering deliveries based on client-defined batches.
client_batch_id_match_type (str, optional): Specifies how the client_batch_id should be matched. Common options might include "exact" or "contains".
consumer (str, optional): Token that identifies the caller, so different consumers can be in different places of the delivered tasks list.

Returns:

Returns deliveries in a DataFrame.

`get_task_status_to_table`

Fetches the status and details of a single task. More details can be found in the Sama documentation.

Parameters:

spark (SparkSession) : A spark session
project_id: The project ID on SamaHub.
task_id: The ID of the task to fetch.
same_as_delivery (default=True): Whether to fetch the task as it would be delivered.

Returns:

A DataFrame containing the task status

`get_multi_task_status_to_table`

Fetches status and details for multiple tasks. More details can be found in the Sama documentation.

Parameters:

spark (SparkSession) : A spark session
project_id (str): The unique identifier of the project on SamaHub. This parameter specifies the project under which the tasks reside.
batch_id (str, optional): The identifier for a batch within the project. If provided, it filters tasks that belong to this specific batch.
client_batch_id (str, optional): The client-specific identifier for a batch. This is useful for filtering tasks based on client-defined batches.
client_batch_id_match_type (str, optional): This parameter specifies how the client_batch_id should be matched. Common options might include "exact" or "contains".
date_type (str, optional): Determines which date to use for the timestamp filters. Examples might include "creation_date" or "completion_date".
from_timestamp (str, optional): Filters tasks that have a date (specified by date_type) after this timestamp.
to_timestamp (str, optional): Filters tasks that have a date (specified by date_type) before this timestamp.
state (TaskStates, optional): An enum value that specifies the desired status of the tasks to filter. For example, "delivered" or "acknowledged".
omit_answers (bool, optional): Flag to determine if answers related to tasks should be omitted from the response. Defaults to True.

Returns:

A DataFrame with tasks and their status

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
sama		sama
tests		tests
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE.txt		LICENSE.txt
README.md		README.md
Sama_Databricks_Connector_Tutorial.ipynb		Sama_Databricks_Connector_Tutorial.ipynb
Sama_Python_SDK_Tutorial.ipynb		Sama_Python_SDK_Tutorial.ipynb
setup.py		setup.py

License

Samasource/sama-python-client

Folders and files

Latest commit

History

Repository files navigation

Sama Python SDK and Databricks Connector

Usage

Python Client

Databricks Connector

sama Client

__init__ method

Parameters

create_task_batch

Parameters

Returns

Description

cancel_batch_creation_job

reject_task

get_task_status

get_multi_task_status

get_delivered_tasks

get_deliveried_tasks_since_last_call

get_status_batch_creation_job

get_creation_task_schema

get_delivery_task_schema

get_project_information

get_project_stats

sama.databricks Client

create_task_batch_from_table method

get_delivered_tasks_to_table method

get_delivered_tasks_since_last_call_to_table method

get_task_status_to_table

get_multi_task_status_to_table

About

Resources

License

Stars

Watchers

Forks

Languages

`init` method

`create_task_batch`

`cancel_batch_creation_job`

`reject_task`

`get_task_status`

`get_multi_task_status`

`get_delivered_tasks`

`get_deliveried_tasks_since_last_call`

`get_status_batch_creation_job`

`get_creation_task_schema`

`get_delivery_task_schema`

`get_project_information`

`get_project_stats`

`create_task_batch_from_table` method

`get_delivered_tasks_to_table` method

`get_delivered_tasks_since_last_call_to_table` method

`get_task_status_to_table`

`get_multi_task_status_to_table`