##Notebook Setup##

In [0]:
from labelbox import Client
import databricks.koalas as pd
import labelspark

try: API_KEY
except NameError: 
  API_KEY = dbutils.notebook.run("api_key", 60)


In [0]:
client = Client(API_KEY)

projects = client.get_projects()
for project in projects:
    print(project.name, project.uid)

In [0]:
# can parse the directory and make a Spark table of image URLs

def create_unstructured_dataset(): 
  print("Creating table of unstructured image data")
  # Pull information from Data Lake or other storage  
  dataSet = client.get_dataset("ckolyi9ha7h800y7i5ppr3put") #Insert Dataset ID from Labelbox for a sample dataset 

  #creates a list of datarow dictionaries 
  df_list = [ {
          "external_id": dataRow.external_id,
          "row_data": dataRow.row_data
      } for dataRow in dataSet.data_rows()]

  # Create DataFrame 
  images = pd.DataFrame(df_list)
  df_images = images.to_spark()
#   display(df_images)
  df_images.registerTempTable("unstructured_data")
  # df_images = spark.createDataFrame(images) 

table_exists = False 
tblList = spark.catalog.listTables()
if len(tblList) == 0: 
  create_unstructured_dataset()
  table_exists = True

for table in tblList: 
    if table.name == "unstructured_data": 
      print("Unstructured data table exists")
      table_exists = True

if not table_exists: create_unstructured_dataset()

##Load Unstructured Data##

In [0]:
%sql 

select * from unstructured_data

In [0]:
from labelbox import Client
client = Client(API_KEY)

LabelSpark expects a spark table with two columns; the first column "external_id" and second column "row_data"

external_id is a filename, like "birds.jpg" or "my_video.mp4"

row_data is the URL path to the file. Labelbox renders assets locally on your users' machines when they label, so your labeler will need permission to access that asset. 

Example: 

| external_id | row_data                             |
|-------------|--------------------------------------|
| image1.jpg  | https://url_to_your_asset/image1.jpg |
| image2.jpg  | https://url_to_your_asset/image2.jpg |
| image3.jpg  | https://url_to_your_asset/image3.jpg |

In [0]:
import labelspark
unstructured_data = spark.table("unstructured_data")
dataSet_new = labelspark.create_dataset(client, unstructured_data, "Demo Dataset")

You can use the labelbox SDK to build your ontology. An example is provided below. 

Please refer to documentation at https://docs.labelbox.com/python-sdk/en/index-en

In [0]:
from labelbox.schema.ontology import OntologyBuilder, Tool, Classification, Option
# from labelbox import Client
# import os

ontology = OntologyBuilder()
tool_people = Tool(tool=Tool.Type.BBOX, name="People")
tool_car = Tool(tool=Tool.Type.SEGMENTATION, name="Car")
tool_umbrella = Tool(tool=Tool.Type.POLYGON, name="Umbrella")
Weather_Classification = Classification(class_type=Classification.Type.RADIO, instructions="Weather", 
                                       options=[Option(value="Clear"), 
                                                Option(value="Overcast"),
                                                Option(value="Rain"),
                                                Option(value="Other")])
Time_of_Day = Classification(class_type=Classification.Type.RADIO, instructions="Time of Day", 
                                       options=[Option(value="Day"),
                                                Option(value="Night"),
                                                Option(value="Unknown")])

ontology.add_tool(tool_people)
ontology.add_tool(tool_car)
ontology.add_tool(tool_umbrella)
ontology.add_classification(Weather_Classification)
ontology.add_classification(Time_of_Day)


project_demo2 = client.create_project(name="LabelSpark Demo Example", description = "Example description here.")
project_demo2.datasets.connect(dataSet_new)

# Setup frontends 
all_frontends = list(client.get_labeling_frontends())
for frontend in all_frontends:
    if frontend.name == 'Editor':
        project_frontend = frontend
        break

# Attach Frontends
project_demo2.labeling_frontend.connect(project_frontend) 
# Attach Project and Ontology
project_demo2.setup(project_frontend, ontology.asdict()) 


print("Project Setup is complete.")

##Bronze and Silver Annotation Tables##

Be sure to provide your Labelbox Project ID (a long string like "ckolzeshr7zsy0736w0usbxdy") to labelspark get_annotations method to pull in your labeled dataset. 

<br>bronze_table = labelspark.get_annotations(client,"ckolzeshr7zsy0736w0usbxdy", spark, sc) 

*These other methods transform the bronze table and do not require a project ID.* 
<br>flattened_bronze_table = labelspark.flatten_bronze_table(bronze_table)
<br>silver_table = labelspark.bronze_to_silver(bronze_table)

In [0]:
client = Client(API_KEY) #refresh client 
bronze_table = labelspark.get_annotations(client,"ckolzeshr7zsy0736w0usbxdj", spark, sc) #insert your unique project ID here
bronze_table.registerTempTable("street_photo_demo")
display(bronze_table.limit(2))

In [0]:
client = Client(API_KEY) #refresh client 
bronze_table = spark.table("street_photo_demo")
flattened_bronze_table = labelspark.flatten_bronze_table(bronze_table)
display(flattened_bronze_table.limit(1))

In [0]:
client = Client(API_KEY) #refresh client 
silver_table = labelspark.bronze_to_silver(bronze_table)
silver_table.registerTempTable("silver_table")
display(silver_table)

In [0]:
%sql 

SELECT * FROM silver_table 
WHERE `People.count` > 0 
AND `Umbrella.count` > 0
AND `Car.count` > 0
AND Weather = "Rain"

In [0]:
%sql 

SELECT * FROM silver_table
WHERE `People.count` > 10

In [0]:
def cleanup(): 
  client = Client(API_KEY)
  dataSet_new.delete()
  project_demo2.delete()

cleanup() 

### How To Get Video Project Annotations

Because Labelbox Video projects can contain multiple videos, you must use the `get_videoframe_annotations` method to return an array of DataFrames for each video in your project. Each DataFrame contains frame-by-frame annotation for a video in the project: 

```
bronze_video = labelspark.get_annotations(client,"labelbox_video_project_id_here", spark, sc) 
video_dataframes = labelspark.get_videoframe_annotations(bronze_video, API_KEY, spark, sc)    #note this extra step for video projects 
```
You may use standard LabelSpark methods iteratively to create your flattened bronze tables and silver tables: 
```
flattened_bronze_video_dataframes = []
silver_video_dataframes = [] 
for frameset in video_dataframes: 
  flattened_bronze_video_dataframes.append(labelspark.flatten_bronze_table(frameset))
  silver_video_dataframes.append(labelspark.bronze_to_silver(frameset))
```
This is how you would display the first video's frames and annotations, in sorted order: 
```
display(silver_video_dataframes[0]
        .join(bronze_video, ["DataRow ID"], "inner")
        .orderBy('frameNumber'), ascending = False)
```

While using LabelSpark, you will likely also use the Labelbox SDK (e.g. for programmatic ontology creation). These resources will help familiarize you with the Labelbox Python SDK: 
* [Visit our docs](https://labelbox.com/docs/python-api) to learn how the SDK works
* Checkout our [notebook examples](https://github.com/Labelbox/labelspark/tree/master/notebooks) to follow along with interactive tutorials
* view our [API reference](https://labelbox.com/docs/python-api/api-reference).