<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/label_export/text.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/label_export/text.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Text Data Export
* Export labels from text annotation projects

In [1]:
!pip install labelbox

In [2]:
from labelbox import Client
import requests
from collections import Counter
import os

In [4]:
# Pick a project that has entity tools in the ontology and has completed labels
PROJECT_ID = "ckme5v7aykpoj0709ufi5h6i2"

# API Key and Client
Provide a valid api key below in order to properly connect to the Labelbox Client.

In [5]:
# Add your api key
API_KEY = None
client = Client(api_key=API_KEY)
project = client.get_project(PROJECT_ID)

### Export the labels

In [6]:
export_url = project.export_labels()

# labels can also be exported with `start` and `end` filters
# export_url = project.export_labels(start="2020-01-01", end="2020-01-02")

In [7]:
print(export_url)

https://storage.googleapis.com/labelbox-exports/ckk4q1vgapsau07324awnsjq2/ckme5v7aykpoj0709ufi5h6i2/export-2021-03-22T11%3A31%3A05.907Z.json?GoogleAccessId=api-prod%40labelbox-193903.iam.gserviceaccount.com&Expires=1617622268&Signature=VmqCl%2FTy60h8FO9q3E6TMmHpS5zgL5ZSD4YY%2BqBPBm2WCexOYnWsbCJ%2BHpqv%2Fy3y%2B9hMdSQiHVPbsScclza1UJC1xKCAdmNlzTnqZAaEkxoCSwKxNCtnKjRoMkYymlhjdrjxadxXeCmfnMGrGA3fr01KYweUdzUYX%2BzWoedno5Uq7aJNOB9HPjTJrltyJnmXbdQNdoKHr11xhzbqwdLFFZ8sW%2B5I2ZRiK2sC5LRoxazIlBu7om4clES4CzEwSSbggNb0A1ZtVg4MVp22XFzS7Ijdes%2FyjHbjm0HfXVzv4e6F5ag3eQ5oq3agUDJZsHw9m9PSbDwnDCAjUT4lRH7mMw%3D%3D&response-content-disposition=attachment


In [8]:
exports = requests.get(export_url).json()

* To get more information on the fields in the label payload, follow [our documentation here](https://docs.labelbox.com/data-model/en/index-en#label)

In [9]:
# Print first label
exports[0]["Label"]["objects"][0]

{'featureId': 'ckme60w4306hv0y8d7g7k64ky',
 'schemaId': 'ckme5v8wt01n10ybafw48f72g',
 'title': 'org',
 'value': 'org',
 'color': '#ff0000',
 'version': 1,
 'format': 'text.location',
 'data': {'location': {'start': 32670, 'end': 32690}},
 'instanceURI': 'https://api.labelbox.com/masks/feature/ckme60w4306hv0y8d7g7k64ky?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VySWQiOiJja2s0cTF2Z3djMHZwMDcwNHhoeDdtNHZrIiwib3JnYW5pemF0aW9uSWQiOiJja2s0cTF2Z2Fwc2F1MDczMjRhd25zanEyIiwiaWF0IjoxNjE2NDEyNjY1LCJleHAiOjE2MTkwMDQ2NjV9.BjsyyZebUwFqfv993ePUXl0DNAoNlXKwLYzgH1s7JUw'}

### Using the data
* This one data_row dataset is pretty simple. 
* We are just going to look at the entities

In [10]:
text = exports[0]["Labeled Data"]

In [11]:
people = []
orgs = []
for entity in exports[0]["Label"]["objects"]:
    location = entity["data"]["location"]
    if entity["title"] == "person":
        people.append(text[location["start"]:location["end"]])
    elif entity["title"] == "org":
        orgs.append(text[location["start"]:location["end"]])

In [12]:
Counter(people)

Counter({'Robin Wensley': 1,
         'Jones': 1,
         'Frank Cass': 1,
         'Robert': 1,
         'Armstrong': 1,
         'Kotler': 1,
         "Adam Smith's": 1,
         'Philip Kotler': 1})

In [13]:
Counter(orgs)

Counter({'Wikiquote\n Marketing': 1,
         'Wiktionary\n Quotations': 1,
         'Handbook of Marketing': 1,
         'Barton A.': 1,
         '2014\nWeitz': 1,
         'The Rise and Fall of Mass Marketing, Routledge': 1,
         'Geoffrey G.': 1,
         'Richard S.': 1,
         'Tedlow': 1,
         'Vol 25': 1,
         'Periodization in Marketing History," Journal of Macromarketing': 1,
         'Dix and Farlow, L.': 1,
         'D.G. Brian': 1,
         'Kathleen M.; Jones': 1,
         'Rassuli': 1,
         'Stanley C.': 1,
         'Hollander': 1,
         'The Emergence of Modern Marketing': 1,
         'Roy and Godley, Andrew (eds)': 1,
         'Harvard Business School Press. ISBN 978-0-87584-585-2.\nChurch': 1,
         'Christensen, Clayton M': 1,
         'Grid': 1,
         'The History of Marketing Thought': 1,
         'PLCIn': 1,
         'PLC': 2,
         'SBU': 5,
         'SBUs': 1,
         'SBU)': 1,
         'The Marketing Plan': 1,
         'YouTube': 