# 🧩 Label Studio + HuggingFace: Common Snippets

**Reusable code snippets for Label Studio and HuggingFace integration tutorials**

This notebook contains common setup, authentication, and utility functions that are used across multiple Label Studio + HuggingFace tutorials.

## 📚 What's Included:

1. **Setup & Installation**: Package installation and environment setup
2. **Authentication**: Connect to Label Studio and HuggingFace
3. **Data Import**: Load datasets from HuggingFace into Label Studio
4. **Data Export**: Export annotations from Label Studio
5. **Model Integration**: Connect HuggingFace models for predictions

## 🎯 How to Use This Notebook:

Each section is self-contained and can be copied into your own tutorial. The code is generic and can be adapted to different use cases (NER, classification, etc.).



---

## 🚀 Getting Started

### What is Label Studio?

**Label Studio** is an open-source data labeling platform that helps you:
- Annotate text, images, audio, video, and time series data
- Collaborate with teams on annotation projects
- Integrate ML models for pre-annotations and active learning
- Export labeled data in multiple formats

### What is HuggingFace?

**HuggingFace** is the leading platform for:
- Pre-trained NLP models (transformers)
- Public datasets for ML research
- Model training and fine-tuning tools
- Model hosting and deployment

### Why Integrate Them?

Combining Label Studio with HuggingFace creates a powerful ML workflow:

```
HuggingFace Datasets → Label Studio → Annotations → HuggingFace Models → Predictions → Label Studio
                              ↓                            ↓                           ↓
                         Easy Import                  Easy Training            Smart Pre-labeling
```

**Key Benefits:**
- ⚡ **10x faster labeling** with ML-assisted pre-annotations
- 🔄 **Seamless data pipeline** from import to training
- 📈 **Continuous improvement** through active learning loops
- 🏭 **Production-ready** automated workflows

### Prerequisites

Before starting, you'll need:

1. **Label Studio Instance**:
   - Local:
     - Install with `pip install label-studio`
     - run `label-studio start`
   - If Local wont work for your needs, consider Starter Cloud or Enterprise. [Compare Versions](https://humansignal.com/pricing?__hstc=90244869.f460bd510aca9872cc2c5cd0f3f951cf.1732197291196.1755110822269.1759412864367.16&__hssc=90244869.5.1759412864367&__hsfp=3975824757&_gl=1*1reqrcq*_gcl_au*MjAxNDk2MTY2NC4xNzU0NjU5OTMw*_ga*MTcyNjc4NDcyNi4xNzMyMTk3Mjkx*_ga_NQELN45JRH*czE3NTk0MTI4NjMkbzI3JGcxJHQxNzU5NDE0MzA1JGo1OSRsMCRoMA..)
   

2. **Label Studio API Key**:
   - Go to Label Studio → Account & Settings → Access Token
   - Copy your API token

3. **HuggingFace Account** (optional but recommended):
   - Sign up at [huggingface.co](https://huggingface.co)
   - Get your token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)

4. **Python 3.8+** with pip installed

Ready? Let's dive in! 👇


---

## 📦 Section 1: Installation

Install the required Python packages for Label Studio and HuggingFace integration.


In [None]:
# Install required packages
%pip install -q label-studio-sdk datasets transformers torch huggingface_hub accelerate

print("✅ All packages installed successfully!")


**What was installed:**
- `label-studio-sdk`: Python SDK for Label Studio API
- `datasets`: HuggingFace datasets library
- `transformers`: HuggingFace transformers for models
- `torch`: PyTorch (required for transformers)
- `huggingface_hub`: Authentication and model hub access
- `accelerate`: Distributed training utilities


### 1.2: Configure Credentials
To support loading credentials from Google Colab Secrets with fallback to .env and environment sourced variables the following cell can be used.

In [None]:
%pip install python-dotenv

# Load configuration with Google Colab Secrets support + fallback
IS_GOOGLE_COLAB = False

# Load from .env file if available (for local development)
try:
    from dotenv import load_dotenv
    load_dotenv()
except:
    pass  # will use system env vars

def get_credential(key, default=None):
    global IS_GOOGLE_COLAB
    """Get credential from Colab Secrets first, then environment variables"""
    try:
        # Try Google Colab Secrets first (most secure)
        from google.colab import userdata
        IS_GOOGLE_COLAB = True
        return userdata.get(key)
    except:
        from os import environ
        IS_GOOGLE_COLAB = False
        # Fallback to environment variables (for local Jupyter)
        return environ.get(key, default)

---

## 🔐 Section 2: Authentication

### 2.1: Connect to Label Studio

Set your environment variables before running:

```bash
export LABEL_STUDIO_URL="http://localhost:8080"  # or your Label Studio URL
export LABEL_STUDIO_API_KEY="your-api-key-here"
```

**How to get your API key:**
1. Open Label Studio in your browser
2. Click on your profile (top-right)
3. Go to "Account & Settings"
4. Click "Personal Access Token"
5. Click "Create new Token"
6. Copy the token


In [None]:
import os
from label_studio_sdk import Client

# Get credentials from environment variables
ls_api_key = os.environ.get('LABEL_STUDIO_API_KEY')
ls_url = os.environ.get('LABEL_STUDIO_URL', 'http://localhost:8080')

if not ls_api_key:
    raise ValueError('❌ Please set LABEL_STUDIO_API_KEY environment variable.')

# Connect to Label Studio
try:
    ls = Client(url=ls_url, api_key=ls_api_key)
    connection_status = ls.check_connection()
    print(f'✅ Connected to Label Studio at {ls_url}')
    print(f'   Connection status: {connection_status}')
except Exception as e:
    raise ConnectionError(f'❌ Failed to connect to Label Studio: {str(e)}')


### 2.2: Authenticate with HuggingFace

Set your HuggingFace token (recommended):

```bash
export HF_TOKEN="your-hf-token-here"
```

**How to get your HuggingFace token:**
1. Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
2. Click "New token"
3. Choose "Read" access (or "Write" if you plan to upload models)
4. Copy the token

**Note:** Authentication is optional for public models and datasets. You only need it for:
- Private models or datasets
- Uploading models to the Hub
- Higher API rate limits


In [None]:
from huggingface_hub import login

# Get HuggingFace token (optional but recommended)
hf_token = os.environ.get('HF_TOKEN')

if hf_token:
    try:
        login(token=hf_token)
        print('✅ Logged into Hugging Face Hub')
    except Exception as e:
        print(f'⚠️  Warning: HF login failed: {str(e)}')
        print('   Continuing with public models only...')
else:
    print('ℹ️  No HF_TOKEN provided. Using public models only.')


---

## 📥 Section 3: Import Data from HuggingFace to Label Studio

### 3.1: Create a Label Studio Project

This example creates a simple text classification project. You can customize the `label_config` for your specific use case.

**Common label configs:**
- Named Entity Recognition (NER)
- Text Classification
- Question Answering
- Sentiment Analysis


In [None]:
# Example: Create a text classification project
# Customize this label_config for your use case

label_config = '''
<View>
  <Text name="text" value="$text"/>
  <Choices name="label" toName="text" choice="single">
    <Choice value="Positive"/>
    <Choice value="Negative"/>
    <Choice value="Neutral"/>
  </Choices>
</View>
'''

# Create or retrieve project
project_title = 'HuggingFace Integration Example'

try:
    project = ls.start_project(
        title=project_title,
        label_config=label_config,
        description='Example project for HuggingFace integration'
    )
    print(f'✅ Project created successfully!')
except Exception as e:
    # Project might already exist, try to get it
    projects = ls.list_projects()
    project = next((p for p in projects if p.get_params()['title'] == project_title), None)
    if project:
        print(f'ℹ️  Using existing project: {project_title}')
    else:
        raise e

print(f'   Project ID: {project.id}')
print(f'   Project URL: {ls_url}/projects/{project.id}')


### 3.2: Load Dataset from HuggingFace

Load a dataset from the HuggingFace Hub. You can:
- Choose any dataset from [huggingface.co/datasets](https://huggingface.co/datasets)
- Select specific splits (train, test, validation)
- Limit the number of examples with slice notation


In [None]:
from datasets import load_dataset

# Example: Load IMDb sentiment dataset
# Replace with your dataset of choice
print('📦 Loading dataset from HuggingFace...')

# Load dataset (adjust parameters for your use case)
dataset = load_dataset(
    'imdb',           # Dataset name
    split='test[:50]' # Take first 50 examples from test split
)

print(f'   Loaded {len(dataset)} examples')
print(f'\n📝 Sample data:')
print(f'   Text preview: {dataset[0]["text"][:200]}...')
print(f'   Label: {dataset[0].get("label", "N/A")}')


### 3.3: Convert to Label Studio Format

Convert HuggingFace dataset format to Label Studio task format.

**Label Studio task format:**
```python
{
    "data": {"text": "Your text here"},  # Must match label config
    "meta": {"source": "huggingface"}   # Optional metadata
}
```


In [None]:
# Convert HuggingFace format to Label Studio format
tasks = []

for idx, row in enumerate(dataset):
    # Customize based on your dataset structure
    # This example assumes the dataset has a 'text' field
    task = {
        "data": {
            "text": row['text']  # Adjust field name to match your dataset
        },
        "meta": {
            "source": "huggingface",
            "dataset_index": idx
        }
    }
    tasks.append(task)

print(f'✅ Converted {len(tasks)} examples to Label Studio format')
print(f'\n📝 Sample task (truncated for display):')
sample_task = tasks[0].copy()
sample_task['data']['text'] = sample_task['data']['text'][:200] + '...'
print(f'   {sample_task}')


### 3.4: Import Tasks into Label Studio

Upload the converted tasks to your Label Studio project.


In [None]:
# Import tasks into Label Studio
print(f'📤 Importing {len(tasks)} tasks into Label Studio...')

project.import_tasks(tasks)

# Verify import
imported_tasks = project.get_tasks()
print(f'✅ Successfully imported {len(imported_tasks)} tasks!')
print(f'\n💡 Next step: Go to Label Studio and start annotating!')
print(f'   {ls_url}/projects/{project.id}')


---

## 📤 Section 4: Export Annotations from Label Studio

### 4.1: Export Labeled Data

Export annotations from Label Studio in JSON format.


In [None]:
# Export annotations from Label Studio
print('📥 Exporting annotations from Label Studio...')

# Export in JSON format
annotations = project.export_tasks(export_type='JSON')

# Filter for labeled tasks only
labeled_tasks = [task for task in annotations if task.get('annotations')]

print(f'   Total tasks: {len(annotations)}')
print(f'   Labeled tasks: {len(labeled_tasks)}')

if len(labeled_tasks) == 0:
    print('\n⚠️  No labeled data found. Please annotate some tasks in Label Studio first.')
else:
    print(f'\n📝 Sample annotation:')
    print(f'   Text: {labeled_tasks[0]["data"]["text"][:100]}...')


### 4.2: Convert to HuggingFace Dataset Format

Convert Label Studio annotations to HuggingFace Dataset format for training.

**Note:** This is a simple example for classification. You'll need to customize this for:
- Token classification (NER)
- Question answering
- Other complex tasks


In [None]:
from datasets import Dataset

# Example: Extract text classification labels
# Customize based on your labeling config

texts = []
labels = []

for task in labeled_tasks:
    # Get the text
    text = task['data']['text']

    # Get the label from first annotation
    annotation = task['annotations'][0]
    results = annotation.get('result', [])

    if results:
        # Extract label (adjust based on your label config)
        # For Choices type, the label is in value['choices'][0]
        label = results[0]['value']['choices'][0]

        texts.append(text)
        labels.append(label)

# Create HuggingFace Dataset
if len(texts) > 0:
    hf_dataset = Dataset.from_dict({
        "text": texts,
        "label": labels
    })

    print(f'✅ Created HuggingFace dataset with {len(hf_dataset)} examples')
    print(f'\n📝 Sample:')
    sample = hf_dataset[0].copy()
    # Truncate long text for display
    if len(sample['text']) > 200:
        sample['text'] = sample['text'][:200] + '...'
    print(f'   Text: {sample["text"]}')
    print(f'   Label: {sample["label"]}')
else:
    print('⚠️  No labeled data to convert. Please annotate some tasks first.')


---

## 🤖 Section 5: Connect HuggingFace Models for Predictions

### 5.1: Load a HuggingFace Model

Load a pre-trained model from HuggingFace for generating predictions.


In [None]:
from transformers import pipeline

# Example: Load sentiment analysis pipeline
# Replace with your model/task of choice
print('🤗 Loading HuggingFace model...')

# Load a pipeline (easiest way)
model = pipeline(
    "sentiment-analysis",  # Task type
    model="distilbert-base-uncased-finetuned-sst-2-english"  # Model name
)

print('✅ Model loaded successfully!')

# Test the model
sample_text = "This is a great tutorial!"
prediction = model(sample_text)
print(f'\n🧪 Test prediction:')
print(f'   Input: "{sample_text}"')
print(f'   Output: {prediction}')


### 5.2: Generate Predictions for Label Studio

Create predictions in Label Studio format and upload them to your project.


In [None]:
# Get unlabeled tasks
print('📋 Fetching unlabeled tasks...')
all_tasks = project.get_tasks()
unlabeled_tasks = [task for task in all_tasks if not task.get('annotations')]

print(f'   Total tasks: {len(all_tasks)}')
print(f'   Unlabeled tasks: {len(unlabeled_tasks)}')

if len(unlabeled_tasks) == 0:
    print('\n✅ All tasks are already labeled!')
else:
    # Generate predictions
    print(f'\n🔮 Generating predictions for {min(10, len(unlabeled_tasks))} tasks...')

    # Label mapping: Convert model output to Label Studio labels
    # HuggingFace sentiment model outputs "POSITIVE"/"NEGATIVE"
    # but our label config expects "Positive"/"Negative"/"Neutral"
    label_mapping = {
        "POSITIVE": "Positive",
        "NEGATIVE": "Negative",
        "NEUTRAL": "Neutral"
    }

    prediction_count = 0
    for task in unlabeled_tasks[:10]:  # Demo: first 10 tasks
        try:
            text = task['data']['text']

            # Get model prediction
            pred = model(text)[0]

            # Map the label (use original if not in mapping)
            mapped_label = label_mapping.get(pred['label'], pred['label'])

            # Convert to Label Studio format
            # Customize based on your label config
            result = [{
                "from_name": "label",
                "to_name": "text",
                "type": "choices",
                "value": {
                    "choices": [mapped_label]
                },
                "score": pred['score']
            }]

            # Create prediction in Label Studio
            project.create_prediction(
                task_id=task['id'],
                result=result,
                model_version='huggingface-sentiment-model'
            )

            prediction_count += 1

        except Exception as e:
            print(f'   ⚠️  Error on task {task["id"]}: {str(e)}')
            continue

    print(f'\n✅ Successfully created {prediction_count} pre-annotations!')
    print(f'\n💡 Next steps:')
    print(f'   1. Go to Label Studio: {ls_url}/projects/{project.id}')
    print(f'   2. Review and correct the predictions')
    print(f'   3. Submit your annotations')


---

## 🎉 That's It!

You now have all the essential building blocks for integrating Label Studio with HuggingFace.

### What You Can Do Next:

1. **Customize for Your Use Case**: Adapt these snippets for NER, QA, or other tasks
2. **Build Complete Pipelines**: Combine these snippets into end-to-end workflows
3. **Add Active Learning**: Use confidence scores to prioritize uncertain examples
4. **Deploy ML Backends**: Create persistent prediction servers for production

### More Resources:

- 📖 [Label Studio Documentation](https://labelstud.io/guide/)
- 🤗 [HuggingFace Tutorials](https://huggingface.co/docs)
- 🎯 [Label Studio SDK Reference](https://labelstud.io/sdk/)
- 💬 [Join Label Studio Slack](https://slack.labelstudio.heartex.com/)

---

Happy labeling! 🏷️✨
