# 🔍 Retrieve Projects and Datasets with Labellerr SDK

> Master the fundamentals of retrieving and managing your annotation projects and datasets programmatically

## 📚 What You'll Learn

- ✅ Retrieve all projects associated with your client account
- ✅ Fetch linked and unlinked datasets with filtering options
- ✅ Parse and display project/dataset information efficiently
- ✅ Build practical workflows for project discovery and management
- ✅ Handle errors gracefully with best practices

---

## 📋 Prerequisites

Before you begin, make sure you have:

- **Google Colab** or any Jupyter notebook environment
- **Labellerr API credentials** (API Key, API Secret, and Client ID)
  - Get these from your Labellerr dashboard → Settings → API Access
- **Basic Python knowledge** (working with dictionaries, lists, and functions)

### 🔑 Setting up Colab Secrets

To securely store your credentials in Google Colab:
1. Click on the **🔑 key icon** in the left sidebar (Secrets)
2. Add the following secrets:
   - `LABELLERR_API_KEY` → Your API key
   - `LABELLERR_API_SECRET` → Your API secret
   - `LABELLERR_CLIENT_ID` → Your client ID
3. Toggle the **notebook access** switch to enable access

---

## 🛠️ Installation

Let's get started by installing the Labellerr SDK. Run the cell below:

In [None]:
!pip install git+https://github.com/tensormatics/SDKPython.git

# Optional: Install pandas for better data display
!pip install pandas

---

## 🔐 Authentication Setup

Now let's set up your API credentials using Colab Secrets for secure storage.

Make sure you've added your credentials to Colab Secrets (see Prerequisites section above).

In [None]:
from labellerr.client import LabellerrClient
from labellerr.exceptions import LabellerrError

# Get credentials from Colab Secrets
try:
    from google.colab import userdata
    api_key = userdata.get('LABELLERR_API_KEY')
    api_secret = userdata.get('LABELLERR_API_SECRET')
    client_id = userdata.get('LABELLERR_CLIENT_ID')
    print("✅ Credentials loaded from Colab Secrets")
except Exception as e:
    print("⚠️ Running outside Colab or secrets not configured")
    # Fallback: Direct input (not recommended for production)
    api_key = "your_api_key_here"
    api_secret = "your_api_secret_here"
    client_id = "your_client_id_here"
    print("⚠️ Using direct credentials. Please configure Colab Secrets for better security.")

# Initialize the Labellerr client
client = LabellerrClient(api_key, api_secret)

print("✅ Client initialized successfully!")
print(f"📍 Client ID: {client_id}")

---

## 📂 Section 1: Retrieve All Projects

Let's start by retrieving all projects associated with your client account. This is useful when you need to:

- 📊 Get an overview of all your annotation projects
- 🔍 Find specific project IDs for further operations
- 📈 Check project statuses and metadata
- 🗂️ Audit your organization's work

In [None]:
try:
    # Retrieve all projects for the client
    result = client.get_all_project_per_client_id(client_id)
    
    # Check if projects were retrieved successfully
    if result and 'response' in result:
        projects = result['response']
        print(f"✅ Found {len(projects)} projects\n")
        print("=" * 70)
        
        # Display project information
        for idx, project in enumerate(projects[:5], 1):  # Show first 5 projects
            print(f"\n📁 Project #{idx}")
            print(f"   Project ID: {project.get('project_id')}")
            print(f"   Name: {project.get('project_name')}")
            print(f"   Type: {project.get('data_type')}")
            print(f"   Status: {project.get('status', 'N/A')}")
            print("   " + "-" * 66)
        
        if len(projects) > 5:
            print(f"\n... and {len(projects) - 5} more projects")
    else:
        print("⚠️ No projects found or unexpected response format")
        projects = []
        
except LabellerrError as e:
    print(f"❌ Failed to retrieve projects: {str(e)}")
    projects = []
except Exception as e:
    print(f"❌ An unexpected error occurred: {str(e)}")
    projects = []

### Display Projects in a DataFrame

For better visualization, let's use pandas to display the projects:

In [None]:
import pandas as pd

if projects:
    # Extract relevant fields
    project_data = []
    for project in projects:
        project_data.append({
            'Project ID': project.get('project_id'),
            'Project Name': project.get('project_name'),
            'Data Type': project.get('data_type'),
            'Status': project.get('status', 'N/A'),
            'Created Date': project.get('created_at', 'N/A')
        })
    
    df_projects = pd.DataFrame(project_data)
    display(df_projects)
    
    # Display summary statistics
    print("\n📊 Project Summary:")
    print(f"   Total Projects: {len(df_projects)}")
    print(f"\n   Projects by Data Type:")
    print(df_projects['Data Type'].value_counts().to_string())
else:
    print("No projects to display")

### Filter Projects by Data Type

Let's filter projects by specific data types (e.g., only image projects):

In [None]:
# Filter projects by data type
target_data_type = 'image'  # Change to 'video', 'audio', 'document', or 'text'

filtered_projects = [p for p in projects if p.get('data_type') == target_data_type]

print(f"🔍 Found {len(filtered_projects)} {target_data_type} projects:\n")
for project in filtered_projects[:10]:  # Show first 10
    print(f"   • {project.get('project_name')} (ID: {project.get('project_id')})")

---

## 🗄️ Section 2: Retrieve All Datasets

Now let's retrieve datasets. Datasets can be:
- **Linked**: Already associated with projects
- **Unlinked**: Available for use in new projects

This is useful for:
- 📦 Managing dataset inventory
- 🔗 Finding datasets to link to new projects
- 🧹 Identifying unused datasets for cleanup
- 📊 Auditing dataset usage across projects

In [None]:
# Specify the data type and other parameters
data_type = 'image'  # Options: 'image', 'video', 'audio', 'document', 'text'
project_id = None    # Optional: filter by specific project
scope = 'all'        # Options: 'all', 'linked', 'unlinked'

try:
    # Retrieve all datasets
    result = client.get_all_dataset(client_id, data_type, project_id, scope)
    
    # Process datasets
    if result and 'response' in result and 'datasets' in result['response']:
        datasets = result['response']['datasets']
        
        print(f"✅ Found {len(datasets)} {data_type} datasets\n")
        print("=" * 70)
        
        # Display dataset information (first 5)
        for idx, dataset in enumerate(datasets[:5], 1):
            print(f"\n📦 Dataset #{idx}")
            print(f"   Dataset ID: {dataset.get('dataset_id')}")
            print(f"   Name: {dataset.get('name')}")
            print(f"   Description: {dataset.get('description', 'No description')}")
            print(f"   Data Type: {dataset.get('data_type')}")
            print(f"   File Count: {dataset.get('file_count', 'N/A')}")
            print("   " + "-" * 66)
        
        if len(datasets) > 5:
            print(f"\n... and {len(datasets) - 5} more datasets")
    else:
        print("⚠️ No datasets found or unexpected response format")
        datasets = []
        
except LabellerrError as e:
    print(f"❌ Failed to retrieve datasets: {str(e)}")
    datasets = []
except Exception as e:
    print(f"❌ An unexpected error occurred: {str(e)}")
    datasets = []

### Display Datasets in a DataFrame

In [None]:
if datasets:
    # Create DataFrame
    dataset_data = []
    for dataset in datasets:
        desc = dataset.get('description', '')
        short_desc = desc[:50] + '...' if len(desc) > 50 else desc
        dataset_data.append({
            'Dataset ID': dataset.get('dataset_id'),
            'Name': dataset.get('name'),
            'Data Type': dataset.get('data_type'),
            'File Count': dataset.get('file_count', 0),
            'Description': short_desc
        })
    
    df_datasets = pd.DataFrame(dataset_data)
    display(df_datasets)
    
    # Summary statistics
    print("\n📊 Dataset Summary:")
    print(f"   Total Datasets: {len(df_datasets)}")
    print(f"   Total Files: {df_datasets['File Count'].sum()}")
else:
    print("No datasets to display")

---

## 🎯 Section 3: Practical Workflow Example

Let's combine what we've learned to build a practical workflow:
1. Find a project by name
2. Get comprehensive workspace statistics

In [None]:
def find_project_by_name(project_name_query):
    """
    Find projects that match the given name (case-insensitive partial match)
    """
    try:
        result = client.get_all_project_per_client_id(client_id)
        if result and 'response' in result:
            all_projects = result['response']
            
            # Filter projects by name
            matching_projects = [
                p for p in all_projects 
                if project_name_query.lower() in p.get('project_name', '').lower()
            ]
            
            return matching_projects
        return []
    except Exception as e:
        print(f"❌ Error finding projects: {str(e)}")
        return []

# Example: Find projects with a specific term in the name
search_query = "project"  # Change this to your search term

print(f"🔍 Searching for projects with '{search_query}' in the name...\n")
found_projects = find_project_by_name(search_query)

if found_projects:
    print(f"✅ Found {len(found_projects)} matching project(s):\n")
    for project in found_projects[:5]:  # Show first 5
        print(f"📁 {project.get('project_name')}")
        print(f"   ID: {project.get('project_id')}")
        print(f"   Type: {project.get('data_type')}")
        print(f"   Status: {project.get('status', 'N/A')}")
        print()
else:
    print(f"❌ No projects found matching '{search_query}'")

### Get Workspace Statistics

In [None]:
def get_workspace_statistics():
    """
    Get comprehensive statistics about your workspace
    """
    stats = {
        'total_projects': 0,
        'projects_by_type': {},
        'total_datasets': 0,
        'datasets_by_type': {}
    }
    
    try:
        # Get projects
        project_result = client.get_all_project_per_client_id(client_id)
        if project_result and 'response' in project_result:
            projects = project_result['response']
            stats['total_projects'] = len(projects)
            
            # Count by type
            for project in projects:
                data_type = project.get('data_type', 'unknown')
                stats['projects_by_type'][data_type] = stats['projects_by_type'].get(data_type, 0) + 1
        
        # Get datasets for each type
        for data_type in ['image', 'video', 'audio', 'document', 'text']:
            try:
                dataset_result = client.get_all_dataset(client_id, data_type, None, 'all')
                if dataset_result and 'response' in dataset_result:
                    datasets = dataset_result['response'].get('datasets', [])
                    count = len(datasets)
                    if count > 0:
                        stats['datasets_by_type'][data_type] = count
                        stats['total_datasets'] += count
            except:
                continue
        
        return stats
    
    except Exception as e:
        print(f"❌ Error getting statistics: {str(e)}")
        return stats

# Get and display statistics
print("📊 Generating workspace statistics...\n")
stats = get_workspace_statistics()

print("=" * 70)
print("📈 WORKSPACE STATISTICS")
print("=" * 70)
print(f"\n🗂️  Total Projects: {stats['total_projects']}")
if stats['projects_by_type']:
    print("\n   Projects by Type:")
    for data_type, count in stats['projects_by_type'].items():
        print(f"      • {data_type.capitalize()}: {count}")

print(f"\n📦 Total Datasets: {stats['total_datasets']}")
if stats['datasets_by_type']:
    print("\n   Datasets by Type:")
    for data_type, count in stats['datasets_by_type'].items():
        print(f"      • {data_type.capitalize()}: {count}")

print("\n" + "=" * 70)

---

## ⚠️ Error Handling Best Practices

Always wrap your API calls in try-except blocks to handle errors gracefully:

In [None]:
def safe_get_projects(client_id):
    """
    Safely retrieve projects with comprehensive error handling
    """
    try:
        result = client.get_all_project_per_client_id(client_id)
        
        if not result:
            print("⚠️ No response received from API")
            return None
        
        if 'response' not in result:
            print("⚠️ Unexpected response format")
            return None
        
        projects = result['response']
        print(f"✅ Successfully retrieved {len(projects)} projects")
        return projects
        
    except LabellerrError as e:
        # Handle Labellerr-specific errors
        print(f"❌ Labellerr API Error: {str(e)}")
        print("   Please check your credentials and client ID")
        return None
        
    except ConnectionError as e:
        # Handle network errors
        print(f"❌ Connection Error: {str(e)}")
        print("   Please check your internet connection")
        return None
        
    except Exception as e:
        # Handle any other unexpected errors
        print(f"❌ Unexpected Error: {str(e)}")
        print(f"   Error Type: {type(e).__name__}")
        return None

# Example usage
projects = safe_get_projects(client_id)

---

## 🎯 Next Steps

Congratulations! You've mastered retrieving projects and datasets with the Labellerr SDK. 🎉

### Recommended Next Steps:

1. **Create Your First Project** - Learn how to create annotation projects programmatically
   - 📓 [02_create_project.ipynb](./02_create_project.ipynb)

2. **Upload Pre-Annotations** - Accelerate your workflow by uploading existing labels
   - 📓 [04_upload_preannotations.ipynb](./04_upload_preannotations.ipynb)

3. **Master Annotation Questions** - Create custom annotation questions for your projects
   - 📓 [03_annotation_questions.ipynb](./03_annotation_questions.ipynb)

### Additional Resources:

- 📖 [Labellerr Documentation](https://docs.labellerr.com)
- 🌐 [SDK GitHub Repository](https://github.com/tensormatics/SDKPython)
- 📧 **Technical Support**: support@tensormatics.com

---

### 💡 Pro Tips:

- Store project and dataset IDs in variables for easy reuse
- Use pandas DataFrame for better data exploration and filtering
- Always validate API responses before processing
- Use Colab Secrets to keep your API credentials secure

---

**Happy Annotating! 🚀**