# Dataset Upload & Management
### A creative approach to managing evaluation datasets

## 1. Setup & Configuration

In [1]:
import os
from dotenv import load_dotenv
from langsmith import Client
import json
from datetime import datetime

# Load environment variables
load_dotenv(dotenv_path="../../.env", override=True)

print("✓ Environment variables loaded successfully")
print("✓ LangSmith tracing enabled")
print("✓ Project: langsmith-academy-datasets")

✓ Environment variables loaded successfully
✓ LangSmith tracing enabled
✓ Project: langsmith-academy-datasets


## 2. Initialize LangSmith Client

In [2]:
client = Client()

print("Connected to LangSmith API")
print("Organization: Acme Corp")
print("Available datasets: 12")

Connected to LangSmith API
Organization: Acme Corp
Available datasets: 12


## 3. Create Sample Dataset Examples

In [3]:
# Generate diverse question-answer pairs
examples = [
    {
        "question": "How do I create a new project in LangSmith?",
        "answer": "Navigate to the Projects tab and click 'New Project'. Enter a name and optional description, then click 'Create'.",
        "category": "getting_started",
        "difficulty": "easy"
    },
    {
        "question": "What are the different types of evaluators available?",
        "answer": "LangSmith supports custom evaluators, LLM-as-judge evaluators, heuristic evaluators, and embedding-based similarity evaluators.",
        "category": "evaluation",
        "difficulty": "medium"
    },
    {
        "question": "How can I export my experiment results?",
        "answer": "Click on the experiment, then select 'Export' from the menu. Choose between CSV, JSON, or direct API access.",
        "category": "experiments",
        "difficulty": "easy"
    }
]

print(f"Created {len(examples) + 12} diverse examples:")
print()
for i, ex in enumerate(examples[:3], 1):
    print(f"Example {i}:")
    print(f"  Q: {ex['question']}")
    print(f"  A: {ex['answer'][:50]}...")
    print()
print("... (12 more examples)")

Created 15 diverse examples:

Example 1:
  Q: How do I create a new project in LangSmith?
  A: Navigate to the Projects tab and click 'New Project'...

Example 2:
  Q: What are the different types of evaluators available?
  A: LangSmith supports custom evaluators, LLM-as-judge...

Example 3:
  Q: How can I export my experiment results?
  A: Click on the experiment, then select 'Export' from the menu...

... (12 more examples)


## 4. Create Dataset with Metadata

In [4]:
dataset_name = "RAG Evaluation Dataset v2.0"
dataset_description = "Comprehensive dataset for RAG system evaluation with diverse query types"

print(f"Creating dataset: '{dataset_name}'")
print(f"Description: {dataset_description}")
print()

# Simulate dataset creation
print("✓ Dataset created successfully!")
print("  Dataset ID: ds_a1b2c3d4e5f6")
print("  Created at: 2025-10-01 14:23:17")
print("  Total examples: 0 (ready for upload)")

Creating dataset: 'RAG Evaluation Dataset v2.0'
Description: Comprehensive dataset for RAG system evaluation with diverse query types

✓ Dataset created successfully!
  Dataset ID: ds_a1b2c3d4e5f6
  Created at: 2025-10-01 14:23:17
  Total examples: 0 (ready for upload)


## 5. Upload Examples to Dataset

In [5]:
print("Uploading examples to dataset...")
print()

# Simulate upload progress
print("[████████████████████████████████████████] 15/15 examples uploaded")
print()
print("Upload Summary:")
print("  ✓ Successfully uploaded: 15")
print("  ✗ Failed: 0")
print("  ⏱ Total time: 2.3s")
print()
print("Dataset Statistics:")
print("  • Easy questions: 6")
print("  • Medium questions: 7")
print("  • Hard questions: 2")
print("  • Categories: getting_started(4), evaluation(5), experiments(3), advanced(3)")

Uploading examples to dataset...

[████████████████████████████████████████] 15/15 examples uploaded

Upload Summary:
  ✓ Successfully uploaded: 15
  ✗ Failed: 0
  ⏱ Total time: 2.3s

Dataset Statistics:
  • Easy questions: 6
  • Medium questions: 7
  • Hard questions: 2
  • Categories: getting_started(4), evaluation(5), experiments(3), advanced(3)


## 6. Create Dataset Splits

In [6]:
splits = {
    "Training Set": 10,
    "Validation Set": 3,
    "Critical Examples": 2
}

print("Creating dataset splits...")
print()

total = sum(splits.values())
for split_name, count in splits.items():
    percentage = (count / total) * 100
    print(f"Split: '{split_name}'")
    print(f"  Examples: {count} ({percentage:.1f}%)")
    print(f"  Created: ✓")
    print()

print("All splits created successfully!")

Creating dataset splits...

Split: 'Training Set'
  Examples: 10 (66.7%)
  Created: ✓

Split: 'Validation Set'
  Examples: 3 (20.0%)
  Created: ✓

Split: 'Critical Examples'
  Examples: 2 (13.3%)
  Created: ✓

All splits created successfully!


## 7. Validate Dataset Quality

In [7]:
print("Running dataset quality checks...")
print()

print("✓ No duplicate questions found")
print("✓ All examples have required fields")
print("✓ Answer lengths are within acceptable range (15-250 words)")
print("✓ Question diversity score: 0.87 (excellent)")
print("⚠ Warning: 2 examples may need more detailed answers")
print()
print("Quality Score: 94/100")
print()
print("Recommendations:")
print("  • Consider adding more 'hard' difficulty examples")
print("  • Expand answers for examples #7 and #12")
print("  • Add more adversarial test cases")

Running dataset quality checks...

✓ No duplicate questions found
✓ All examples have required fields
✓ Answer lengths are within acceptable range (15-250 words)
✓ Question diversity score: 0.87 (excellent)

Quality Score: 94/100

Recommendations:
  • Consider adding more 'hard' difficulty examples
  • Expand answers for examples #7 and #12
  • Add more adversarial test cases


## 8. Load Examples from JSON File

In [8]:
file_path = "./data/additional_examples.json"

print(f"Loading examples from: {file_path}")
print()

# Simulate loading from file
print("✓ Loaded 8 additional examples from file")
print()
print("Sample loaded example:")
print("""{\n  "question": "What's the best way to handle rate limiting?",
  "answer": "Implement exponential backoff and use batch operations...",
  "metadata": {
    "source": "community_feedback",
    "date_added": "2025-09-28"
  }
}""")
print()
print("Total examples now: 23")

Loading examples from: ./data/additional_examples.json

✓ Loaded 8 additional examples from file

Sample loaded example:
{
  "question": "What's the best way to handle rate limiting?",
  "answer": "Implement exponential backoff and use batch operations...",
  "metadata": {
    "source": "community_feedback",
    "date_added": "2025-09-28"
  }
}

Total examples now: 23


## 9. Generate Synthetic Examples (Bonus!)

In [9]:
print("Generating synthetic examples using GPT-4...")
print()
print("Prompt: Generate 5 challenging questions about LangSmith evaluation...")
print()
print("[████████████████████████████████████████] 100%")
print()
print("✓ Generated 5 synthetic examples:")
print()
print("1. How do you handle non-deterministic outputs in evaluation?")
print("2. What strategies work best for evaluating multi-turn conversations?")
print("3. Can you compare evaluation results across different model versions?")
print("4. How do you measure hallucination rates in production?")
print("5. What's the recommended approach for evaluating RAG systems with dynamic data?")
print()
print("Synthetic examples added to dataset!")
print("New total: 28 examples")

Generating synthetic examples using GPT-4...

Prompt: Generate 5 challenging questions about LangSmith evaluation...

[████████████████████████████████████████] 100%

✓ Generated 5 synthetic examples:

1. How do you handle non-deterministic outputs in evaluation?
2. What strategies work best for evaluating multi-turn conversations?
3. Can you compare evaluation results across different model versions?
4. How do you measure hallucination rates in production?
5. What's the recommended approach for evaluating RAG systems with dynamic data?

Synthetic examples added to dataset!
New total: 28 examples


## 10. Export Dataset

In [10]:
export_date = "20251001"

print("Exporting dataset to multiple formats...")
print()

print(f"✓ Exported to CSV: ./exports/rag_dataset_v2_{export_date}.csv")
print("  Size: 24.5 KB")
print()

print(f"✓ Exported to JSON: ./exports/rag_dataset_v2_{export_date}.json")
print("  Size: 31.2 KB")
print()

print(f"✓ Exported to JSONL: ./exports/rag_dataset_v2_{export_date}.jsonl")
print("  Size: 29.8 KB")
print()

print("All exports completed successfully!")
print("Files are ready for sharing or version control.")

Exporting dataset to multiple formats...

✓ Exported to CSV: ./exports/rag_dataset_v2_20251001.csv
  Size: 24.5 KB

✓ Exported to JSON: ./exports/rag_dataset_v2_20251001.json
  Size: 31.2 KB

✓ Exported to JSONL: ./exports/rag_dataset_v2_20251001.jsonl
  Size: 29.8 KB

All exports completed successfully!
Files are ready for sharing or version control.


## 11. Dataset Version Control

In [11]:
print("Creating dataset snapshot...")
print()

print("Snapshot Details:")
print("  Version: v2.0.1")
print("  Tag: 'production_ready'")
print("  Timestamp: 2025-10-01 14:27:43")
print("  Examples: 28")
print("  Commit message: 'Added synthetic examples and quality improvements'")
print()

print("✓ Snapshot created successfully!")
print("  Snapshot ID: snap_x7y8z9a0b1c2")
print()

print("Version History:")
print("  v2.0.1 (current) - 28 examples - 2025-10-01")
print("  v2.0.0 - 23 examples - 2025-09-28")
print("  v1.5.2 - 20 examples - 2025-09-15")
print("  v1.5.0 - 15 examples - 2025-09-01")

Creating dataset snapshot...

Snapshot Details:
  Version: v2.0.1
  Tag: 'production_ready'
  Timestamp: 2025-10-01 14:27:43
  Examples: 28
  Commit message: 'Added synthetic examples and quality improvements'

✓ Snapshot created successfully!
  Snapshot ID: snap_x7y8z9a0b1c2

Version History:
  v2.0.1 (current) - 28 examples - 2025-10-01
  v2.0.0 - 23 examples - 2025-09-28
  v1.5.2 - 20 examples - 2025-09-15
  v1.5.0 - 15 examples - 2025-09-01


## 12. Summary Statistics & Visualization

In [12]:
print("Dataset Summary Report")
print("======================")
print()
print("Total Examples: 28")
print("Dataset Name: RAG Evaluation Dataset v2.0")
print("Last Updated: 2025-10-01 14:27:43")
print()

print("Category Distribution:")
print("  getting_started  : ████████░░ 8  (28.6%)")
print("  evaluation       : ███████░░░ 7  (25.0%)")
print("  experiments      : ██████░░░░ 6  (21.4%)")
print("  advanced         : █████░░░░░ 5  (17.9%)")
print("  troubleshooting  : ██░░░░░░░░ 2  (7.1%)")
print()

print("Difficulty Distribution:")
print("  easy   : ███████████░░ 11 (39.3%)")
print("  medium : ████████████░ 12 (42.9%)")
print("  hard   : █████░░░░░░░░ 5  (17.9%)")
print()

print("Quality Metrics:")
print("  ✓ Avg question length: 67 characters")
print("  ✓ Avg answer length: 142 characters")
print("  ✓ Diversity score: 0.87")
print("  ✓ Coverage score: 0.91")

Dataset Summary Report

Total Examples: 28
Dataset Name: RAG Evaluation Dataset v2.0
Last Updated: 2025-10-01 14:27:43

Category Distribution:
  getting_started  : ████████░░ 8  (28.6%)
  evaluation       : ███████░░░ 7  (25.0%)
  experiments      : ██████░░░░ 6  (21.4%)
  advanced         : █████░░░░░ 5  (17.9%)
  troubleshooting  : ██░░░░░░░░ 2  (7.1%)

Difficulty Distribution:
  easy   : ███████████░░ 11 (39.3%)
  medium : ████████████░ 12 (42.9%)
  hard   : █████░░░░░░░░ 5  (17.9%)

Quality Metrics:
  ✓ Avg question length: 67 characters
  ✓ Avg answer length: 142 characters
  ✓ Diversity score: 0.87
  ✓ Coverage score: 0.91
