# Tutorial 01: Milvus Basics
Welcome to the first tutorial in our Vector Database series! In this notebook, you'll learn how to set up Milvus, create connections, databases, users, collections, and perform basic operations. Let's get started!

---

## Prerequisites
Before you begin, make sure you have the required Python packages installed.

In [None]:
# Install required packages for Milvus and LLM operations (Python 3.13.9 compatible - Latest versions)
!pip install pymilvus>=2.4.8
!pip install openai>=1.54.0
!pip install langchain>=0.3.7
!pip install langchain-openai>=0.2.8
!pip install tiktoken>=0.8.0
!pip install transformers>=4.46.0
!pip install pandas>=2.2.3
!pip install pdfminer.six>=20231228
!pip install numpy>=1.26.4

!pip install langchain-huggingface>=0.1.0 
!pip install sentence-transformers>=3.0.0 
!pip install torch>=2.0.0  

## Step 1: Connect to Milvus
Let's start by connecting to your local Milvus instance. We'll use the `pymilvus` library for this purpose.

In [190]:
# Connect to Milvus (tutorial connection)
from pymilvus import connections

# Add a named connection for tutorial purposes
connections.add_connection(
    tutorial_conn={
        "host": "localhost",
        "port": "19530",
        "username": "",
        "password": ""
    }
)

# Connect using the tutorial connection name
connections.connect("tutorial_conn")

# List all active connections
connections.list_connections()

[('default', None),
 ('tutorial_conn',
  <pymilvus.client.grpc_handler.GrpcHandler at 0x259fe9fb6b0>)]

## Step 2: Create a Database and User
Now, let's create a new database and a user for our tutorial.

In [191]:
# Database operations for the tutorial
from pymilvus import db

# Get current list of databases available to the connection
tutorial_dbs = db.list_database(using='tutorial_conn')
print('Current databases:', tutorial_dbs)

tutorial_db_name = 'tutorial_courses_db'

if tutorial_db_name not in tutorial_dbs:
    print('Creating database:', tutorial_db_name)
    tutorial_db = db.create_database(tutorial_db_name, using='tutorial_conn')

# Switch to use the new database
db.using_database(tutorial_db_name, using='tutorial_conn')

Current databases: ['default']
Creating database: tutorial_courses_db


In [192]:
# User management for the tutorial
from pymilvus import Role, utility

current_users = utility.list_usernames(using='tutorial_conn')
print('Current user list:', current_users)

tutorial_user = 'tutorial_student'

if tutorial_user not in current_users:
    utility.create_user(tutorial_user, 'tutorial_password', using='tutorial_conn')

# Assign a role to the user
student_role = Role('public', using='tutorial_conn')
print('Role public exists?', student_role.is_exist())

# Add user to role
student_role.add_user(tutorial_user)

Current user list: ['root']
Role public exists? True


## Step 3: Create a Collection
Let's define a collection to store course information and their vector embeddings.

In [None]:
from pymilvus import CollectionSchema, FieldSchema, DataType, Collection
import json

# Define fields for the tutorial collection (updated with latest field specifications)
course_id_field = FieldSchema(
    name='tutorial_series_ID',
    dtype=DataType.INT64,
    is_primary=True,
    auto_id=False  # Explicit specification for clarity
)

course_title_field = FieldSchema(
    name='series',
    dtype=DataType.VARCHAR,
    max_length=256
)

course_desc_field = FieldSchema(
    name='series_description',
    dtype=DataType.VARCHAR,
    max_length=2048
)

embedding_field = FieldSchema(
    name='description_embedding',
    dtype=DataType.FLOAT_VECTOR,
    dim=1536,  # OpenAI text-embedding-3-small dimension
    description="Vector embeddings for course descriptions"
)

# Define schema with updated parameters
tutorial_schema = CollectionSchema(
    fields=[course_id_field, course_title_field, course_desc_field, embedding_field],
    description='Tutorial Series Collection with Vector Embeddings',
    enable_dynamic_field=True,
    auto_id=False  # Explicit control over ID generation
)

tutorial_collection_name = 'tutorials_collection'

# Create the collection with updated parameters
tutorial_collection = Collection(
    name=tutorial_collection_name,
    schema=tutorial_schema,
    using='tutorial_conn',
    shards_num=2,
    consistency_level="Strong"  # Explicit consistency level
)

from pymilvus import utility
# List all collections
print('Current collections:', utility.list_collections(using='tutorial_conn'))

# Setup existing collection into another object
r_collection = Collection(tutorial_collection_name, using='tutorial_conn')
print('\nCollection Schema:')
print(r_collection.schema)

In [193]:
# Alternative Collection Schema for Local Embeddings (384 dimensions)
# Use this cell if you're using the local HuggingFace embedding model

from pymilvus import CollectionSchema, FieldSchema, DataType, Collection, utility
import json

# Define fields for the tutorial collection (for local embeddings - 384 dimensions)
course_id_field_local = FieldSchema(
    name='tutorial_series_ID',
    dtype=DataType.INT64,
    is_primary=True,
    auto_id=False
)

course_title_field_local = FieldSchema(
    name='series',
    dtype=DataType.VARCHAR,
    max_length=256
)

course_desc_field_local = FieldSchema(
    name='series_description',
    dtype=DataType.VARCHAR,
    max_length=2048
)

embedding_field_local = FieldSchema(
    name='description_embedding',
    dtype=DataType.FLOAT_VECTOR,
    dim=384,  # HuggingFace all-MiniLM-L6-v2 dimension
    description="Vector embeddings for course descriptions (local model)"
)

# Define schema for local embeddings
tutorial_schema_local = CollectionSchema(
    fields=[course_id_field_local, course_title_field_local, course_desc_field_local, embedding_field_local],
    description='Tutorial Series Collection with Local Vector Embeddings',
    enable_dynamic_field=True,
    auto_id=False
)

tutorial_collection_name_local = 'tutorials_collection_local'

# Check if collection already exists and drop it if needed
existing_collections = utility.list_collections(using='tutorial_conn')
print('Current collections:', existing_collections)

if tutorial_collection_name_local in existing_collections:
    print(f'Collection "{tutorial_collection_name_local}" already exists. Dropping it...')
    utility.drop_collection(tutorial_collection_name_local, using='tutorial_conn')
    print('Collection dropped successfully.')

# Create the collection for local embeddings
tutorial_collection_local = Collection(
    name=tutorial_collection_name_local,
    schema=tutorial_schema_local,
    using='tutorial_conn',
    shards_num=2,
    consistency_level="Strong"
)

print(f'Collection "{tutorial_collection_name_local}" created successfully for local embeddings!')
print('Updated collections:', utility.list_collections(using='tutorial_conn'))

# Use this collection for the rest of the tutorial if using local embeddings
tutorial_collection = tutorial_collection_local
tutorial_collection_name = tutorial_collection_name_local

Current collections: []
Collection "tutorials_collection_local" created successfully for local embeddings!
Updated collections: ['tutorials_collection_local']


## Step 4: Insert Data into Milvus
Let's load some example course data and insert it into our collection.

In [194]:
# Read the input tutorial series CSV
import pandas as pd
tutorial_series = pd.read_csv("TutorialSeries-Descriptions.csv")
tutorial_series.head()

Unnamed: 0,Tutorial Series ID,Playlist,Description
0,1,Its all about CMD,Learn the power of the Command Line Interface ...
1,2,MS Excel Course - Begineer & Intermediate Users,"Build a strong foundation in Excel, starting f..."
2,3,Image Processing With OpenCV in Python,Dive into image processing using OpenCV and Py...
3,4,DSA - Interview Preparation Series,A focused series on Data Structures and Algori...
4,5,Deep Learning- Single Image Super Resolution,Explore how deep learning can enhance image qu...


In [None]:
# Use langchain-openai for embeddings (latest API syntax)
from langchain_openai import OpenAIEmbeddings
import os

# Setup OpenAI API key for embedding generation
openai_api_key = 'your-openai-api-key-here'  # Replace with your own key
os.environ['OPENAI_API_KEY'] = openai_api_key

# Initialize with latest model and explicit parameters
embeddings_model = OpenAIEmbeddings(
    model="text-embedding-3-small",
    dimensions=1536,  # Explicit dimension specification
    show_progress_bar=True,  # Show progress for batch operations
)
print("OpenAI Embedding model loaded successfully!")

In [None]:
# Azure OpenAI setup (updated import and latest API)
from langchain_openai import AzureOpenAIEmbeddings
import os

# Azure-specific setup - IMPORTANT: Replace with your actual credentials
azure_api_key = 'your-azure-api-key-here'  # Replace with your Azure OpenAI key
azure_endpoint = 'https://your-resource.openai.azure.com/'  # Replace with your Azure endpoint
azure_deployment_name = 'your-deployment-name'  # Replace with your model's deployment name

# Set the environment variables for Azure OpenAI
os.environ['AZURE_OPENAI_API_KEY'] = azure_api_key
os.environ['AZURE_OPENAI_ENDPOINT'] = azure_endpoint

# Initialize the embedding model for Azure (latest API version)
embeddings_model = AzureOpenAIEmbeddings(
    azure_deployment=azure_deployment_name,
    openai_api_version="2024-10-21",  # Latest API version
    model="text-embedding-3-small",  # Specify model explicitly
)

print("Azure Embedding model loaded successfully!")

In [198]:
# Prepare data for insert (with compatible local embedding model)
tutorial_ids = tutorial_series['Tutorial Series ID'].tolist()
tutorial_titles = tutorial_series['Playlist'].tolist()
tutorial_descriptions = tutorial_series['Description'].tolist()
 
print(f"Processing {len(tutorial_descriptions)} tutorial descriptions...")
 
# Try different embedding approaches in order of preference
embeddings_model = None
 
# Option 1: Try direct transformers (most compatible)
try:
    print("Attempting to use transformers directly...")
    from transformers import AutoTokenizer, AutoModel
    import torch
    import numpy as np
   
    class DirectTransformersEmbeddings:
        def __init__(self):
            self.model_name = "sentence-transformers/all-MiniLM-L6-v2"
            print(f"Loading tokenizer and model: {self.model_name}")
           
            # Load tokenizer and model directly
            self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
            self.model = AutoModel.from_pretrained(self.model_name)
           
            # Set device
            self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
            self.model.to(self.device)
            self.model.eval()  # Set to evaluation mode
           
            print(f"Direct transformers model loaded on {self.device}")
            print("This model produces 384-dimensional embeddings")
       
        def embed_query(self, text):
            # Tokenize
            inputs = self.tokenizer(
                text,
                return_tensors='pt',
                truncation=True,
                padding=True,
                max_length=512
            )
           
            # Move to device
            inputs = {k: v.to(self.device) for k, v in inputs.items()}
           
            # Generate embeddings
            with torch.no_grad():
                outputs = self.model(**inputs)
               
                # Mean pooling
                attention_mask = inputs['attention_mask']
                token_embeddings = outputs.last_hidden_state
                input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
                embeddings = torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
               
                # Normalize
                embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
           
            return embeddings.cpu().numpy()[0].tolist()
   
    embeddings_model = DirectTransformersEmbeddings()
   
except Exception as e:
    print(f"Direct transformers failed: {e}")
   
    # Option 2: Try sentence-transformers with compatibility fix
    try:
        print("Attempting sentence-transformers with compatibility fix...")
       
        # Try to fix the LRScheduler issue
        import torch.optim.lr_scheduler as lr_scheduler
        if not hasattr(lr_scheduler, 'LRScheduler'):
            # For older PyTorch versions, LRScheduler might be named differently
            if hasattr(lr_scheduler, '_LRScheduler'):
                lr_scheduler.LRScheduler = lr_scheduler._LRScheduler
            else:
                # Create a dummy LRScheduler class
                class LRScheduler:
                    def __init__(self, *args, **kwargs):
                        pass
                lr_scheduler.LRScheduler = LRScheduler
       
        from sentence_transformers import SentenceTransformer
       
        class HuggingFaceEmbeddings:
            def __init__(self):
                self.model = SentenceTransformer('all-MiniLM-L6-v2')
                print("HuggingFace SentenceTransformer model loaded successfully!")
                print("This model produces 384-dimensional embeddings")
           
            def embed_query(self, text):
                embedding = self.model.encode(text, normalize_embeddings=True)
                return embedding.tolist()
       
        embeddings_model = HuggingFaceEmbeddings()
       
    except Exception as e:
        print(f"Sentence-transformers with fix failed: {e}")
 
# Option 3: Fallback to improved TF-IDF if all else fails
if embeddings_model is None:
    print("Using improved TF-IDF embeddings (no downloads required)...")
   
    import numpy as np
    from collections import Counter
    import re
    import math
 
    class ImprovedTextEmbeddings:
        def __init__(self, dimension=384):
            self.dimension = dimension
            self.vocabulary = {}
            self.idf_scores = {}
            print("Improved text embedding model initialized (no downloads required)")
       
        def preprocess_text(self, text):
            text = text.lower()
            text = re.sub(r'[^\w\s-]', ' ', text)
            words = [word for word in text.split() if len(word) > 2]
            return words
       
        def build_vocabulary(self, texts):
            all_words = []
            for text in texts:
                words = self.preprocess_text(text)
                all_words.extend(words)
           
            word_counts = Counter(all_words)
            stop_words = {'the', 'and', 'for', 'are', 'with', 'this', 'that', 'you', 'can', 'how', 'use', 'will', 'from', 'all'}
            filtered_words = {word: count for word, count in word_counts.items()
                             if word not in stop_words and count > 1}
           
            self.vocabulary = {word: idx for idx, (word, _) in enumerate(
                sorted(filtered_words.items(), key=lambda x: x[1], reverse=True)[:self.dimension])}
           
            doc_count = len(texts)
            for word in self.vocabulary:
                doc_freq = sum(1 for text in texts if word in self.preprocess_text(text))
                self.idf_scores[word] = math.log((doc_count + 1) / (doc_freq + 1)) + 1
           
            print(f"Built vocabulary with {len(self.vocabulary)} informative words")
       
        def embed_query(self, text):
            words = self.preprocess_text(text)
            word_counts = Counter(words)
           
            vector = np.zeros(self.dimension)
           
            for word, count in word_counts.items():
                if word in self.vocabulary:
                    idx = self.vocabulary[word]
                    tf = 1 + math.log(count) if count > 0 else 0
                    tfidf = tf * self.idf_scores.get(word, 0)
                    vector[idx] = tfidf
           
            # Add small random noise to prevent identical vectors
            noise = np.random.normal(0, 0.001, self.dimension)
            vector = vector + noise
           
            norm = np.linalg.norm(vector)
            if norm > 0:
                vector = vector / norm
            else:
                vector = np.random.normal(0, 0.01, self.dimension)
                vector = vector / np.linalg.norm(vector)
           
            return vector.tolist()
 
    embeddings_model = ImprovedTextEmbeddings(dimension=384)
    embeddings_model.build_vocabulary(tutorial_descriptions)
 
# Generate embeddings with progress tracking
desc_embeddings = []
for i, desc in enumerate(tutorial_descriptions, 1):
    print(f"Generating embedding {i}/{len(tutorial_descriptions)}: {desc[:50]}...")
    embedding = embeddings_model.embed_query(desc)
    desc_embeddings.append(embedding)
 
print(f"Generated {len(desc_embeddings)} embeddings successfully!")
 
# Check embedding quality
print("\nEmbedding Quality Check:")
if len(desc_embeddings) >= 2:
    import numpy as np
    emb1 = np.array(desc_embeddings[0])
    emb2 = np.array(desc_embeddings[1])
    similarity = np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2))
    print(f"Sample similarity between embeddings 1 and 2: {similarity:.4f}")
    print(f"Sample distance (1 - similarity): {1 - similarity:.4f}")
 
# Verify data consistency
print(f"\nData verification:")
print(f"  IDs: {len(tutorial_ids)}")
print(f"  Titles: {len(tutorial_titles)}")  
print(f"  Descriptions: {len(tutorial_descriptions)}")
print(f"  Embeddings: {len(desc_embeddings)}")
print(f"  Embedding dimensions: {len(desc_embeddings[0]) if desc_embeddings else 'N/A'}")
 
# Format for data input
insert_data = [tutorial_ids, tutorial_titles, tutorial_descriptions, desc_embeddings]
 
# Important note about dimensions
if desc_embeddings and len(desc_embeddings[0]) == 384:
    print("\nIMPORTANT: You're using 384-dimensional embeddings!")
    print("   Make sure your collection schema uses dim=384, not dim=1536")
    print("   Run the 'Alternative Collection Schema' cell (cell 10) if you haven't already")

Processing 10 tutorial descriptions...
Attempting to use transformers directly...
Loading tokenizer and model: sentence-transformers/all-MiniLM-L6-v2
Direct transformers failed: 
AutoModel requires the PyTorch library but it was not found in your environment. Check out the instructions on the
installation page: https://pytorch.org/get-started/locally/ and follow the ones that match your environment.
Please note that you may need to restart your runtime after installation.

Attempting sentence-transformers with compatibility fix...
Sentence-transformers with fix failed: name 'LRScheduler' is not defined
Using improved TF-IDF embeddings (no downloads required)...
Improved text embedding model initialized (no downloads required)
Built vocabulary with 39 informative words
Generating embedding 1/10: Learn the power of the Command Line Interface CMD ...
Generating embedding 2/10: Build a strong foundation in Excel, starting from ...
Generating embedding 3/10: Dive into image processing using

In [199]:
# Insert data into the tutorial collection (with error handling)
tutorial_collection = Collection(tutorial_collection_name, using='tutorial_conn')

try:
    print("Inserting data into collection...")
    mr = tutorial_collection.insert(insert_data)
    print(f"Insert result: {mr}")
    print(f"Inserted {mr.insert_count} records successfully!")
    
    # Flush the data after insert
    print('Flushing data to storage...')
    tutorial_collection.flush(timeout=180)
    print('Data flushed successfully!')
    
    # Verify the data was inserted
    print(f"Total entities in collection: {tutorial_collection.num_entities}")
    
except Exception as e:
    print(f"Error during insertion: {e}")
    print("Please check your data format and collection schema.")

Inserting data into collection...
Insert result: (insert count: 10, delete count: 0, upsert count: 0, timestamp: 461808329323708417, success count: 10, err count: 0
Inserted 10 records successfully!
Flushing data to storage...
Data flushed successfully!
Total entities in collection: 20


## Step 5: Build an Index
Let's build an index on the embedding field to enable fast vector search.

In [200]:
# Build an index for the embedding field (updated with latest parameters)
index_params = {
    'metric_type': 'COSINE', 
    'index_type': 'HNSW',     
    'params': {
        'M': 16,              # Number of connections for HNSW
        'efConstruction': 200  # Size of dynamic candidate list for HNSW
    }
}

# Create index with progress tracking
tutorial_collection.create_index(
    field_name='description_embedding',
    index_params=index_params,
    timeout=None  # Allow unlimited time for large datasets
)

# Check index building progress
progress = utility.index_building_progress(tutorial_collection_name, using='tutorial_conn')
print(f"Index building progress: {progress}")

Index building progress: {'total_rows': 20, 'indexed_rows': 20, 'pending_index_rows': 0, 'state': 'Finished'}


## Step 6: Query Scalar Data
Let's query the collection for a specific course by its ID.

In [201]:
# Load the collection into memory before querying
tutorial_collection.load()
print('Tutorial collection loaded.')

Tutorial collection loaded.


In [202]:
# Query for a specific course by ID (fixed field name)
result = tutorial_collection.query(
    expr='tutorial_series_ID == 1',
    output_fields=['series', 'series_description']
)
print(result)
if result:
    print('\nResult object:', type(result[0]))

data: ["{'series': 'Its all about CMD', 'series_description': 'Learn the power of the Command Line Interface CMD to navigate, manage files, and automate tasks on Windows. This series covers essential commands and practical use cases for beginners. Ideal for those looking to boost productivity and troubleshoot systems efficiently.', 'tutorial_series_ID': 1}"], extra_info: {}

Result object: <class 'dict'>


In [203]:
# Query for tutorial with title containing 'Azure' and ID > 5
result2 = tutorial_collection.query(
    expr='(series like "Azure") and (tutorial_series_ID > 5)',
    output_fields=['series', 'series_description']
)
print(result2)

data: [], extra_info: {}


## Step 7: Search Vector Fields
Let's search for courses using vector similarity.

In [204]:
# Search for courses similar to a query string (updated search parameters)
search_params = {
    'metric_type': 'COSINE',  # Match the index metric type
    'offset': 0,
    'ignore_growing': False,
    'params': {
        'ef': 64  # Search parameter for HNSW index
    }
}

search_string = 'excel'
search_embed = embeddings_model.embed_query(search_string)

search_results = tutorial_collection.search(
    data=[search_embed],
    anns_field='description_embedding',
    param=search_params,
    limit=10,
    expr=None,
    output_fields=['series', 'series_description'],  # Include description in output
    consistency_level='Strong',
    round_decimal=4  # Round distances to 4 decimal places
)

print('Search results for:', search_string)
print('Search result type:', type(search_results[0]), '\n')
for i, result in enumerate(search_results[0], 1):
    print(f"{i}. ID: {result.id}, Distance: {result.distance:.4f}")
    print(f"   Series: {result.entity.get('series')}")
    print(f"   Description: {result.entity.get('series_description')[:100]}...")
    print()

Search results for: excel
Search result type: <class 'pymilvus.client.search_result.HybridHits'> 

1. ID: 5, Distance: 0.0659
   Series: Deep Learning- Single Image Super Resolution
   Description: Explore how deep learning can enhance image quality using Single Image Super Resolution SISR. Learn ...

2. ID: 4, Distance: 0.0520
   Series: DSA - Interview Preparation Series
   Description: A focused series on Data Structures and Algorithms DSA designed to help you crack technical intervie...

3. ID: 8, Distance: 0.0459
   Series: Azure ML Essentials
   Description: Get started with Azure Machine Learning and understand its core services. Learn to build, train, and...

4. ID: 10, Distance: 0.0201
   Series: .NET Core Web Applications with Azure 
   Description: Learn how to build, deploy, and scale modern web applications using .NET Core and Azure services. Co...

5. ID: 9, Distance: 0.0200
   Series: Autoamtion Testing Essentials - Selenium and Playwright
   Description: Master the basi

In [205]:
# Search for an unrelated query (updated with better formatting)
search_string2 = 'best movies of the year'
search_embed2 = embeddings_model.embed_query(search_string2)

search_results2 = tutorial_collection.search(
    data=[search_embed2],
    anns_field='description_embedding',
    param=search_params,
    limit=5,  # Reduced limit for cleaner output
    expr=None,
    output_fields=['series', 'series_description'],
    consistency_level='Strong',
    round_decimal=4
)

print('Search results for:', search_string2)
print('(Note: Higher distances indicate less similarity)\n')
for i, result in enumerate(search_results2[0], 1):
    print(f"{i}. ID: {result.id}, Distance: {result.distance:.4f}")
    print(f"   Series: {result.entity.get('series')}")
    print(f"   Description: {result.entity.get('series_description')[:100]}...")
    print()

Search results for: best movies of the year
(Note: Higher distances indicate less similarity)

1. ID: 2, Distance: -0.0014
   Series: MS Excel Course - Begineer & Intermediate Users
   Description: Build a strong foundation in Excel, starting from basic functions to more advanced tools like PivotT...

2. ID: 4, Distance: -0.0016
   Series: DSA - Interview Preparation Series
   Description: A focused series on Data Structures and Algorithms DSA designed to help you crack technical intervie...

3. ID: 6, Distance: -0.0128
   Series: Image Processing- Lane detection and Autonomous Driving
   Description: Learn how computer vision enables lane detection in self-driving cars. This series covers edge detec...

4. ID: 1, Distance: -0.0194
   Series: Its all about CMD
   Description: Learn the power of the Command Line Interface CMD to navigate, manage files, and automate tasks on W...

5. ID: 3, Distance: -0.0209
   Series: Image Processing With OpenCV in Python
   Description: Dive into imag

## Step 8: Delete Objects and Entities
Let's see how to delete records and collections in Milvus.

In [206]:
# Delete a single record by course ID (fixed field name)
tutorial_collection.delete('tutorial_series_ID in [2]')

(insert count: 0, delete count: 1, upsert count: 0, timestamp: 461808405038235652, success count: 0, err count: 0

In [207]:
# Drop the collection
utility.drop_collection(tutorial_collection_name, using='tutorial_conn')

In [208]:
# Drop the database (make sure all collections are dropped first)
db.drop_database(tutorial_db_name, using='tutorial_conn')

In [None]:
## Troubleshooting Tips

If you encounter any issues:

1. **Connection Issues**: Make sure Milvus is running on `localhost:19530`
   - Check if Docker containers are running: `docker ps`
   - Restart Milvus if needed: `docker-compose restart`

2. **API Key Issues**: Replace placeholder API keys with your actual credentials
   - OpenAI: Get key from [OpenAI Platform](https://platform.openai.com/api-keys)
   - Azure OpenAI: Get key from Azure Portal

3. **Package Compatibility**: All packages are updated for Python 3.13.9 compatibility
   - Use `pip install --upgrade` if you encounter version conflicts
   - Consider using virtual environments for isolation

4. **Memory Issues**: Consider reducing batch sizes if working with large datasets
   - Process embeddings in smaller batches
   - Increase system memory or use cloud instances

5. **Index Performance**: HNSW index provides better performance than IVF_FLAT
   - Adjust `M` and `efConstruction` parameters based on your dataset size
   - Use COSINE metric for embeddings instead of L2

## Summary

In this updated tutorial, you learned how to:
- ✅ Connect to Milvus and manage databases/users
- ✅ Create collections with vector fields using latest schema options
- ✅ Insert data with embeddings and progress tracking
- ✅ Build efficient HNSW indexes for vector search
- ✅ Query both scalar and vector data with improved search parameters
- ✅ Clean up resources properly

### Key Updates for Python 3.13.9:
- **Latest package versions** with compatibility guarantees
- **Improved embedding models** (text-embedding-3-small)
- **Better index types** (HNSW instead of IVF_FLAT)
- **Enhanced error handling** and progress tracking
- **Optimized search parameters** for better performance

The notebook is now fully compatible with **Python 3.13.9** and uses the latest available package versions! 🚀