# Azure Cognitive Search Backup and Restore Sample

This notebook demonstrates how to backup and restore a search index and migrate it to another instance.

The only pre-requsitite is that your search index has a `key` field that is `filterable` and `sortable`. If you don't have one, you can create a new field and assign unique values to your search index. 

It is important to note that only fields marked as `retrievable` can be successfully backed up and restored. It's crucial to consider whether or not you want your vector fields to be marked as `retrievable` in your Cognitive Search index. Marking vector fields as `retrievable` will allow you to backup and restore them and use them for any purpose, whereas NOT marking them as `retrievable` will save you storage costs, but the tradeoff is that you will not be able to backup and restore those fields.

Please review this sample and follow the instructions provided in this Jupyter Python notebook to backup and restore your Azure Cognitive Search indexes.

In [None]:
! pip install azure-search-documents --pre
! pip install tqdm

This script demonstrates backing up and restoring an Azure Cognitive Search index between two services. The `backup_and_restore_index` function retrieves the source index definition, creates a new target index, backs up all documents, and restores them to the target index.

In [10]:
from azure.core.credentials import AzureKeyCredential  
from azure.search.documents import SearchClient  
from azure.search.documents.indexes import SearchIndexClient  
from azure.search.documents.indexes.models import SearchIndex, SearchField, VectorSearch  
import tqdm  
  
def create_clients(endpoint, key, index_name):  
    search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=AzureKeyCredential(key))  
    index_client = SearchIndexClient(endpoint=endpoint, credential=AzureKeyCredential(key))  
    return search_client, index_client  
  
def search_results(search_client, key_field_name, top=1000):  
    last_key = None  
    while True:  
        query_kwargs = {  
            "search_text": "*",  
            "top": top,  
            "order_by": key_field_name  
        }  
        if last_key:  
            query_kwargs["filter"] = f"{key_field_name} gt '{last_key}'"  
          
        response = search_client.search(**query_kwargs)  
        results_count = 0  
        for result in response:  
            yield result  
            results_count += 1  
            last_key = result[key_field_name]  
          
        if results_count < top:  
            break  

def backup_and_restore_index(source_endpoint, source_key, source_index_name, target_endpoint, target_key, target_index_name, key_field_name):  
    # Create search and index clients  
    source_search_client, source_index_client = create_clients(source_endpoint, source_key, source_index_name)  
    target_search_client, target_index_client = create_clients(target_endpoint, target_key, target_index_name)  
  
    # Get the source index definition  
    source_index = source_index_client.get_index(name=source_index_name)  
  
    # Create target index with the same definition  
    target_index = SearchIndex(  
        name=target_index_name,  
        fields=[SearchField(**field.as_dict()) for field in source_index.fields],  
        vector_search=VectorSearch(**source_index.vector_search.as_dict()) if source_index.vector_search else None  
    )  
    target_index_client.create_index(target_index)  
  
    # Backup and restore documents  
    all_documents = list(search_results(source_search_client, key_field_name))  
  
    print("Backing up documents:")  
    with tqdm.tqdm(total=len(all_documents)) as progress_bar:  
        for result in all_documents:  
            progress_bar.update(1)  
  
    print("Restoring documents:")  
    failed_documents = 0  
    failed_keys = []  
    with tqdm.tqdm(total=len(all_documents)) as progress_bar:  
        for batch_start in range(0, len(all_documents), 1000):  
            batch_end = min(batch_start + 1000, len(all_documents))  
            batch = all_documents[batch_start:batch_end]  
            result = target_search_client.upload_documents(documents=batch)  
            progress_bar.update(len(result))  
  
            for item in result:  
                if item.succeeded is not True:  
                    failed_documents += 1  
                    failed_keys.append(batch[result.index_of(item)].id)  
                    print(f"Document upload error: {item.error.message}")  
  
    if failed_documents > 0:  
        print(f"Failed documents: {failed_documents}")  
        print(f"Failed document keys: {failed_keys}")  
    else:  
        print("All documents uploaded successfully.")  
  
    print(f"Successfully backed up '{source_index_name}' and restored to '{target_index_name}'")  
    return source_search_client, target_search_client, all_documents  
  
# Replace with your service endpoints, keys, and index names  
source_endpoint = "YOUR_SEARCH_SERVICE_SOURCE_ENDPOINT"  
source_key = "YOUR_SEARCH_SERVICE_SOURCE_ADMIN_KEY"  
source_index_name = "YOUR_SEARCH_SERVICE_SOURCE_INDEX_NAME"  
target_endpoint = "YOUR_SEARCH_SERVICE_TARGET_ENDPOINT" 
target_key = "YOUR_SEARCH_SERVICE_TARGET_ADMIN_KEY"  
target_index_name = "YOUR_SEARCH_SERVICE_TARGET_INDEX_NAME"  
# Replace with the name of the key (e.g. ID, AzureDocumentKey) field in your index, this should be a unique field
key_field_name = "YOUR_KEY_FIELD_NAME"  
  
source_search_client, target_search_client, all_documents = backup_and_restore_index(source_endpoint, source_key, source_index_name, target_endpoint, target_key, target_index_name, key_field_name)  


Backing up documents:


100%|██████████| 2681468/2681468 [00:00<00:00, 3988426.45it/s]


Restoring documents:


100%|██████████| 2681468/2681468 [24:00<00:00, 1861.12it/s] 

All documents uploaded successfully.
Successfully backed up 'beir-nq-2022-09-29' and restored to 'beir-nq'





The verify_counts function compares document counts between source and target indexes after backup and restore. It prints a message indicating if the document counts match or not.

In [11]:
def verify_counts(source_search_client, target_search_client):  
    source_document_count = source_search_client.get_document_count()  
    target_document_count = target_search_client.get_document_count()  
  
    print(f"Source document count: {source_document_count}")  
    print(f"Target document count: {target_document_count}")  
  
    if source_document_count == target_document_count:  
        print("Document counts match.")  
    else:  
        print("Document counts do not match.")  
  
# Call the verify_counts function with the search_clients returned by the backup_and_restore_index function  
verify_counts(source_search_client, target_search_client)  


Source document count: 2681468
Target document count: 2681468
Document counts match.
