# Azure AI Search Custom Skills Example

This Python notebook demonstrates various powerful scenarios that customers can use in Azure AI Studio leveraging [custom web api skills](https://learn.microsoft.com/en-us/azure/search/cognitive-search-custom-skill-web-api) to search through multi-modal content.

In this demo, we dip our toes into tackling the multi-modal content understanding scenario, by reducing the image + text modality (in rich text + image documents like PPTs, PDFs) into a text-only modality via verbalization.

### Prerequisites
An Azure subscription, with access to Azure OpenAI.

Azure AI Search, any tier, but we recommend Basic or higher for this workload. 

A deployment of the text-embedding-3-small model on Azure OpenAI.

Azure Blob Storage. This notebook connects to your storage account and loads a container with the sample PDFs and images.

Deployed Azure functions to carry out the custom inference tasks with the various deployed language models used in this demo (gpt-4o, gpt-4o-mini, and phi)

### Install required packages


In [None]:
print("Starting the notebook execution")
! pip install -r azure-ai-search-custom-skills-sample-requirements.txt --quiet

### Set up key variables

In [None]:
from dotenv import load_dotenv
import os

# load environment variables
load_dotenv()
blob_connection_string = os.getenv("BLOB_CONNECTION_STRING")
print(f'the blob connection string is: {blob_connection_string}')
index_name = "custom_skills_sample_index"
blob_container_name = "custom-skills-sample-container"


## Connect to Blob Storage and load documents
Retrieve documents from Blob Storage. You can use the sample documents in the data/documents folder.

In [None]:
from azure.storage.blob import BlobServiceClient
from azure.identity import DefaultAzureCredential 
import glob

def upload_sample_documents(
        blob_connection_string: str,
        blob_container_name: str,
        documents_directory: str,
        # Set to false if you want to use credentials included in the blob connection string
        # Otherwise your identity will be used as credentials
        use_user_identity: bool = False,
    ):
        # Connect to Blob Storage
        blob_service_client = BlobServiceClient.from_connection_string(logging_enable=True, conn_str=blob_connection_string, credential=DefaultAzureCredential() if use_user_identity else None)
        # print(f"blob_service_client is: {blob_service_client}")
        # print(f"blob_container_name is: {blob_container_name}")
        container_client = blob_service_client.get_container_client(blob_container_name)
        if not container_client.exists():
            print("creating a NEW directory")
            container_client.create_container()
        all_files = glob.glob(os.path.join(documents_directory))
        print(f'all files are: {all_files}')
        for file in all_files:
            with open(file, "rb") as data:
                name = os.path.basename(file)
                if not container_client.get_blob_client(name).exists():
                    container_client.upload_blob(name=name, data=data)

def upload_documents():
    print(f"blob connection string is: {blob_connection_string}")
    upload_sample_documents(
        blob_connection_string=blob_connection_string,
        blob_container_name=blob_container_name,
        documents_directory=os.path.join("data", "documents")
    )

upload_documents()

print(f"Setup sample data in {blob_container_name}")