## Creating Vector Store in Azure OpenAI / AI Studios

As confirmed in Azure OpenAI [documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/assistant), the new **Vector Store** object now supports up to 10,000 files.

The process of files upload requires phased approach:
- Up to **100 files** during initial Vector Store setup;
- Then up to **500 files** per each next batch.

### Pre-requisites

In [1]:
# Importing required packages
from openai import AzureOpenAI
import time
import os

In [2]:
# Extracting environment variables
AOAI_API_BASE = os.getenv("AZURE_OPENAI_API_BASE")
AOAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION")
AOAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")

### Generating test files

In [3]:
# Creating local folders
NUMBER_OF_FOLDERS = 3

for i in range(NUMBER_OF_FOLDERS):
    folder = f"folder{i+1}"
    if not os.path.exists(folder):
        os.makedirs(folder)

In [4]:
# Generating text files: 100 files in the first folder, 500 files in all other folders
FIRST_FOLDER_FILES = 100
OTHER_FOLDERS_FILES = 500

for i in range(NUMBER_OF_FOLDERS):
    folder = f"folder{i+1}"
    for j in range(FIRST_FOLDER_FILES if i == 0 else OTHER_FOLDERS_FILES):
        with open(f"{folder}/{folder}_file{j+1}.txt", "w") as file:
            file.write(f"This is file {j+1} in folder {i+1}")
    print(f"Generated {FIRST_FOLDER_FILES if i == 0 else OTHER_FOLDERS_FILES} files in {folder}")

Generated 100 files in folder1
Generated 500 files in folder2
Generated 500 files in folder3


### Activating service components

In [5]:
# Initiating AzureOpenAI client
client = AzureOpenAI(
    azure_endpoint = AOAI_API_BASE,
    api_version = AOAI_API_VERSION,
    api_key = AOAI_API_KEY
)

In [6]:
# Creating a new vector store
vector_store = client.beta.vector_stores.create(
    name = "My 10K Vector Store",
)

### Populating vector store

In [7]:
# Uploading files to the vector store from each folder in batches
for i in range(NUMBER_OF_FOLDERS):
    folder = f"folder{i+1}"
    file_streams = [open(f"{folder}/{file}", "rb") for file in os.listdir(folder)]
    
    print(f"Uploading files to the vector store from {folder}...")
    time.sleep(5)
    file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
        vector_store_id = vector_store.id,
        files = file_streams
    )

    # Pause if the vector store is not ready
    while vector_store.status != "completed":
        time.sleep(5) 

    print(f"Files upload status: {file_batch.status}")
    print(f"- cancelled: {file_batch.file_counts.cancelled}")
    print(f"- completed: {file_batch.file_counts.completed}")
    print(f"- failed: {file_batch.file_counts.failed}")
    print(f"- in progress: {file_batch.file_counts.in_progress}")
    print("----------------------------------------")
    print(f"Total: {file_batch.file_counts.total}\n")

Uploading files to the vector store from folder1...
Files upload status: completed
- cancelled: 0
- completed: 100
- failed: 0
- in progress: 0
----------------------------------------
Total: 100

Uploading files to the vector store from folder2...
Files upload status: completed
- cancelled: 0
- completed: 499
- failed: 1
- in progress: 0
----------------------------------------
Total: 500

Uploading files to the vector store from folder3...
Files upload status: completed
- cancelled: 0
- completed: 500
- failed: 0
- in progress: 0
----------------------------------------
Total: 500

