# Generate synthetic documents for RAG knowledge base

## Prerequisites
- An Azure subscription, with [access to Azure OpenAI](https://aka.ms/oai/access).

We used Python 3.12.3, [Visual Studio Code with the Python extension](https://code.visualstudio.com/docs/python/python-tutorial), and the [Jupyter extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) to test this example.

### Set up a Python virtual environment in Visual Studio Code

1. Open the Command Palette (Ctrl+Shift+P).
1. Search for **Python: Create Environment**.
1. Select **Venv**.
1. Select a Python interpreter. Choose 3.10 or later.

It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments).

### Install packages

In [None]:
%pip install openai
%pip install ast

## Import packages and create Azure OpenAI client

In [1]:
import os
import sys
from dotenv import load_dotenv
from openai import AzureOpenAI

sys.path.append('../..')
from rag_utils import generate_topics_and_documents, create_duplicate_documents

# Load environment variables from .env
load_dotenv(override=True)

# AOAI FOR ANSWER GENERATION
aoai_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
aoai_apikey = os.environ["AZURE_OPENAI_API_KEY"]
aoai_model_name = os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"]

aoai_client = AzureOpenAI(
  azure_endpoint = aoai_endpoint, 
  api_key=aoai_apikey,
  api_version="2024-02-01"
)

## Generate a list of topics for the synthetic documents and Generate documents based on list of topics

In [None]:
generate_topics_and_documents(aoai_client, aoai_model_name, "QuickConnect", "telecommunications", "../output_telco")

# Create variants of documents to simulate duplicates

In [None]:
create_duplicate_documents(aoai_client, aoai_model_name, "QuickConnect", "telecommunications", "../output_telco")