## Initialize RAG and GraphRAG indexes 

### Run this notebook only once !!!

In [None]:
! python -m graphrag.index --init --root .

## Configuring Azure OpenAI in GraphRAG
Azure OpenAI users should set the following variables in the settings.yaml file.   

You need to configure both the llm and the embeddings sections as follows:  
```
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: azure_openai_chat
  model: <your Azure openAI model name>
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: <your Azure openAI deployment endpoint>
  api_version: 2024-02-15-preview
  # organization: <organization_id>
  deployment_name: gpt-4-32k
```

```
embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: azure_openai_embedding
    model: <your Azure OpenAI Embeddings model name>
    api_base: <your Azure openAI deployment endpoint>
    api_version: 2024-02-15-preview
    # organization: <organization_id>
    deployment_name: <your Azure OpenAI Embeddings deployment name>
```

### Index the data

In [7]:
! python -m graphrag.index --root .

⠋ GraphRAG Indexer 
🚀 Reading settings from settings.yaml
⠋ GraphRAG Indexer 
⠋ GraphRAG Indexer 
⠹ GraphRAG Indexer 
├── Loading Input (text) - 1 files loaded (0 filtered) ----- 100% 0:00:… 0:00:…
⠹ GraphRAG Indexer 
├── Loading Input (text) - 1 files loaded (0 filtered) ----- 100% 0:00:… 0:00:…
└── create_base_text_units
⠹ GraphRAG Indexer 
├── Loading Input (text) - 1 files loaded (0 filtered) ----- 100% 0:00:… 0:00:…
└── create_base_text_units
⠹ GraphRAG Indexer 
├── Loading Input (text) - 1 files loaded (0 filtered) ----- 100% 0:00:… 0:00:…
└── create_base_text_units
⠹ GraphRAG Indexer 
├── Loading Input (text) - 1 files loaded (0 filtered) ----- 100% 0:00:… 0:00:…
└── create_base_text_units
⠇ GraphRAG Indexer 
├── Loading Input (text) - 1 files loaded (0 filtered) ----- 100% 0:00:… 0:00:…
└── create_base_text_units
⠇ GraphRAG Indexer 
├── Loading Input (text) - 1 files loaded (0 filtered) ----- 100% 0:00:… 0:00:…
└── create_base_text_units
⠇ GraphRAG Indexer 
├── Loading Input (t

├── create_summarized_entities
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
└── create_final_community_reports
    └── Verb create_community_reports -------------- -----  73% -:--:-- 1:16:21
⠴ GraphRAG Indexer 
├── Loading Input (text) - 1 files loaded (0 filtered) ----- 100% 0:00:… 0:00:…
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
└── create_final_community_reports
    └── Verb create_community_reports -------------- -----  73% -:--:-- 1:16:22
⠧ GraphRAG Indexer 
├── Loading Input (text) - 1 files loaded (0 filtered) ----- 100% 0:00:… 0:00:…
├── create_base_text_u

In [8]:
import os

import pandas as pd

from langchain.document_loaders import TextLoader
from langchain.embeddings import AzureOpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from dotenv import load_dotenv
from IPython.display import display, HTML
import os

load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_DEPLOYMENT_ENDPOINT = os.getenv("OPENAI_DEPLOYMENT_ENDPOINT")
OPENAI_GPT35_DEPLOYMENT_NAME = os.getenv("OPENAI_GPT35_DEPLOYMENT_NAME")
OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME  = os.getenv("OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME")


def init_llm(model=OPENAI_GPT35_DEPLOYMENT_NAME,
             deployment_name=OPENAI_GPT35_DEPLOYMENT_NAME,
             openai_api_version="2024-02-15-preview",
             temperature=0,
             max_tokens=400
             ):

    llm = AzureChatOpenAI(deployment_name=deployment_name,
                            model=model,
                            openai_api_version=openai_api_version,
                            azure_endpoint=OPENAI_DEPLOYMENT_ENDPOINT,
                            temperature=temperature,
                            max_tokens=max_tokens
                            )
    return llm

llm = init_llm()

embeddings = AzureOpenAIEmbeddings(
    model=OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME,
    azure_endpoint=OPENAI_DEPLOYMENT_ENDPOINT,
    openai_api_version="2024-02-15-preview",
    chunk_size = 1
)

In [11]:
#create embeddings for the txt file and store in Faiss DB (vector store)
fileName = "./input/moby_dick.txt"
loader = TextLoader(fileName, encoding='latin1')
pages = loader.load_and_split()
print("Number of pages: ", len(pages))
db = FAISS.from_documents(documents=pages, embedding=embeddings)
# save the FAISS index to disk
db.save_local("./faiss/faiss_index")

Number of pages:  353
