## Get Started quickly 
First make sure to :

git clone https://github.com/Qredence/GraphFleet.git


cd GraphFleet
poetry shell
poetry install


## Clone the project

In [None]:
git clone https://github.com/Qredence/GraphFleet.git

In [None]:
cd GraphFleet
poetry shell
poetry install

## GraphRAG Environment Variables

In [None]:
# Your Azure OpenAI API key
export GRAPHRAG_API_KEY=""

# The base URL for your Azure OpenAI API endpoint
export GRAPHRAG_API_BASE=""

# The API version you're using (e.g., "2024-02-15-preview")
export GRAPHRAG_API_VERSION=""

# The name of the language model you're using (e.g., "gpt-4")
export GRAPHRAG_LLM_MODEL=""

# The deployment name for your language model in Azure
export GRAPHRAG_DEPLOYMENT_NAME=""

# The name of the embedding model you're using (e.g., "text-embedding-ada-002")
export GRAPHRAG_EMBEDDING_MODEL=""

# Note: Replace the empty strings with your actual values before using GraphRAG

In [None]:
! python -m graphrag.index --init --root ../graphfleet

## Setting Up Your GraphRAG Pipeline

This notebook guides you through configuring your GraphRAG pipeline using either OpenAI or Azure OpenAI.

### 1. Environment Variables and Settings Files

GraphRAG relies on two crucial files for configuration:

- **.env:** This file stores environment variables. The most important one is GRAPHRAG_API_KEY, which holds your API key for either OpenAI or Azure OpenAI.
- **settings.yaml:** This file contains settings that fine-tune the behavior of the GraphRAG pipeline.

Here's a breakdown of how to configure each file for OpenAI and Azure OpenAI:

### 2. OpenAI Configuration

1. **Update .env:**
   - Open the .env file located in your ./graphfleet directory.
   - Find the line GRAPHRAG_API_KEY=<API_KEY>.
   - Replace <API_KEY> with your actual OpenAI API key.

2. **(Optional) Customize settings.yaml:**
   - Open the settings.yaml file in the same directory.
   - You can customize various aspects of the pipeline here, like which language model to use or how many results to return. Refer to the [configuration documentation](link-to-configuration-docs) for detailed options.

### 3. Azure OpenAI Configuration

1. **Update .env:**
   - Open the .env file.
   - Set the GRAPHRAG_API_KEY to your Azure OpenAI API key.

2. **Configure settings.yaml:**
   - Open the settings.yaml file.
   - Search for the llm configuration section. You'll find two: one for chat and one for embeddings.
   - **Chat Endpoint Example:**
     ```yaml
     llm:
       type: azure_openai_chat 
       api_base: https://<your-instance>.openai.azure.com 
       api_version: your version  # Adjust if needed
       deployment_name: <your-azure-model-deployment-name> 
     ```

   - **Embeddings Endpoint Example:** 
     ```yaml
     llm:
       type: azure_openai_embedding
       api_base: https://<your-instance>.openai.azure.com 
       api_version: your version  # Adjust if needed
       deployment_name: <your-azure-model-deployment-name> 
     ```

   - **Replace the placeholders:**
     - <your-instance>: Your Azure OpenAI instance name.
     - <your-azure-model-deployment-name>: The deployment name of your Azure OpenAI model.




## Auto generate prompts for your specific data index :
This command does the following:
- Runs the prompt_tune module of GraphRAG
- Uses the configuration file settings.yaml in the ./graphfleet directory
- Sets the root directory to ./graphfleet
- Disables entity type generation with the --no-entity-types flag
- Specifies the output directory for the generated prompts as ./graphfleet/prompts


### This step is important because it customizes the prompts based on your specific data index, which can improve the relevance and effectiveness of your queries later on.


In [None]:
! python -m graphrag.prompt_tune --config ./graphfleet/settings.yaml --root ./graphfleet --no-entity-types --output ./graphfleet/prompts

## Indexing Your Data:
 Now, let's index your data to make it searchable. This is the final step!


In [10]:
! python -m graphrag.index --verbose --root ../graphfleet --config ../graphfleet/settings.yaml

[2KLogging enabled at r 
..[35m/graphfleet/../graphfleet/output/20240822-052358/reports/[0m[95mindexing-engine.log[0m
[2KStarting pipeline run for: [1;36m20240822[0m-[1;36m052358[0m, [33mdryrun[0m=[3;91mFalse[0m
[2KUsing default configuration: [1m{[0m
    [32m"llm"[0m: [1m{[0m
        [32m"api_key"[0m: [32m"==== REDACTED ===="[0m,
        [32m"type"[0m: [32m"azure_openai_chat"[0m,
        [32m"model"[0m: [32m"gpt-4o"[0m,
        [32m"max_tokens"[0m: [1;36m4000[0m,
        [32m"temperature"[0m: [1;36m0.0[0m,
        [32m"top_p"[0m: [1;36m1.0[0m,
        [32m"n"[0m: [1;36m1[0m,
        [32m"request_timeout"[0m: [1;36m180.0[0m,
        [32m"api_base"[0m: [32m"https://fleet-openai.openai.azure.com"[0m,
        [32m"api_version"[0m: [32m"2024-04-01-preview"[0m,
        [32m"proxy"[0m: null,
        [32m"cognitive_services_endpoint"[0m: null,
        [32m"deployment_name"[0m: [32m"gpt-4o"[0m,
        [32m"model_supports_js

## Indexing in Progress!

Running the indexing pipeline might take a while – don't worry, that's normal! ⏳ 
!
**Factors that influence indexing time:**

* **Size of your data:**  Larger datasets naturally take longer to process.
* **Model selection:** Different models have varying processing speeds.
* **Text chunk size:** This setting (configurable in your `.env` file) impacts how the data is broken down and indexed.

**What to expect:**

Once the indexing process is complete, you'll find a new folder in your project directory:

   `./graphfleet/output/<timestamp>/artifacts` 

Inside this folder, you'll see a collection of `parquet` files. These files contain your indexed data, ready for GraphRAG to use! 


## Time to Query! 🚀

Now that your data is indexed, the real fun begins: **asking questions!**  

Let's explore how to use GraphRAG's query engine to extract insights from your dataset. 

### Global Search: Uncovering High-Level Themes

Use global search to get a bird's-eye view of the main ideas in your data:

## Explanation:

python -m graphrag.query: Runs the GraphRAG query engine.
--root ./graphfleet: Specifies the root directory of your GraphRAG project.
--method global: Tells GraphRAG to perform a global search across all your data.
"What are the top themes in this story?": Your natural language query.



In [13]:
! python -m graphrag.query \
--root ../graphfleet \
--method global \
"Why should I use GraphRAG over other kind of solution for my company  ?" 




INFO: Reading settings from ../graphfleet/settings.yaml
creating llm client with {'api_key': 'REDACTED,len=32', 'type': "azure_openai_chat", 'model': 'gpt-4o', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'n': 1, 'request_timeout': 180.0, 'api_base': 'https://fleet-openai.openai.azure.com', 'api_version': '2024-04-01-preview', 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': 'gpt-4o', 'model_supports_json': True, 'tokens_per_minute': 150000, 'requests_per_minute': 10000, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}

SUCCESS: Global Search Response: ### Why You Should Use GraphRAG for Your Company

GraphRAG (Graph-based Retrieval-Augmented Generation) offers several compelling advantages over traditional retrieval and generation methods, making it a highly effective solution for companies dealing with large datasets and complex queries.

#### Enhanced Comprehensiv

## Explanation:

--method local: Instructs GraphRAG to focus on a specific part of your data relevant to the query.
"Who is Scrooge, and what are his main relationships?": This query focuses on a character (Scrooge) and their relationships.

In [None]:
! python -m graphrag.query --root ../graphfleet --method local "What is the main features of GraphRAG  ?" 


## Experiment! 🧪

Go ahead and ask your own questions! Try different query types, phrasings, and explore the power of GraphRAG to unlock insights from your indexed data.

Now check the [local_search_notebook.py](local_search_notebook.ipynb) file to see how to use the local search engine and how to generate questions !
Same for [global_search_notebook.ipynb](global_search_notebook.ipynb)