## Get Started quickly 
First make sure to :

git clone https://github.com/Qredence/GraphFleet.git


cd GraphFleet
poetry shell
poetry install


## Batch import your PDF right from this notebook ! 
Run the script below, an "Upload" button should have appeared, click on it and add your pdfs, it will automaticly convert and add them in a .txt in the right folder.

In [None]:
# Import libraries
import os
import json
import csv
import ipywidgets as widgets
from IPython.display import display, clear_output

# Define the directory to save the .txt files
txt_directory = '../graphfleet/input'
if not os.path.exists(txt_directory):
    os.makedirs(txt_directory)

json_file_path = '../graphfleet/input/json'
if not os.path.exists(json_file_path):
    os.makedirs(json_file_path)


# Create upload button
uploader = widgets.FileUpload(
    accept='.json',  # Accept only PDF files
    multiple=True   # Allow uploading multiple files
)

# Create output area
output = widgets.Output()



def json_to_txt(json_file_path, txt_directory):
    # Load JSON data
    with open(json_file_path, 'r') as json_file:
        data = json.load(json_file)

    # Write data to TXT file
    with open(txt_directory, 'w', newline='') as txt_file:
        writer = csv.DictWriter(txt_file, fieldnames=data[0].keys(), delimiter='\t')
        writer.writeheader()
        for row in data:
            writer.writerow(row)

# Usage
json_to_txt('input.json', 'output.txt')



# Observe changes in the upload widget
uploader.observe(json_to_txt, names='value')

# Display the upload button and output area
display(uploader)
display(output)





### Great! Now that your PDF is formatted correctly and in the right location (graphfleet/input), we can initialize your workspace. Just execute the following command to get started:

git clone https://github.com/Qredence/GraphFleet.git

In [None]:
cd GraphFleet
poetry shell
poetry install

In [None]:
! python -m graphrag.index --init --root ../graphfleet

## Setting Up Your GraphRAG Pipeline

This notebook guides you through configuring your GraphRAG pipeline using either OpenAI or Azure OpenAI.

### 1. Environment Variables and Settings Files

GraphRAG relies on two crucial files for configuration:

- **.env:** This file stores environment variables. The most important one is GRAPHRAG_API_KEY, which holds your API key for either OpenAI or Azure OpenAI.
- **settings.yaml:** This file contains settings that fine-tune the behavior of the GraphRAG pipeline.

Here's a breakdown of how to configure each file for OpenAI and Azure OpenAI:

### 2. OpenAI Configuration

1. **Update .env:**
   - Open the .env file located in your ./graphfleet directory.
   - Find the line GRAPHRAG_API_KEY=<API_KEY>.
   - Replace <API_KEY> with your actual OpenAI API key.

2. **(Optional) Customize settings.yaml:**
   - Open the settings.yaml file in the same directory.
   - You can customize various aspects of the pipeline here, like which language model to use or how many results to return. Refer to the [configuration documentation](link-to-configuration-docs) for detailed options.

### 3. Azure OpenAI Configuration

1. **Update .env:**
   - Open the .env file.
   - Set the GRAPHRAG_API_KEY to your Azure OpenAI API key.

2. **Configure settings.yaml:**
   - Open the settings.yaml file.
   - Search for the llm configuration section. You'll find two: one for chat and one for embeddings.
   - **Chat Endpoint Example:**
     ```yaml
     llm:
       type: azure_openai_chat 
       api_base: https://<your-instance>.openai.azure.com 
       api_version: your version  # Adjust if needed
       deployment_name: <your-azure-model-deployment-name> 
     ```

   - **Embeddings Endpoint Example:** 
     ```yaml
     llm:
       type: azure_openai_embedding
       api_base: https://<your-instance>.openai.azure.com 
       api_version: your version  # Adjust if needed
       deployment_name: <your-azure-model-deployment-name> 
     ```

   - **Replace the placeholders:**
     - <your-instance>: Your Azure OpenAI instance name.
     - <your-azure-model-deployment-name>: The deployment name of your Azure OpenAI model.




## Indexing Your Data:
 Now, let's index your data to make it searchable. This is the final step!


In [None]:
! python -m graphrag.index --root ../graphfleet

## Indexing in Progress!

Running the indexing pipeline might take a while – don't worry, that's normal! ⏳ 
!
**Factors that influence indexing time:**

* **Size of your data:**  Larger datasets naturally take longer to process.
* **Model selection:** Different models have varying processing speeds.
* **Text chunk size:** This setting (configurable in your `.env` file) impacts how the data is broken down and indexed.

**What to expect:**

Once the indexing process is complete, you'll find a new folder in your project directory:

   `./graphfleet/output/<timestamp>/artifacts` 

Inside this folder, you'll see a collection of `parquet` files. These files contain your indexed data, ready for GraphRAG to use! 


## Time to Query! 🚀

Now that your data is indexed, the real fun begins: **asking questions!**  

Let's explore how to use GraphRAG's query engine to extract insights from your dataset. 

### Global Search: Uncovering High-Level Themes

Use global search to get a bird's-eye view of the main ideas in your data:


In [None]:
! python -m graphrag.query \
--root ../graphfleet \
--method global \
"Why should I use GraphRAG over other kind of solution for my company  ?" 

## Explanation:

python -m graphrag.query: Runs the GraphRAG query engine.
--root ./graphfleet: Specifies the root directory of your GraphRAG project.
--method global: Tells GraphRAG to perform a global search across all your data.
"What are the top themes in this story?": Your natural language query.


In [None]:
! python -m graphrag.query --root ../graphfleet --method local "What is the main features of GraphRAG  ?" 


## Explanation:

--method local: Instructs GraphRAG to focus on a specific part of your data relevant to the query.
"Who is Scrooge, and what are his main relationships?": This query focuses on a character (Scrooge) and their relationships.

## Experiment! 🧪

Go ahead and ask your own questions! Try different query types, phrasings, and explore the power of GraphRAG to unlock insights from your indexed data.