# NexuSync Demo Notebook

This notebook demonstrates how to use NexuSync for document indexing, querying, and other key functionalities.

NexuSync is a powerful library designed for efficient document indexing and querying, using state-of-the-art language and embedding models.


## Initialize NexuSync

Let's start by initializing NexuSync with the required parameters. Adjust the following parameters according to your use case:

- **input_dirs**: List of directories containing documents for indexing.
- **openai_model_yn**: Whether to use OpenAI models for embeddings and language tasks.
- **embedding_model**: Model to be used for generating embeddings.
- **language_model**: Model to be used for language tasks.
- **chroma_db_dir**: Directory for storing ChromaDB files.
- **index_persist_dir**: Directory for persisting the index.
- **chunk_size**: Size of the text chunks to be used for creating embeddings.
- **chunk_overlap**: Overlap between text chunks to maintain context.


In [1]:
from nexusync import NexuSync

OPENAI_MODEL_YN = False # if False, you will use ollama model
EMBEDDING_MODEL = "BAAI/bge-base-en-v1.5" # suggested embedding model
LANGUAGE_MODEL = 'llama3.2' # you need to download ollama model first, please check https://ollama.com/download
TEMPERATURE = 0.4 # range from 0 to 1, higher means higher creativitiy level
CHROMA_DB_DIR = 'chroma_db'
INDEX_PERSIST_DIR = 'index_storage'
CHROMA_COLLECTION_NAME = 'my_collection'
INPUT_DIRS = ["../sample_docs"] # can specify multiple document paths
CHUNK_SIZE = 1024
CHUNK_OVERLAP = 20
RECURSIVE = True

ns = NexuSync(input_dirs=INPUT_DIRS, 
              openai_model_yn=False, 
              embedding_model=EMBEDDING_MODEL, 
              language_model=LANGUAGE_MODEL, 
              temperature=TEMPERATURE, 
              chroma_db_dir = CHROMA_DB_DIR,
              index_persist_dir = INDEX_PERSIST_DIR,
              chroma_collection_name=CHROMA_COLLECTION_NAME,
              chunk_overlap=CHUNK_OVERLAP,
              chunk_size=CHUNK_SIZE,
              recursive=RECURSIVE
              )

  from .autonotebook import tqdm as notebook_tqdm
2024-10-09 17:12:24,344 - nexusync.utils.embedding_models.set_embedding_model - INFO - Using HuggingFace embedding model: BAAI/bge-base-en-v1.5
2024-10-09 17:12:24,348 - nexusync.utils.embedding_models.set_language_model - INFO - Using Ollama LLM model: llama3.2
2024-10-09 17:12:24,429 - nexusync.NexuSync - INFO - Vectors and Querier initialized successfully.
VisionEncoderDecoderModel has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model archite

## Stream Chat

In [2]:
text_qa_template = """
Context Information:
--------------------
{context_str}
--------------------

Query: {query_str}

Instructions:
1. Carefully read the context information and the query.
2. Think through the problem step by step.
3. Provide a concise and accurate answer based on the given context.
4. If the answer cannot be determined from the context, state "Based on the given information, I cannot provide a definitive answer."
5. If you need to make any assumptions, clearly state them.
6. If relevant, provide a brief explanation of your reasoning.

Answer: """

ns.initialize_stream_chat(
    text_qa_template=text_qa_template,
    chat_mode="context",
    similarity_top_k=3
)

2024-10-09 17:13:30,704 - nexusync.core.chat_engine - INFO - Chat engine initialized


In [5]:
query = "what is Nvidia's ecosystem"


for item in ns.start_chat_stream(query):
    if isinstance(item, str):
        # This is a token, print or process as needed
        print(item, end='', flush=True)
    else:
        # This is the final response with metadata
        print("\n\nFull response:", item['response'])
        print("Metadata:", item['metadata'])
        break

Based on the context provided, Nvidia's ecosystem appears to refer to their comprehensive suite of technologies, tools, and solutions for accelerating various workloads in the datacenter, particularly in areas such as artificial intelligence (AI), high-performance computing (HPC), and big data analytics.

Nvidia's ecosystem includes:

1. **GPUs**: Nvidia's line of graphics processing units (GPUs) that are designed for AI, HPC, and other compute-intensive applications.
2. **GPUDirect Storage**: A software framework that enables direct storage access between GPUs and storage systems.
3. **Nvidia Container Runtime**: A container runtime that provides a managed environment for running containers on Nvidia GPUs.
4. **NGC Registry**: A container registry for Nvidia GPU-accelerated applications.
5. **TensorFlow** and **PyTorch**: Popular deep learning frameworks that are supported by Nvidia's ecosystem.
6. **CUDA**: Nvidia's parallel computing platform and programming model.

Nvidia's ecosyst

## Rebuild Index

Rebuild the index when either of the  followings are changed:
- openai_model_yn
- embedding_model
- language_model
- chroma_db_dir
- index_persist_dir
- chroma_collection_name
- chunk_overlap
- chunk_size
- recursive

In [8]:
from nexusync import rebuild_index

OPENAI_MODEL_YN = True # if False, you will use ollama model
EMBEDDING_MODEL = "text-embedding-3-large" # suggested embedding model
LANGUAGE_MODEL = 'gpt-4o-mini' # you need to download ollama model first, please check https://ollama.com/download
TEMPERATURE = 0.4 # range from 0 to 1, higher means higher creativitiy level
CHROMA_DB_DIR = 'chroma_db'
INDEX_PERSIST_DIR = 'index_storage'
CHROMA_COLLECTION_NAME = 'my_collection'
INPUT_DIRS = ["../sample_docs"] # can specify multiple document paths
CHUNK_SIZE = 1024
CHUNK_OVERLAP = 20
RECURSIVE = True

rebuild_index(input_dirs=INPUT_DIRS, 
              openai_model_yn=OPENAI_MODEL_YN, 
              embedding_model=EMBEDDING_MODEL, 
              language_model=LANGUAGE_MODEL, 
              temperature=TEMPERATURE, 
              chroma_db_dir = CHROMA_DB_DIR,
              index_persist_dir = INDEX_PERSIST_DIR,
              chroma_collection_name=CHROMA_COLLECTION_NAME,
              chunk_overlap=CHUNK_OVERLAP,
              chunk_size=CHUNK_SIZE,
              recursive=RECURSIVE
              )

2024-10-09 17:10:04,517 - nexusync.core.indexing_functions - INFO - Starting index rebuild process...
2024-10-09 17:10:04,521 - nexusync.utils.embedding_models.set_embedding_model - INFO - Using OpenAI embedding model: text-embedding-3-large
2024-10-09 17:10:04,526 - nexusync.utils.embedding_models.set_language_model - INFO - Using OpenAI LLM model: gpt-4o-mini
2024-10-09 17:10:04,527 - nexusync.core.indexing_functions - INFO - Deleting existing index directory: index_storage
2024-10-09 17:10:04,535 - nexusync.core.indexing_functions - INFO - Deleting existing Chroma DB directory: chroma_db
2024-10-09 17:10:13,819 - nexusync.core.indexing_functions - INFO - Loaded 40 chunks from ../sample_docs.
2024-10-09 17:10:16,163 - nexusync.core.indexing_functions - INFO - Index Built.


## Refresh Index

In [9]:
# Initialize and persist the vector store in a chroma db
ns.refresh_index()

2024-10-09 17:10:16,169 - nexusync.core.indexer - INFO - Starting index refresh process...
2024-10-09 17:10:16,171 - nexusync.core.indexer - INFO - Processing directory: ../sample_docs
2024-10-09 17:10:25,050 - nexusync.core.indexer - INFO - Loaded 6 files from ../sample_docs
2024-10-09 17:10:25,060 - nexusync.core.indexer - INFO - Updated 1 files in ../sample_docs
2024-10-09 17:10:25,060 - nexusync.core.indexer - INFO - Updated file: /mnt/d/nexusync/notebooks/../sample_docs/new_added.txt
2024-10-09 17:10:25,062 - nexusync.core.indexer - INFO - Deleted file: /mnt/d/nexusync/notebooks/../sample_docs/READEME.md
2024-10-09 17:10:25,062 - nexusync.core.indexer - INFO - Deleted file: /mnt/d/nexusync/notebooks/../sample_docs/READEME.md
2024-10-09 17:10:25,062 - nexusync.core.indexer - INFO - Deleted file: /mnt/d/nexusync/notebooks/../sample_docs/READEME.md
2024-10-09 17:10:25,062 - nexusync.core.indexer - INFO - Deleted file: /mnt/d/nexusync/notebooks/../sample_docs/READEME.md
2024-10-09 17:

## One-time query

In [6]:
query = "how to install nexusync"


response = ns.start_query(text_qa_template = text_qa_template, query = query )

print(f"Query: {query}")
print(f"Response: {response['response']}")
print(f"Response: {response['metadata']}")

2024-10-09 06:59:57,194 - nexusync.NexuSync - INFO - Starting query: how to install nexusync
2024-10-09 07:00:01,795 - nexusync.NexuSync - INFO - Query completed successfully.


Query: how to install nexusync
Response: To install Nexusync, use the following command:

pip install nexusync
Response: {'sources': [{'source_text': 'file_path: /mnt/d/nexusync/notebooks/../sample_docs/READEME.md\n\nInstallation\n\n```bash\npip install nexusync\n```', 'metadata': {'file_path': '/mnt/d/nexusync/notebooks/../sample_docs/READEME.md', 'file_name': 'READEME.md', 'file_type': 'text/markdown', 'file_size': 4571, 'creation_date': '2024-10-06', 'last_modified_date': '2024-10-06'}}, {'source_text': 'file_path: /mnt/d/nexusync/notebooks/../sample_docs/READEME.md\n\nNexuSync\n\nNexuSync is a powerful document indexing and querying tool built on top of LlamaIndex. It allows you to efficiently manage, search, and interact with large collections of documents using advanced natural language processing techniques.', 'metadata': {'file_path': '/mnt/d/nexusync/notebooks/../sample_docs/READEME.md', 'file_name': 'READEME.md', 'file_type': 'text/markdown', 'file_size': 4571, 'creation_date

# Get Chat History

In [10]:
# Get chat history
chat_history = ns.chat_engine.get_chat_history()
print("Chat History:")
for entry in chat_history:
    print(f"Human: {entry['query']}")
    print(f"AI: {entry['response']}\n")

Chat History:
Human: summary of the store.json?
AI: The `store.json` file contains a collection of products, each represented by an object with the following attributes:

1. **id**: A unique identifier for each product.
2. **title**: The name of the product.
3. **price**: The cost of the product.
4. **description**: A brief description of the product.
5. **category**: The category to which the product belongs (e.g., men's clothing, women's clothing, jewelry, electronics).
6. **image**: A URL link to an image of the product.
7. **rating**: An object containing:
   - **rate**: The average rating of the product.
   - **count**: The number of ratings received.

The data includes a variety of products spanning different categories such as:
- **Men's Clothing** (e.g., backpacks, jackets, t-shirts)
- **Women's Clothing** (e.g., casual t-shirts, short sleeves)
- **Jewelry** (e.g., bracelets, rings)
- **Electronics** (e.g., external hard drives, SSDs)

The file contains detailed descriptions an

## Testing Refresh

### Adding a document

In [6]:
# Add a new document
with open("../sample_docs/new_added.txt", "w") as f:
    f.write("Breaking News: Trump and Harris had a fight!!!!")

# Refresh the index: incremental in new files and detect deleted files in the folder
ns.refresh_index()
print("Index refreshed successfully!")

2024-10-09 10:35:59,214 - nexusync.core.indexer - INFO - Starting index refresh process...
2024-10-09 10:35:59,216 - nexusync.core.indexer - INFO - Processing directory: ../sample_docs
2024-10-09 10:36:08,515 - nexusync.core.indexer - INFO - Loaded 6 files from ../sample_docs
2024-10-09 10:36:08,536 - nexusync.core.indexer - INFO - Updated 1 files in ../sample_docs
2024-10-09 10:36:08,537 - nexusync.core.indexer - INFO - Updated file: /mnt/d/nexusync/notebooks/../sample_docs/new_added.txt
2024-10-09 10:36:08,538 - nexusync.core.indexer - INFO - No deleted files found.


Index refreshed successfully!


### Deleting a file

In [3]:
# Step 2: Delete the new document
# os.remove('../sample_docs/Nvidia ecosystem.pptx')
# print("New document deleted.")

ns.refresh_index()
print("Index refreshed after deletion.")

2024-10-07 15:40:27,349 - nexusync.core.indexer - INFO - Starting index refresh process...
2024-10-07 15:40:27,353 - nexusync.core.indexer - INFO - Processing directory: ../sample_docs
2024-10-07 15:40:36,886 - nexusync.core.indexer - INFO - Loaded 7 files from ../sample_docs
2024-10-07 15:40:36,887 - nexusync.core.indexer - INFO - Updated 0 files in ../sample_docs
2024-10-07 15:40:36,888 - nexusync.core.indexer - INFO - No deleted files found.


Index refreshed after deletion.
