# NexuSync Demo Notebook

This notebook demonstrates the basic usage of NexuSync for document indexing and querying.

## Initialize NexuSync

In [1]:
import os

In [2]:
from nexusync.models import set_embedding_model, set_language_model
from nexusync import NexuSync

EMBEDDING_MODEL = "BAAI/bge-base-en-v1.5"
LLM_MODEL = 'llama3.2'
TEMPERATURE = 0.4
INPUT_DIRS = ["../sample_docs"] # can put multiple paths

set_embedding_model(huggingface_model= EMBEDDING_MODEL) 
set_language_model(ollama_model = LLM_MODEL, temperature=TEMPERATURE)
ns = NexuSync(input_dirs=INPUT_DIRS)

  from .autonotebook import tqdm as notebook_tqdm


Using HuggingFace embedding model: BAAI/bge-base-en-v1.5
Using Ollama LLM model: llama3.2


2024-10-06 22:53:56,709 - nexusync.core.indexer - INFO - Index already built. Loading from disk.


## One-time query

In [2]:
query = "News about Nvidia?"
text_qa_template = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information above, I want you to think step by step to answer the query in a crisp manner. "
    "In case you don't know the answer, say 'I don't know!'.\n"
    "Query: {query_str}\n"
    "Answer: "
)


response = ns.query(text_qa_template = text_qa_template, query = query )

print(f"Query: {query}")
print(f"Response: {response['response']}")
print(f"Response: {response['metadata']}")

Query: News about Nvidia?
Response: Based on the provided context, here are the key points related to news about Nvidia:

1. Nvidia's Blackwell GPUs are expected to be shipped to clients in Q4 of this year and have a consumer release expected in 2025.
2. The demand for Blackwell is "insane" according to Nvidia Chief Financial Officer Colette Kress, with several billion dollars in revenue expected.
3. Nvidia's stock has surged by over 150% this year, following an impressive 240% gain in 2023.
4. Major cloud providers like AWS, Azure, and Google Cloud are integrating Blackwell into their infrastructure to support high-performance AI workloads.
5. Nvidia forecasts $32.5 billion in revenue for the current quarter, an 80% increase from last year.

These points indicate that Nvidia is making significant progress with its Blackwell technology, which has generated substantial interest and investment from major cloud providers and investors alike.
Response: {'sources': [{'source_text': 'file_pa

## Chat with Context

In [3]:
# Initiate the chat engine once
ns.chat_engine.initialize_chat_engine(text_qa_template, chat_mode="context")

2024-10-06 21:55:50,583 - nexusync.core.chat_engine - INFO - Chat engine initialized


In [4]:
# Start chatting, chat with memories
queries = [
    "how many GPUs will Oracle order from Nvidia?",
    "what is Nvidia's ecosystem"
]

for query in queries:
    print(f"Human: {query}")
    response = ns.chat_engine.chat(query)
    print(f"AI: {response['response']}\n")
    print(f"METADATA: {response['metadata']['sources'][0]['metadata']['file_path']}")

Human: how many GPUs will Oracle order from Nvidia?
AI: The existing answer is not relevant to the provided context. There is no mention of Oracle ordering GPUs from Nvidia in the text.

Since there is no information about Oracle's plans, I cannot provide a specific number of GPUs that they might order from Nvidia.

METADATA: /mnt/d/nexusync/notebooks/../sample_docs/news.docx
Human: what is Nvidia's ecosystem
AI: Based on the provided context, it appears that Nvidia's ecosystem refers to the company's various technologies, tools, and platforms for accelerating computing, data science, artificial intelligence (AI), and other applications.

From the slides, we can gather some information about Nvidia's ecosystem:

1. **GPU acceleration**: Nvidia's GPUs are optimized for AI, deep learning, and high-performance computing.
2. **Nvidia Datacenter**: A suite of technologies and services designed to accelerate data center workloads, including storage, networking, and computing.
3. **GPUDirect 

In [11]:
# Get chat history
chat_history = ns.chat_engine.get_chat_history()
print("Chat History:")
for entry in chat_history:
    print(f"Human: {entry['query']}")
    print(f"AI: {entry['response']}\n")

Chat History:
Human: What is NexuSync?
AI: NexuSync is a powerful document indexing and querying tool built on top of LlamaIndex. It allows you to efficiently manage, search, and interact with large collections of documents using advanced natural language processing techniques.

Human: What are its main features?
AI: According to the README.md file, NexuSync has the following main features:

1. **Smart Document Indexing**: Automatically index documents from specified directories, keeping your knowledge base up-to-date.
2. **Efficient Querying**: Use natural language to query your document collection and get relevant answers quickly.
3. **Upsert Capability**: Easily update or insert new documents into the index without rebuilding from scratch.
4. **Deletion Handling**: Automatically remove documents from the index when they're deleted from the filesystem.
5. **Chat Interface**: Engage in conversational interactions with your document collection, making information retrieval more intuiti

## Refresh the Index

### Adding a document

In [3]:
# Add a new document
with open("../sample_docs/new_added.txt", "w") as f:
    f.write("Breaking News: Trump and Harris had a fight!")

# Refresh the index: incremental in new files and detect deleted files in the folder
ns.refresh_index()
print("Index refreshed successfully!")

2024-10-06 22:54:08,525 - nexusync.core.indexer - INFO - Starting index refresh process...
2024-10-06 22:54:08,527 - nexusync.core.indexer - INFO - Processing directory: ../sample_docs
VisionEncoderDecoderModel has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Pleas

Index refreshed successfully!


In [4]:
query = "what is the breaking news?"
text_qa_template = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information above, I want you to think step by step to answer the query in a crisp manner. "
    "In case you don't know the answer, say 'I don't know!'.\n"
    "Query: {query_str}\n"
    "Answer: "
)


response = ns.query(text_qa_template = text_qa_template, query = query )

print(f"Query: {query}")
print(f"Response: {response['response']}")
print(f"Response: {response['metadata']}")

Query: what is the breaking news?
Response: Breaking News: Trump and Harris had a fight!
Response: {'sources': [{'source_text': 'file_path: /mnt/d/nexusync/notebooks/../sample_docs/new_added.txt\n\nBreaking News: Trump and Harris had a fight!', 'metadata': {'file_path': '/mnt/d/nexusync/notebooks/../sample_docs/new_added.txt', 'file_name': 'new_added.txt', 'file_type': 'text/plain', 'file_size': 44, 'creation_date': '2024-10-06', 'last_modified_date': '2024-10-06'}}, {'source_text': "file_path: /mnt/d/nexusync/notebooks/../sample_docs/news.docx\n\nPalantir Stock vs. Nvidia Stock: Wall Street Says Sell One and Buy the Other\n\n\n\nTrevor Jennewine, The Motley Fool\n\nSun, October 6, 2024 at 8:55 AM GMT+1\xa05 min read\n\n18\n\nIn This Article:\n\n\n\nNVDA\n\n+1.68%\n\n\n\nPLTR\n\n\n\n\n\n^GSPC\n\n\n\nPalantir\xa0Technologies\xa0(NYSE: PLTR)\xa0and\xa0Nvidia\xa0(NASDAQ: NVDA)\xa0are two of the hottest\xa0artificial intelligence (AI) stocks\xa0on Wall Street. Suffice it to say Wall Street

### Deleting a file

In [5]:
# Step 2: Delete the new document
os.remove('../sample_docs/Nvidia ecosystem.pptx')
print("New document deleted.")

ns.refresh_index()
print("Index refreshed after deletion.")

2024-10-06 22:52:11,340 - nexusync.core.indexer - INFO - Starting index refresh process...
2024-10-06 22:52:11,342 - nexusync.core.indexer - INFO - Processing directory: ../sample_docs


New document deleted.


2024-10-06 22:52:11,738 - nexusync.core.indexer - INFO - Loaded 47 documents from ../sample_docs
2024-10-06 22:52:11,739 - nexusync.core.indexer - INFO - Updated 0 documents in ../sample_docs
2024-10-06 22:52:11,739 - nexusync.core.indexer - INFO - Upsert operation completed. Total documents processed: 47, updated or inserted: 0
2024-10-06 22:52:11,740 - nexusync.core.indexer - INFO - Deleting 1 documents from the index.
2024-10-06 22:52:11,741 - nexusync.core.indexer - INFO - Deletion process completed.
2024-10-06 22:52:11,743 - nexusync.core.indexer - INFO - Index refresh completed. Current document count: 4
2024-10-06 22:52:11,743 - nexusync.core.indexer - INFO - Total files in input directories: 4


Index refreshed after deletion.


In [6]:
query = "what is the breaking news?"
text_qa_template = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information above, I want you to think step by step to answer the query in a crisp manner. "
    "In case you don't know the answer, say 'I don't know!'.\n"
    "Query: {query_str}\n"
    "Answer: "
)


response = ns.query(text_qa_template = text_qa_template, query = query )

print(f"Query: {query}")
print(f"Response: {response['response']}")
print(f"Response: {response['metadata']}")

Query: what is the breaking news?
Response: Based on the provided context, it appears that there are two main breaking news stories:

1. Nvidia's CEO Jensen Huang has mentioned that the demand for Nvidia's Blackwell technology is "insane" and that everybody wants to have the most and be first.
2. Nvidia's latest earnings report showed strong financial performance, with revenue hitting $30.04 billion, up 122%, and beating Wall Street expectations.

However, without more specific information, it's difficult to pinpoint a single breaking news story.
Response: {'sources': [{'source_text': "file_path: /mnt/d/nexusync/notebooks/../sample_docs/news.docx\n\nPalantir Stock vs. Nvidia Stock: Wall Street Says Sell One and Buy the Other\n\n\n\nTrevor Jennewine, The Motley Fool\n\nSun, October 6, 2024 at 8:55 AM GMT+1\xa05 min read\n\n18\n\nIn This Article:\n\n\n\nNVDA\n\n+1.68%\n\n\n\nPLTR\n\n\n\n\n\n^GSPC\n\n\n\nPalantir\xa0Technologies\xa0(NYSE: PLTR)\xa0and\xa0Nvidia\xa0(NASDAQ: NVDA)\xa0are tw

In [7]:
# Verify the results
final_stats = ns.get_index_stats()
final_stats

AttributeError: 'NexuSync' object has no attribute 'get_index_stats'