# NexuSync Demo Notebook

This notebook demonstrates the basic usage of NexuSync for document indexing and querying.

## Initialize NexuSync

In [1]:
from nexusync.models import set_embedding_model, set_language_model
from nexusync import NexuSync
import os

EMBEDDING_MODEL = "BAAI/bge-base-en-v1.5"
LLM_MODEL = 'llama3.2'
TEMPERATURE = 0.4
INPUT_DIRS = ["../sample_docs"] # can put multiple paths

set_embedding_model(huggingface_model= EMBEDDING_MODEL) 
set_language_model(ollama_model = LLM_MODEL, temperature=TEMPERATURE)
ns = NexuSync(input_dirs=INPUT_DIRS)

  from .autonotebook import tqdm as notebook_tqdm


Using HuggingFace embedding model: BAAI/bge-base-en-v1.5
Using Ollama LLM model: llama3.2


2024-10-07 15:51:50,816 - nexusync.core.indexer - INFO - Index already built. Loading from disk.


In [3]:
text_qa_template = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information above, I want you to think step by step to answer the query in a crisp manner. "
    "In case you don't know the answer, say 'I don't know!'.\n"
    "Query: {query_str}\n"
    "Answer: "
)


## One-time query

In [4]:
query = "News about Nvidia?"


response = ns.query(text_qa_template = text_qa_template, query = query )

print(f"Query: {query}")
print(f"Response: {response['response']}")
print(f"Response: {response['metadata']}")

Query: News about Nvidia?
Response: Based on the provided context information, here are the key points related to news about Nvidia:

1. Nvidia's CEO Jensen Huang made a bombshell announcement that raised the bar for the stock.
2. Nvidia plans to ship Blackwell GPUs to clients in Q4 of this year, with a consumer release expected in 2025.
3. The demand for Blackwell is "insane," and Nvidia forecasts $32.5 billion in revenue for the current quarter, an 80% increase from last year.
4. Nvidia's stock has surged by over 150% this year, following an impressive 240% gain in 2023.
5. Major cloud providers like AWS, Azure, and Google Cloud are integrating Blackwell into their infrastructure to support high-performance AI workloads.
6. Oracle announced that it would need 131,072 Nvidia Blackwell GPUs as part of a $6.5 billion investment to establish a new public cloud region in Malaysia.

These points suggest that Nvidia is making significant advancements in its artificial intelligence (AI) tech

## Chat with Context

In [4]:
# Initiate the chat engine once
ns.chat_engine.initialize_chat_engine(text_qa_template, chat_mode="context")

2024-10-07 15:52:43,787 - nexusync.core.chat_engine - INFO - Chat engine initialized


In [7]:
# Start chatting, chat with memories
queries = [
    "how many GPUs will Nvidia get from Oracle?",
    "what is its ecosystem?"
]

for query in queries:
    print(f"Human: {query}")
    response = ns.chat_engine.chat(query)
    print(f"AI: {response['response']}\n")
    print(f"METADATA: {response['metadata']['sources'][0]['metadata']['file_path']}")

Human: how many GPUs will Nvidia get from Oracle?
AI: The text does not mention that Oracle plans to order any GPUs from Nvidia. It actually mentions that Futurewei (not Oracle) would need 131,072 Nvidia Blackwell GPUs as part of a $6.5 billion investment to establish a new public cloud region in Malaysia.

Additionally, the text also mentions Palantir Technologies and its relationship with Nvidia, but it does not mention any GPU orders from Nvidia by either company, including Oracle.

METADATA: /mnt/d/nexusync/notebooks/../sample_docs/news.docx
Human: what is its ecosystem?
AI: Based on the provided context information, it appears that Nvidia's ecosystem refers to the company's comprehensive platform and tools for developing, deploying, and managing AI, data science, and high-performance computing (HPC) applications.

The text mentions various components of Nvidia's ecosystem, including:

1. GPUs: Nvidia graphics processing units are used for accelerating AI, data science, and HPC wor

In [11]:
# Get chat history
chat_history = ns.chat_engine.get_chat_history()
print("Chat History:")
for entry in chat_history:
    print(f"Human: {entry['query']}")
    print(f"AI: {entry['response']}\n")

Chat History:
Human: What is NexuSync?
AI: NexuSync is a powerful document indexing and querying tool built on top of LlamaIndex. It allows you to efficiently manage, search, and interact with large collections of documents using advanced natural language processing techniques.

Human: What are its main features?
AI: According to the README.md file, NexuSync has the following main features:

1. **Smart Document Indexing**: Automatically index documents from specified directories, keeping your knowledge base up-to-date.
2. **Efficient Querying**: Use natural language to query your document collection and get relevant answers quickly.
3. **Upsert Capability**: Easily update or insert new documents into the index without rebuilding from scratch.
4. **Deletion Handling**: Automatically remove documents from the index when they're deleted from the filesystem.
5. **Chat Interface**: Engage in conversational interactions with your document collection, making information retrieval more intuiti

### Stream Chat (word by word output)

In [8]:
# Initiate the chat engine once
ns.chat_engine.initialize_chat_engine(text_qa_template, chat_mode="context")

2024-10-07 15:54:17,669 - nexusync.core.chat_engine - INFO - Chat engine initialized


In [9]:
query = "What is the nvidia ecosystem?"
for token in ns.chat_engine.chat_stream(query):
    print(token, end='', flush=True)  # Print each token as it's generated

The Nvidia ecosystem refers to the collection of technologies, products, and services developed by NVIDIA Corporation that support its graphics processing units (GPUs) and other computing hardware. The ecosystem includes:

1. GPUs: NVIDIA's graphics processing units, which are used for a wide range of applications, including gaming, professional visualization, artificial intelligence, and high-performance computing.
2. CUDA: A parallel computing platform and programming model developed by NVIDIA, which allows developers to harness the power of GPUs for general-purpose computing.
3. Deep Learning: NVIDIA provides a range of tools and technologies for deep learning, including Tensor Cores, cuDNN, and Deep Learning SDKs.
4. Nvidia Drive: A platform for developing autonomous vehicles, which includes GPUs, software development kits, and other technologies.
5. Nvidia Grid: A cloud-based platform for delivering high-performance computing resources to businesses and researchers.
6. Nvidia DGX:

In [12]:
# Print each token as it's generated
response_generator = ns.chat_engine.chat_stream(query)
for item in response_generator:
    if isinstance(item, str):
        print(item, end='', flush=True)
    else:
        # This is the final yield with the full response and metadata
        full_response = item
        break

print("\n\nFull response:", full_response['response'])
print("Metadata:", full_response['metadata'])

The Nvidia ecosystem refers to a broad range of products, services, and technologies developed by NVIDIA Corporation that support its graphics processing units (GPUs) and other computing hardware. The ecosystem includes:

1. **GPUs**: NVIDIA's graphics processing units, which are used for gaming, professional visualization, artificial intelligence, high-performance computing, and other applications.
2. **CUDA**: A parallel computing platform and programming model developed by NVIDIA, allowing developers to harness the power of GPUs for general-purpose computing.
3. **Deep Learning**: NVIDIA provides tools and technologies for deep learning, including Tensor Cores, cuDNN, and Deep Learning SDKs.
4. **Nvidia Drive**: A platform for developing autonomous vehicles, featuring GPUs, software development kits, and other technologies.
5. **Nvidia Grid**: A cloud-based platform delivering high-performance computing resources to businesses and researchers.
6. **Nvidia DGX**: Pre-configured syste

## Refresh the Index

### Adding a document

In [2]:
# Add a new document
with open("../sample_docs/new_added.txt", "w") as f:
    f.write("Breaking News: Trump and Harris had a fight!")

# Refresh the index: incremental in new files and detect deleted files in the folder
ns.refresh_index()
print("Index refreshed successfully!")

2024-10-07 15:39:38,122 - nexusync.core.indexer - INFO - Starting index refresh process...
2024-10-07 15:39:38,124 - nexusync.core.indexer - INFO - Processing directory: ../sample_docs
VisionEncoderDecoderModel has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From üëâv4.50üëà onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior.

Index refreshed successfully!


In [5]:
query = "whhat news about tesla?"
text_qa_template = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information above, I want you to think step by step to answer the query in a crisp manner. "
    "In case you don't know the answer, say 'I don't know!'.\n"
    "Query: {query_str}\n"
    "Answer: "
)


response = ns.query(text_qa_template = text_qa_template, query = query )

print(f"Query: {query}")
print(f"Response: {response['response']}")
print(f"Response: {response['metadata']}")

Query: whhat news about tesla?
Response: Based on the provided context, here's the answer:

Tesla has released software update 2024.38 for its employees, which includes two main features: 

1. The ability to use Spotify with a free Spotify account (still requiring Premium Connectivity).
2. Improvements to the vehicle‚Äôs side mirror functions.

Additionally, Tesla now allows users to fold their side mirrors in and out using the quick menu, and there are also minor fixes and improvements in the update.
Response: {'sources': [{'source_text': "file_path: /mnt/d/nexusync/notebooks/../sample_docs/news.docx\n\nThe Features in Tesla's 2024.38 Software Update\n\nOctober 4, 2024\n\nBy Karan Singh\n\n\n\n\n\nNot a Tesla App\n\nTesla has released\xa0software update 2024.38\xa0to its employees for its last round of internal testing. We now have preliminary release notes for this update. You‚Äôll also be able to unfold them from the quick menu as well. Other Updates\n\nSimilar to other recent updat

### Deleting a file

In [3]:
# Step 2: Delete the new document
# os.remove('../sample_docs/Nvidia ecosystem.pptx')
# print("New document deleted.")

ns.refresh_index()
print("Index refreshed after deletion.")

2024-10-07 15:40:27,349 - nexusync.core.indexer - INFO - Starting index refresh process...
2024-10-07 15:40:27,353 - nexusync.core.indexer - INFO - Processing directory: ../sample_docs
2024-10-07 15:40:36,886 - nexusync.core.indexer - INFO - Loaded 7 files from ../sample_docs
2024-10-07 15:40:36,887 - nexusync.core.indexer - INFO - Updated 0 files in ../sample_docs
2024-10-07 15:40:36,888 - nexusync.core.indexer - INFO - No deleted files found.


Index refreshed after deletion.


In [6]:
query = "what is the breaking news?"
text_qa_template = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information above, I want you to think step by step to answer the query in a crisp manner. "
    "In case you don't know the answer, say 'I don't know!'.\n"
    "Query: {query_str}\n"
    "Answer: "
)


response = ns.query(text_qa_template = text_qa_template, query = query )

print(f"Query: {query}")
print(f"Response: {response['response']}")
print(f"Response: {response['metadata']}")

Query: what is the breaking news?
Response: Based on the provided context, it appears that there are two main breaking news stories:

1. Nvidia's CEO Jensen Huang has mentioned that the demand for Nvidia's Blackwell technology is "insane" and that everybody wants to have the most and be first.
2. Nvidia's latest earnings report showed strong financial performance, with revenue hitting $30.04 billion, up 122%, and beating Wall Street expectations.

However, without more specific information, it's difficult to pinpoint a single breaking news story.
Response: {'sources': [{'source_text': "file_path: /mnt/d/nexusync/notebooks/../sample_docs/news.docx\n\nPalantir Stock vs. Nvidia Stock: Wall Street Says Sell One and Buy the Other\n\n\n\nTrevor Jennewine, The Motley Fool\n\nSun, October 6, 2024 at 8:55 AM GMT+1\xa05 min read\n\n18\n\nIn This Article:\n\n\n\nNVDA\n\n+1.68%\n\n\n\nPLTR\n\n\n\n\n\n^GSPC\n\n\n\nPalantir\xa0Technologies\xa0(NYSE: PLTR)\xa0and\xa0Nvidia\xa0(NASDAQ: NVDA)\xa0are tw