# Virtual Teaching Assistant NVIDIA AI Blueprint

Imagine a teaching assistant that never sleeps, never gets tired, and is always ready to help. Our Virtual Teaching Assistant (VTA) makes that possible, transforming static educational content into dynamic learning experiences. Built for modern classrooms and self-learners alike, VTA is designed to:

- Break down complex course materials into clear, conversational explanations

- Guide students through material and help them arrive at answers on their own

- Adapt its tone and depth based on user needs — from quick summaries to in-depth walkthroughs

- Utilize powerful language models via NVIDIA NIM to understand and explain academic content



<img src="nvidiaragimg.png" alt="NVIDIA RAG Diagram" style="width:80%;"/>


## Features
Built on the NVIDIA RAG Blueprint, this system is optimized for fast setup, multimodal data handling, and private, on-prem inference or cloud deployment. Launch the full stack with a single make all. **We've built this for you to edit and deploy on your own infrastructure with ease.**

#### Core Capabilities
- **Canvas Integration  📚** - The Course Manager API provides a RESTful interface to authenticate with Canvas, retrieve course data, and download materials. It integrates with the RAG server to enable AI-powered content processing.
- **Guardrailed Conversations 🛡️** - Uses NeMo Guardrails to maintain safe, educational, and on-topic assistant responses—ideal for a classroom or learning environment.
- **Multimodal Ingestion 📄** - Extracts text, tables, charts, and images from PDFs, DOCX, and PPTX files using GPU-accelerated NIM services.
- **On-Prem LLM & Retrieval 🧠** - Locally hosts embedding, reranking, and inference microservices using Meta Llama and NVIDIA models—ensuring low latency and data privacy.
- **Hybrid Semantic Search 🌐** - Combines dense and sparse search for accurate academic content retrieval with support for multilingual queries.
- **Context-Aware Responses 🔥** - Supports multi-turn Q&A with reranking and query rewriting for enhanced dialogue quality.


#### Development Experience
- **One-Command Deployment ⚙️** - Launch ingestion, RAG, NIM services, and the playground UI with a single make all.
- **Docker Compose Integration 🐳** - one command (`make all`) spins up the entire stack, with smart handling of GPU resources and service dependencies.


#### Extend and Customize (More Information in nvidia-rag-2.0/docs)
- **Swap Inference or Embedding Models 🔁** - Easily change the LLM or embedding model to match your performance or domain needs.
- **Customize Prompts and Parameters 🎛️** - Tailor prompt templates and LLM parameters at runtime for better control.
- **Turn on Image Captioning 🖼️** - Add vision-language model support to describe visual content.
- **Activate Hybrid Search 🔍** - Combine sparse and dense retrieval to improve chunk relevance.
- **Optimize for Text-Only Mode ⚡** - Reduce latency and compute for lightweight use cases.


## Prerequisites

### Clone the repository

In [None]:
# Can skip this step if you just create the VM and can access it through ssh

!git clone 

## Get an NVIDIA NIM Trial API Key

Prior to getting started, you will need to create API Keys to access NVIDIA NIM trial hosted endpoints.

If you don’t have an NVIDIA account, you will be asked to sign-up.

Click [here](https://build.nvidia.com/meta/llama-3_3-70b-instruct?signin=true&api_key=true) to sign-in and get an API key


<div class="alert alert-block alert-success">
    <b>Tip:</b> The key begins with the letters nvapi-.

## Set Environment Variables

This notebook requires certain environment variables to be configured. We'll help you set these up in a `.env` file.

Required variables:
- `NVIDIA_API_KEY`: Your NVIDIA API key
- `MAX_CONCURRENT_REQUESTS`: Number of concurrent requests allowed (recommended: 1 for local development)

Run the code cell below to create your `.env` file. Make sure to replace the placeholder values with your actual API keys.

In [2]:
%%bash

cd s25-nvidia/
    
# Backup existing .env if it exists
if [ -f .env ]; then
    echo "Warning: .env file already exists. Backing up to .env.backup"
    mv .env .env.backup
fi

# Create new .env file
cat > .env << EOL
NVIDIA_API_KEY=<ENTER_KEY>
MAX_CONCURRENT_REQUESTS=1
EOL

echo "Created .env file. Please edit it with your actual API keys."
echo -e "\nCurrent .env contents:"
echo "----------------------------------------"
cat .env

bash: line 2: cd: s25-nvidia/: No such file or directory


Created .env file. Please edit it with your actual API keys.

Current .env contents:
----------------------------------------
NVIDIA_API_KEY=<ENTER_KEY>
MAX_CONCURRENT_REQUESTS=1


## Install Dependancies

You can install them by simply running `make setup` in the root of the project

In [None]:
%%bash

# Cd into the repo
cd s25-nvidia/ 

# Making setup script executable
make setup 

bash: line 3: cd: s25-nvidia/: No such file or directory


CalledProcessError: Command 'b'\n# Cd into the repo\ncd s25-nvidia/ \n\n# Making setup script executable\nmake setup > /dev/null 2>&1\n'' returned non-zero exit status 2.

## Spin Up Blueprint
Docker compose scripts are provided which spin up the microservices on a single node. This docker-compose yaml file will start up each microservice. This may take up to **15 minutes** to complete.

> **In a separate terminal window, run**

```
cd s25-nvidia/
make all
```

In [None]:
!docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"

This command should produce similiar output in the following format:

```
NAMES                                   STATUS
compose-nv-ingest-ms-runtime-1          Up 5 minutes (healthy)
ingestor-server                         Up 5 minutes
compose-redis-1                         Up 5 minutes
rag-playground                          Up 9 minutes
rag-server                              Up 9 minutes
milvus-standalone                       Up 36 minutes
milvus-minio                            Up 35 minutes (healthy)
milvus-etcd                             Up 35 minutes (healthy)
nemoretriever-ranking-ms                Up 38 minutes (healthy)
compose-page-elements-1                 Up 38 minutes
compose-paddle-1                        Up 38 minutes
compose-graphic-elements-1              Up 38 minutes
compose-table-structure-1               Up 38 minutes
nemoretriever-embedding-ms              Up 38 minutes (healthy)
nim-llm-ms                              Up 38 minutes (healthy)
```

You can check if the services are up by running the cells below

In [None]:
!curl localhost:8002/health

## Run the remaining commands for the course_manager_api and frontend service to be started up

%%bash

# Try the make commands first 
run make frontend-compose
run make api-start

# If the course-manager-api service and the frontend is not showing up, then manually cd into their directories and run the following compose commands

docker compose -f deploy/compose/course-manager.yaml up

and

docker compose -f deploy/compose/frontend.yaml up

Open a web browser and access http://localhost:3000 to use our frontend. You can use the upload tab to ingest files into the server or follow the notebooks to understand the API usage.

In [4]:
from IPython.display import Image

Image(url="https://raw.githubusercontent.com/Clemson-Capstone/s25-nvidia/main/docs/Dori_Frontend.png")


## If you have any questions or trouble running the brev launchable please feel free to leave a pull request for any fixes you know or reach out to the Clemson Team!