# Memvid on Colab with GPU Support

This notebook allows you to run the Memvid project on Google Colab, leveraging a GPU for accelerated performance, particularly for embedding generation and semantic search.

Follow the steps below to set up the environment and use Memvid.

In [None]:
# Check GPU availability
!nvidia-smi

## 1. Setup Environment

This section checks GPU, clones the Memvid repository, and installs all necessary dependencies.

In [None]:
# Check GPU availability (already run in the cell above, but can be re-checked)
!nvidia-smi

# Clone the repository
!git clone https://github.com/featuregraph/memvid.git
%cd memvid

# Install system dependencies
!apt-get update && apt-get install -y ffmpeg libzbar0

# Install PyTorch with CUDA
!pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu121

# Install faiss-gpu and sentence-transformers
!pip install faiss-gpu sentence-transformers

# Install other Python dependencies (from requirements.txt, excluding conflicting/already installed ones)
!pip install qrcode[pil] opencv-python opencv-contrib-python numpy tqdm Pillow PyPDF2 python-dotenv beautifulsoup4 ebooklib openai google-generativeai anthropic

## 3. Configure for GPU and Prepare Data

This section configures Memvid to use the GPU for indexing and provides a space for your text data.

In [None]:
import os
import memvid.config
from memvid import MemvidEncoder, quick_chat, chat_with_memory

# --- Configure for GPU ---
# Get the default config and modify it for this session
print("Original default config for index:", memvid.config.get_default_config()['index'])
default_config = memvid.config.get_default_config()
default_config['index']['use_gpu'] = True
default_config['index']['type'] = 'Flat'  # 'Flat' index is robust for varying data sizes in examples
print("Updated default config for index:", default_config['index'])

# --- Define Output Paths ---
# Ensure you are in the 'memvid' directory cloned earlier
# %cd /content/memvid 
# (The %cd magic should have been run in the setup cell. If not, uncomment the line above)
output_dir = "colab_output" # Store outputs in a subdirectory within /content/memvid/
os.makedirs(output_dir, exist_ok=True)
video_file_path = os.path.join(output_dir, "memory_colab.mp4") # Using .mp4 for broad compatibility
index_file_path = os.path.join(output_dir, "memory_colab_index") # .json is added by save method

print(f"Video will be saved to: {os.path.abspath(video_file_path)}")
print(f"Index will be saved to: {os.path.abspath(index_file_path)}.faiss and .json")


# --- Provide Your Text Data ---
# Replace the example text below with your own data.
user_text_data = """
The quantum computer achieved 100 qubits of processing power in March 2024.
Machine learning models can now process over 1 trillion parameters efficiently.
The new GPU architecture delivers 5x performance improvement for AI workloads.
Cloud storage costs have decreased by 80% over the past five years.
Quantum encryption methods are becoming standard for secure communications.
Edge computing reduces latency to under 1ms for critical applications.
Neural networks can now generate photorealistic images in real-time.

This is a sample document for Memvid.
Memvid allows storing text into video frames using QR codes.
It uses sentence transformers for embeddings and FAISS for indexing.
This notebook demonstrates running Memvid on Colab with GPU support.
Make sure your FAISS index is configured to use the GPU for faster search.
Sentence transformers will also benefit from GPU acceleration.
"""
print(f"\nProvided text data has {len(user_text_data)} characters.")

## 4. Build Video Memory & Index

This step encodes your text data into a video and creates a searchable index.
If `use_gpu` was set to `True` in the config, FAISS indexing will utilize the GPU.
SentenceTransformer embedding generation will also automatically use the GPU if PyTorch is set up with CUDA.

In [None]:
# --- Initialize Encoder ---
# It will use the globally modified default_config if no specific config is passed.
encoder = MemvidEncoder() 
print(f"Encoder using config for index: {encoder.config['index']}")


# --- Add Text and Build Video ---
print("\nAdding text to encoder...")
encoder.add_text(user_text_data, chunk_size=256, overlap=50) # Adjusted chunk size for example

stats = encoder.get_stats()
print(f"\nEncoder stats before building:")
print(f"  Total chunks: {stats['total_chunks']}")
print(f"  Total characters: {stats['total_characters']}")

print(f"\nBuilding video and index...")
import time
start_time = time.time()
# Note: VIDEO_FILE_TYPE from config is used by build_video to determine extension if not in output_file
# We explicitly use .mp4 in video_file_path for this notebook.
build_stats = encoder.build_video(video_file_path, index_file_path, show_progress=True, codec='mp4v') # Using mp4v for wider compatibility
elapsed = time.time() - start_time

print(f"\nBuild completed in {elapsed:.2f} seconds.")
print(f"Video file: {build_stats.get('video_file', 'Not found')}")
print(f"Index file: {build_stats.get('index_file', 'Not found')}")
print(f"Video duration: {build_stats.get('duration_seconds', 0):.1f}s")
print(f"Video size: {build_stats.get('video_size_mb', 0):.2f}MB")

## 5. Set Up LLM API Key (Optional)

For chat functionality with an LLM (like OpenAI, Google, Anthropic), you need to provide an API key.
It's recommended to use Colab's secret manager (click the key icon on the left sidebar) to store your API key.
Then, you can access it like `os.environ.get('YOUR_SECRET_NAME')`.

In [None]:
import os
from google.colab import userdata # For Colab secrets

# Option 1: Use Colab Secrets (Recommended)
# Create a secret named OPENAI_API_KEY (or GOOGLE_API_KEY, ANTHROPIC_API_KEY) in Colab's secrets manager
# and put your API key there.
try:
    # Replace 'OPENAI_API_KEY' with the name of your secret if different
    # Also change for other providers e.g. 'GOOGLE_API_KEY'
    llm_api_key = userdata.get('OPENAI_API_KEY') 
    if llm_api_key:
        print("Successfully loaded API key from Colab secrets.")
        # For OpenAI, it's often set as an environment variable
        os.environ['OPENAI_API_KEY'] = llm_api_key 
        # For Google, the client usually takes it as a direct argument.
        # For Anthropic, similar.
    else:
        print("API key not found in Colab secrets. Chat responses will be context-only or may fail.")
except userdata.SecretNotFoundError:
    print("Secret not found. Please create it in Colab's secret manager for LLM chat.")
    llm_api_key = None
except Exception as e:
    print(f"An error occurred accessing Colab secrets: {e}")
    llm_api_key = None

# Option 2: Paste key directly (Less Secure - Use only for temporary testing)
# if not llm_api_key:
#   llm_api_key = "sk-your-openai-api-key" # Replace with your actual key
#   os.environ['OPENAI_API_KEY'] = llm_api_key 
#   print("Used manually pasted API key.")

if not llm_api_key:
    print("\nLLM API key is not set. Chat functions might not provide full LLM responses or may only show retrieved context.")

## 6. Query Your Memory

Now you can ask questions to your video memory.
`quick_chat` sends a single query. `chat_with_memory` starts an interactive session.
These functions use `MemvidChat` internally, which initializes `MemvidRetriever`.
Since we updated the global default config, the retriever should also use the GPU for FAISS.

In [None]:
# Ensure paths are correct (they are defined in cell "3. Configure for GPU and Prepare Data")
print(f"Using video file: {video_file_path}")
print(f"Using index file: {index_file_path}") # IndexManager handles .faiss/.json extensions

# --- Quick Chat Example ---
if os.path.exists(video_file_path) and os.path.exists(index_file_path + ".faiss"):
    query = "What is Memvid?"
    print(f"\nSending query to quick_chat: '{query}'")
    
    # provider can be 'openai', 'google', 'anthropic'
    # Ensure you have the corresponding API key set up and library installed
    # For Google, you might need !pip install google-generativeai
    # For Anthropic, you might need !pip install anthropic
    # These should be in requirements.txt if intended for use.
    # The llm_api_key is passed to the MemvidChat constructor if not found in env.
    response = quick_chat(video_file_path, index_file_path, query, provider='openai', api_key=llm_api_key)
    print("\nResponse from quick_chat:")
    print(response)
else:
    print("\nMemory files not found. Please run the 'Build Video Memory & Index' step successfully.")

# --- Interactive Chat Example (Optional) ---
# Uncomment the lines below to start an interactive chat session.
# print("\nStarting interactive chat session (type 'quit' or 'exit' to end):")
# if os.path.exists(video_file_path) and os.path.exists(index_file_path + ".faiss"):
#    chat_with_memory(video_file_path, index_file_path, provider='openai', api_key=llm_api_key)
# else:
#    print("\nMemory files not found for interactive chat.")