Google Cloud RAG Service Tutorial

A comprehensive tutorial for building a Retrieval-Augmented Generation (RAG) service that uses Google Cloud to process documents and enable powerful semantic search with natural language responses.

What is RAG?

Retrieval-Augmented Generation (RAG) combines search capabilities with generative AI to produce more accurate, contextualized answers. The system works by:

Converting documents into vector embeddings
Storing these embeddings in a vector database
Retrieving relevant information based on semantic similarity when queried
Using an LLM to generate natural language responses based on the retrieved context

Features

Process various Google Workspace documents:
- Google Docs
- Google Sheets
- Google Slides
- Microsoft Word (.docx)
- Microsoft Excel (.xlsx)
- Microsoft PowerPoint (.pptx)
Generate embeddings using Hugging Face models
Store and query vector embeddings with Chroma DB
Generate contextual responses using OpenAI models

Prerequisites

Python 3.8+
Google Cloud Service Account with access to Drive, Docs, Sheets, and Slides APIs
OpenAI API key (for generating responses)

Setup

Clone this repository:

git clone https://github.com/TribalScale/google-cloud-rag.git
cd google-cloud-rag

Create and activate a virtual environment:

# Using venv (Python's built-in virtual environment)
python -m venv venv

# Activate the virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

Install dependencies:

pip install langchain-community langchain-core langchain-openai "langchain[text-splitters]" python-dotenv google-api-python-client google-auth-httplib2 google-auth-oauthlib openpyxl python-docx python-pptx sentence-transformers chromadb

Create .env file from template:
```
cp .env.template .env
```
Configure your environment variables in .env:
- GOOGLE_SERVICE_ACC: Your Google service account credentials JSON
- OPENAI_API_KEY: Your OpenAI API key
- CHROMA_PATH: Local path for Chroma DB storage (e.g., "./chroma_db")

Usage

Processing Documents from Google Drive

from index import upload_google_drive

# Replace with your Google Drive folder ID
upload_google_drive("your-drive-folder-id")

Querying the RAG System

from index import query_rag

# Ask a question based on the processed documents
query_rag("What is the revenue forecast for Q3?")

How It Works

Document Processing (google_cloud.py): Extracts text from various Google Workspace and Microsoft Office documents.
Text Splitting (rag_db_service.py): Divides documents into manageable chunks for more efficient retrieval.
Embedding Generation (get_embeddings.py): Creates vector embeddings using the Hugging Face model.
Vector Storage (rag_db_service.py): Stores document chunks and their embeddings in a Chroma vector database.
Retrieval and Response (index.py): Retrieves relevant document chunks based on query similarity and generates natural language responses using OpenAI's models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Cloud RAG Service Tutorial

What is RAG?

Features

Prerequisites

Setup

Usage

Processing Documents from Google Drive

Querying the RAG System

How It Works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.template		.env.template
get_embeddings.py		get_embeddings.py
google_cloud.py		google_cloud.py
index.py		index.py
rag_db_service.py		rag_db_service.py
readme.md		readme.md

Folders and files

Latest commit

History

Repository files navigation

Google Cloud RAG Service Tutorial

What is RAG?

Features

Prerequisites

Setup

Usage

Processing Documents from Google Drive

Querying the RAG System

How It Works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages