LangChain RAG Application Project

This project demonstrates the use of LangChain to implement a Retrieval-Augmented Generation (RAG) application. The application integrates prompt templates, LLM chains, retrieval mechanisms, and conversation memory to answer user queries based on content from a web source.

Getting Started

These instructions will help you get a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

You need to install the following tools and configure their dependencies:

Python (version 3.13 or higher)
```
python --version
```
Should return something like:
```
Python 3.13.0
```
Git
- Install Git by following the instructions here
Verify the installation:
```
git --version
```
Should return something like:
```
git version 2.25.1
```

Installing

Clone the repository and navigate into the project directory:

git clone https://github.com/Sebasvasquezz/LangChain-RAG-Tutorial.git

cd LangChain-RAG-Tutorial

Before running the application you need to get the OpenAI api key and replace it in the line:
```
os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY_HERE"
```
and the LangChain api key in the line:
```
os.environ["LANGCHAIN_API_KEY"] = "YOUR_API_KEY_HERE"
```

Running the Notebook

Run each cell in the notebook one by one, ensuring each step completes without errors.

Cell Descriptions

Cell 1: This cell installs and updates the langchain, langchain-community, and langchain-chroma libraries silently.
Cell 2: This cell sets up environment variables for LangSmith tracking and authentication.
Cell 3: This cell configures the OpenAI API key for accessing its models and installs BeautifulSoup for parsing HTML.
Cell 4: This cell creates an instance of the ChatOpenAI GPT-4 mini model.
Cell 5: This cell extracts specific data (title, header, content) from a web page using BeautifulSoup.
Cell 6: This cell prints the first 500 characters of the loaded content.
Cell 7: This cell sets up a text splitter that includes start indices for dividing text into chunks.
Cell 8: This cell displays the length of the content of the first chunk.
Cell 9: This cell displays the metadata of the tenth chunk.
Cell 10: This cell creates and stores the text chunks in a vector database.
Cell 11: This cell sets up a retriever and performs a sample query on the vector database.
Cell 12: This cell prints the content of the first retrieved document.
Cell 13: This cell updates the langchain-openai library once again.
Cell 14: This cell configures OpenAI and creates another instance of the GPT-4 mini model.
Cell 15: This cell sets up and runs a sample prompt using the prompt hub from LangChain.
Cell 16: This cell prints the content of an example message from the prompt hub.
Cell 17: This cell sets up and runs a Retrieval-Augmented Generation (RAG) chain for continuous response generation.
Cell 18: This cell creates a sub-chain for testing retrieval functionality.
Cell 19: This cell configures a RAG chain with templates for answering questions.
Cell 20: This cell prints the source documents retrieved by the RAG chain.
Cell 21: This cell sets up and runs a custom prompt using LangChain core's prompt templates.

Architectural Diagram

Diagram Explanation

This architecture illustrates the workflow of the Retrieval-Augmented Generation (RAG) project using LangChain. The components and their interactions are as follows:

User: Represents the end-user who sends a query to the system.
LangChain RAG Pipeline: Orchestrates the project's workflow by integrating document retrieval and response generation components.
Retriever: Fetches relevant document chunks from the vector database based on the user's query.
Vector Database: Stores processed documents as vector embeddings to facilitate similarity-based search.
OpenAI GPT-4 Model: Processes the retrieved documents and the query to generate a meaningful response.
Generated Response: The final response delivered to the user, combining retrieved information and generated insights.

The system is designed to combine the strengths of retrieval (for context) and generative models (for response creation), ensuring accurate and contextual replies to user queries.

Built With

Git - Version Control System

Versioning

I use GitHub for versioning. For the versions available, see the tags on this repository.

Authors

Juan Sebastian Vasquez Vega - Sebasvasquezz

Date

November 14, 2024

License

This project is licensed under the GNU License - see the LICENSE.txt file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
langchainRAG.ipynb		langchainRAG.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LangChain RAG Application Project

Getting Started

Prerequisites

Installing

Running the Notebook

Cell Descriptions

Architectural Diagram

Diagram Explanation

Built With

Versioning

Authors

Date

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LangChain RAG Application Project

Getting Started

Prerequisites

Installing

Running the Notebook

Cell Descriptions

Architectural Diagram

Diagram Explanation

Built With

Versioning

Authors

Date

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages