# RAG Implementation For Wikipedia Search Assistant with FAISS Vector Library, OpenAI and LangChain

## References:

1. Tutorial "FAISS Vector Library with LangChain and OpenAI (Semantic Search)" by RyanNolanData, available at [this link](https://www.youtube.com/watch?v=ZCSsIkyCZk4).

2. [Wikipedia]('https://en.wikipedia.org/wiki/')


## Description
The goal of this experiment is to implement a Retrieval-Augmented Generation (RAG) pipeline, by using FAISS vector library, to answer user queries by combining document retrieval from Wikipedia.



##Setup

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
cd "YOUR-PATH"

/content/drive/MyDrive/2-DL-ML/14_GenAI/2_RAG_Wikipedia


In [3]:
# Install necessary libraries
%%capture
!pip install langchain langchain-openai langchain-community wikipedia wikipedia-api faiss-cpu chromadb tiktoken sentence-transformers openai


In [None]:
# Import Libraries. These libraries will be used to build the RAG pipeline

from langchain_openai import OpenAI
from langchain_community.document_loaders import WikipediaLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.vectorstores import VectorStoreRetriever
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

In [5]:
# Set up OpenAI API Key
import os
os.environ['OPENAI_API_KEY'] = 'YOUR_OPENAI_API_KEY'

In [6]:
# Load Wikipedia Articles
# Define a function to fetch Wikipedia articles and convert them into documents
def load_wikipedia_articles(queries):
    documents = []
    for query in queries:
        try:
            loader = WikipediaLoader(query=query)
            docs = loader.load()
            documents.extend(docs)
        except Exception as e:
            print(f"Error loading {query}: {e}")
    return documents

In [7]:
# Load Wikipedia articles based on user-defined query
query = ["Artificial Intelligence"]
documents = load_wikipedia_articles(query)
print(f"Loaded {len(documents)} documents.")

Loaded 25 documents.


In [8]:
# Define RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=0,
    length_function=len
)

In [9]:
# Split the loaded documents into smaller chunks based on the text splitter configuration
docs = text_splitter.split_documents(documents)

In [10]:
# Print the total number of chunks created after splitting the documents
len(docs)

313

In [11]:
# Display the first chunk
docs[0]

Document(metadata={'title': 'Artificial intelligence', 'summary': 'Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals. Such machines may be called AIs.\nHigh-profile applications of AI include advanced web search engines (e.g., Google Search); recommendation systems (used by YouTube, Amazon, and Netflix); virtual assistants (e.g., Google Assistant, Siri, and Alexa); autonomous vehicles (e.g., Waymo); generative and creative tools (e.g., ChatGPT and AI art); and superhuman play and analysis in strategy games (e.g., chess and Go). However, many AI applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applications, often without being cal

In [12]:
# Initialize the OpenAI Embeddings model
embedding = OpenAIEmbeddings()

In [13]:
# Create a FAISS vector store from the splitted text documents and their embeddings
library = FAISS.from_documents(docs, embedding)

In [14]:
# Define an example of user query
query = "What is the impact of Artificial Intelligence on society?"

In [15]:
# Perform a similarity search in the vector store using the query
query_answer = library.similarity_search(query)

In [16]:
# Print the content of the first retrieved document chunk
print(query_answer[0].page_content)

the future of AI and its impact on society.


In [17]:
# Print the content of the second retrieved document chunk
print(query_answer[1].page_content)

The philosophy of artificial intelligence attempts to answer such questions as follows:


In [18]:
# Perform a similarity search that also returns scores for each result
docs_and_scores = library.similarity_search_with_score(query)

In [19]:
# Print the first retrieved document chunk along with its similarity score
docs_and_scores[0]

(Document(id='a95ef3f4-4b66-4ee6-bb07-6d0fad97b8ea', metadata={'title': 'History of artificial intelligence', 'summary': 'The history of artificial intelligence (AI) began in antiquity, with myths, stories, and rumors of artificial beings endowed with intelligence or consciousness by master craftsmen. The study of logic and formal reasoning from antiquity to the present led directly to the invention of the programmable digital computer in the 1940s, a machine based on abstract mathematical reasoning. This device and the ideas behind it inspired scientists to begin discussing the possibility of building an electronic brain.\nThe field of AI research was founded at a workshop held on the campus of Dartmouth College in 1956. Attendees of the workshop became the leaders of AI research for decades. Many of them predicted that machines as intelligent as humans would exist within a generation. The U.S. government provided millions of dollars with the hope of making this vision come true.\nEve

In [20]:
# Print the second retrieved document chunk along with its similarity score
docs_and_scores[1]

(Document(id='b2224e0a-570e-42eb-a4d4-39195e0669b3', metadata={'title': 'Philosophy of artificial intelligence', 'summary': 'The philosophy of artificial intelligence is a branch of the philosophy of mind and the philosophy of computer science that explores artificial intelligence and its implications for knowledge and understanding of intelligence, ethics, consciousness, epistemology, and free will. Furthermore, the technology is concerned with the creation of artificial animals or artificial people (or, at least, artificial creatures; see artificial life) so the discipline is of considerable interest to philosophers. These factors contributed to the emergence of the philosophy of artificial intelligence.\nThe philosophy of artificial intelligence attempts to answer such questions as follows:\n\nCan a machine act intelligently? Can it solve any problem that a person would solve by thinking?\nAre human intelligence and machine intelligence the same? Is the human brain essentially a com

In [21]:
# Print the third retrieved document chunk along with its similarity score
docs_and_scores[2]

(Document(id='e239effc-6a4e-4b12-8e23-5648179678c4', metadata={'title': 'Artificial intelligence in India', 'summary': "The AI market in India is projected to reach $8 billion by 2025, growing at a compound annual growth rate (CAGR) of over 40% from 2020 to 2025. \nThis growth is part of the broader AI boom, a global period of rapid technological advancements starting in the late 2010s and gaining prominence in the early 2020s. Globally, breakthroughs in protein folding by Google DeepMind and the rise of generative AI models from OpenAI have defined this era. In India, the development of AI has been similarly transformative, with applications in healthcare, finance, and education, bolstered by government initiatives like NITI Aayog's 2018 National Strategy for Artificial Intelligence.\nWhile AI presents significant opportunities for economic growth and social development in India, challenges such as data privacy concerns, skill shortages, and ethical considerations need to be addressed

In [39]:
# Create a retriever from the FAISS vector store
retriever = library.as_retriever()

In [40]:
# Create a RetrievalQA chain using the retriever and LLM
qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-4o-mini"),
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

In [41]:
# Define the user's query to retrieve relevant information from the vector store
retriever_query = "What is the impact of Artificial Intelligence on society?"

In [45]:
# Use the QA chain to process the query and retrieve the results
results = qa.invoke(retriever_query)

In [46]:
# Process and display the query and result:
def display_result(query, result):
    print("\n=== Query ===")
    print(query)  # Print the retriever query

    print("\n=== Answer ===")
    print(result["result"])  # Print the answer

    print("\n=== Sources ===")
    for idx, source in enumerate(result["source_documents"], start=1):
        print(f"{idx}. {source.metadata.get('source', 'Unknown Source')}")

results = qa.invoke(retriever_query)
display_result(retriever_query, results)


=== Query ===
What is the impact of Artificial Intelligence on society?

=== Answer ===
The impact of Artificial Intelligence on society encompasses a wide range of areas, including:

1. **Economic Changes**: AI can lead to technological unemployment as machines take over tasks previously performed by humans. This may result in job displacement but could also create new job opportunities in AI-related fields.

2. **Ethical Considerations**: The development of machine ethics raises questions about how to ensure AI behaves in ways that align with human values. This includes concerns about lethal autonomous weapon systems and the potential for AI to be used in harmful ways.

3. **Misinformation**: AI has the potential to generate and spread misinformation, which can affect public opinion and trust in information sources.

4. **AI Safety and Alignment**: Ensuring that AI systems operate safely and align with human intentions is a significant challenge, especially as AI systems become more

In [30]:
# Save the FAISS vector store to a local file ("faiss_index_AI")
# This allows reusing the vector store without recalculating embeddings.
library.save_local("faiss_index_AI")

In [33]:
# Load the FAISS vector store from the saved file
AI_saved = FAISS.load_local("faiss_index_AI", embedding, allow_dangerous_deserialization=True)

In [34]:
# Create a RetrievalQA chain using the loaded FAISS retriever and a language model
qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-4o-mini"),
    chain_type="stuff",
    retriever=AI_saved.as_retriever(),
    return_source_documents=True
)

In [35]:
# Use the QA chain to process the query and retrieve results
results = qa.invoke(retriever_query)

In [38]:
# Process and display the query and result:
def display_result(query, result):
    print("\n=== Query ===")
    print(query)  # Print the retriever query

    print("\n=== Answer ===")
    print(result["result"])  # Print the answer

    print("\n=== Sources ===")
    for idx, source in enumerate(result["source_documents"], start=1):
        print(f"{idx}. {source.metadata.get('source', 'Unknown Source')}")

results = qa.invoke(retriever_query)
display_result(retriever_query, results)


=== Query ===
What is the impact of Artificial Intelligence on society?

=== Answer ===
The impact of Artificial Intelligence on society is multifaceted and encompasses both positive and negative aspects. Some of the potential impacts include:

1. **Economic Changes**: AI can lead to increased productivity and efficiency in various industries, potentially driving economic growth. However, it may also result in technological unemployment as certain jobs become automated.

2. **Healthcare Improvements**: AI has the potential to revolutionize healthcare by improving diagnostics, personalizing treatment plans, and enhancing research capabilities.

3. **Ethical Considerations**: The rise of AI raises important ethical questions, including how to ensure that AI systems behave ethically and the implications of creating lethal autonomous weapon systems.

4. **Misinformation and Manipulation**: AI can be used to generate and spread misinformation, posing challenges for public discourse and tru