### Installing required packages

In [None]:
!pip install langchain faiss-cpu langchain-community sentence-transformers
!pip install -q langchain-google-genai
!pip install --upgrade -q langchain-google-genai
!pip show langchain-google-genai
!pip install -q google-generativeai

Collecting langchain
  Downloading langchain-0.3.1-py3-none-any.whl.metadata (7.1 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.8.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.7 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.1-py3-none-any.whl.metadata (2.8 kB)
Collecting sentence-transformers
  Downloading sentence_transformers-3.1.1-py3-none-any.whl.metadata (10 kB)
Collecting langchain-core<0.4.0,>=0.3.6 (from langchain)
  Downloading langchain_core-0.3.6-py3-none-any.whl.metadata (6.3 kB)
Collecting langchain-text-splitters<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_text_splitters-0.3.0-py3-none-any.whl.metadata (2.3 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.128-py3-none-any.whl.metadata (13 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from lan

### Set Google Gemini API key to evironment

In [None]:

import getpass
import os
if 'GOOGLE_API_KEY' not in os.environ:
    os.environ['GOOGLE_API_KEY'] = getpass.getpass('Provide your Google API Key: ')

Provide your Google API Key: ··········


### Load various models available by Google Gemini

In [None]:
import google.generativeai as genai
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
for model in genai.list_models():
    print(model.name)

models/chat-bison-001
models/text-bison-001
models/embedding-gecko-001
models/gemini-1.0-pro-latest
models/gemini-1.0-pro
models/gemini-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-pro-exp-0801
models/gemini-1.5-pro-exp-0827
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash-001-tuning
models/gemini-1.5-flash
models/gemini-1.5-flash-exp-0827
models/gemini-1.5-flash-8b-exp-0827
models/gemini-1.5-flash-8b-exp-0924
models/gemini-1.5-flash-002
models/embedding-001
models/text-embedding-004
models/aqa


### Creating some sample docuemnts

In [None]:
# Creating sample documents
import os

folder_path = './docs'
os.makedirs(folder_path, exist_ok=True)

documents_data = [
    {
        'source': 'quantum_computing.txt',
        'content': '''Quantum computing is rapidly advancing, with significant breakthroughs in error correction, qubit coherence, and quantum algorithms. One of the key challenges facing quantum computing is maintaining qubit stability, as even slight environmental disturbances can cause qubits to lose coherence. However, recent advancements in error correction techniques, such as surface codes, are helping to mitigate this issue.

In 2023, researchers achieved quantum supremacy for a specialized task, demonstrating that a quantum computer could perform a calculation faster than the most powerful classical computers. This breakthrough paves the way for practical quantum computing applications in areas such as cryptography, drug discovery, and materials science.

Another promising development is the improvement in quantum hardware. New approaches to building qubits, including topological qubits and photonic qubits, are being explored to increase the scalability and reliability of quantum systems. As the field progresses, the race to build the first fully functional quantum computer continues, with tech giants and research institutions heavily investing in quantum technology.
'''
    },
    {
        'source': 'ai_impact_on_society.txt',
        'content': '''Artificial intelligence (AI) is transforming various industries, from healthcare to finance, by automating tasks, optimizing processes, and providing deep insights from data. However, as AI becomes more prevalent, it raises significant ethical and societal concerns.

One key issue is the potential loss of jobs due to automation. While AI can enhance productivity, many worry that automation will replace jobs, particularly in sectors like manufacturing and customer service. This has led to calls for retraining and reskilling programs to help workers adapt to the AI-driven economy.

Another concern is bias in AI systems. Machine learning models are often trained on biased datasets, which can lead to discriminatory outcomes. For example, AI algorithms used in hiring or lending decisions might favor certain groups over others. Researchers are working on developing fair and transparent AI systems to mitigate these issues.

Despite these challenges, AI has the potential to bring immense benefits. In healthcare, AI-powered tools can assist doctors in diagnosing diseases and personalizing treatments. In education, AI can provide tailored learning experiences for students. The key will be to harness AI's potential while addressing its risks through robust regulation and ethical frameworks.
'''
    },
    {
        'source': 'data_privacy.txt',
        'content': '''As AI systems become more integrated into daily life, concerns over data privacy have become paramount. AI models require vast amounts of data to train, and much of this data is personal, ranging from browsing habits to medical records.

One of the primary challenges in ensuring data privacy is that AI models can inadvertently reveal sensitive information. For instance, large language models trained on personal conversations or emails may unintentionally memorize and leak private details. To address this issue, researchers are developing techniques such as differential privacy, which allows AI models to learn from data while protecting individual privacy.

Governments worldwide are enacting legislation to protect citizens' data. The General Data Protection Regulation (GDPR) in Europe is one of the most stringent data privacy laws, requiring companies to obtain explicit consent before collecting personal data. Similarly, the California Consumer Privacy Act (CCPA) gives consumers the right to know what data is being collected about them and to request its deletion.

The future of AI will require a careful balance between innovation and privacy. Companies must prioritize transparency in their data practices and develop AI models that respect user privacy without compromising on performance.
'''
    },
    {
        'source': 'deploying_ai_models.txt',
        'content': '''Deploying AI models in production environments comes with a unique set of challenges. One of the primary hurdles is the scalability of AI models. While many AI models perform well in controlled environments, scaling them to handle real-world data and workloads can be complex. Data pipelines need to be robust, and the infrastructure must be able to support the computational demands of large models.

Another challenge is model explainability. Many AI models, particularly deep learning models, are considered "black boxes," meaning it's difficult to understand how they arrive at their decisions. This lack of transparency is a problem in industries like healthcare and finance, where decisions need to be justified. Researchers are working on developing more interpretable AI models or providing post-hoc explanations for decisions made by complex models.

Data privacy is another concern. AI models often require access to sensitive data, and ensuring that this data is handled securely is crucial. Techniques such as federated learning, where models are trained locally on devices without transferring data to central servers, are being explored to address this issue.

Finally, AI models require constant monitoring and retraining. As new data becomes available, models may drift from their original performance, necessitating retraining to ensure they remain accurate and effective.
'''
    },
    {
        'source': 'ai_in_drug_discovery.txt',
        'content': '''AI is playing a transformative role in drug discovery by accelerating the process of identifying potential drug candidates and optimizing clinical trials. Traditional drug discovery can take years, but AI models are helping to reduce this timeline by analyzing large datasets of chemical compounds, biological data, and patient information.

One of the key applications of AI in drug discovery is virtual screening, where AI models predict the likelihood of certain molecules being effective against a disease target. These models can quickly sift through vast libraries of compounds to identify the most promising candidates for further testing.

AI is also being used in precision medicine. By analyzing patient data, AI can help identify which treatments are most likely to be effective for individual patients, leading to more personalized and effective treatments. This approach has already shown promise in fields such as oncology, where AI models are helping to develop targeted therapies for cancer patients.

The integration of AI into drug discovery is not without challenges. Data quality and availability are critical factors in training accurate AI models. Additionally, regulatory agencies are still adapting to AI-driven approaches, and ensuring that AI tools meet the necessary safety and efficacy standards will be essential.
'''
    }
]

for doc in documents_data:
    with open(os.path.join(folder_path, doc["source"]), 'w') as f:
        f.write(doc['content'])

### Loading documents, creating embeddings and storing in FAISS local store

In [None]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings

# Initialize an empty list to store all document objects
all_documents = []

# Load each document and add to the list
for file_path in [os.path.join(folder_path, document['source']) for document in documents_data]:
    loader = TextLoader(file_path)
    documents = loader.load()

    # Split each document into chunks for efficient retrieval
    text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=100)
    docs = text_splitter.split_documents(documents)

    # Add all the chunks to the list
    all_documents.extend(docs)

# Embedding using HuggingFace
embeddings = HuggingFaceEmbeddings()

# Create a FAISS vector store from the document chunks
vector_store = FAISS.from_documents(all_documents, embeddings)

  embeddings = HuggingFaceEmbeddings()
  embeddings = HuggingFaceEmbeddings()
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]



1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

### Retrieval-Augmented Generation (RAG) pipeline using LangChain's tools

In [None]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_google_genai import ChatGoogleGenerativeAI

prompt_template = """
You are an AI assistant capable of answering questions based on the following documents:

Documents:
{context}

Question: {question}

Answer:
"""

prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

# Use FAISS to retrieve documents based on the query
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 3})

# Initialize the Google Gemini language model
llm = ChatGoogleGenerativeAI(model='gemini-1.5-flash', temperature=0.9)

# Define the RAG chain
rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff",
    return_source_documents=True,
    output_key="answer"
)

### Parsing Input and Output, and answer generation

In [None]:
def query_rag_chain(query):
    # Input parsing and query the RAG pipeline
    result = rag_chain({"query": query})

    # Output formatting: Display the answer and source documents
    answer = result['answer']
    sources = result['source_documents']

    print(f"Answer: {answer}")
    print("\nSources:")
    for i, doc in enumerate(sources):
        print(f"{i+1}. {doc.metadata['source']}: {doc.page_content[:200]}...")

# Example input query
# query = "What are the latest advancements in quantum computing?"
# query = "How does AI affect data privacy, and what are the main concerns? How do regulatory agencies adapt to AI-driven drug discovery approaches?"
query = "How can federated learning address data privacy concerns in AI? How are quantum computers being used in drug discovery and materials science? How is it impacting job markets across different industries?"
query_rag_chain(query)

  result = rag_chain({"query": query})


Answer: Here's what I can tell you based on the provided context:

* **Federated learning and data privacy:**  The provided text explains that federated learning helps address data privacy concerns by training AI models locally on devices without transferring data to central servers. This means sensitive data remains on individual devices, reducing the risk of breaches or misuse.

* **Quantum computers and drug discovery/materials science:** The provided context doesn't offer any information about quantum computers or their applications in drug discovery or materials science. 

* **Impact of AI on job markets:** The provided context doesn't offer information about the impact of AI on job markets across different industries. 


Sources:
1. ./docs/deploying_ai_models.txt: Data privacy is another concern. AI models often require access to sensitive data, and ensuring that this data is handled securely is crucial. Techniques such as federated learning, where models are t...
2. ./docs/data_