<a href="https://colab.research.google.com/github/charurathour/Data-science-projects/blob/main/gemini_qa_generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install -q -U google-generativeai

In [None]:
import pathlib
import textwrap

import google.generativeai as genai

# Used to securely store your API key
from google.colab import userdata

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

In [None]:
# Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.
GOOGLE_API_KEY=userdata.get('gemini')

genai.configure(api_key=GOOGLE_API_KEY)

In [None]:
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

models/gemini-pro
models/gemini-pro-vision


In [None]:
model = genai.GenerativeModel('gemini-pro')

In [None]:
!pip install PyPDF2



In [None]:
import PyPDF2

def extract_text_from_pdf(pdf_path):
    with open(pdf_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        num_pages = len(reader.pages)

        text = ''
        for page_num in range(num_pages):
            page = reader.pages[page_num]
            text += page.extract_text()

        return text

# Replace 'your_pdf_file.pdf' with the path to your PDF file
pdf_text = extract_text_from_pdf('/content/Defensive Distillation is Not Robust to Adversarial Examples.pdf')




In [None]:
pdf_text

'Robustness Veriﬁcation of Tree-based Models\nHongge Chen*,1Huan Zhang*,2Si Si3Yang Li3Duane Boning1Cho-Jui Hsieh2,3\n1Department of EECS, MIT\n2Department of Computer Science, UCLA\n3Google Research\nchenhg@mit.edu, huan@huan-zhang.com, sisidaisy@google.com\nliyang@google.com, boning@mtl.mit.edu, chohsieh@cs.ucla.edu\n*Hongge Chen and Huan Zhang contributed equally.\nAbstract\nWe study the robustness veriﬁcation problem for tree based models, including\ndecision trees, random forests (RFs) and gradient boosted decision trees (GBDTs).\nFormal robustness veriﬁcation of decision tree ensembles involves ﬁnding the\nexact minimal adversarial perturbation or a guaranteed lower bound of it. Existing\napproaches ﬁnd the minimal adversarial perturbation by a mixed integer linear\nprogramming (MILP) problem, which takes exponential time so is impractical for\nlarge ensembles. Although this veriﬁcation problem is NP-complete in general,\nwe give a more precise complexity characterization. We sho

In [None]:
combined_text = f"""
    Title/Context/Information:
    {pdf_text}

    Question Generation Prompt:
    Generate 20 unique questions based on the provided information
    """

In [None]:
!pip install python-docx



In [None]:
response = model.generate_content(combined_text)

In [None]:
response.text

'1. What is the main idea behind adversarial demonstration attacks on large language models?\n\n\n2. What is the purpose of using in-context learning (ICL) in natural language processing (NLP) tasks?\n\n\n3. How does the proposed ICL attack method, advICL, differ from traditional text-based adversarial attacks?\n\n\n4. What are the key findings from the experiments conducted using the advICL attack method?\n\n\n5. How does the number of demonstrations used in ICL impact the robustness of the models against adversarial attacks?\n\n\n6. What is the significance of the observed transferability of adversarial demonstrations across different test examples?\n\n\n7. What are some potential applications of the advICL attack method in practical settings?\n\n\n8. How can the results of this study contribute to improving the robustness of ICL against adversarial attacks?\n\n\n9. What are some limitations or potential challenges associated with the advICL attack method?\n\n\n10. What future resear

In [None]:
from docx import Document
doc = Document()


In [None]:
doc.add_paragraph(response.text)
doc.add_page_break()

    # Save the Word document
doc.save('question_doc3.docx')

In [None]:
from docx import Document
def add_responses_to_word(text):
    doc = Document()
    combined_text = f"""
    Title/Context/Information:
    {text}

    Question Generation Prompt:
    Generate 50 unique questions based on the provided information
    """
    # Assuming model.generate_content() generates the responses
    response = model.generate_content(combined_text)  # Replace this with the actual response from your model

    # Add content to the Word document
    doc.add_paragraph(response.text)
    doc.add_page_break()

    # Save the Word document
    doc.save('question_doc1.docx')

# Add responses to the Word document
add_responses_to_word(pdf_text)

In [None]:
%%time
response = model.generate_content(combined_text)

CPU times: user 279 ms, sys: 28.2 ms, total: 307 ms
Wall time: 23.1 s


In [None]:
response.text

"**Questions:**\n\n1. What is the main cause of the water crisis in Bengaluru?\n2. What are the major sources of water for Bengaluru city?\n3. What percentage of the city's population has access to piped water supply?\n4. What is the per capita water availability in Bengaluru?\n5. What are the challenges with regard to water provision in Bengaluru?\n6. What is the main reason for the high water loss in Bengaluru?\n7. What are the strategies for the city's water woes?\n8. What are the benefits of roof-top rainwater harvesting?\n9. What are the challenges and constraints in adopting rooftop rainwater harvesting in Bengaluru?\n10. How can waste water be used to supplement the existing water resources in Bengaluru?\n11. What are the limitations of using treated wastewater?\n12. How can the rejuvenation of lakes help improve water availability in Bengaluru?\n13. What are the challenges associated with lake rejuvenation in Bengaluru?\n14. What is the proposed solution to augment the city's w

In [None]:
%%time
response1 = model.generate_content(combined_text)

CPU times: user 277 ms, sys: 38.7 ms, total: 316 ms
Wall time: 24.2 s


In [None]:
response1.text

"**Question 1:** What is the primary source of water for the city of Bengaluru?\n**Answer:** The primary sources of water for Bengaluru are the Cauvery River and Arkavathi River.\n\n**Question 2:** What percentage of the city's water supply comes from surface water sources?\n**Answer:** Approximately 60% of Bengaluru's water supply comes from surface water sources.\n\n**Question 3:** How much water does BWSSB supply to the city per day?\n**Answer:** BWSSB supplies approximately 950 MLD (million liters per day) of water to Bengaluru.\n\n**Question 4:** What is the estimated water demand for Bengaluru's population?\n**Answer:** The estimated water demand for Bengaluru's population is around 1342 MLD.\n\n**Question 5:** What is the main challenge in providing efficient water distribution in Bengaluru?\n**Answer:** The inadequate infrastructure and high transmission losses (UFW) are the main challenges in providing efficient water distribution.\n\n**Question 6:** What is the average per ca

In [None]:
import PyPDF2

def extract_paragraphs_from_pdf(pdf_path):
    with open(pdf_path, 'rb') as file:
        pdf_reader = PyPDF2.PdfReader(file)
        num_pages = len(pdf_reader.pages)  # Get the number of pages directly
        print(num_pages)
        paragraphs = []
        for page_num in range(num_pages):
            page = pdf_reader.pages[page_num]
            text = page.extract_text()
            # Split text into paragraphs based on a delimiter (e.g., newline)
            paragraphs.extend(text.split('\n\n'))  # Change the delimiter as needed

        return paragraphs

In [None]:
# Replace 'path_to_your_pdf.pdf' with the path to your PDF file
pdf_paragraphs = extract_paragraphs_from_pdf('/content/10-CWRDM-Bengaluru-water.pdf')


10


In [None]:
from docx import Document
for page in pdf_paragraphs:
  combined_text = f"""
    Title/Context/Information:
    {page}

    Question Generation Prompt:
    Generate 50 unique question and answer pairs together based on the provided information
    """
  response = model.generate_content(combined_text)


In [None]:
from docx import Document
def add_responses_to_word(paragraphs):
    doc = Document()
    for idx, page in enumerate(paragraphs, start=1):
        combined_text = f"""Task Description: You are a school teacher who has to prepare questions and their answers for students practice
                    Generate question and answers based on the provided text. The generated questions and answers should cover various aspects such as factual details, inferences, summaries, and implications derived from the text.Below is an example.

                    Example Text:
                    "The discovery of antibiotics revolutionized medicine in the 20th century. Alexander Fleming's accidental discovery of penicillin in 1928 paved the way for the widespread use of antibiotics. These medications have saved countless lives by effectively combating bacterial infections. However, overuse and misuse of antibiotics have led to the development of antibiotic-resistant bacteria, posing a significant challenge to modern medicine."

                    Generated QA:

                    Q:What was the year of Alexander Fleming's accidental discovery of penicillin?
                    A: 1928
                    Q:How did the discovery of antibiotics impact medical treatment in the 20th century?
                    A: The discovery of antibiotics revolutionized medical treatment in the 20th century by introducing a highly effective means of combating bacterial infections. It enabled the successful treatment of previously life-threatening illnesses and significantly reduced mortality rates associated with infections. This breakthrough facilitated the development of various medical procedures, surgeries, and treatments that relied on the ability to control bacterial infections, marking a pivotal advancement in modern medicine.
                    Q: What challenges arose due to the overuse and misuse of antibiotics?
                    A: The overuse and misuse of antibiotics have led to the emergence of antibiotic-resistant bacteria. This phenomenon poses a significant challenge to modern medicine as these resistant strains render many antibiotics ineffective, making infections harder to treat. It creates a scenario where common infections become potentially life-threatening, requiring alternative treatments and intensifying the search for new antibiotics.
                    Q: Can you summarize the significance of antibiotics in modern medicine?
                    A: Antibiotics have fundamentally transformed modern medicine by revolutionizing the treatment of bacterial infections. Their discovery and widespread use have saved countless lives, enabling successful management of infections that were once fatal. Antibiotics have facilitated advancements in medical procedures, surgeries, and treatments, allowing for better healthcare outcomes and improving the overall quality of life.

                    Context/Information:
                    {page}

                    Generated QA:
                    """
        # Assuming model.generate_content() generates the responses
        response = model.generate_content(combined_text)

        # Add content to the Word document
        doc.add_paragraph(response.text)

    # Save the Word document
    doc.save('water_qa_doc.docx')

# Replace 'path_to_your_pdf.pdf' with the path to your PDF file
pdf_paragraphs = extract_paragraphs_from_pdf('/content/WP 307 - Krishna Raj.pdf')

# Add responses to the Word document
add_responses_to_word(pdf_paragraphs)

22


In [None]:
# Replace 'path_to_your_pdf.pdf' with the path to your PDF file
pdf_paragraphs = extract_paragraphs_from_pdf('/content/bangaluru_portrait.pdf')

16


In [None]:
doc = Document()

In [None]:

combined_text = f"""
            You are a school teacher who has to prepare questions for students test from the text given below:

            Title/Context/Information:
            {pdf_paragraphs[10]}

            Question Generation Prompt:
            Generate 10 unique question and answer pairs together based on the provided information
            """

In [None]:
response = model.generate_content(combined_text)

In [None]:
%%time
response = model.generate_content(combined_text)

In [None]:
import os
import PyPDF2

# Replace 'folder_path' with the path to your folder containing PDF documents
folder_path = '/content/data'
# Loop through files in the folder
for filename in os.listdir(folder_path):
    if filename.endswith('.pdf'):  # Check for PDF files
        file_path = os.path.join(folder_path, filename)

        # Open the PDF file
        with open(file_path, 'rb') as file:
            reader = PyPDF2.PdfReader(file)
            num_pages = len(reader.pages)

            text = ''
            for page_num in range(num_pages):
              page = reader.pages[page_num]
              text += page.extract_text()

            combined_text = f"""
                        Title/Context/Information:
                        {pdf_text}

                        Question Generation Prompt:
                        Generate 20 unique questions based on the provided information and start question with Q:
                        """
            response = model.generate_content(combined_text)
            doc = Document()
            doc.add_paragraph(response.text)
            custom_name = f"{filename}_que.docx"
            doc.save(custom_name)


In [None]:
combined_text = f"""Task Description:Generate 20 questions based on the provided text. The generated questions should cover various aspects such as factual details, inferences, summaries, and implications derived from the text.Below is an example.

                    Example Text:
                    "The discovery of antibiotics revolutionized medicine in the 20th century. Alexander Fleming's accidental discovery of penicillin in 1928 paved the way for the widespread use of antibiotics. These medications have saved countless lives by effectively combating bacterial infections. However, overuse and misuse of antibiotics have led to the development of antibiotic-resistant bacteria, posing a significant challenge to modern medicine."

                    Generated Questions:

                    Factual Detail: What was the year of Alexander Fleming's accidental discovery of penicillin?
                    Inference: How did the discovery of antibiotics impact medical treatment in the 20th century?
                    Implication: What challenges arose due to the overuse and misuse of antibiotics?
                    Summary: Can you summarize the significance of antibiotics in modern medicine?

                    Context/Information:
                    It is an open question how to train neural networks so
                    they will be robust to adversarial examples [6]. Defensive distillation [5] was recently proposed as an approach
                    to make feed-forward neural networks robust against adversarial examples.
                    In this short paper, we demonstrate that defensive distillation is not effective. We show that, with a slight modification to a standard attack, one can find adversarial examples on defensively distilled networks. We demonstrate the attack on the MNIST [2] digit recognition task.
                    Distillation prevents existing techniques from finding
                    adversarial examples by increasing the magnitude of the
                    inputs to the softmax layer. This makes an unmodified
                    attack fail. We show that if we artificially reduce the
                    magnitude of the input to the softmax function, and make
                    two other minor changes, the attack succeeds. Our attack
                    achieves successful targeted misclassification on 96.4%
                    of images by changing on average 4.7% of pixels.

                    Generated Questions:
                    """

In [None]:
questions = [
    "What is the main idea of the research paper?",
    "What are adversarial examples, and how are they created?"
]




In [None]:
import PyPDF2

def extract_text_from_pdf(pdf_path):
    with open(pdf_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        num_pages = len(reader.pages)

        text = ''
        for page_num in range(num_pages):
            page = reader.pages[page_num]
            text += page.extract_text()

        return text

# Replace 'your_pdf_file.pdf' with the path to your PDF file
pdf_text = extract_text_from_pdf('/content/Defensive Distillation is Not Robust to Adversarial Examples.pdf')



In [None]:
prompt = f"""Task Description: Generate an answer to the given question based on the provided text.

text:
{pdf_text}

Question: {questions}

Answer:"""



In [None]:
for question in questions:
  prompt = f"""Task Description: Generate an answer to the given question based on the provided text.

                text:
                {pdf_text}

                Question: {question}

                Answer:"""

  answer = model.generate_content(prompt)
  print(answer)

ReadTimeout: ignored

In [None]:
!pip -q install langchain_experimental langchain_core
!pip -q install google-generativeai==0.3.1
!pip -q install google-ai-generativelanguage==0.4.0
!pip -q install langchain-google-genai
!pip -q install "langchain[docarray]"

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch

In [None]:
#@title Setting up the Auth
import os
import google.generativeai as genai
from google.colab import userdata

from IPython.display import display
from IPython.display import Markdown

os.environ["GOOGLE_API_KEY"] = "AIzaSyDAORu8PH3vgsDFoxZAoj13YY9TBasiffM"

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
#GOOGLE_API_KEY=userdata.get('gemini')

#genai.configure(api_key=GOOGLE_API_KEY)

In [None]:
model = ChatGoogleGenerativeAI(model="gemini-pro",
                             temperature=0.7)

In [None]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

vectorstore = DocArrayInMemorySearch.from_texts(
    # mini docs for embedding
    pdf_text,

    embedding=embeddings # passing in the embedder model
)

retriever = vectorstore.as_retriever()

GoogleGenerativeAIError: ignored