<a href="https://colab.research.google.com/github/disha4u/RAG-SS_Instructional_Aide/blob/main/Survey_Answers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
%%capture
!pip install gdown

In [3]:
%%capture
!pip install langchain
!pip install langchain-community
!pip install pypdf #prerequisite for pdfloader in langchain
!pip install lark
!pip install openai #for using openai llm in langchain
!pip install -q -U google-generativeai

In [4]:
#run if you want hugginface embeddings in langchain
%%capture
!pip install --upgrade --quiet  langchain sentence_transformers
!pip install langchain-huggingface

In [5]:
%%capture
!pip install chromadb

In [6]:
!gdown "https://drive.google.com/uc?id=1b4AxN63ZDe_N7Ye9yExuRxLlzKS4Sp23&export=download" -O "websec.pdf"

Downloading...
From: https://drive.google.com/uc?id=1b4AxN63ZDe_N7Ye9yExuRxLlzKS4Sp23&export=download
To: /content/websec.pdf
  0% 0.00/3.44M [00:00<?, ?B/s]100% 3.44M/3.44M [00:00<00:00, 111MB/s]


In [7]:
import pandas as pd
import pathlib
import textwrap
import re
import numpy as np
import time
import google.generativeai as genai
from google.colab import userdata

from langchain.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage
import openai
import os

In [8]:
from abc import ABC, abstractmethod

class chatAPI(ABC):

    @abstractmethod
    def generate_answer(self):
        pass

In [9]:
class gemini(chatAPI):

  def __init__(self,key):
    self.config=genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))
    self.model = genai.GenerativeModel('gemini-pro')

  def generate_answer(self,text):
    try:
      response = self.model.generate_content(text+"answer the question in short")
    except:
      print( "error occured")
      return None
    return response.text


In [10]:
class openai(chatAPI):

  def __init__(self):
    self.client=OpenAI(api_key=os.getenv('API_KEY'))

  def generate_answer(self,text):
    try:
      response = self.client.chat.completions.create(
              model="gpt-3.5-turbo",
              messages=[
                  {"role": "system", "content": ""},
                  {"role": "user", "content": text+"answer the question in short"}
                  ]
              )
    except:
      print( "error occured")
      return None
    return response.choices[0].message.content

In [11]:
class Rag(chatAPI):

  def __init__(self):
    os.environ["OPENAI_API_KEY"] = os.getenv('API_KEY')
    self.qa_chain=None

  def set_rag(self,pdfile="websec.pdf",pagenos=[0,-1],persistdir="chromadb"):
    loader = PyPDFLoader(pdfile)
    pages = loader.load()
    pages=pages[pagenos[0]:pagenos[-1]]
    rsplit=RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=128, separators=['\n\n','\n','\. '])
    docs=rsplit.split_documents(pages)
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    db = Chroma.from_documents(docs, embeddings, persist_directory=persistdir)
    llm=OpenAI(model='gpt-3.5-turbo-instruct',temperature=0)
    template = """Use the following pieces of context if relevant to answer the question at the end.
               {context}
               Question: {question}
               Helpful Answer:"""
    QA_CHAIN_PROMPT = PromptTemplate.from_template(template)
    self.qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=db.as_retriever(search_type="mmr"),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
    )

  def generate_answer(self,text):

    try:
      response = self.qa_chain({"query":text})
    except Exception as e:
      print(f"An error occurred: {e}")
    return response


In [12]:
df=pd.read_excel("Survey_ Module 6 Survey Student Analysis Report_QAfilled.xlsx")

In [13]:
df.head(2)

Unnamed: 0.1,Unnamed: 0,Name,ID,Original Question,Narrowed Down Question,Answer,Topic,Final Version
0,0,Marco Bassaletti,869246,,,,,\n
1,1,Alex Shum,1099949,"In the movie Transformer, the hacker Glen trie...",1. How can you understand Decepticons technolo...,1. To understand Decepticons technology and me...,,\n


In [14]:
df["Original Question"]=df["Original Question"].apply(lambda s: str(s))

In [15]:
df["Answer"]=df["Answer"].apply(lambda s: str(s))

In [16]:
df["Narrowed Down Question"]=df["Narrowed Down Question"].apply(lambda s: str(s))

In [17]:
from dotenv import load_dotenv

load_dotenv('turmerik.env')

True

In [19]:
rag=Rag()
rag.set_rag()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  llm=OpenAI(model='gpt-3.5-turbo-instruct',temperature=0)


In [18]:
res=rag.generate_answer(df["Narrowed Down Question"][1])

  response = self.qa_chain({"query":text})


In [19]:
res['result']

'\n1. Understanding Decepticons technology and methodology requires a deep understanding of Internet Protocol and cryptographic algorithms. It involves balancing multiple concepts and being familiar with various types of encryption and decryption keys. It also requires knowledge of common security vulnerabilities and how to protect against them.\n\n2. To filter Decepticons transmissions from other transmissions, you can use encryption techniques and hash functions. These can help identify and fingerprint data, making it easier to distinguish Decepticons transmissions from others. Additionally, being familiar with common security vulnerabilities can help identify and block Decepticons transmissions.'

In [20]:
res['source_documents']

[Document(metadata={'page': 147, 'source': 'websec.pdf'}, page_content='jargon. Understanding how it fits into the Internet Protocol requires balanc -\ning multiple concepts in your head at once, so thank you for your patience. \nLet’s see how the various types of cryptographic algorithms we have dis -\ncussed are used by TLS.\nThe TLS Handshake\nTLS uses a combination of cryptographic algorithms to efficiently and \nsafely pass information. For speed, most data packets passed over TLS will \nbe encrypted using a symmetric encryption algorithm commonly referred'),
 Document(metadata={'page': 145, 'source': 'websec.pdf'}, page_content='decryption key —the corresponding key required to unscramble the data. The \ninput data and keys are typically encoded as binary data, though the keys \nmay be expressed as strings of text for readability.\nMany encryption algorithms exist, and more continue to be invented \nby mathematicians and security researchers. They can be classified into \na few c

In [20]:
df["rag_answer"]=df["Narrowed Down Question"].apply(lambda s: rag.generate_answer(s)  if pd.notna(s) else None)

  response = self.qa_chain({"query":text})


In [21]:
df.to_excel("Survey_ Module 6 Survey Student Analysis Report_QAfilled.xlsx")

In [None]:
# GOOGLE_API_KEY='**********************'
# genai.configure(api_key=GOOGLE_API_KEY)
# model = genai.GenerativeModel('gemini-pro')

In [None]:
#response = model.generate_content(df["Original Question"][107]+"Summarise the question.")

In [None]:
#response.text
"""
* How do web technologies (TLS, CMS) ensure secure transmission?
* What vulnerabilities remain despite cryptography?
* How can cryptography be exploited?
"""

'* How do web technologies (TLS, CMS) ensure secure transmission?\n\n* What vulnerabilities remain despite cryptography?\n\n* How can cryptography be exploited?'

In [None]:
#response = model.generate_content(response.text)

In [None]:
#response.text
"""
**How do web technologies (TLS, CMS) ensure secure transmission?**
* **TLS (Transport Layer Security)** is a cryptographic protocol that provides secure communication over the Internet. It is used to protect data in transit between two parties, such as a web browser and a web server. TLS works by encrypting the data before it is sent over the network, and then decrypting it when it is received. This ensures that the data cannot be intercepted and read by unauthorized parties.
* **CMS (Content Management System)** is a software application that allows users to create and manage digital content. CMSs typically include features such as a text editor, a media manager, and a template system. CMSs can be used to create websites, blogs, and other types of online content. CMSs typically use TLS to protect data in transit, and they may also use other security features such as access control and encryption at rest.
**What vulnerabilities remain despite cryptography?**
* **Man-in-the-middle attacks:** In a man-in-the-middle attack, an attacker intercepts communications between two parties and impersonates one of them. This allows the attacker to read and modify the data being transmitted. Man-in-the-middle attacks can be difficult to detect, and they can be used to steal sensitive information or compromise security systems.
* **Side-channel attacks:** Side-channel attacks are attacks that exploit information that is leaked from a cryptographic system during its operation. This information can be used to recover the cryptographic keys or to decrypt encrypted data. Side-channel attacks can be difficult to prevent, and they can be used to compromise even the strongest cryptographic algorithms.
* **Zero-day attacks:** Zero-day attacks are attacks that exploit vulnerabilities in software that have not yet been patched. Zero-day attacks can be very effective, and they can be used to compromise even the most secure systems.
**How can cryptography be exploited?**
* **Cryptojacking:** Cryptojacking is a type of attack in which an attacker uses a victim's computer to mine cryptocurrency. Cryptojacking attacks can be carried out through malicious websites, email attachments, or software downloads.
* **Phishing attacks:** Phishing attacks are attempts to trick users into revealing sensitive information, such as their passwords or credit card numbers. Phishing attacks can be carried out through email, text messages, or websites.
* **Ransomware attacks:** Ransomware attacks are attacks in which an attacker encrypts a victim's files and demands a ransom payment in exchange for the decryption key. Ransomware attacks can be very disruptive, and they can cause significant financial losses.
"""

"**How do web technologies (TLS, CMS) ensure secure transmission?**\n\n* **TLS (Transport Layer Security)** is a cryptographic protocol that provides secure communication over the Internet. It is used to protect data in transit between two parties, such as a web browser and a web server. TLS works by encrypting the data before it is sent over the network, and then decrypting it when it is received. This ensures that the data cannot be intercepted and read by unauthorized parties.\n* **CMS (Content Management System)** is a software application that allows users to create and manage digital content. CMSs typically include features such as a text editor, a media manager, and a template system. CMSs can be used to create websites, blogs, and other types of online content. CMSs typically use TLS to protect data in transit, and they may also use other security features such as access control and encryption at rest.\n\n**What vulnerabilities remain despite cryptography?**\n\n* **Man-in-the-m

In [None]:
# for m in genai.list_models():
#   if 'generateContent' in m.supported_generation_methods:
#     print(m.name)

In [None]:
#len(df)
#117

117

In [None]:
# for i in range(118):
#   if i%10==0:
#     print(str(i)+"th interation")

#   if df["Original Question"][i]=="nan" or df["Answer"][i]!="nan":
#     print(i)
#     continue
#   q=model.generate_content(df["Original Question"][i]+"Summarise the question, dont give the answer to it .")
#   df.loc[i,"Narrowed Down Question"]=q.text
#   #print(q.text)
#   #time.sleep(5)
#   a=model.generate_content(q.text+"answer the question in short")
#   #print(a.text)
#   df.loc[i,"Answer"]=a.text
#   #time.sleep(4)
#   #print("---------------------------------------------------")

In [None]:
#df.to_excel("Survey_ Module 6 Survey Student Analysis Report_QAfilled.xlsx")