In [1]:
!pip --quiet install langchain-openai langchain langchainhub openai chromadb tiktoken pypdf tavily-python urllib3==1.26.15
import os
import chromadb

# prefer this is set already in OS level...
os.environ.setdefault('OPENAI_API_KEY', '<put your key here if not in env>')

from langchain_openai import OpenAI, ChatOpenAI, OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders import PyPDFLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
# automatically creating a chroma db, chunking text from parsed PDF, and persisting vectors in memory
loader = PyPDFLoader("https://www.security.ntt/reports/Cyber-Security-Reports-2023-01-01.pdf")
index = VectorstoreIndexCreator(
    # split the documents into chunks
    text_splitter=CharacterTextSplitter(chunk_size=1000, chunk_overlap=0),
    # select which embeddings we want to use
    embedding=OpenAIEmbeddings(),
    # use Chroma as the vectorstore to index and search embeddings
    vectorstore_cls=Chroma
).from_loaders([loader])

client = chromadb.Client()
collection = client.get_collection(name="langchain")
print("Total items in collection:" + str(collection.count()))

Total items in collection:18


In [3]:
question = "How can ChatGPT be used in cyber attacks?"

## Testing run with LLEC chain/ chain_type=stuff

In [4]:
db = Chroma(collection_name="langchain",embedding_function=OpenAIEmbeddings())

retriever = db.as_retriever()

template = """Answer the question based only on the following context, If you don't know the answer say I don't know:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

model = ChatOpenAI()

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

%time chain.invoke(question)

CPU times: user 39.1 ms, sys: 4.48 ms, total: 43.6 ms
Wall time: 26.1 s


'ChatGPT can be used in cyber attacks for social hacking, such as generating text for phishing emails and impersonating specific individuals. It can also be used for developing malware.'

## Testing 4 different chain_types and their execution times.

In [5]:
%time index.query(llm=OpenAI(), question=question, chain_type="map_reduce")

CPU times: user 130 ms, sys: 39.9 ms, total: 170 ms
Wall time: 19.2 s


' ChatGPT could potentially be used for phishing attacks, email scams, and malware development in cyberattacks. However, its developer has banned antisocial use and it has no ability to judge right from wrong, so its use in cyberattacks is still being explored and is not fully understood.'

In [6]:
%time index.query(llm=OpenAI(), question=question, chain_type="stuff")

CPU times: user 43 ms, sys: 4.51 ms, total: 47.5 ms
Wall time: 13.8 s


'ChatGPT can be used in cyber attacks in a variety of ways. It can be used for social hacking, such as generating text for phishing emails. It can also be used to develop malware by storing information about program development. ChatGPT can also be used for targeted attacks and in conjunction with other generative AI to affect the entire internet. However, OpenAI has banned antisocial use of ChatGPT and has taken steps to prevent it from answering questions that could lead to abuse or cyberattacks.'

In [7]:
%time index.query(llm=OpenAI(), question=question, chain_type="refine")

CPU times: user 72.1 ms, sys: 6.42 ms, total: 78.5 ms
Wall time: 45.9 s


"\n\nChatGPT can be used in cyber attacks by generating email drafts that can be used in email phishing attacks, refining drafts to deceive recipients, and developing VBA code embedded in attached Excel macro files. Additionally, it has been shown that other researchers have been able to get ChatGPT to generate VBA code in response to specific questions, demonstrating its potential use in developing malicious code for cyber attacks. This has been further confirmed by practical applications of ChatGPT in cyber attacks, such as the development of email and code attacks by cybercriminals using ChatGPT-powered technology. Furthermore, the potential for ChatGPT to generate AI-generated text that can deceive and manipulate internet users has raised concerns about its use in cyber attacks.\n\nIn addition, ChatGPT's ability to interact with humans and generate text that can easily deceive them makes it a useful tool for social engineering and social hacking attacks. This could involve using Ch

In [8]:
# sometimes this fails for apply_and_parse method... https://github.com/langchain-ai/langchain/issues/12459
%time index.query(llm=OpenAI(), question=question, chain_type="map_rerank")



ValueError: Could not parse output:  ChatGPT can be used in cyberattacks through social engineering and the development of malware. However, its developer, OpenAI, has banned antisocial use and prevented it from answering questions that could lead to abuse. Score: 100