# Document Question Answering

In [1]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQAWithSourcesChain

In [2]:
import os
import textwrap
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

open.api_key = os.environ['OPENAI_API_KEY']

## Initialize ChromaDB

In [3]:
embeddings = OpenAIEmbeddings()
persist_directory = 'chroma/'
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embeddings)
print(vectordb._collection.count())
vectordb.persist()

199


## Create the chain and a helper function

Initialize the chain we will use for question answering.

In [4]:
#llm_model_name = "gpt-3.5-turbo"
llm_model_name = "gpt-4"
llm = ChatOpenAI(model_name=llm_model_name, temperature=0)

qa = RetrievalQAWithSourcesChain.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True
)

def pretty_query (query, qa=qa):
    result = qa(query)
    print(f"{textwrap.fill(result['answer'])}\n\nSource: {result['sources']}")
    

## Ask questions!

Now we can use the chain to ask questions!

In [5]:
pretty_query ("What did the president say about large corporations and the wealthy?")

The president stated that the tax system is not fair and needs to be
fixed. He proposed a 15% minimum tax rate for corporations and closing
loopholes so the very wealthy don't pay a lower tax rate than a
teacher or a firefighter. He also mentioned that 55 Fortune 500
corporations earned $40 billion in profits and paid zero dollars in
federal income tax, which he deemed as not fair. The president also
announced a crackdown on foreign-owned companies that raised prices by
as much as 1,000% during the pandemic and made record profits. He also
mentioned that as Wall Street firms take over more nursing homes,
quality in those homes has gone down and costs have gone up, which he
plans to end. He also stated that capitalism without competition isn't
capitalism, but exploitation, which drives up prices.

Source: data/state_of_the_union.txt


In [6]:
pretty_query ("What does Sarah Silverman allege against OpenAI?")

Sarah Silverman, along with other plaintiffs, alleges that OpenAI has
committed direct copyright infringement, vicarious copyright
infringement, violations of section 1202(b) of the Digital Millennium
Copyright Act, unjust enrichment, violations of the California and
common law unfair competition laws, and negligence. She claims that
OpenAI made unauthorized copies of her book, "The Bedwetter", and
other works during the training process of the OpenAI Language Models.
She asserts that these models are infringing derivative works, made
without the plaintiffs' permission and in violation of their exclusive
rights under the Copyright Act.

Source: data/silverman-openai-complaint.pdf


###  This code seems to be unable to distinguish the Mar-a-Lago indictment from the later indictment.

In [7]:
pretty_query("What does the District of Columbia indictment against Donald Trump allege?  I am not intersted in the Florida indictment.")

The District of Columbia indictment against Donald Trump alleges that
he is waiting for the possibility of a separate indictment stemming
from an investigation into attempts to alter the 2020 presidential
election and interfere with the peaceful transfer of power.

Source: data/doc-3.txt


In [8]:
pretty_query ("Did Richard Robbins and his colleagues reach a conclusion about whether semantic metrics are better than lexical metrics for text generation tasks?")

Yes, Richard Robbins and his colleagues concluded that semantic
metrics are more robust indicators of success at question generation
than lexical metrics.

Source: data/Question_Generation.pdf


In [9]:
pretty_query ("Can Epiq employees use AI tools at work?")

Yes, Epiq employees can use AI tools at work. However, they must
exercise caution and best judgement when using any AI platform to
protect Epiq’s information, clients’ information, intellectual
property or other sensitive data. They are not allowed to use AI
platforms for purposes that are illegal, unethical, or harmful to
Epiq. They must also follow the requirements of the Appropriate Use
policy when using AI. Training on the safe and secure use of AI must
be provided to all Epiq personnel. The raw output and results received
from AI platforms must be quality checked before being used in
reports, work content or other deliverables. Use of public AI
platforms is only permitted where approved and authorised by Senior
Leadership within a business area.

Source: data/AI Usage Policy_4APR2023.pdf


In [10]:
pretty_query ("Who should Epiq employees contact if they want to use an AI tool at work?")

Epiq employees should contact the Compliance team if they want to use
an AI tool at work. All requests for access to public AI platforms
must go to the Compliance team.

Source: data/AI Usage Policy_4APR2023.pdf


In [11]:
pretty_query ("I am an Epiq employee. Can I use Chat GPT at work?")

Yes, Epiq employees can use Chat GPT at work, but there are certain
conditions that must be met. The use of public AI platforms, such as
ChatGPT, is only permitted where approved and authorised by Senior
Leadership (VP or above) within a business area. A formal, documented
acknowledgement of the AI Usage Policy is required before public AI
platforms can be used. All requests for access to public AI platforms
must go to the Compliance team. The request must specify the Epiq
personnel who will be using the public AI platform and the specific
business requirement for doing so. Compliance and Security will review
all requests for access. Information classified as Internal or
Restricted, as set out in the Data Classification policy, is
prohibited from being uploaded on a public AI platform under any
circumstances.

Source: data/AI Usage Policy_4APR2023.pdf


In [12]:
pretty_query ("Everyone uses Chat GPT.  Why doesn't Epiq let me access it?")

Epiq does not allow access to public AI platforms such as ChatGPT
unless it is approved and authorised by Senior Leadership (VP or
above) within a business area. A formal, documented acknowledgement of
the AI Usage Policy is required before public AI platforms can be
used. All requests for access to public AI platforms must go to the
Compliance team. The request must specify the Epiq personnel who will
be using the public AI platform and the specific business requirement
for doing so. Compliance and Security will review all requests for
access.

Source: data/AI Usage Policy_4APR2023.pdf


In [13]:
pretty_query ("Please summarize, the key parts of Epiq's AI usage policy.")

Epiq's AI usage policy outlines the requirements for the safe and
secure use of AI by Epiq personnel. The policy aims to align the use
of AI with Epiq’s existing Security, Compliance, Legal and other
corporate policies, and make employees aware of the potential risks
associated with using AI. Epiq personnel must exercise caution and
best judgement when using any AI platform to protect Epiq’s
information, clients’ information, intellectual property or other
sensitive data. They must not use AI platforms for purposes that are
illegal, unethical, or harmful to Epiq. Any suspected or known
violations of this policy may result in termination of access to
Epiq’s IT network, systems or data, and prosecution under applicable
statutes. The policy also highlights the risks associated with AI use,
including data leakage, privacy breaches, biased/inaccurate
information, and over-reliance on AI output.

Source: data/AI Usage Policy_4APR2023.pdf


In [14]:
pretty_query ("What are the rules at Epiq for the use of private AI tools?")

At Epiq, the use of private AI tools must follow the requirements of
the Appropriate Use policy. All Epiq personnel must be provided with
training on the safe and secure use of AI, covering awareness of the
security, legal and other business related risks of using AI. The raw
output and results received from AI platforms must be quality checked
before being used in reports, work content or other deliverables. Use
of public AI platforms, such as the public instance of ChatGPT, is
only permitted where approved and authorised by Senior Leadership
within a business area. A formal, documented acknowledgement of this
Policy is required before public AI platforms can be used. All
requests for access to public AI platforms must go to the Compliance
team. Information classified as Internal or Restricted is prohibited
from being uploaded on a public AI platform under any circumstances.

Source: data/AI Usage Policy_4APR2023.pdf


In [15]:
pretty_query ("What risks is Epiq concerned about regarding Generative AI?")

Epiq is concerned about several risks regarding Generative AI. These
include data leakage, which could lead to the sharing of restricted or
internal information, commercially sensitive information about Epiq,
intellectual property, confidential client information, and system
information that could be exploited for cyber-attacks. There is also a
risk of privacy breaches, biased or inaccurate information, over-
reliance on AI output, use of confidential or proprietary information,
and explainability issues. If an AI platform is not transparent in its
decision-making process, it may be unclear how a specific outcome was
reached, leading to a lack of trust in the output and potential legal
and ethical risks.

Source: data/AI Usage Policy_4APR2023.pdf


In [16]:
pretty_query ("What should I do if I think that Epiq proprietary information has leaked into a public AI?")

If you suspect that Epiq proprietary information has leaked into a
public AI, you should notify incidentreporting@epiqglobal.com. This
includes any output from an AI platform that contains Restricted or
Internal Epiq information.

Source: data/AI Usage Policy_4APR2023.pdf


In [17]:
pretty_query ("I need to summarize Epiq's AI policy for my team of software developers.  Please provide a summary.")

Epiq's AI usage policy, version 1.0, dated 30th March 2023, outlines
the requirements for the safe and secure use of AI by Epiq personnel.
The policy aims to align the use of AI with Epiq’s existing Security,
Compliance, Legal, and other corporate policies, and to make employees
aware of the potential risks associated with using AI.   Epiq
personnel must exercise caution and best judgement when using any AI
platform to protect Epiq’s information, clients’ information,
intellectual property, or other sensitive data. AI platforms should
not be used for purposes that are illegal, unethical, or harmful to
Epiq. Any output from an AI platform that contains Restricted or
Internal Epiq information should be reported to
incidentreporting@epiqglobal.com.   The policy is owned by Epiq’s
Compliance team and is reviewed at least annually, with input from
CyberSecurity and other teams. Compliance with the policy is
mandatory, but exceptions can be considered under special
circumstances. Violations 

In [18]:
pretty_query ("What are the risks associated with AI use?")

The risks associated with AI use include data leakage, which could
lead to the sharing of restricted or internal information,
commercially sensitive information about the company, intellectual
property, confidential client information, and system information that
could be exploited for cyber-attacks. There is also a risk of privacy
breaches, such as uploading data containing personal information of
third parties, clients or employees. AI platforms may produce biased
or inaccurate information if the initial data source is incorrect or
biased. Over-reliance on AI for task completion, analysis, and content
creation is another risk. AI search query results may contain
unauthorized information, such as confidential, sensitive or
proprietary information belonging to a third party. If an AI platform
is not transparent in its decision-making process, it may be unclear
how a specific outcome was reached, leading to a lack of trust in the
output and potential legal and ethical risks.

Source: da