# Document Question Answering

In [1]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQAWithSourcesChain

In [2]:
import os
import textwrap
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

open.api_key = os.environ['OPENAI_API_KEY']

## Initialize ChromaDB

In [3]:
embeddings = OpenAIEmbeddings()
persist_directory = 'chroma/'
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embeddings)
print(vectordb._collection.count())
vectordb.persist()

199


## Create the chain and a helper function

Initialize the chain we will use for question answering.

In [4]:
llm_model_name = "gpt-3.5-turbo"
#llm_model_name = "gpt-4"
llm = ChatOpenAI(model_name=llm_model_name, temperature=0)

qa = RetrievalQAWithSourcesChain.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True
)

def pretty_query (query, qa=qa):
    result = qa(query)
    print(f"{textwrap.fill(result['answer'])}\n\nSource: {result['sources']}")
    

## Ask questions!

Now we can use the chain to ask questions!

In [5]:
pretty_query ("What did the president say about large corporations and the wealthy?")

The president mentioned that he wants corporations and the wealthiest
Americans to start paying their fair share of taxes. He proposed a 15%
minimum tax rate for corporations and closing loopholes that allow the
wealthy to pay a lower tax rate than others. He also mentioned that
the previous administration's tax cuts for the wealthy and
corporations led to bigger deficits and a wider wealth gap. The
president's plan aims to grow the economy and lower costs for
families.

Source: data/state_of_the_union.txt


In [6]:
pretty_query ("What does Sarah Silverman allege against OpenAI?")

Sarah Silverman alleges that OpenAI engaged in direct copyright
infringement by making unauthorized copies of her book, The Bedwetter,
as well as the books of Christopher Golden and Richard Kadrey. OpenAI
used these copies to train their OpenAI Language Models without the
permission of the authors. This constitutes a violation of the
authors' exclusive rights under copyright law.

Source: data/silverman-openai-complaint.pdf


###  This code seems to be unable to distinguish the Mar-a-Lago indictment from the later indictment.

In [19]:
pretty_query("What does the District of Columbia indictment against Donald Trump allege?  I am not intersted in the Florida indictment.")

The District of Columbia indictment against Donald Trump alleges new
charges related to documents with classified markings discovered at
Mar-a-Lago. The charges include altering, destroying, mutilating, or
concealing an object; corruptly altering, destroying, mutilating or
concealing a document, record or other object; and willful retention
of national defense information. Trump was previously charged with 37
felony counts, including willful retention of classified documents and
conspiracy to obstruct justice. He has pleaded not guilty. The trial
is set for May 2024 in Fort Pierce, Florida. Carlos De Oliveira, a
Mar-a-Lago property manager and former valet, has also been charged in
the case. Walt Nauta, a former aide to Trump, has pleaded not guilty
to the charges.

Source: data/doc-3.txt


In [8]:
pretty_query ("Did Richard Robbins and his colleagues reach a conclusion about whether semantic metrics are better than lexical metrics for text generation tasks?")

Richard Robbins and his colleagues concluded that semantic metrics are
more robust indicators of success at question generation than lexical
metrics. They found that predictions with high semantic metric scores
but low lexical metric scores still captured the essence of the target
questions, while predictions with high lexical metric scores but low
semantic metric scores did not adequately capture the essence. They
specifically mentioned that METEOR was the best of the lexical
similarity metrics they considered.

Source: data/Question_Generation.pdf


In [9]:
pretty_query ("Can Epiq employees use AI tools at work?")

Epiq employees can use AI tools at work, but they must exercise
caution and best judgment when using any AI platform to protect Epiq's
information and not use AI platforms for illegal, unethical, or
harmful purposes. They must also follow the requirements of the
Appropriate Use policy when using AI. Training on the safe and secure
use of AI must be provided to all Epiq personnel, and the raw output
and results received from AI platforms must be quality checked before
being used. The use of public AI platforms is only permitted with
approval from Senior Leadership, and certain information classified as
Internal or Restricted is prohibited from being uploaded on a public
AI platform.

Source: data/AI Usage Policy_4APR2023.pdf


In [10]:
pretty_query ("Who should Epiq employees contact if they want to use an AI tool at work?")

Epiq employees should contact incidentreporting@epiqglobal.com if they
want to use an AI tool at work.

Source: data/AI Usage Policy_4APR2023.pdf


In [11]:
pretty_query ("I am an Epiq employee. Can I use Chat GPT at work?")

The AI Usage Policy states that the use of public AI platforms, such
as the public instance of ChatGPT, is only permitted with approval and
authorization from Senior Leadership (VP or above) within a business
area. A formal, documented acknowledgement of the policy is required
before using public AI platforms. Additionally, Epiq personnel must
undergo training on the safe and secure use of AI, including awareness
of the risks and potential for misinformation and bias. The Compliance
team must review all requests for access to public AI platforms. It is
important to note that information classified as Internal or
Restricted, as defined in the Data Classification policy, is
prohibited from being uploaded on a public AI platform under any
circumstances.

Source: data/AI Usage Policy_4APR2023.pdf


In [12]:
pretty_query ("Everyone uses Chat GPT.  Why doesn't Epiq let me access it?")

Epiq does not allow access to ChatGPT because it is a public AI
platform and its usage is restricted to approved and authorized
personnel. Access to ChatGPT via the API is billed on the basis of
usage. Additionally, Epiq has control over access to the software and
requires authentication through unique user names and passwords. The
relationship between Epiq and the user is that of licensee/licensor.

Source: data/silverman-openai-complaint.pdf, data/AI Usage Policy_4APR2023.pdf, data/doc-6.txt


In [13]:
pretty_query ("Please summarize, the key parts of Epiq's AI usage policy.")

The key parts of Epiq's AI usage policy include exercising caution and
best judgment when using AI platforms, not using AI platforms for
illegal or unethical purposes, notifying
incidentreporting@epiqglobal.com of any AI output containing
restricted or internal Epiq information, and following the
requirements of the Appropriate Use policy when using AI. The policy
also addresses the risks associated with AI use, such as data leakage,
privacy concerns, biased/inaccurate information, and over-reliance on
AI output. The policy is owned by Epiq's Compliance team and is
reviewed annually. Compliance with the policy is mandatory, and
violations may result in termination of access to Epiq's IT network,
systems, or data. Exceptions to the policy may be considered under
special circumstances. The full policy and additional corporate
policies can be found on One Epiq.

Source: data/AI Usage Policy_4APR2023.pdf


In [14]:
pretty_query ("What are the rules at Epiq for the use of private AI tools?")

The rules at Epiq for the use of private AI tools include providing
training on the safe and secure use of AI to all Epiq personnel,
quality checking and fact-checking the raw output and results received
from AI platforms, obtaining approval from Senior Leadership for the
use of public AI platforms, and not uploading any information
classified as Internal or Restricted on a public AI platform. Epiq
personnel must also exercise caution and best judgment when using any
AI platform, not engage in illegal or unethical activities while
utilizing Epiq owned resources, and follow the requirements of the
Appropriate Use policy when using AI. The risks associated with AI use
include data leakage, privacy breaches, biased or inaccurate
information, and over-reliance on AI output.

Source: data/AI Usage Policy_4APR2023.pdf


In [15]:
pretty_query ("What risks is Epiq concerned about regarding Generative AI?")

Epiq is concerned about the following risks regarding Generative AI:
1. Data leakage: The deliberate, accidental, or malicious sharing of
restricted or internal Epiq information on AI platforms, which could
be accessed by others. 2. Privacy: The risk of uploading data
containing personal information of third parties, clients, or
employees, which would be a breach of contractual or regulatory
requirements. 3. Biased/inaccurate information: If the initial data
source used to train the AI is incorrect or biased, there is a risk
that the results from the AI platform will also contain inaccuracies
and bias. 4. Over-reliance on AI output: As the use of AI increases,
individuals could become overly reliant on AI for task completion,
analysis, and content creation. 5. Use of Confidential or Proprietary
Information: AI search query results may contain unauthorized
information, such as confidential, sensitive, or proprietary
information belonging to a third party, which could result in legal
act

In [16]:
pretty_query ("What should I do if I think that Epiq proprietary information has leaked into a public AI?")

If you think that Epiq proprietary information has leaked into a
public AI, you should notify incidentreporting@epiqglobal.com. It is
important to exercise caution and best judgment when using any AI
platform to protect Epiq's information and not use AI platforms for
purposes that are illegal, unethical, or harmful to Epiq. The use of
public AI platforms is only permitted with approval and authorization
from Senior Leadership within a business area, and information
classified as Internal or Restricted is prohibited from being uploaded
on a public AI platform under any circumstances. There are risks
associated with AI use, including data leakage, privacy breaches,
biased/inaccurate information, and over-reliance on AI output.
Training on the safe and secure use of AI must be provided to all Epiq
personnel, and the raw output and results received from AI platforms
must be quality checked before being used. The use of single tenant
(private) AI environments is permitted for authorized tea

In [17]:
pretty_query ("I need to summarize Epiq's AI policy for my team of software developers.  Please provide a summary.")

Epiq's AI policy requires personnel to exercise caution and best
judgment when using AI platforms to protect Epiq's information and not
engage in any illegal or unethical activities. They must also notify
incidentreporting@epiqglobal.com if they become aware of any AI output
containing restricted or internal Epiq information. The policy is
owned by Epiq's Compliance team and is reviewed annually. Violations
of the policy may result in termination of access to Epiq's IT
network, systems, or data. The policy applies to Epiq personnel,
third-party consultants, contractors, vendors, and any individual or
entity with access to the company's information resources. The policy
does not cover ethical and moral risks associated with AI use.

Source: data/AI Usage Policy_4APR2023.pdf


In [18]:
pretty_query ("What are the risks associated with AI use?")

The risks associated with AI use include inaccurate or incomplete
content, the potential for data leakage, privacy breaches, biased or
inaccurate information, and over-reliance on AI output. It is
important to provide training on the safe and secure use of AI and to
quality check the output before using it in reports or deliverables.
The use of public AI platforms is only permitted with approval from
senior leadership and certain restrictions apply, such as not
uploading internal or restricted information. These risks and
guidelines are outlined in the AI Usage Policy of Epiq.

Source: data/AI Usage Policy_4APR2023.pdf
