# Legal Assistant Bot

This chatbot retrieves context from a proprietary data source and the web to answer questions about federal laws in the United States of America (USA).  The proprietary datasource is a CSV file of all federal laws and their revision history in the USA.  The web data required to respond to the user's questions is retrieved using the You.com API.  The chatbot is implemented as an agent in using LangChain and LangGraph.

## Install all required packages

In [1]:
%%capture
! pip install langgraph==0.0.59
! pip install pandas==2.2.2
! pip install openai==1.30.3
! pip install langchain==0.2.1
! pip install langchain_community==0.2.1
! pip install langchain_openai==0.1.7
! pip install langchain_text_splitters==0.2.0
! pip install langchain_core==0.2.1
! pip install numpy==1.26.4
! pip install openai==1.30.3
! pip install python-dotenv==1.0.1
! pip install faiss-cpu==1.8.0

## Load in the US Federal Laws dataset and create a vector database representation of this dataset, which will then be converted into a LangChain Retriever and Tool

In [2]:
# Let's take a look at our CSV dataset first
import pandas as pd

# The CSV file can be downloaded from: https://www.nature.com/articles/s41597-023-02758-z#Sec3
df = pd.read_csv("us_laws_dataset.csv")

df.head()

Unnamed: 0,row_number,action,Title,sal_volume,sal_page_start,BillCitation,congress_number,chapter,session_number,pl_no,date_of_passage,secondary_date,dates_conflict,Source,URL,alternate_sal_volume,alternate_sal_page_start,has_alternate_sal_citation
0,1,An Act,To regulate the time and manner of administeri...,1,23.0,,1,1.0,1.0,,1789-06-01,,,HeinOnline,,,,False
1,2,An Act,"For laying a duty on goods, wares, and merchan...",1,24.0,,1,2.0,1.0,,1789-07-04,,,HeinOnline,,,,False
2,3,An Act,Imposing duties on tonnage.,1,27.0,,1,3.0,1.0,,1789-07-20,,,HeinOnline,,,,False
3,4,An Act,For establishing an executive department to be...,1,28.0,,1,4.0,1.0,,1789-07-27,,,HeinOnline,,,,False
4,5,An Act,To regulate the collection of the duties impos...,1,29.0,,1,5.0,1.0,,1789-07-31,,,HeinOnline,,,,False


In [3]:
import openai
import langchain
import os

In [5]:
# The YDC_API_KEY and OPENAI_API_KEY should be defined in a .env file
# Let's load the API keys in from the .env file
import dotenv
dotenv.load_dotenv(".env", override=True)

True

In [6]:
from langchain_community.document_loaders.csv_loader import CSVLoader

# The CSV file can be downloaded from: https://www.nature.com/articles/s41597-023-02758-z#Sec3
loader = CSVLoader(file_path = "us_laws_dataset.csv")

data = loader.load()

In [7]:
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

# split the document into chunks, and vectorize these chunks in a FAISS database
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 100)
docs = text_splitter.split_documents(data)
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(documents=docs, embedding=embeddings)

In [8]:
# test out the similarity search
query = "What laws and acts relate to perjury?"
response = db.similarity_search(query, k=10)
# let's look at the first 3 retrieved docs
response[:3]

[Document(page_content='row_number: 1885\naction: An Act\nTitle: In addition to the act, entitled "An act for the prompt settlement of public accounts," and for the punishment of the crime of perjury\nsal_volume: 3\nsal_page_start: 770\nBillCitation: NA\ncongress_number: 17\nchapter: 37\nsession_number: 2\npl_no: NA\ndate_of_passage: 1823-03-01\nsecondary_date: NA\ndates_conflict: NA\nSource: HeinOnline\nURL: NA\nalternate_sal_volume: NA\nalternate_sal_page_start: NA\nhas_alternate_sal_citation: FALSE', metadata={'source': 'us_laws_dataset.csv', 'row': 1884}),
 Document(page_content='row_number: 38724\naction: An Act\nTitle: An act to permit the use of unsworn declarations under penalty of perjury as evidence in Federal proceedings\nsal_volume: 90\nsal_page_start: 2534\nBillCitation: H.R. 15531\ncongress_number: 94\nchapter: NA\nsession_number: 2\npl_no: 94-550\ndate_of_passage: 1976-10-18\nsecondary_date: NA\ndates_conflict: FALSE\nSource: NA\nURL: https://www.govinfo.gov/content/pkg/

In [31]:
from langchain.tools.retriever import create_retriever_tool

# convert this retriver into a tool
db_retriever = db.as_retriever()
db_retriever_tool = create_retriever_tool(
    db_retriever,
    name = "law_dataset_retriever",
    description = "Retrieve relevant context from the US laws dataset."
)

## Instantiating the You.com Tool in Langchain

LangChain provides a wrapper around the You.com API and a You.com Tool.  For more information, please visit: https://python.langchain.com/v0.1/docs/integrations/tools/you/

In [32]:
from langchain_community.tools.you import YouSearchTool
from langchain_community.utilities.you import YouSearchAPIWrapper

api_wrapper = YouSearchAPIWrapper(num_web_results = 10)
ydc_tool = YouSearchTool(api_wrapper=api_wrapper)

In [33]:
# test out the You.com search tool
response = ydc_tool.invoke("Tell me about a recent high-profile case related to antitrust in the USA.")
# let's look at the first 3 results
response[:3]

[Document(page_content='Meijer, Inc. v. Ferring B.V.; Ferring Pharmaceuticals, Inc.; and Aventis Pharmaceuticals [In Re: DDAVP Direct Purchaser Antitrust Litigation] U.S. v. Memphis Board of Realtors', metadata={'url': 'https://www.justice.gov/atr/antitrust-case-filings-alpha', 'thumbnail_url': None, 'title': 'Antitrust Division | Antitrust Case Filings | United States Department of Justice', 'description': 'An official website of the United States government · Official websites use .gov A .gov website belongs to an official government organization in the United States'}),
 Document(page_content="Dentsply International, Inc. v. Antitrust Division of the United States Department of Justice · U.S. v. Freddy Deoliveira · U.S. v. Wilhelm DerMinassian · U.S. v. Eric Descouraux · Leinani Deslandes, Stephanie Turner, et al. v. McDonald's USA, LLC, et al.", metadata={'url': 'https://www.justice.gov/atr/antitrust-case-filings-alpha', 'thumbnail_url': None, 'title': 'Antitrust Division | Antitru

## Instantiate our LLM

In [42]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", temperature=0.5)

## Tying it all together

In [40]:
from langgraph.prebuilt import chat_agent_executor
from langgraph.checkpoint import MemorySaver

# Create a checkpointer to use memory
memory = MemorySaver()
# the vector store representation of the CSV dataset and the You.com Search tool will both be passed as tools to the agent
tools = [db_retriever_tool, ydc_tool]
agent_executor = chat_agent_executor.create_tool_calling_executor(llm, tools, checkpointer=memory)

## Let's try it out!

In [41]:
agent_executor.invoke(input={"messages": "What laws in the US address economic espionage?"}, config={"configurable": {"thread_id": "xyz_789"}})["messages"][-1].content

'Several laws in the United States address economic espionage:\n\n1. **Economic Espionage Act of 1996**:\n   - **Public Law Number**: 104-294\n   - **Date of Passage**: October 11, 1996\n   - **Details**: This Act criminalizes the theft or misappropriation of trade secrets with the intent or knowledge that the offense will benefit a foreign government, foreign instrumentality, or foreign agent.\n   - **Source**: [Economic Espionage Act of 1996](https://www.govinfo.gov/content/pkg/STATUTE-110/pdf/STATUTE-110-Pg3488.pdf)\n\n2. **An act to clarify the scope of the Economic Espionage Act of 1996**:\n   - **Public Law Number**: 112-236\n   - **Date of Passage**: December 28, 2012\n   - **Details**: This Act clarifies and expands the scope of the Economic Espionage Act of 1996.\n   - **Source**: [Clarification of the Economic Espionage Act of 1996](https://www.govinfo.gov/content/pkg/STATUTE-126/html/STATUTE-126-Pg1627.htm)\n\n3. **An act to amend title 18, United States Code, to provide for

In [37]:
agent_executor.invoke(input={"messages": "What is the most famous US Supreme Court case related to economic espionage?"}, config={"configurable": {"thread_id": "xyz_789"}})["messages"][-1].content

'The most notable U.S. Supreme Court case related to economic espionage is **Totten v. United States, 92 U.S. 105 (1876)**. This case is significant because it established crucial principles regarding judicial jurisdiction in espionage cases and the State Secrets Privilege.\n\n### Key Points of Totten v. United States:\n- **Background**: The case involved a contract claim against the government by a Civil War spy who sought compensation for espionage services rendered.\n- **Ruling**: The Supreme Court ruled that certain secret contracts cannot be publicly reviewed by courts. The Court stated that public policy forbids the maintenance of any suit in a court of justice if the trial would inevitably lead to the disclosure of matters that the law regards as confidential.\n- **Impact**: The decision set a precedent that was later referenced and expanded in subsequent cases, such as United States v. Reynolds (1953), Tenet v. Doe (2005), and General Dynamics Corp. v. United States, which furt

In [38]:
agent_executor.invoke(input={"messages": "Based on your knowledge of federal laws in the US pertaining to economic espionage, were there any other laws that could have been applied in this case?"}, config={"configurable": {"thread_id": "xyz_789"}})["messages"][-1].content

'In addition to the principles established in **Totten v. United States**, other federal laws pertaining to economic espionage that could have been relevant include:\n\n1. **Economic Espionage Act of 1996 (EEA)**:\n   - **Title 18, U.S. Code, Sections 1831 and 1832**: These sections criminalize the theft or misappropriation of trade secrets, particularly when the theft benefits a foreign government, foreign instrumentality, or foreign agent.\n   - **Section 1831**: Specifically addresses economic espionage intended to benefit foreign entities.\n   - **Section 1832**: Addresses theft of trade secrets for commercial or economic advantage.\n\n2. **Espionage Act of 1917**:\n   - **Title 18, U.S. Code, Sections 793-798**: These sections cover various acts of espionage, including the gathering, transmitting, or losing of defense information.\n   - **Section 793**: Deals with gathering, transmitting, or losing defense information.\n   - **Section 794**: Covers espionage during wartime.\n   - 

## Let's create a Python class that encapsulates the code above.  This will enable users to easily create chatbots for new use cases with custom CSV datasets

In [26]:
import secrets

class CSV_QA_Bot:
    def __init__(self, llm: ChatOpenAI, csv_files: list[str], num_web_results_to_fetch: int = 10):
        self._llm = llm
        
        docs = self._load_csv_files(csv_files)
        
        # split the docs into chunks, vectorize the chunks and load them into a vector store
        db = self._create_vector_store(docs)
        
        # create a retriever from the vector store
        self._faiss_retriever = db.as_retriever()
        
        # convert this retriever into a Langchain tool
        self._faiss_retriever_tool = create_retriever_tool(
            self._faiss_retriever,
            name = "custom_dataset_retriever",
            description = "Retrieve relevant context from custom dataset."
        )
        
        # instantiate the YDC search tool in Langchain
        self._ydc_api_wrapper = YouSearchAPIWrapper(num_web_results=num_web_results_to_fetch)
        self._ydc_search_tool = YouSearchTool(api_wrapper=self._ydc_api_wrapper)
        
        
        # create a list of tools that will be supplied to the Langchain agent
        self._tools = [self._faiss_retriever_tool, self._ydc_search_tool]
        
        # Create a checkpointer to use memory
        self._memory = MemorySaver()
        
        # create the agent executor
        self._agent_executor = chat_agent_executor.create_tool_calling_executor(self._llm, self._tools, checkpointer=self._memory)
        
        # generate a thread ID for to keep track of conversation history
        self._thread_id = self._generate_thread_id()

    def _load_csv_files(self, csv_files: list[str]) -> list:
        docs = []
        for file in csv_files:
            data_loader = CSVLoader(file)
            docs.extend(data_loader.load())
        return docs
    
    def _create_vector_store(self, docs: list) -> FAISS:
        text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 100)
        chunked_docs = text_splitter.split_documents(docs)
        embeddings = OpenAIEmbeddings()
        return FAISS.from_documents(documents=chunked_docs, embedding=embeddings)

    def _generate_thread_id(self) -> str:
        thread_id = secrets.token_urlsafe(16)
        return thread_id
    
    def invoke_bot(self, input_str: str) -> str:
        input = {"messages": input_str}
        config = {"configurable": {"thread_id": self._thread_id}}
        output = self._agent_executor.invoke(input=input, config=config)
        return output["messages"][-1].content

## Let's try it out!

In [27]:
llm = ChatOpenAI(model="gpt-4o", temperature=0.5)
conversational_agent = CSV_QA_Bot(llm, csv_files=["us_laws_dataset.csv"])

In [28]:
conversational_agent.invoke_bot("What laws in the USA address insider trading?")

'In the United States, several laws address insider trading, aiming to ensure fair and transparent securities markets. Here are some key pieces of legislation:\n\n1. **Securities Exchange Act of 1934**:\n   - **Section 10(b)**: This section prohibits any manipulative or deceptive device or contrivance in connection with the purchase or sale of any security. Rule 10b-5, enacted under this section, explicitly prohibits fraud, misrepresentation, and deceit in securities trading.\n   - **Section 16(b)**: This section requires company insiders to return any profits made from the purchase and sale of company stock within a six-month period, known as "short-swing" profits.\n\n2. **Securities Act of 1933**:\n   - This act focuses on the initial sale of securities (primary market) and requires issuers to provide full and fair disclosure of material information, thus preventing fraud in the sale of securities.\n\n3. **Insider Trading Sanctions Act of 1984**:\n   - This act allows the Securities 

In [29]:
conversational_agent.invoke_bot("What is the most famous US Supreme Court case of insider trading?")

'One of the most famous U.S. Supreme Court cases involving insider trading is **United States v. O\'Hagan**, 521 U.S. 642 (1997). This landmark case established the "misappropriation theory" of insider trading.\n\n### United States v. O\'Hagan (1997)\n\n**Background**:\n- James O\'Hagan was a partner at the law firm Dorsey & Whitney, which was representing Grand Met in its attempt to take over Pillsbury. O\'Hagan did not work on this case directly but learned about the takeover attempt.\n- Using this non-public information, O\'Hagan purchased stock options for Pillsbury and profited significantly when the takeover bid was announced and the stock price rose.\n\n**Legal Issue**:\n- The central question was whether O\'Hagan\'s actions constituted securities fraud under Section 10(b) of the Securities Exchange Act of 1934 and SEC Rule 10b-5.\n\n**Supreme Court Decision**:\n- The Court ruled in favor of the United States, holding that O\'Hagan\'s actions violated the securities laws under t

In [30]:
conversational_agent.invoke_bot("Based on your knowledge of federal laws in the USA related to insider trading, were there any other laws that could have been applied in this case?")

'In addition to the laws directly applied in **United States v. O\'Hagan**, several other federal laws and regulations could have been relevant in prosecuting insider trading cases. Here are some additional laws and regulations that could have been applied:\n\n### 1. **Securities Act of 1933**\n   - **Section 17(a)**: This section prohibits fraud in the offer or sale of securities. It is broader than Rule 10b-5 and could potentially be used to address fraudulent activities related to insider trading.\n\n### 2. **Securities Exchange Act of 1934**\n   - **Section 10(b)**: This section, along with Rule 10b-5, was indeed applied in the O\'Hagan case. It prohibits any manipulative or deceptive device in connection with the purchase or sale of any security.\n   - **Section 16(b)**: This section requires insiders to return any profits made from the purchase and sale of company stock within a six-month period, although it is more focused on "short-swing" profits rather than insider trading bas