# Legal Assistant Bot

This chatbot retrieves context from a proprietary data source and the web to answer questions about federal laws in the United States of America (USA).  The proprietary datasource is a CSV file of all federal laws and their revision history in the USA.  The web data required to respond to the user's questions is retrieved using the You.com API.  The chatbot is implemented as an agent in using LangChain and LangGraph.

## Install all required packages

In [46]:
%%capture
! pip install langgraph==0.0.59
! pip install pandas==2.2.2
! pip install openai==1.30.3
! pip install langchain==0.2.1
! pip install langchain_community==0.2.1
! pip install langchain_openai==0.1.7
! pip install langchain_text_splitters==0.2.0
! pip install langchain_core==0.2.1
! pip install numpy==1.26.4
! pip install openai==1.30.3
! pip install python-dotenv==1.0.1
! pip install faiss-cpu==1.8.0

In [54]:
import dotenv
dotenv.load_dotenv(".env", override=True)

True

In [48]:
import pandas as pd
import openai
import langchain
import os
from langchain_community.document_loaders.csv_loader import CSVLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.tools.retriever import create_retriever_tool
from langchain_community.tools.you import YouSearchTool
from langchain_community.utilities.you import YouSearchAPIWrapper
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import chat_agent_executor
from langgraph.checkpoint import MemorySaver

## Load in the US Federal Laws dataset and create a vector database representation of this dataset, which will then be converted into a LangChain Retriever and Tool

In [49]:
# Let's take a look at our CSV dataset first
import pandas as pd

# The CSV file can be downloaded from: https://www.nature.com/articles/s41597-023-02758-z#Sec3
df = pd.read_csv("us_laws_dataset.csv")

df.head()

Unnamed: 0,row_number,action,Title,sal_volume,sal_page_start,BillCitation,congress_number,chapter,session_number,pl_no,date_of_passage,secondary_date,dates_conflict,Source,URL,alternate_sal_volume,alternate_sal_page_start,has_alternate_sal_citation
0,1,An Act,To regulate the time and manner of administeri...,1,23.0,,1,1.0,1.0,,1789-06-01,,,HeinOnline,,,,False
1,2,An Act,"For laying a duty on goods, wares, and merchan...",1,24.0,,1,2.0,1.0,,1789-07-04,,,HeinOnline,,,,False
2,3,An Act,Imposing duties on tonnage.,1,27.0,,1,3.0,1.0,,1789-07-20,,,HeinOnline,,,,False
3,4,An Act,For establishing an executive department to be...,1,28.0,,1,4.0,1.0,,1789-07-27,,,HeinOnline,,,,False
4,5,An Act,To regulate the collection of the duties impos...,1,29.0,,1,5.0,1.0,,1789-07-31,,,HeinOnline,,,,False


In [50]:
import openai
import langchain
import os

In [51]:
os.environ["YDC_API_KEY"] = "<Insert your YDC API key here>"
os.environ["OPENAI_API_KEY"] = "<Insert your Open AI API key here>"

In [52]:
from langchain_community.document_loaders.csv_loader import CSVLoader

# The CSV file can be downloaded from: https://www.nature.com/articles/s41597-023-02758-z#Sec3
loader = CSVLoader(file_path = "us_laws_dataset.csv")

data = loader.load()

In [55]:
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

# split the document into chunks, and vectorize these chunks in a FAISS database
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 100)
docs = text_splitter.split_documents(data)
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(documents=docs, embedding=embeddings)

In [58]:
# test out the similarity search
query = "What laws and acts relate to perjury?"
response = db.similarity_search(query, k=10)
# let's look at the first 3 retrieved docs
response[:3]

[Document(page_content='row_number: 1885\naction: An Act\nTitle: In addition to the act, entitled "An act for the prompt settlement of public accounts," and for the punishment of the crime of perjury\nsal_volume: 3\nsal_page_start: 770\nBillCitation: NA\ncongress_number: 17\nchapter: 37\nsession_number: 2\npl_no: NA\ndate_of_passage: 1823-03-01\nsecondary_date: NA\ndates_conflict: NA\nSource: HeinOnline\nURL: NA\nalternate_sal_volume: NA\nalternate_sal_page_start: NA\nhas_alternate_sal_citation: FALSE', metadata={'source': 'us_laws_dataset.csv', 'row': 1884}),
 Document(page_content='row_number: 38724\naction: An Act\nTitle: An act to permit the use of unsworn declarations under penalty of perjury as evidence in Federal proceedings\nsal_volume: 90\nsal_page_start: 2534\nBillCitation: H.R. 15531\ncongress_number: 94\nchapter: NA\nsession_number: 2\npl_no: 94-550\ndate_of_passage: 1976-10-18\nsecondary_date: NA\ndates_conflict: FALSE\nSource: NA\nURL: https://www.govinfo.gov/content/pkg/

In [59]:
from langchain.tools.retriever import create_retriever_tool

# convert this retriver into a tool
db_retriever = db.as_retriever()
db_retriever_tool = create_retriever_tool(
    db_retriever,
    name = "law_dataset_retriever",
    description = "Retrieve relevant context from the US laws dataset."
)

## Instantiating the You.com Tool in Langchain

LangChain provides a wrapper around the You.com API and a You.com Tool.  For more information, please visit: https://python.langchain.com/v0.1/docs/integrations/tools/you/

In [60]:
from langchain_community.tools.you import YouSearchTool
from langchain_community.utilities.you import YouSearchAPIWrapper

api_wrapper = YouSearchAPIWrapper(num_web_results = 10)
ydc_tool = YouSearchTool(api_wrapper=api_wrapper)

In [62]:
# test out the You.com search tool
response = ydc_tool.invoke("Tell me about a recent high-profile case related to antitrust in the USA.")
# let's look at the first 3 results
response[:3]

[Document(page_content='Meijer, Inc. v. Ferring B.V.; Ferring Pharmaceuticals, Inc.; and Aventis Pharmaceuticals [In Re: DDAVP Direct Purchaser Antitrust Litigation] U.S. v. Memphis Board of Realtors', metadata={'url': 'https://www.justice.gov/atr/antitrust-case-filings-alpha', 'thumbnail_url': None, 'title': 'Antitrust Division | Antitrust Case Filings | United States Department of Justice', 'description': 'An official website of the United States government · Official websites use .gov A .gov website belongs to an official government organization in the United States'}),
 Document(page_content="Dentsply International, Inc. v. Antitrust Division of the United States Department of Justice · U.S. v. Freddy Deoliveira · U.S. v. Wilhelm DerMinassian · U.S. v. Eric Descouraux · Leinani Deslandes, Stephanie Turner, et al. v. McDonald's USA, LLC, et al.", metadata={'url': 'https://www.justice.gov/atr/antitrust-case-filings-alpha', 'thumbnail_url': None, 'title': 'Antitrust Division | Antitru

## Instantiate our LLM

In [63]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", temperature=0.5)

## Tying it all together

In [64]:
from langgraph.prebuilt import chat_agent_executor
from langgraph.checkpoint import MemorySaver

# Create a checkpointer to use memory
memory = MemorySaver()
# the vector store representation of the CSV dataset and the You.com Search tool will both be passed as tools to the agent
tools = [db_retriever_tool, ydc_tool]
agent_executor = chat_agent_executor.create_tool_calling_executor(llm, tools, checkpointer=memory)

## Let's try it out!

In [68]:
agent_executor.invoke(input={"messages": "What laws in the US address economic espionage?"}, config={"configurable": {"thread_id": "xyz_789"}})["messages"][-1].content

'Several federal laws in the United States address economic espionage. Here are the key statutes:\n\n1. **Economic Espionage Act of 1996 (EEA)**\n   - **Title 18, U.S. Code, Sections 1831 and 1832**\n   - **Section 1831**: Addresses economic espionage benefiting foreign governments, foreign instrumentalities, or foreign agents.\n   - **Section 1832**: Addresses the theft of trade secrets for economic gain, regardless of the beneficiary.\n   - **Key Provisions**: Criminalizes the theft or misappropriation of trade secrets and provides penalties for those found guilty of such acts.\n   - **Full Text**: [Economic Espionage Act of 1996](https://www.govinfo.gov/content/pkg/STATUTE-110/pdf/STATUTE-110-Pg3488.pdf)\n\n2. **An Act to amend title 18, United States Code, to provide for increased penalties for foreign and economic espionage, and for other purposes**\n   - **Public Law Number**: 112-269\n   - **Bill Citation**: H.R. 6029\n   - **Date of Passage**: January 14, 2013\n   - **Details**

In [69]:
agent_executor.invoke(input={"messages": "What is the most famous US Supreme Court case related to economic espionage?"}, config={"configurable": {"thread_id": "xyz_789"}})["messages"][-1].content

"The most notable U.S. Supreme Court case related to economic espionage is **Totten v. United States** (92 U.S. 105, 1876). This case set significant precedents regarding judicial jurisdiction in espionage cases and the State Secrets Privilege.\n\n### Key Details:\n- **Case Name**: Totten v. United States\n- **Citation**: 92 U.S. 105 (1876)\n- **Decision**: The Supreme Court ruled that certain secret contracts could not be publicly reviewed by courts, thereby precluding judicial review in cases where success depends on the existence of a secret espionage relationship with the government.\n- **Significance**: This case was an important precursor to the court's 1953 decision in United States v. Reynolds, which recognized the State Secrets Privilege. The ruling in Totten was later referenced and its holding expanded in the 2005 case of Tenet v. Doe and General Dynamics Corp. v. United States.\n\n### Further Reading:\n- [Totten v. United States - Wikipedia](https://en.wikipedia.org/wiki/To

In [70]:
agent_executor.invoke(input={"messages": "Based on your knowledge of federal laws in the US pertaining to economic espionage, were there any other laws that could have been applied in this case?"}, config={"configurable": {"thread_id": "xyz_789"}})["messages"][-1].content

'In addition to the principles established in **Totten v. United States**, several federal laws could potentially apply to cases involving economic espionage. Here are some key statutes:\n\n1. **Economic Espionage Act of 1996 (EEA)**\n   - **Title 18, U.S. Code, Sections 1831 and 1832**\n   - **Section 1831**: Addresses economic espionage benefiting foreign governments, foreign instrumentalities, or foreign agents.\n   - **Section 1832**: Addresses the theft of trade secrets for economic gain, regardless of the beneficiary.\n   - **Key Provisions**: Criminalizes the theft or misappropriation of trade secrets and provides penalties for those found guilty of such acts.\n\n2. **Espionage Act of 1917**\n   - **Title 18, U.S. Code, Sections 793-798**\n   - **Section 793**: Deals with gathering, transmitting, or losing defense information.\n   - **Section 794**: Addresses espionage activities benefiting foreign governments.\n   - **Section 798**: Concerns the disclosure of classified informa

## Let's create a Python class that encapsulates the code above.  This will enable users to easily create chatbots for new use cases with custom CSV datasets

In [72]:
import secrets

class CSV_QA_Bot:
    def __init__(self, llm: ChatOpenAI, csv_files: list[str], num_web_results_to_fetch: int = 10):
        self._llm = llm
        
        docs = self._load_csv_files(csv_files)
        
        # split the docs into chunks, vectorize the chunks and load them into a vector store
        db = self._create_vector_store(docs)
        
        # create a retriever from the vector store
        self._faiss_retriever = db.as_retriever()
        
        # convert this retriever into a Langchain tool
        self._faiss_retriever_tool = create_retriever_tool(
            self._faiss_retriever,
            name = "custom_dataset_retriever",
            description = "Retrieve relevant context from custom dataset."
        )
        
        # instantiate the YDC search tool in Langchain
        self._ydc_api_wrapper = YouSearchAPIWrapper(num_web_results=num_web_results_to_fetch)
        self._ydc_search_tool = YouSearchTool(api_wrapper=self._ydc_api_wrapper)
        
        
        # create a list of tools that will be supplied to the Langchain agent
        self._tools = [self._faiss_retriever_tool, self._ydc_search_tool]
        
        # Create a checkpointer to use memory
        self._memory = MemorySaver()
        
        # create the agent executor
        self._agent_executor = chat_agent_executor.create_tool_calling_executor(self._llm, self._tools, checkpointer=self._memory)
        
        # generate a thread ID for to keep track of conversation history
        self._thread_id = self._generate_thread_id()

    def _load_csv_files(self, csv_files: list[str]) -> list:
        docs = []
        for file in csv_files:
            data_loader = CSVLoader(file)
            docs.extend(data_loader.load())
        return docs
    
    def _create_vector_store(self, docs: list) -> FAISS:
        text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 100)
        chunked_docs = text_splitter.split_documents(docs)
        embeddings = OpenAIEmbeddings()
        return FAISS.from_documents(documents=chunked_docs, embedding=embeddings)

    def _generate_thread_id(self) -> str:
        thread_id = secrets.token_urlsafe(16)
        return thread_id
    
    def invoke_bot(self, input_str: str) -> str:
        input = {"messages": input_str}
        config = {"configurable": {"thread_id": self._thread_id}}
        output = self._agent_executor.invoke(input=input, config=config)
        return output["messages"][-1].content

## Let's try it out!

In [73]:
llm = ChatOpenAI(model="gpt-4o", temperature=0.5)
conversational_agent = CSV_QA_Bot(llm, csv_files=["us_laws_dataset.csv"])

In [74]:
conversational_agent.invoke_bot("What laws in the USA address insider trading?")

'In the United States, insider trading is addressed through a combination of statutes, regulations, and case law. Here are some key laws and regulations that govern insider trading:\n\n1. **Securities Exchange Act of 1934**:\n   - **Section 10(b)**: Prohibits the use of any "manipulative or deceptive device" in connection with the purchase or sale of any security.\n   - **Rule 10b-5**: Enacted under Section 10(b), this rule specifically prohibits fraud, misrepresentation, and deceit in securities transactions. It is one of the primary tools used to prosecute insider trading.\n\n2. **Insider Trading Sanctions Act of 1984**:\n   - This act increased the penalties for insider trading, including civil fines up to three times the profit gained or loss avoided.\n\n3. **Insider Trading and Securities Fraud Enforcement Act of 1988**:\n   - This law further strengthened the penalties and enforcement mechanisms for insider trading, including increasing the maximum prison sentence and fines.\n\n4

In [75]:
conversational_agent.invoke_bot("What is the most famous US Supreme Court case of insider trading?")

'The most famous U.S. Supreme Court case involving insider trading is **United States v. O\'Hagan** (1997).\n\n### United States v. O\'Hagan (521 U.S. 642)\n#### Background:\n- **Defendant**: James O\'Hagan, a partner at the law firm Dorsey & Whitney.\n- **Facts**: O\'Hagan used confidential information obtained through his firm\'s representation of a company involved in a takeover bid to purchase stock options in the target company. He made substantial profits from these trades.\n\n#### Legal Issue:\n- The primary issue was whether O\'Hagan\'s actions constituted insider trading under the "misappropriation theory."\n\n#### Supreme Court\'s Decision:\n- The Court upheld the "misappropriation theory" of insider trading.\n- **Misappropriation Theory**: This theory asserts that a person commits fraud "in connection with" a securities transaction, and thereby violates Section 10(b) and Rule 10b-5, when he misappropriates confidential information for securities trading purposes, in breach o

In [76]:
conversational_agent.invoke_bot("Based on your knowledge of federal laws in the USA related to insider trading, were there any other laws that could have been applied in this case?")

'In addition to the "misappropriation theory" under Section 10(b) of the Securities Exchange Act of 1934 and Rule 10b-5, which were central to the United States v. O\'Hagan case, several other federal laws and regulations could potentially be applied in cases of insider trading. Here are some key ones:\n\n### 1. Securities Act of 1933\n- **Section 17(a)**: Prohibits fraud in the offer or sale of securities. Although primarily focused on securities offerings, it can be applied to fraudulent activities related to insider trading.\n\n### 2. Securities Exchange Act of 1934\n- **Section 10(b)**: Prohibits the use of any "manipulative or deceptive device" in connection with the purchase or sale of any security.\n- **Rule 10b-5**: Enacted under Section 10(b), this rule specifically prohibits fraud, misrepresentation, and deceit in securities transactions.\n- **Section 14(e)**: Prohibits fraudulent, deceptive, or manipulative acts in connection with any tender offer. This could be relevant if 