In [1]:
import os
from langchain.document_loaders import Docx2txtLoader, JSONLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores.faiss import FAISS
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate
from langchain.chains import LLMChain
from dotenv import find_dotenv, load_dotenv
import pandas as pd
import logging
import pprint
from helper import generate_hypothetical_embeddings, f_path, PROMPT_TEMPLATE_HYDE
import re


logging.basicConfig(format="%(asctime)s - %(levelname)s - %(message)s", level=logging.INFO)
logger = logging.getLogger(__name__)


load_dotenv(find_dotenv())
query_memory = []


CHUNK_SIZE = 500
CHUNK_OVERLAP = 250
TEMPERATURE = 1
k = 25

In [2]:

def clean_name(officer_name):
    return re.sub(
        r"(Detective|Officer|Deputy|Captain|[CcPpLl]|Sergeant|Lieutenant|Techn?i?c?i?a?n?|Investigator)\.?\s+",
        "",
        officer_name,
    )

def extract_officer_data(response):
    response = response.split("\n\n")
    officer_data = []
    for line in response:
        officer_dict = {}
        match = re.search(r"Officer Name:\s*(.*)\s*Officer Context:\s*(.*)\s*Officer Role:\s*(.*)", line)
        if match:
            officer_dict["Officer Name"] = match.group(1).strip()
            officer_dict["Officer Context"] = match.group(2).strip()
            officer_dict["Officer Role"] = match.group(3).strip()
            officer_data.append(officer_dict)
    return officer_data


In [3]:
ROLES_PROMPT = """

As an AI assistant, my job is to assign roles to identified law enforcement personnel based on the context of their mention:

Supervising Officer - Homicide Division:
- Manages the homicide division, overseeing all ongoing investigations and operations in the homicide division.
- Allocates resources and personnel strategically to effectively handle homicide cases.
- Reviews, edits, and approves all reports and paperwork prepared within the homicide division.
- Can be called upon to testify in court regarding department policies or the conduct of officers in the homicide vision.

Supervising Officer - Crime Lab Division:
- Oversees the crime lab division, ensuring efficient and accurate analyses of crime scene evidence.
- Coordinates resources and personnel, ensuring the most effective use of lab facilities and equipment.
- Reviews and signs off on scientific reports and other paperwork generated in the crime lab.
- Might testify in court about lab procedures or the conduct of analysts under their supervision.

Supervising Officer - All Other Divisions:
- Supervises investigations or operations in divisions that are not the homicide division or the crime lab division.
- Coordinates the allocation of resources and personnel within divisions that are not the homicide division or crime lab division.
- Reviews and approves all reports and paperwork within a division that is not the homicide division or crime lab division
- May be called upon to testify about division policies or the conduct of officers under their supervision.

Lead Detective:
- Assumes primary responsibility for significant cases, often involving serious crimes.
- Coordinates investigative efforts, assigning tasks to other detectives and law enforcement officers on the case.
- Conducts or oversees key investigative actions, such as interviewing major witnesses or suspects.
- Plays a key role in decision-making processes regarding case direction and strategy.
- Often presents case updates to supervising officers and may testify in court about their findings.

Detective:
- Conducts detailed investigations into crimes, which can include interviewing witnesses and suspects, gathering and analyzing evidence, and preparing detailed reports.
- Collaborates with other detectives, patrol officers, forensic analysts, and other law enforcement personnel to advance investigations.
- May specialize in certain types of crimes (e.g., property crimes, violent crimes, sex crimes).
- Can be called upon to testify in court about their investigative actions and findings.

Interrogator:
- Specializes in conducting in-depth interviews with suspects, often at the police station after an arrest.
- Uses a variety of interrogation techniques to elicit information, confessions, or clarifying details.
- May testify in court about the interrogation process and the statements made by the suspect.

Officer on Scene:
- Typically the first to respond to a crime scene or incident, often a patrol officer.
- Secures the crime scene, assists victims, and may conduct a preliminary investigation.
- Collects initial evidence and takes witness statements at the scene.
- Writes initial incident reports detailing their observations and actions at the scene.

Arresting Officer:
- Identifies, pursues, and apprehends suspects, making official arrests.
- Writes detailed arrest reports outlining the circumstances and justifications for the arrest.
- May testify in court about the arrest, the suspect's behavior, and any statements the suspect made at the time of the arrest.

Transporting Officer:
- Transports suspects or prisoners between locations (e.g., from the scene of arrest to the police station, or from the police station to court).
- Documents any incidents or notable events that occur during transport.

Booking Officer:
- Processes new detainees at the police station.
- Records the personal information of the suspects and the details of the alleged crime.
- May conduct searches and confiscations of personal items.
- Ensures the proper documentation of all actions and movements of detainees.

Patrol Officer (catch-all):
- Conducts regular patrols and responds to emergency calls.
- Enforces traffic laws and issues citations.
- Takes initial crime reports and provides first response to incidents.

Criminalist:
- Specializes in the scientific analysis of specific types of evidence (e.g., DNA, ballistics, trace evidence, digital forensics).
- Conducts tests and examinations using specialized techniques and equipment.
- Writes detailed reports outlining their methods, findings, and conclusions.
- May be called upon to testify as an expert witness in court, explaining their findings and the scientific principles behind their work.

Crime Lab Analyst:
- Analyses various types of evidence gathered from the crime scene, including but not limited to, DNA, fingerprints, blood samples, drug substances, etc.
- Utilizes specialized techniques and equipment for the examination of evidence.
- Prepares detailed reports outlining the findings of their analyses.
- May testify in court as expert witnesses to explain their findings and methodologies.

Crime Scene Investigator:
- Arrives at the crime scene to collect, catalog, and preserve physical evidence.
- Works closely with detectives to understand the case context and identify relevant evidence.
- May specialize in certain types of evidence (e.g., biological samples, fingerprints, digital evidence) or particular crime scenes (e.g., home invasions, vehicle thefts).
- Documents the crime scene thoroughly through photographs, sketches, and detailed written reports.

Informant Handler/Coordinator:
- Manages relationships with informants, maintaining regular contact and ensuring their safety.
- Collects and assesses information provided by informants, using it to inform investigative strategies.
- Shares relevant informant-derived information with other detectives and law enforcement personnel working on the case, while protecting the informant's identity.
- May testify in court, with careful measures taken to protect the informant's identity.

Coroners/Medical Examiners Office:
- Conducts autopsies/post-mortem examinations to determine the cause and manner of death.
- Collects and preserves evidence from the body for further examination.
- Works closely with detectives to provide crucial information about the time and cause of death.
- May be called to testify in court to explain the findings of the autopsy.

Expert (catch-all):
- Could be any professional brought in due to their specialized knowledge, such as psychologists, gang experts, ballistics experts, etc.
- Provides specialized advice and/or services that support the investigation.
- May be called to testify in court as an expert witness.

"""

In [4]:
PROMPT_TEMPLATE_MODEL = PromptTemplate(
    input_variables=["question", "docs", "roles"],
    template="""
    As an AI assistant, my role is to meticulously analyze criminal justice documents and extract information about law enforcement personnel.
  
    Query: {question}

    Documents: {docs}

    Roles: {roles}

    The response will contain:

    1) The name of a law enforcement personnel. The individual's name must be prefixed with one of the following titles to be in law enforcement: 
       Detective, Sergeant, Lieutenant, Captain, Deputy, Officer, Patrol Officer, Criminalist, Technician, Coroner, or Dr. 
       Please prefix the name with "Officer Name: ". 
       For example, "Officer Name: John Smith".

    2) If available, provide an in-depth description of the context of their mention. 
       If the context induces ambiguity regarding the individual's employment in law enforcement, please make this clear in your response.
       Please prefix this information with "Officer Context: ". 

    3) Review the context to discern the role of the officer.
       Please prefix this information with "Officer Role: "
       For example, "Officer Role: Lead Officer"

    Additional guidelines for the AI assistant:
    - Only derive responses from factual information found within the police reports.
    - If the context of an identified person's mention is not clear in the report, provide their name and note that the context is not specified.
    - Do not extract information about victims and witnesses
""",
)

In [5]:
def preprocess_document(file_path, embeddings):
    logger.info(f"Processing Word document: {file_path}")

    loader = Docx2txtLoader(file_path)
    text = loader.load()
    logger.info(f"Text loaded from Word document: {file_path}")

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)
    docs = text_splitter.split_documents(text)

    db = FAISS.from_documents(docs, embeddings)
    return db

In [6]:
from langchain.chat_models import AzureChatOpenAI

def get_response_from_query(db, query, temperature, k, ROLES_PROMPT):
    logger.info("Performing query...")
    
    doc_list = db.similarity_search_with_score(query, k=k)

    docs = sorted(doc_list, key=lambda x: x[1], reverse=True)

    third = len(docs) // 3

    highest_third = docs[:third]
    middle_third = docs[third:2*third]
    lowest_third = docs[2*third:]

    highest_third = sorted(highest_third, key=lambda x: x[1], reverse=True)
    middle_third = sorted(middle_third, key=lambda x: x[1], reverse=True)
    lowest_third = sorted(lowest_third, key=lambda x: x[1], reverse=True)

    docs = highest_third + lowest_third + middle_third

    docs_page_content = " ".join([d[0].page_content for d in docs])

    # llm = ChatOpenAI(model_name="gpt-3.5-turbo-16k")

    DEPLOYMENT_NAME = "ani"

    BASE_URL = "https://wc-model.openai.azure.com/"
    API_KEY = ""

    llm = AzureChatOpenAI(
        openai_api_base=BASE_URL,
        openai_api_version="2023-05-15",
        deployment_name=DEPLOYMENT_NAME,
        openai_api_key=API_KEY,
        openai_api_type="azure",
    )

    prompt = PROMPT_TEMPLATE_MODEL

    chain = LLMChain(llm=llm, prompt=prompt)
    response = chain.run(roles=ROLES_PROMPT, question=query, docs=docs_page_content, temperature=temperature)
    print(response)

    return response

In [7]:
QUERIES = [
"Identify each individual in the transcript, by name, who are directly referred to as officers, sergeants, lieutenants, captains, detectives, homicide officers, and crime lab personnel. Provide the context of their mention, focusing on key events, significant decisions or actions they made, interactions with other individuals, roles or responsibilities they held, noteworthy outcomes or results they achieved, and any significant incidents or episodes they were involved in, if available."
]


# QUERIES = [
#     "Identify individuals, by name, with the specific titles of officers, sergeants, lieutenants, captains, detectives, homicide officers, and crime lab personnel in the transcript. Specifically, provide the context of their mention related to key events in the case, if available.",
#     "List individuals, by name, directly titled as officers, sergeants, lieutenants, captains, detectives, homicide units, and crime lab personnel mentioned in the transcript. Provide the context of their mention in terms of any significant decisions they made or actions they took.",
#     "Locate individuals, by name, directly referred to as officers, sergeants, lieutenants, captains, detectives, homicide units, and crime lab personnel in the transcript. Explain the context of their mention in relation to their interactions with other individuals in the case.",
#     "Highlight individuals, by name, directly titled as officers, sergeants, lieutenants, captains, detectives, homicide units, and crime lab personnel in the transcript. Describe the context of their mention, specifically noting any roles or responsibilities they held in the case.",
#     "Outline individuals, by name, directly identified as officers, sergeants, lieutenants, captains, detectives, homicide units, and crime lab personnel in the transcript. Specify the context of their mention in terms of any noteworthy outcomes or results they achieved.",
#     "Pinpoint individuals, by name, directly labeled as officers, sergeants, lieutenants, captains, detectives, homicide units, and crime lab personnel in the transcript. Provide the context of their mention, particularly emphasizing any significant incidents or episodes they were involved in.",
# ]


def process_query(embeddings):
    iteration_times = 6
    max_retries = 10  

    for file_name in os.listdir(f_path):
        if file_name.endswith(".docx"):
            csv_output_path = os.path.join(f_path, f"{file_name}.csv")
            if os.path.exists(csv_output_path):
                logger.info(f"CSV output for {file_name} already exists. Skipping...")
                continue

            file_path = os.path.join(f_path, file_name)
            output_data = []
            
            for iteration in range(1, iteration_times + 1):  
                db = preprocess_document(file_path, embeddings)
                for query in QUERIES:
                    retries = 0
                    while retries < max_retries:
                        try:
                            officer_data_string = get_response_from_query(db, query, TEMPERATURE, k, ROLES_PROMPT)
                            break  # break out of the while loop if no error occurs
                        except ValueError as e:
                            if "Azure has not provided the response" in str(e):
                                retries += 1
                                logger.warn(f"Retry {retries} for query {query} due to Azure content filter error.")
                            else:
                                raise  # raise any other unexpected error
                        
                    if retries == max_retries:
                        logger.error(f"Max retries reached for query {query}. Skipping...")
                        continue
                    
                    officer_data = extract_officer_data(officer_data_string) 

                    for item in officer_data:
                        item["Query"] = query
                        item["Prompt Template for Hyde"] = PROMPT_TEMPLATE_HYDE
                        item["Prompt Template for Model"] = PROMPT_TEMPLATE_MODEL
                        item["Prompt Template for Roles"] = ROLES_PROMPT
                        item["Chunk Size"] = CHUNK_SIZE
                        item["Chunk Overlap"] = CHUNK_OVERLAP
                        item["Temperature"] = TEMPERATURE
                        item["k"] = k
                        item["hyde"] = "1"
                        item["iteration"] = iteration  
                    output_data.extend(officer_data)

                output_df = pd.DataFrame(output_data)
                output_df.to_csv(csv_output_path, index=False)

In [8]:
def main():
    embeddings = generate_hypothetical_embeddings()
    process_query(embeddings)

if __name__ == "__main__":
    main()

2023-08-14 16:55:53,183 - INFO - Processing Word document: ../../data/convictions/transcripts/iterative\Trial Transcript  ALL (pages 1 - 229).docx


2023-08-14 16:55:53,378 - INFO - Text loaded from Word document: ../../data/convictions/transcripts/iterative\Trial Transcript  ALL (pages 1 - 229).docx
2023-08-14 16:56:02,903 - INFO - Loading faiss with AVX2 support.
2023-08-14 16:56:02,906 - INFO - Could not load library with AVX2 support due to:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'")
2023-08-14 16:56:02,908 - INFO - Loading faiss.
2023-08-14 16:56:02,954 - INFO - Successfully loaded faiss.
2023-08-14 16:56:03,094 - INFO - Performing query...
2023-08-14 16:56:20,902 - INFO - Processing Word document: ../../data/convictions/transcripts/iterative\Trial Transcript  ALL (pages 1 - 229).docx
2023-08-14 16:56:21,009 - INFO - Text loaded from Word document: ../../data/convictions/transcripts/iterative\Trial Transcript  ALL (pages 1 - 229).docx


Officer Name: Detective Debbie Coffee
Officer Context: Detective Debbie Coffee assisted in the investigation of an armed robbery and rape that occurred in the French Quarter on April 6th and 7th. She worked on the case involving Tonette Patterson, Hazel Trimble, and Lionel Johnson. She developed the name of a suspect, Robert Jones, in that case.
Officer Role: Lead Detective


2023-08-14 16:56:27,698 - INFO - Performing query...
  logger.warn(f"Retry {retries} for query {query} due to Azure content filter error.")
2023-08-14 16:56:55,804 - INFO - Performing query...
2023-08-14 16:57:27,344 - INFO - Processing Word document: ../../data/convictions/transcripts/iterative\Trial Transcript  ALL (pages 1 - 229).docx


1) Officer Name: Robert Jones
   Officer Context: Robert Jones is a witness who provided his name and address as a potential alibi witness. He was employed at an unspecified location.
   Officer Role: Witness

2) Officer Name: Tonette Patterson
   Officer Context: Tonette Patterson is a witness who was interviewed by detectives. She gave her name and address during the interview and identified the defendant, Robert Jones, in a photographic lineup.
   Officer Role: Witness

3) Officer Name: Hazel Trimble
   Officer Context: Hazel Trimble is a witness who was interviewed by detectives. She also identified the defendant, Robert Jones, in a photographic lineup.
   Officer Role: Witness

4) Officer Name: Lionel Johnson
   Officer Context: Lionel Johnson is a witness who was interviewed by detectives. He was escorted to an interview room by Detective Coffee and identified the defendant, Robert Jones, in a photographic lineup.
   Officer Role: Witness

5) Officer Name: Detective Cade
   Offic

2023-08-14 16:57:27,561 - INFO - Text loaded from Word document: ../../data/convictions/transcripts/iterative\Trial Transcript  ALL (pages 1 - 229).docx
2023-08-14 16:57:34,873 - INFO - Performing query...
2023-08-14 16:57:43,075 - INFO - Processing Word document: ../../data/convictions/transcripts/iterative\Trial Transcript  ALL (pages 1 - 229).docx
2023-08-14 16:57:43,152 - INFO - Text loaded from Word document: ../../data/convictions/transcripts/iterative\Trial Transcript  ALL (pages 1 - 229).docx


Officer Name: Kevin Baker
Officer Context: Mentioned as a crime scene technician.
Officer Role: Crime Scene Investigator


2023-08-14 16:57:50,214 - INFO - Performing query...
2023-08-14 16:58:01,244 - INFO - Processing Word document: ../../data/convictions/transcripts/iterative\Trial Transcript  ALL (pages 1 - 229).docx
2023-08-14 16:58:01,328 - INFO - Text loaded from Word document: ../../data/convictions/transcripts/iterative\Trial Transcript  ALL (pages 1 - 229).docx


1) Officer Name: Lester Jones
2) Officer Context: Lester Jones is mentioned in relation to a physical lineup. It is stated that he was not part of the lineup viewed by the victims. 
3) Officer Role: Not specified in the context provided.


2023-08-14 16:58:08,560 - INFO - Performing query...
2023-08-14 16:58:21,468 - INFO - Processing Word document: ../../data/convictions/transcripts/iterative\Trial Transcript  ALL (pages 1 - 229).docx


Officer Name: Detective Berne11 Nevil

Officer Context: Detective Berne11 Nevil went to the scene of the crime and processed it. He did not go out to the scene of the crime to process it himself, but instead directed the crime scene technician to do so. He also directed the crime scene technician to look for any hair or marks of Robert Jones on a blouse that was confiscated as evidence.

Officer Role: Detective


2023-08-14 16:58:21,734 - INFO - Text loaded from Word document: ../../data/convictions/transcripts/iterative\Trial Transcript  ALL (pages 1 - 229).docx
2023-08-14 16:58:30,073 - INFO - Performing query...


Officer Name: Robert Johnson
Officer Context: The defendant in the case.
Officer Role: The defendant in the case.

Officer Name: Hazel Trimble
Officer Context: Witness in the case.
Officer Role: Witness

Officer Name: Debbie Coffee
Officer Context: Investigating officer in the rape case
Officer Role: Lead Detective

Officer Name: Tonette Patterson
Officer Context: Victim in the rape case
Officer Role: Victim

Officer Name: Lionel Johnson
Officer Context: Husband of Tonette Patterson
Officer Role: Witness

Officer Name: Sergeant Bernell Nevil, Junior
Officer Context: Testifying police officer
Officer Role: Testifying Officer

Officer Name: Lois Jones
Officer Context: Developed the name of a suspect in the case
Officer Role: Investigating Officer

Officer Name: Johnny Donnels
Officer Context: Police artist who does composite sketching
Officer Role: Composite Sketch Artist

Officer Name: Crimestoppers
Officer Context: Provided information leading to the suspect's name
Officer Role: Anonym