### To Do

1. Make it easier to visualize jury trials/extract jury trial results and do the same for other variables
2. Make sure context + system prompt fits in context window
3. Add helper function to manually label
4. Add reranking algorithm

Reranking
- https://adasci.org/a-hands-on-guide-to-enhance-rag-with-re-ranking/
- https://techcommunity.microsoft.com/t5/microsoft-developer-community/doing-rag-vector-search-is-not-enough/ba-p/4161073
- https://community.openai.com/t/bad-formats-for-semantic-search-of-rag-implementing-internal-chatbot-for-troubleshooting-an-sdk/848715
- https://learn.microsoft.com/en-us/azure/search/index-similarity-and-scoring
- https://cohere.com/blog/rerank-3
- https://www.reddit.com/r/LocalLLaMA/comments/1d9h2pg/doing_rag_vector_search_is_not_enough/
- https://www.datacamp.com/tutorial/boost-llm-accuracy-retrieval-augmented-generation-rag-reranking
- https://python.langchain.com/v0.2/docs/integrations/retrievers/flashrank-reranker/

RAG
- https://ollama.com/blog/embedding-models
- https://huggingface.co/learn/nlp-course/chapter5/6
- https://docs.mistral.ai/guides/rag/
- https://docs.trychroma.com/guides

In [1]:
import numpy as np
import pandas as pd
import os
from utils.case_directory import CaseDirectory
from utils.case_metadata import CaseMetadata
from extractors.jury_ruling_classifier import JuryRulingClassifier

### Test CaseDirectory 100_random_sample

In [2]:
d = CaseDirectory("100_random_sample")
pd.DataFrame(d.get_metadata_json())

Unnamed: 0,court,title,docket,judges,judge,type,link,status,flags,nature_of_suit,cause,magistrate,metadata_path
0,"Oregon State, Multnomah County, Circuit Court",State of Oregon vs Abed Alkader Fattoum,18CR08309,"[Angel Lopez, Angela Lucero, Benjamin Souede, ...",,Offense Felony,https://www.docketalarm.com/cases/Oregon_State...,Closed,,,,,100_random_sample/Oregon_State_Multnomah_Count...
1,"Texas State, Harris County, 152nd District Court","CASARES, DOUGLAS vs. CASARES, ROSA",202236481,[ROBERT K. SCHAFFER],ROBERT K. SCHAFFER,Motor Vehicle Accident,https://www.docketalarm.com/cases/Texas_State_...,Disposed (Final),,,,,100_random_sample/Texas_State_Harris_County_15...
2,"Texas State, Dallas County, 101st District Court",ANDREW HOWARD vs. JIMMY SIMPSON,DC-19-03669,[STACI WILLIAMS],STACI WILLIAMS,MOTOR VEHICLE ACCIDENT,https://www.docketalarm.com/cases/Texas_State_...,CLOSED,,,,,100_random_sample/Texas_State_Dallas_County_10...
3,"Massachusetts State, Superior Court, Barnstabl...","Sprague, Dawn vs. Corner Cycle Of Cape Cod, Inc.",2072CV00184,,,Torts,https://www.docketalarm.com/cases/Massachusett...,Open,,,,,100_random_sample/Massachusetts_State_Superior...
4,"Texas State, Dallas County, 116th District Court",DESOTO APARTMENTS LTD vs. DALLAS CENTRAL APPRA...,DC-13-11287,[TONYA PARKER],TONYA PARKER,TAX APPRAISAL,https://www.docketalarm.com/cases/Texas_State_...,CLOSED,,,,,100_random_sample/Texas_State_Dallas_County_11...
...,...,...,...,...,...,...,...,...,...,...,...,...,...
102,"Connecticut State, Superior Court","MACCHIAROLI, ANTHONY v. OFFICE OF CLAIMS COMIS...",LLI-CV23-5015421-S,[ANDREW RORABACK],ANDREW RORABACK,M90 - Misc - All other,https://www.docketalarm.com/cases/Connecticut_...,,,,,,100_random_sample/Connecticut_State_Superior_C...
103,"Connecticut State, Superior Court","DAO, THAO v. ST. LAURENT, GERALD",HHD-CV17-6080452-S,"[CONSTANCE EPSTEIN, DAVID SHERIDAN, SUSAN COBB]",,V01 - Vehicular - Motor Vehicles - Driver and/...,https://www.docketalarm.com/cases/Connecticut_...,,,,,,100_random_sample/Connecticut_State_Superior_C...
104,"Connecticut State, Superior Court","RENFORS, TARYN v. BRANFORD MANOR PRESERVATION....",KNL-CV21-6051823-S,[KAREN GOODROW],KAREN GOODROW,T03 - Torts - Defective Premises - Private - O...,https://www.docketalarm.com/cases/Connecticut_...,,,,,,100_random_sample/Connecticut_State_Superior_C...
105,"Connecticut State, Superior Court","ANDINO, GIRELYS Et Al v. KAISER, MAYA Et Al",UWY-CV22-6068059-S,[ROBERT D ANDREA],ROBERT D ANDREA,V01 - Vehicular - Motor Vehicles - Driver and/...,https://www.docketalarm.com/cases/Connecticut_...,,,,,,100_random_sample/Connecticut_State_Superior_C...


### Test CaseDirectory 100_random_fed

In [3]:
f = CaseDirectory("100_random_fed")
print(f.get_proportion_downloaded())
print(f.get_mean_downloaded_per_case())

### Test CaseMetadata

In [2]:
m = CaseMetadata.from_path("100_random_sample/Oregon_State_Multnomah_County_Circuit_Court/18CR08309/metadata.json")

In [3]:
dr = m.get_docket_report()

In [4]:
dr

Unnamed: 0,date,contents,link,link_viewer,title,document_path
0,2021-03-30,Disposition - Reported Created: 03/30/2021 7:3...,,,,
1,2021-03-30,Closed Created: 03/30/2021 2:19 PM,,,,
2,2021-03-30,Judgment - General Eric Dahlin Signed: 03/18/2...,https://www.docketalarm.com/cases/Oregon_State...,https://www.docketalarm.com/cases/Oregon_State...,Judgment - General,100_random_sample/Oregon_State_Multnomah_Count...
3,2021-03-16,CANCELED Call (9:15 AM) Stephen Bushong Judgme...,,,,
4,2021-03-12,Order - Pending Judgment Eric Dahlin Signed: 0...,https://www.docketalarm.com/cases/Oregon_State...,https://www.docketalarm.com/cases/Oregon_State...,Order - Pending Judgment,100_random_sample/Oregon_State_Multnomah_Count...
...,...,...,...,...,...,...
64,2018-02-06,Affidavit - Probable Cause Created: 02/06/2018...,https://www.docketalarm.com/cases/Oregon_State...,https://www.docketalarm.com/cases/Oregon_State...,Affidavit - Probable Cause,100_random_sample/Oregon_State_Multnomah_Count...
65,2018-02-05,Order - Appear Benjamin Souede Signed: 02/05/2...,https://www.docketalarm.com/cases/Oregon_State...,https://www.docketalarm.com/cases/Oregon_State...,Order - Appear,100_random_sample/Oregon_State_Multnomah_Count...
66,2018-02-05,Information Created: 02/06/2018 7:46 AM,https://www.docketalarm.com/cases/Oregon_State...,https://www.docketalarm.com/cases/Oregon_State...,Information,100_random_sample/Oregon_State_Multnomah_Count...
67,2018-02-05,"Arraignment (Judicial Officer: Oden-Orr, Melvi...",,,,


### DF

In [4]:
df = pd.read_csv("labeled_cases.csv")

In [8]:
df[df.trial_type == "jury"].metadata_path.tolist()

['workdata/100_random_sample/New_York_State_Suffolk_County_Supreme_Court/602235---2016/metadata.json',
 'workdata/100_random_sample/Delaware_District_Court/1--21-cv-01238/metadata.json',
 'workdata/100_random_sample/Massachusetts_State_Superior_Court_Essex_County/1777CV00789/metadata.json',
 'workdata/100_random_sample/Connecticut_State_Superior_Court/HHD-CV17-6080452-S/metadata.json',
 'workdata/100_random_sample/Connecticut_State_Superior_Court/UWY-CV22-6068059-S/metadata.json']

### Test JuryRulingClassifier

In [7]:
path = '100_random_sample/Connecticut_State_Superior_Court/HHD-CV17-6080452-S/metadata.json'
classifier = JuryRulingClassifier(path)

In [11]:
classifier.metadata.get_docket_report_contents()

  lambda html: BeautifulSoup(html, features="html.parser").text


['Jury Selection / Trial - Proceeding',
 'ORDER RESULT: Granted 3/28/2019 HON DAVID SHERIDAN',
 'MOTION FOR CONTINUANCE RESULT: Granted 3/28/2019 HON DAVID SHERIDAN',
 'TRIAL MANAGEMENT REPORT',
 'WITHDRAWAL OF ACTION AGAINST PARTICULAR DEFENDANT(S) – CASE REMAINS PENDING Withdrawal as to Gerald St. Laurent only',
 'OFFER OF COMPROMISE',
 'MOTION FOR ORDER Motion to Compel Deposition',
 'NOTICE Notice of Supplemental Compliance',
 'ORDER RESULT: Sustained 12/3/2018 HON SUSAN COBB',
 'OBJECTION TO MOTION objection to motion to compel RESULT: Sustained 12/3/2018 HON SUSAN COBB',
 'OBJECTION TO MOTION FOR DEFAULT',
 'MOTION FOR ORDER OF COMPLIANCE – PB SEC 13-14 (INTERR/PROD – 13-6/13-9)',
 'MOTION FOR DEFAULT AND JUDGMENT PB 17-33 Motion for Default - failure to appear for deposition',
 'MOTION FOR ORDER OF COMPLIANCE – PB SEC 13-14 (INTERR/PROD – 13-6/13-9)',
 'ORDER RESULT: Granted 10/16/2018 HON DAVID SHERIDAN',
 'MOTION FOR CONTINUANCE RESULT: Granted 10/16/2018 HON DAVID SHERIDAN',


In [8]:
classifier.extract()

Extracting from metadata...
- Getting relevant chunks...


  lambda html: BeautifulSoup(html, features="html.parser").text
Number of requested results 8 is greater than number of elements in index 1, updating n_results = 1


- Querying llm...
- Response: {'reasoning': 'The documents describe a Motion for Order of Compliance, which was withdrawn by the Defendant. This does not identify the result of the jury trial.', 'category': 'undetermined'}
Extracting from documents...
- Getting relevant chunks...


  lambda html: BeautifulSoup(html, features="html.parser").text


- Response: {'category': 'undetermined'}


({'category': 'undetermined'},
 {'metadata_response': {'reasoning': 'The documents describe a Motion for Order of Compliance, which was withdrawn by the Defendant. This does not identify the result of the jury trial.',
   'category': 'undetermined'},
  'metadata_context': 'MOTION FOR ORDER OF COMPLIANCE – PB SEC 13-14 (INTERR/PROD – 13-6/13-9) RESULT: Withdrawn 10/4/2018 BY THE DEFENDANT Last Updated:\xa0 Result Information - 10/29/2018',
  'document_response': {'category': 'undetermined'},
  'document_context': 'No relevant documents'})