### To do

1. Make it easier to visualize jury trials/extract jury trial results and do the same for other variables
2. Add reranking algorithm
3. Add helper function to manually label

Reranking
- https://adasci.org/a-hands-on-guide-to-enhance-rag-with-re-ranking/
- https://techcommunity.microsoft.com/t5/microsoft-developer-community/doing-rag-vector-search-is-not-enough/ba-p/4161073
- https://community.openai.com/t/bad-formats-for-semantic-search-of-rag-implementing-internal-chatbot-for-troubleshooting-an-sdk/848715
- https://learn.microsoft.com/en-us/azure/search/index-similarity-and-scoring
- https://cohere.com/blog/rerank-3
- https://www.reddit.com/r/LocalLLaMA/comments/1d9h2pg/doing_rag_vector_search_is_not_enough/
- https://www.datacamp.com/tutorial/boost-llm-accuracy-retrieval-augmented-generation-rag-reranking
- https://python.langchain.com/v0.2/docs/integrations/retrievers/flashrank-reranker/

RAG
- https://ollama.com/blog/embedding-models
- https://huggingface.co/learn/nlp-course/chapter5/6
- https://docs.mistral.ai/guides/rag/
- https://docs.trychroma.com/guides

### Code

In [1]:
import numpy as np
import pandas as pd
import os
from utils.case_directory import CaseDirectory
from utils.case_metadata import CaseMetadata
from extractors.jury_ruling_classifier import JuryRulingClassifier

In [2]:
df = pd.read_csv("labeled_cases.csv")
df[df.trial_type == "jury"].metadata_path.tolist()

['workdata/100_random_sample/New_York_State_Suffolk_County_Supreme_Court/602235---2016/metadata.json',
 'workdata/100_random_sample/Delaware_District_Court/1--21-cv-01238/metadata.json',
 'workdata/100_random_sample/Massachusetts_State_Superior_Court_Essex_County/1777CV00789/metadata.json',
 'workdata/100_random_sample/Connecticut_State_Superior_Court/HHD-CV17-6080452-S/metadata.json',
 'workdata/100_random_sample/Connecticut_State_Superior_Court/UWY-CV22-6068059-S/metadata.json']

In [3]:
df[df.trial_type == "bench"].metadata_path.tolist()

['workdata/100_random_sample/Florida_State_Broward_County_Seventeenth_Circuit_Court/CACE15005896/metadata.json',
 'workdata/100_random_sample/New_York_Southern_District_Court/1--05-cv-06677/metadata.json',
 'workdata/100_random_sample/Texas_Northern_District_Court/2--07-cv-00142/metadata.json',
 'workdata/100_random_sample/Massachusetts_District_Court/1--14-cv-14176/metadata.json',
 'workdata/100_random_sample/California_State_Court_of_Appeals_Second_District/B232339/metadata.json',
 'workdata/100_random_sample/North_Carolina_Western_District_Court/2--12-cr-00007/metadata.json',
 'workdata/100_random_sample/California_State_San_Francisco_County_Superior_Court/CGC-05-439929/metadata.json',
 'workdata/100_random_sample/Washington_State_Pierce_County_Superior_Court/09-2-16353-2/metadata.json',
 'workdata/100_random_sample/Illinois_Northern_District_Court/1--21-cv-05336/metadata.json',
 'workdata/100_random_sample/US_Court_of_Appeals_Ninth_Circuit_BAP/22-1214/metadata.json']

In [11]:
path = '100_random_sample/Massachusetts_State_Superior_Court_Essex_County/1777CV00789/metadata.json'
classifier = JuryRulingClassifier(path, language_model="llama3")

In [12]:
classifier.extract()

Extracting from metadata...
- Getting relevant chunks...


  lambda html: BeautifulSoup(html, features="html.parser").text


- Querying llm...
- Response: {'reasoning': "According to the documents, SUMMARY JUDGMENT for Defendant(s) was granted. This shows that the jury ruled in favor of the defendant because the plaintiff's claims were dismissed.", 'category': 'defendant'}


{'reasoning': "According to the documents, SUMMARY JUDGMENT for Defendant(s) was granted. This shows that the jury ruled in favor of the defendant because the plaintiff's claims were dismissed.",
 'category': 'defendant'}

In [14]:
classifier.log["metadata_context"]

"SUMMARY JUDGMENT for Defendant(s), Frank Cousins, Jr. Individually and in his/her capacity Sheriff of the Essex County Sheriff's Department against Plaintiff(s), Cory Mathieson, without statutory costs.It is ORDERED and ADJUDGED: That the plaintiff's claims for violation of the Whistleblower Act (Count I), insofar as the claim is asserted against Sheriff Cousins individually, and as to his claim for violation of the MCRA (Count II), be and hereby are DISMISSED.||Plaintiff, Defendants Cory Mathieson, Essex County Sheriff's Department, Frank Cousins, Jr. Individually and in his/her capacity Sheriff of the Essex County Sheriff's Department's Motion to continue / reschedule an event 11/05/2019 02:00 PM Final Pre-Trial Conference||Docket Note: Court, Judge Charles Barrett, took no action on the parties' Joint Motion to Continue Trial and wants the parties to appear at the Final Trial Conference on 7/19/22 to review the matter and to consider the motion with the parties on the record. Clerk

In [15]:
classifier.metadata.get_docket_report()

  lambda html: BeautifulSoup(html, features="html.parser").text


Unnamed: 0,date,contents,court,link_viewer,link,docket,document_path
0,2022-11-15,"Event Judge: Karp, Hon. JeffreySession: Civil ...",,,,,
1,2022-11-08,"Event Judge: Karp, Hon. JeffreySession: Civil ...",,,,,
2,2022-07-27,"Event Judge: Barrett, Hon. C. WilliamSession: ...",,,,,
3,2022-07-26,"Event Judge: Barrett, Hon. C. WilliamSession: ...",,,,,
4,2022-07-20,Endorsement on Motion to Continue Trial for 60...,"Massachusetts State, Superior Court, Essex County",https://www.docketalarm.com/cases/Massachusett...,https://www.docketalarm.com/cases/Massachusett...,1777CV00789,
...,...,...,...,...,...,...,...
115,2017-05-30,Tickler: DiscoveryStart Date: 05/30/2017Due Da...,,,,,
116,2017-05-30,Tickler: Rule 56 Served ByStart Date: 05/30/20...,,,,,
117,2017-05-30,Tickler: Rule 56 Filed ByStart Date: 05/30/201...,,,,,
118,2017-05-30,Tickler: Final Pre-Trial ConferenceStart Date:...,,,,,
