# <span style="color: #4daafc">Legal Case Similarity Detection - Extract References</span>
- [Environment](#environment)
- [Load Data](#load-data)
- [Extract references](#extract-references)
- [Create one-hot encoding](#create-one-hot-encoding)
- [Save data and model](#save-data-and-model)

# Environment

In [1]:
from utils.file_utils import load_file, save_file
from utils.df import df_shape
import numpy as np
import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
import re
import pickle

# Load Data

In [2]:
f_path = 'data/sum_trans_data_ar_ap_100.xlsx'
df = load_file(file_name=f_path)

Successfully loaded DataFrame from data/sum_trans_data_ar_ap_100.xlsx


In [3]:
df_shape(df)
display(df.head(10))

Data shape: 100 rows x 8 columns


Unnamed: 0,case_number,procedure_name,case_date,case_link,document_body,document_body_eng_sum,document_body_english_1,document_body_english_2
0,1108/97,"ע""א 1108/97 מרחיב אביב נ. מדינת ישראל",1997-05-11,https://supremedecisions.court.gov.il/Verdicts...,"בבית המשפט העליון בש""פ 97 / 1108 בפני: כבוד הר...",1. Case Number: 97/1108\n\n2. Case Type: Admin...,In the Supreme Court of Israel Case No. 97/110...,
1,4477/00,"ע""א 4477/00 לודמילה וורוביוב נ. היועצ המשפטי ל...",2000-07-06,https://supremedecisions.court.gov.il/Verdicts...,"בבית המשפט העליון בה""נ 4477/00 בפני כבוד נשיא ...",\nCase Number: HCJ 4477/00\n\nCase Type: Admin...,"In the Supreme Court of Israel, HCJ 4477/00 Be...",
2,1890/16,"ע""פ 1890/16",2017-03-09,https://supremedecisions.court.gov.il/Verdicts...,"פסק דין בתיק ע""פ 1890/16 בבית המשפט העליון בשב...",1. Case Number: 1890/16\n\n2. Case Type: Crimi...,"In this case, the Hebrew text is a translation...",
3,7176/04,"ע""פ 7176/04 ירונ תלמי נ. מדינת ישראל",2006-02-02,https://supremedecisions.court.gov.il/Verdicts...,"פסק-דין בתיק ע""פ 7176/04 בבית המשפט העליון בשב...",Case Number: T.P. 2328/01\n\nCase Type: Crimin...,In the case of Appellant Yaron Telmi v. State ...,
4,3766/12,"ע""א 3766/12",2012-06-17,https://supremedecisions.court.gov.il/Verdicts...,"החלטה בתיק ע""א 3766/12 בבית המשפט העליון בירוש...",1. Case Number: A3766/12\n\n2. Case Type: Civi...,The decision in Case A3766/12 at the Supreme C...,
5,8178/12,"ע""א 8178/12 עו""ד צבי סלנט נ. יונתנ גוטליב",2014-11-12,https://supremedecisions.court.gov.il/Verdicts...,"החלטה בתיק ע""א 8178/12 בבית המשפט העליון בשבתו...",1. Case Number: Appeal No. 8178/12\n\n2. Case ...,In the case of Appeal No. 8178/12 at the Supre...,
6,3015/09,"ע""פ 3015/09 מדינת ישראל נ. פואד קדיח",2010-07-20,https://supremedecisions.court.gov.il/Verdicts...,"פסק דין בתיק ע""פ 3015/09 בבית המשפט העליון בשב...","1. Case Number: Criminal Appeal No. 3015/09, C...",The State of Israel appeals the sentence issue...,
7,4272/05,"ע""פ 4272/05 אמיר חג'וג' נ. מדינת ישראל",2006-01-04,https://supremedecisions.court.gov.il/Verdicts...,"פסק-דין בתיק ע""פ 4272/05 בבית המשפט העליון בשב...",1. Case Number: Appeal No. 4272/05\n\n2. Case ...,In the case of Appeal No. 4272/05 at the Supre...,
8,10467/08,"ע""א 10467/08 עומר חג'אזי נ. אדיב עיסא דיאב",2010-11-03,https://supremedecisions.court.gov.il/Verdicts...,"פסק דין בתיק ע""א 10467/08 בבית המשפט העליון בש...",Case Number: Appeal No. 10467/08\n\nCase Type:...,The case of Appeal No. 10467/08 at the Supreme...,he registered a cautionary note or completed ...
9,3330/11,"ע""א 3330/11 אגד אגודה שיתופית לתחבורה בישראל ב...",2011-11-17,https://supremedecisions.court.gov.il/Verdicts...,"פסק דין בתיק ע""א 3330/11 בבית המשפט העליון בשב...",1. Case Number: Appeal No. 3330/11\n\n2. Case ...,In the case of Appeal No. 3330/11 at the Supre...,


# Extract references

Case references are:
- Law number and section/paragraph. <br/>
    *Example*: סעיף 499(א)(1) or 144(ב)
- Court cases numbers - cases that were citated/referenced in the document. <br/>
    *Example*: ע"פ 3091/08
- Amendment to the law <br/>
    *Example*: ת/89 or ת/85א or ת/110

**Note**: citatations especially law references do not follow the same format. Instead of creating a single complex regex, which is pretty hard, we have made multiple patterns to simplify the maintance and readability.

In [4]:
def extract_references(text):
    # Define regex patterns for law sections and court cases
    law_section_generic_pattern = re.compile(r'(\d{1,3}[א-ת]?\([א-ת\d]\)\(?\d?\)?)') # generic pattern
    law_section_single_pattern = re.compile(r'(?:סעיף)\s*(\d{1,3}[א-ת]?(?:\(?[א-ת\d]\)?)*)') # single case pattern 
    law_section_group_pattern = r'(?:סעיפים)\s*(?:\d{1,3}[א-ת]?(?:\(?[א-ת\d]\)?)*)(?:\s*[,|ו-]+\s*(?:\d{1,3}(?:\(?[א-ת\d]\)?)*))*'
    law_amend_pattern = re.compile(r'ת/\d{1,3}[א-ת]?')
    court_case_pattern = re.compile(r'\d{1,5}/\d{2}')

    # Find all matches in the file content
    law_sections = law_section_generic_pattern.findall(text) # find all matches for generic pattern
    law_sections.extend(law_section_single_pattern.findall(text)) # find single law sections (in case something was missed, for example section 123 - regular number)

    # find multiple law sections (same as single just for group) - ensure we didn't miss anything
    matches = re.findall(law_section_group_pattern, text)

    section_pattern = r'(\d{1,3}(?:\(?[א-ת\d]\)?)*)'
    for group in matches:
        found_law_sections = re.findall(section_pattern, group)
        law_sections.extend(found_law_sections)
    
    law_amendments = law_amend_pattern.findall(text)
    court_cases = court_case_pattern.findall(text)

    # Combine and remove duplicates
    references = list(set(law_sections + law_amendments + court_cases))

    return references

In [5]:
sample_case = df.iloc[61]
res = extract_references(sample_case['document_body'])
print(f"Case number: {sample_case["case_number"]}\nReferences/citations:\n{res}")

Case number: 319/21
References/citations:
['345(ב)(3)', 'ת/36', 'ת/82', 'ת/110', '8479/13', '7229/20', '5764/92', 'ת/67א', '5459/09', '4528/18', '319/21', '2246/13', '8430/20', 'ת/65', 'ת/25', '2529/05', '347(ב)', 'ת/68', '34כב', 'ת/24', '402(ב)', '10033/17', '5928/99', '334', '7090/15', 'ת/72', '345(א)(1)', '779/19', '377א(א)(7)', '345(ב)(4)', '6', '149/12', '1130/19', '4117/06', '4454/19', 'ת/85', '5705/20', '374א', 'ת/92', '3132/10', 'ת/67', 'ת/89', '9724/02', '9040/05', '3578/11', 'ת/66', '355', 'ת/85א']


In [6]:
df_w_ref_cases = df.copy()
df_w_ref_cases['legal_refs'] = df_w_ref_cases['document_body'].apply(extract_references)

In [7]:
df_w_ref_cases.head()

Unnamed: 0,case_number,procedure_name,case_date,case_link,document_body,document_body_eng_sum,document_body_english_1,document_body_english_2,legal_refs
0,1108/97,"ע""א 1108/97 מרחיב אביב נ. מדינת ישראל",1997-05-11,https://supremedecisions.court.gov.il/Verdicts...,"בבית המשפט העליון בש""פ 97 / 1108 בפני: כבוד הר...",1. Case Number: 97/1108\n\n2. Case Type: Admin...,In the Supreme Court of Israel Case No. 97/110...,,[]
1,4477/00,"ע""א 4477/00 לודמילה וורוביוב נ. היועצ המשפטי ל...",2000-07-06,https://supremedecisions.court.gov.il/Verdicts...,"בבית המשפט העליון בה""נ 4477/00 בפני כבוד נשיא ...",\nCase Number: HCJ 4477/00\n\nCase Type: Admin...,"In the Supreme Court of Israel, HCJ 4477/00 Be...",,"[1(א), 4477/00]"
2,1890/16,"ע""פ 1890/16",2017-03-09,https://supremedecisions.court.gov.il/Verdicts...,"פסק דין בתיק ע""פ 1890/16 בבית המשפט העליון בשב...",1. Case Number: 1890/16\n\n2. Case Type: Crimi...,"In this case, the Hebrew text is a translation...",,"[9816/09, 1890/16, 345(א)(1), 345(ב)(1), 1555/..."
3,7176/04,"ע""פ 7176/04 ירונ תלמי נ. מדינת ישראל",2006-02-02,https://supremedecisions.court.gov.il/Verdicts...,"פסק-דין בתיק ע""פ 7176/04 בבית המשפט העליון בשב...",Case Number: T.P. 2328/01\n\nCase Type: Crimin...,In the case of Appellant Yaron Telmi v. State ...,,"[61(א)(4), 7(א), 40075/04, 46, 185/87, 7176/04..."
4,3766/12,"ע""א 3766/12",2012-06-17,https://supremedecisions.court.gov.il/Verdicts...,"החלטה בתיק ע""א 3766/12 בבית המשפט העליון בירוש...",1. Case Number: A3766/12\n\n2. Case Type: Civi...,The decision in Case A3766/12 at the Supreme C...,,"[1113/97, 3766/12, 471ג(ד), 8467/06, 5016/00, ..."


# Create one-hot encoding

In [8]:
mlb = MultiLabelBinarizer()
legal_refs_enc = mlb.fit_transform(df_w_ref_cases['legal_refs'])

In [9]:
df_w_ref_cases['legal_refs_sparse_vec'] = list(legal_refs_enc)

In [10]:
df_w_ref_cases.head()

Unnamed: 0,case_number,procedure_name,case_date,case_link,document_body,document_body_eng_sum,document_body_english_1,document_body_english_2,legal_refs,legal_refs_sparse_vec
0,1108/97,"ע""א 1108/97 מרחיב אביב נ. מדינת ישראל",1997-05-11,https://supremedecisions.court.gov.il/Verdicts...,"בבית המשפט העליון בש""פ 97 / 1108 בפני: כבוד הר...",1. Case Number: 97/1108\n\n2. Case Type: Admin...,In the Supreme Court of Israel Case No. 97/110...,,[],"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
1,4477/00,"ע""א 4477/00 לודמילה וורוביוב נ. היועצ המשפטי ל...",2000-07-06,https://supremedecisions.court.gov.il/Verdicts...,"בבית המשפט העליון בה""נ 4477/00 בפני כבוד נשיא ...",\nCase Number: HCJ 4477/00\n\nCase Type: Admin...,"In the Supreme Court of Israel, HCJ 4477/00 Be...",,"[1(א), 4477/00]","[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
2,1890/16,"ע""פ 1890/16",2017-03-09,https://supremedecisions.court.gov.il/Verdicts...,"פסק דין בתיק ע""פ 1890/16 בבית המשפט העליון בשב...",1. Case Number: 1890/16\n\n2. Case Type: Crimi...,"In this case, the Hebrew text is a translation...",,"[9816/09, 1890/16, 345(א)(1), 345(ב)(1), 1555/...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
3,7176/04,"ע""פ 7176/04 ירונ תלמי נ. מדינת ישראל",2006-02-02,https://supremedecisions.court.gov.il/Verdicts...,"פסק-דין בתיק ע""פ 7176/04 בבית המשפט העליון בשב...",Case Number: T.P. 2328/01\n\nCase Type: Crimin...,In the case of Appellant Yaron Telmi v. State ...,,"[61(א)(4), 7(א), 40075/04, 46, 185/87, 7176/04...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
4,3766/12,"ע""א 3766/12",2012-06-17,https://supremedecisions.court.gov.il/Verdicts...,"החלטה בתיק ע""א 3766/12 בבית המשפט העליון בירוש...",1. Case Number: A3766/12\n\n2. Case Type: Civi...,The decision in Case A3766/12 at the Supreme C...,,"[1113/97, 3766/12, 471ג(ד), 8467/06, 5016/00, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


In [11]:
case_example = 61
case_example_name = df_w_ref_cases.iloc[case_example]['case_number']
case_example_sparse_vec = df_w_ref_cases.iloc[case_example]['legal_refs_sparse_vec']
print(f"case id: {case_example}\ncase name: {case_example_name}\ncase name: sparse vector:\n{case_example_sparse_vec}\nlength: {len(case_example_sparse_vec)}")

case id: 61
case name: 319/21
case name: sparse vector:
[0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
 0 0 0 0 1 0 1 1 1 0 1 1 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

In [12]:
# create a new DataFrame with the encoded data
df_legal_refs_enc = pd.DataFrame(legal_refs_enc, columns=mlb.classes_)

In [13]:
# concatenate the new DataFrame with the original DataFrame
df_w_sparse_vec = pd.concat([df_w_ref_cases, df_legal_refs_enc], axis=1)

In [14]:
df_w_sparse_vec.head()

Unnamed: 0,case_number,procedure_name,case_date,case_link,document_body,document_body_eng_sum,document_body_english_1,document_body_english_2,legal_refs,legal_refs_sparse_vec,...,ת/66,ת/67,ת/67א,ת/68,ת/72,ת/82,ת/85,ת/85א,ת/89,ת/92
0,1108/97,"ע""א 1108/97 מרחיב אביב נ. מדינת ישראל",1997-05-11,https://supremedecisions.court.gov.il/Verdicts...,"בבית המשפט העליון בש""פ 97 / 1108 בפני: כבוד הר...",1. Case Number: 97/1108\n\n2. Case Type: Admin...,In the Supreme Court of Israel Case No. 97/110...,,[],"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",...,0,0,0,0,0,0,0,0,0,0
1,4477/00,"ע""א 4477/00 לודמילה וורוביוב נ. היועצ המשפטי ל...",2000-07-06,https://supremedecisions.court.gov.il/Verdicts...,"בבית המשפט העליון בה""נ 4477/00 בפני כבוד נשיא ...",\nCase Number: HCJ 4477/00\n\nCase Type: Admin...,"In the Supreme Court of Israel, HCJ 4477/00 Be...",,"[1(א), 4477/00]","[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",...,0,0,0,0,0,0,0,0,0,0
2,1890/16,"ע""פ 1890/16",2017-03-09,https://supremedecisions.court.gov.il/Verdicts...,"פסק דין בתיק ע""פ 1890/16 בבית המשפט העליון בשב...",1. Case Number: 1890/16\n\n2. Case Type: Crimi...,"In this case, the Hebrew text is a translation...",,"[9816/09, 1890/16, 345(א)(1), 345(ב)(1), 1555/...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",...,0,0,0,0,0,0,0,0,0,0
3,7176/04,"ע""פ 7176/04 ירונ תלמי נ. מדינת ישראל",2006-02-02,https://supremedecisions.court.gov.il/Verdicts...,"פסק-דין בתיק ע""פ 7176/04 בבית המשפט העליון בשב...",Case Number: T.P. 2328/01\n\nCase Type: Crimin...,In the case of Appellant Yaron Telmi v. State ...,,"[61(א)(4), 7(א), 40075/04, 46, 185/87, 7176/04...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",...,0,0,0,0,0,0,0,0,0,0
4,3766/12,"ע""א 3766/12",2012-06-17,https://supremedecisions.court.gov.il/Verdicts...,"החלטה בתיק ע""א 3766/12 בבית המשפט העליון בירוש...",1. Case Number: A3766/12\n\n2. Case Type: Civi...,The decision in Case A3766/12 at the Supreme C...,,"[1113/97, 3766/12, 471ג(ד), 8467/06, 5016/00, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",...,0,0,0,0,0,0,0,0,0,0


In [15]:
df_w_sparse_vec.shape

(100, 505)

# Save Data and Model

### Save updated DataFrame with sparse vectors

In [16]:
f_name_sum_w_ref_cases_sparse_vec = f_path.replace('sum_trans', 'sum_trans_w_refs_sparse_vec')
save_file(df_w_sparse_vec, f_name_sum_w_ref_cases_sparse_vec)

DataFrame successfully saved to data/sum_trans_w_refs_sparse_vec_data_ar_ap_100.xlsx


### Save the one-hot encoding model for future usage

In [17]:
f_path_mlb = 'models/mlb_model_case_refs.pkl'
with open(f_path_mlb, "wb") as f:
    pickle.dump(mlb, f)

print(f"MLB model saved successfully to: {f_path_mlb}")

MLB model saved successfully to: models/mlb_model_case_refs.pkl


### Save sparse vectors

In [18]:
# Save the vectors to a .npy file
f_name_sparse_vectors = 'db/vectors/sparse_vectors.npy'
np.save(f_name_sparse_vectors, legal_refs_enc)