# 월간 데이콘 법원 판결 예측 AI 경진대회

Competition 링크

- https://dacon.io/competitions/official/236112/overview/description 

해당 Competition Dataset의 출처

- [JUSTICE: A Benchmark Dataset for Supreme Court's Judgment Prediction](https://arxiv.org/abs/2112.03414)
    - https://arxiv.org/pdf/2112.03414.pdf

## Load Data

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

In [2]:
import os
from pathlib import Path

path = Path(os.getcwd() + '/datasets')

In [3]:
train = pd.read_csv(path / 'train.csv')
test = pd.read_csv(path / 'test.csv')
submission = pd.read_csv(path / 'sample_submission.csv')

In [7]:
# 전체 텍스트 보이도록 colwidth 조정
pd.set_option('display.max_colwidth', None)

In [8]:
train.head()

Unnamed: 0,ID,first_party,second_party,facts,first_party_winner
0,TRAIN_0000,Phil A. St. Amant,Herman A. Thompson,"On June 27, 1962, Phil St. Amant, a candidate for public office, made a television speech in Baton Rouge, Louisiana. During this speech, St. Amant accused his political opponent of being a Communist and of being involved in criminal activities with the head of the local Teamsters Union. Finally, St. Amant implicated Herman Thompson, an East Baton Rouge deputy sheriff, in a scheme to move money between the Teamsters Union and St. Amant’s political opponent. \nThompson successfully sued St. Amant for defamation. Louisiana’s First Circuit Court of Appeals reversed, holding that Thompson did not show St. Amant acted with “malice.” Thompson then appealed to the Supreme Court of Louisiana. That court held that, although public figures forfeit some of their First Amendment protection from defamation, St. Amant accused Thompson of a crime with utter disregard of whether the remarks were true. Finally, that court held that the First Amendment protects uninhibited, robust debate, rather than an open season to shoot down the good name of anyone who happens to be a public servant. \n",1
1,TRAIN_0001,Stephen Duncan,Lawrence Owens,"Ramon Nelson was riding his bike when he suffered a lethal blow to the back of his head with a baseball bat. After two eyewitnesses identified Lawrence Owens from an array of photos and then a lineup, he was tried and convicted for Nelson’s death. Because Nelson was carrying cocaine and crack cocaine potentially for distribution, the judge at Owens’ bench trial ruled that Owens was probably also a drug dealer and was trying to “knock [Nelson] off.” Owens was found guilty of first-degree murder and sentenced to 25 years in prison.\nOwens filed a petition for a writ of habeas corpus on the grounds that his constitutional right to due process was violated during the trial. He argued that the eyewitness identification should have been inadmissible based on unreliability and that the judge impermissibly inferred a motive when a motive was not an element of the offense. The district court denied the writ of habeas corpus, and Owens appealed. The U.S. Court of Appeals for the Seventh Circuit reversed the denial and held that the trial judge’s inference about Owens’s motive violated his right to have his guilt adjudicated solely based on the evidence presented at trial.\n",0
2,TRAIN_0002,Billy Joe Magwood,"Tony Patterson, Warden, et al.","An Alabama state court convicted Billy Joe Magwood of murder and sentenced him to death. Subsequently, an Alabama federal district court partially granted Mr. Magwood's petition for federal habeas corpus relief. The court upheld his conviction but instructed the state court to look at mitigating evidence when resentencing Mr. Magwood. Upon resentencing, the state court sentenced Mr. Magwood to death once again. Mr. Magwood filed a second petition for federal habeas corpus relief with the federal district court arguing that a judicial rule was retroactively applied in his case and that he lacked effective counsel at sentencing. The district court granted the petition and vacated Mr. Magwood's death sentence.\nOn appeal, the U.S. Court of Appeals for the Eleventh circuit reversed, holding that prisoners may not raise challenges to an original sentence that could have been raised in an earlier petition. The court also held that Mr. Magwood's counsel was not ineffective because he failed to raise an argument that had already been decided by the state's highest court adverse to his client's position.\n",1
3,TRAIN_0003,Linkletter,Walker,"Victor Linkletter was convicted in state court on evidence illegally obtained by police prior to the Supreme Court decision concerning the Fourth Amendment in Mapp v. Ohio. Mapp applied the exclusionary rule to state criminal proceedings, denying the use of illegally obtained evidence at trial. Linkletter argued for a retrial based on the Mapp decision.\n",0
4,TRAIN_0004,William Earl Fikes,Alabama,"On April 24, 1953 in Selma, Alabama, an intruder broke into the apartment of the daughter of the city mayor. The daughter and the intruder struggled through several rooms until she was able to seize his knife, and he fled. The assailant had a towel over his head, so the victim could not identify the defendant during the trial. The police apprehended William Earl Fikes on the basis of a call from a private citizen and held him “on an open charge of investigation.” The police questioned Fikes for hours, placed him in jail, and limited his access to anyone familiar. After nearly a week of this treatment, Fikes confessed in the form of answers to the interrogator’s leading questions. Five days later, Fikes confessed under questioning a second time. When these confessions were admitted into the trial as evidence, Fikes did not testify regarding the events surrounding his interrogation because the judge had ruled he would be subjected to unlimited cross-examination. The jury convicted Fikes and sentenced him to death. The Supreme Court of Alabama affirmed.\n",1


In [9]:
test.head()

Unnamed: 0,ID,first_party,second_party,facts
0,TEST_0000,Salerno,United States,The 1984 Bail Reform Act allowed the federal courts to detain an arrestee prior to trial if the government could prove that the individual was potentially dangerous to other people in the community. Prosecutors alleged that Salerno and another person in this case were prominent figures in the La Cosa Nostra crime family.\n
1,TEST_0001,Milberg Weiss Bershad Hynes and Lerach,"Lexecon, Inc.","Lexecon Inc. was a defendant in a class action lawsuit. Under 28 USC section 1407(a), the lawsuit was transferred for pretrial proceedings to the District of Arizona. Section 1407(a) authorizes the Judicial Panel on Multidistrict Litigation to transfer civil actions with common issues of fact ""to any district for coordinated or consolidated pretrial proceedings,"" but provides that the Panel ""shall"" remand any such action to the original district ""at or before the conclusion of such pretrial proceedings."" After claims against it were dismissed, Lexecon brought suit against Milberg Weiss Bershad Hynes & Lerach and others (Milberg) in the class action lawsuit in the Northern District of Illinois. Ultimately, the Panel, under section 1407(a), ordered the case transferred to the District of Arizona. Afterwards, Lexecon moved for the Arizona District Court to remand the case to Illinois. Milberg filed a countermotion requesting the Arizona District Court to invoke section 1404(a) to ""transfer"" the case to itself for trial.Ultimately, the court assigned the case to itself and the Court of Appeals affirmed its judgment.\n"
2,TEST_0002,"No. 07-582\t Title: \t Federal Communications Commission, et al.","Fox Television Stations, Inc., et al.","In 2002 and 2003, Fox Television Stations broadcast the Billboard Music Awards, an annual program honoring top-selling musicians. During the broadcasts, one musician used an explicative in his acceptance speech, and a presenter used two expletives. The Federal Communications Commission (FCC), although it had previously taken the position that such fleeting and isolated expletives did not violate its indecency regime, issued notices of liability to Fox for broadcasting the profane language. The FCC argued that previous decisions referring to ""fleeting"" expletives were merely staff letters and dicta and did not accurately represent its position on the matter. Fox appealed the FCC sanctions to the U.S. Court of Appeals for the Second Circuit.\nThe Second Circuit held that the FCC's liability order was ""arbitrary and capricious"" under the governing Administrative Procedure Act because the FCC had completely reversed its position on fleeting expletives without giving a proper justification. The Second Circuit also failed to find any evidence that the expletives were harmful.\n"
3,TEST_0003,Harold Kaufman,United States,"During his trial for armed robbery of a federally insured savings and loan association, Harold Kaufman admitted to the crime but unsuccessfully claimed insanity. He was convicted and the U.S. Court of Appeals for the Eighth Circuit affirmed. Kaufman then filed a post-conviction motion in district court challenging the evidence that proved his sanity. He alleged that the evidence was unlawfully seized in violation of the Fourth Amendment. The district court denied relief, holding that unlawful search and seizure was not an available attack in post-conviction proceedings. The Eighth Circuit affirmed.\n"
4,TEST_0004,Berger,Hanlon,"In 1993, a magistrate judge issued a warrant authorizing the search of Paul and Erma Berger's Montana ranch for evidence of the taking of wildlife in violation of federal law. Later, a multiple-vehicle caravan consisting of government agents and a crew of photographers and reporters from CNN proceeded to the ranch. In executing the warrant, the federal officers allowed the media crew to accompany and observe them. Subsequently, the Berger's filed suit, asserting that the officials, special agents of the United States Fish and Wildlife Service and an assistant United States attorney, had violated their rights under the Fourth Amendment. The District Court concluded that the officials were entitled to qualified immunity, as no clearly established law protecting individuals from the commercial recording of a search of their premises existed at the time. The Court of Appeals reversed.\n"


모델링을 위해 참고한 논문

- [Legal Judgement Prediction for UK Courts](https://dl.acm.org/doi/10.1145/3388176.3388183)
    - https://ueaeprints.uea.ac.uk/id/eprint/75123/1/Accepted_Manuscript.pdf

Modeling = feature + algorithm

- "the top performing combination : LR algorithm paired with the TFIDF
vector representation."

## Vectorize

In [10]:
vectorizer = TfidfVectorizer()

def get_vector(vectorizer, df, train_mode):
    if train_mode:
        X_facts = vectorizer.fit_transform(df['facts'])
    else:
        X_facts = vectorizer.transform(df['facts'])
    X_party1 = vectorizer.transform(df['first_party'])
    X_party2 = vectorizer.transform(df['second_party'])
    
    X = np.concatenate([X_party1.todense(), X_party2.todense(), X_facts.todense()], axis=1)
    return np.asarray(X)

In [19]:
X = get_vector(vectorizer, train, True)
Y = train["first_party_winner"]
X_test = get_vector(vectorizer, test, False)

In [12]:
X_train.shape

(2478, 52377)

## Modeling

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(X, Y, test_size=0.1, )

### Logistic Regression

In [13]:
from sklearn.model_selection import GridSearchCV

In [14]:
lr = LogisticRegression()

In [15]:
lr_params = {
    'C':[0.1, 0.5, 1.0],
    'max_iter':[800, 900, 1000],
    'solver':['liblinear', 'lbfgs'],
    'random_state':[122]
}

In [16]:
grid_cv_lr = GridSearchCV(estimator=lr,
                       param_grid=lr_params,
                       scoring='accuracy',
                       cv=5)

grid_cv_lr.fit(X_train, Y_train)

In [17]:
print('최적 하이퍼파라미터:', grid_cv_lr.best_params_)

최적 하이퍼파라미터: {'C': 0.1, 'max_iter': 800, 'random_state': 122, 'solver': 'liblinear'}


In [68]:
model = grid_cv_lr.best_estimator_
grid_cv_lr_pred = model.predict(X_val)
score = accuracy_score(y_val, grid_cv_lr_pred);
print(f"Accuracy: {score:.4f}")

Accuracy: 0.6801


In [69]:
pred_lr = model.predict(X_test)

submission['first_party_winner'] = pred_lr
submission.to_csv('./baseline_hyperparams.csv', index=False)
print('Done')

Done


### KNN

In [13]:
from sklearn.neighbors import KNeighborsClassifier

kn = KNeighborsClassifier()
kn.fit(X_train, Y_train)

### RF

In [35]:
from sklearn.metrics import accuracy_score

In [33]:
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(X, Y, test_size=0.15, stratify=Y, random_state=122)

In [42]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

rf = RandomForestClassifier()

rf_params = {
    'random_state':[42],
    'n_estimators':[100, 140, 200]
}

grid_cv_rf = GridSearchCV(estimator=rf,
                          param_grid=rf_params,
                          scoring='accuracy',
                          cv=5)

grid_cv_rf.fit(X_train, y_train)
print('최적 하이퍼파라미터 :', grid_cv_rf.best_params_)

최적 하이퍼파라미터 : {'n_estimators': 140, 'random_state': 42}


In [46]:
model = grid_cv_rf.best_estimator_
grid_cv_pred = model.predict(X_val)
score = accuracy_score(y_val, grid_cv_pred);
print(f"Accuracy: {score:.4f}")

Accuracy: 0.6720


In [41]:
model = grid_cv_rf.best_estimator_
grid_cv_pred = model.predict(X_val)
score = accuracy_score(y_val, grid_cv_pred);
print(f"Accuracy: {score:.4f}")

Accuracy: 0.6720


In [36]:
preds_rf = rf.predict(X_val)

score = accuracy_score(y_val, preds_rf);
print(f"Accuracy: {score:.4f}")

Accuracy: 0.6694


## Inference & Submission

In [11]:
pred = model.predict(X_test)

In [12]:
submission['first_party_winner'] = pred
submission.to_csv('./baseline_submit.csv', index=False)
print('Done')

Done


In [14]:
pred_kn = kn.predict(X_test)

submission['first_party_winner'] = pred_kn
submission.to_csv('./knn_model.csv', index=False)
print('Done')

Done


In [45]:
pred = model.predict(X_test)
submission['first_party_winner'] = pred
submission.to_csv('./rf_model.csv', index=False)
print('Done')

Done
