From https://www.activestate.com/blog/how-to-do-text-summarization-with-python/ :

I need to use **"Abstractive Text Summarization"**



Diff. approaches from https://www.projectpro.io/article/text-summarization-python-nlp/546 .

In [17]:
import pandas as pd
from pathlib import Path

path = Path('../parsing_easylaw_ai/data')

In [49]:
cases = pd.read_csv(path / 'cases_base.csv', sep='\t', encoding='utf-8', nrows=10000)

## check number of tokens

In [50]:
import tiktoken
enc = tiktoken.get_encoding("cl100k_base")
assert enc.decode(enc.encode("hello world")) == "hello world"

# To get the tokeniser corresponding to a specific model in the OpenAI API:
enc = tiktoken.encoding_for_model("gpt-3.5-turbo")

In [51]:
cases['Tokens_of_judgment'] = cases['Judgment'].apply(lambda x: len(enc.encode(x)))

In [60]:
# Input:	$0.0015 / 1K tokens
# Output: 	$0.002 / 1K tokens

print(f"Average number of tokens per 1 case: {(cases['Tokens_of_judgment'].sum() / cases.shape[0])}")
print(f"Total cost of all cases: {((cases['Tokens_of_judgment'].sum() / cases.shape[0]) * 150000) * ((0.0015 + 0.002) / 2) / 1000}")

Average number of tokens per 1 case: 2872.6155
Total cost of all cases: 754.06156875


In [69]:
case = cases[(2500 < cases['Tokens_of_judgment']) & (cases['Tokens_of_judgment'] < 3000)].iloc[0]['Judgment']

# NLTK

# GENSIM

# SPACY

# BERT

# BART

# CHATGPT

In [70]:
import os
import openai

 
openai.api_key = os.getenv('CHATGPTKEY')

In [71]:
# messages = [{"role": "system", "content": "You are an intelligent assistant."}]

# while True:
#     message = input("User : ")
#     if message:
#         messages.append({"role": "user", "content": message})
#         chat = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=messages)

#     reply = chat.choices[0].message.content
#     print(f"ChatGPT: {reply}")
#     messages.append({"role": "assistant", "content": reply})

In [72]:
messages = [{"role": "system", "content": "You are an intelligent assistant."}]

In [73]:
messages.append({"role": "user", "content": f"Summarize this: {case}"})
chat = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=messages)

In [74]:
chat.choices[0].message.content

"Salim Khan, the accused-respondent, was acquitted by the Sessions Judge in Bannu for the murder of Musharraf Khan. The judge concluded that the prosecution failed to prove the case against Salim Khan beyond a reasonable doubt. The state has challenged the acquittal through this appeal. The prosecution's case was that Salim Khan, along with another accused, shot and killed Musharraf Khan. The post-mortem examination showed multiple gunshot wounds to the deceased's body. The eyewitnesses to the occurrence were closely related to the deceased, raising doubts about their testimony. No independent witnesses were produced at the trial. The presence of the witnesses at the scene was questioned, as no one else in the busy market witnessed the incident. Salim Khan had earlier filed a complaint against the other accused, indicating a conflict between them. Considering these factors, the court concluded that the acquittal of Salim Khan was justified and dismissed the appeal."

In [77]:
print(chat.choices[0].message.content.replace('. ', '.\n'))

Salim Khan, the accused-respondent, was acquitted by the Sessions Judge in Bannu for the murder of Musharraf Khan.
The judge concluded that the prosecution failed to prove the case against Salim Khan beyond a reasonable doubt.
The state has challenged the acquittal through this appeal.
The prosecution's case was that Salim Khan, along with another accused, shot and killed Musharraf Khan.
The post-mortem examination showed multiple gunshot wounds to the deceased's body.
The eyewitnesses to the occurrence were closely related to the deceased, raising doubts about their testimony.
No independent witnesses were produced at the trial.
The presence of the witnesses at the scene was questioned, as no one else in the busy market witnessed the incident.
Salim Khan had earlier filed a complaint against the other accused, indicating a conflict between them.
Considering these factors, the court concluded that the acquittal of Salim Khan was justified and dismissed the appeal.


In [78]:
messages.append({"role": "user", "content": f"Summarize even more concisely"})
chat = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=messages)

In [81]:
print(chat.choices[0].message.content.replace('. ', '.\n'))

The accused, Salim Khan, was tried for the murder of Musharraf Khan.
The court found that the prosecution failed to prove the case beyond a reasonable doubt and acquitted the accused.
The state has challenged this acquittal but the court upheld it, stating that there was insufficient evidence and no reason to overturn the acquittal.
The appeal was dismissed.
