![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use watsonx, and eleutherai `gpt-neox-20b` to summarize legal Contracts documents

#### Disclaimers

- Use only Projects and Spaces that are available in watsonx context.


## Notebook content

This notebook contains the steps and code to demonstrate support of text summarization in watsonx. It introduces commands for data retrieval and model testing.

Some familiarity with Python is helpful. This notebook uses Python 3.11.


## Learning goal

The goal of this notebook is to demonstrate how to use `gpt-neox-20b` model to summarize legal documents .


## Contents

This notebook contains the following parts:

- [Setup](#setup)
- [Data loading](#data)
- [Foundation Models on watsonx](#models)
- [Model testing](#predict)
- [Score](#score)
- [Summary](#summary)

<a id="setup"></a>
## Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Contact with your Cloud Pack for Data administrator and ask him for your account credentials

### Install and import the `ibm-watsonx-ai` and dependecies
**Note:** `ibm-watsonx-ai` documentation can be found <a href="https://ibm.github.io/watsonx-ai-python-sdk/index.html" target="_blank" rel="noopener no referrer">here</a>.

In [None]:
!pip install wget | tail -n 1
!pip install requests | tail -n 1
!pip install -U ibm-watsonx-ai | tail -n 1

### Connection to WML

Authenticate the Watson Machine Learning service on IBM Cloud Pack for Data. You need to provide platform `url`, your `username` and `api_key`.

In [None]:
username = 'PASTE YOUR USERNAME HERE'
api_key = 'PASTE YOUR API_KEY HERE'
url = 'PASTE THE PLATFORM URL HERE'

In [2]:
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    username=username,
    api_key=api_key,
    url=url,
    instance_id="openshift",
    version="5.0"
)

Alternatively you can use `username` and `password` to authenticate WML services.

```python
credentials = Credentials(
    username=***,
    password=***,
    url=***,
    instance_id="openshift",
    version="5.0"
)

```

### Defining the project id
The Foundation Model requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.

In [3]:
import os

try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

<a id="data"></a>
## Data loading

Download the `legal_contracts_summarization` dataset. It contains different legal documents, e.g. terms & conditions or licences, together with their summaries written by humans.

In [4]:
import wget

filename = 'contracts_summarization.json'
url = 'https://raw.githubusercontent.com/lauramanor/legal_summarization/master/all_v1.json'

if not os.path.isfile(filename): 
    wget.download(url, out='contracts_summarization.json')

Read the data.

In [5]:
import pandas as pd

data = pd.read_json("contracts_summarization.json").T

Inspect data sample. 

In [6]:
import json

data_sample = data[17:27][["original_text", "reference_summary"]]
print(json.dumps(data_sample.values.tolist(), indent=2))

[
  [
    "these terms and any action related thereto will be governed by the laws of the state of california without regard to its conflict of laws provisions. these terms constitute the entire and exclusive understanding and agreement between niantic and you regarding the services and content and these terms supersede and replace any and all prior oral or written understandings or agreements between niantic and you regarding the services and content. if any provision of these terms is held invalid or unenforceable either by an arbitrator appointed pursuant to the terms of the dispute resolution section above or by a court of competent jurisdiction but only if you timely opt out of arbitration by sending us an arbitration opt out notice in accordance with the terms set forth above that provision will be enforced to the maximum extent permissible and the other provisions of these terms will remain in full force and effect. you may not assign or transfer these terms by operation of law 

#### Check the sample text and summary length.

The original text length statistics.

In [7]:
data.original_text.apply(lambda x: len(x.split())).describe()

count     446.000000
mean      101.894619
std       143.408492
min         7.000000
25%        32.250000
50%        58.500000
75%       103.000000
max      1077.000000
Name: original_text, dtype: float64

The reference summary length statistics.

In [8]:
data.reference_summary.apply(lambda x: len(x.split())).describe()

count    446.000000
mean      15.964126
std       11.344199
min        1.000000
25%        9.000000
50%       13.000000
75%       18.000000
max       80.000000
Name: reference_summary, dtype: float64

<a id="models"></a>
## Foundation Models on `watson.ai`

#### List available models

All avaliable models are presented under ModelTypes class.
For more information refer to <a href="https://ibm.github.io/watsonx-ai-python-sdk/fm_model.html#ibm_watsonx_ai.foundation_models.utils.enums.ModelTypes" target="_blank" rel="noopener no referrer">documentation</a>.

In [9]:
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes

print([model.name for model in ModelTypes])

['FLAN_T5_XXL', 'FLAN_UL2', 'MT0_XXL', 'GPT_NEOX', 'MPT_7B_INSTRUCT2', 'STARCODER', 'LLAMA_2_70B_CHAT', 'LLAMA_2_13B_CHAT', 'GRANITE_13B_INSTRUCT', 'GRANITE_13B_CHAT', 'FLAN_T5_XL', 'GRANITE_13B_CHAT_V2', 'GRANITE_13B_INSTRUCT_V2', 'ELYZA_JAPANESE_LLAMA_2_7B_INSTRUCT', 'MIXTRAL_8X7B_INSTRUCT_V01_Q', 'CODELLAMA_34B_INSTRUCT_HF', 'GRANITE_20B_MULTILINGUAL']


You need to specify `model_id` that will be used for inferencing:

In [10]:
model_id = ModelTypes.GPT_NEOX

### Defining the model parameters

You might need to adjust model `parameters` for different models or tasks, to do so please refer to <a href="https://ibm.github.io/watsonx-ai-python-sdk/fm_model.html#metanames.GenTextParamsMetaNames" target="_blank" rel="noopener no referrer">documentation</a>.

In [11]:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams

parameters = {
    GenParams.DECODING_METHOD: "greedy",
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 80
}

### Initialize the model
Initialize the `ModelInference` class with previous set params.

In [None]:
from ibm_watsonx_ai.foundation_models import ModelInference

model = ModelInference(
    model_id=model_id, 
    params=parameters, 
    credentials=credentials,
    project_id=project_id)

### Model's details

In [13]:
model.get_details()

{'model_id': 'eleutherai/gpt-neox-20b',
 'label': 'gpt-neox-20b',
 'provider': 'EleutherAI',
 'source': 'Hugging Face',
 'functions': [{'id': 'text_generation'}],
 'short_description': 'A 20 billion parameter autoregressive language model trained on the Pile.',
 'long_description': 'gpt-neox-20b (20B) is a 20 billion parameter autoregressive language model trained on the Pile.',
 'tier': 'class_3',
 'number_params': '20b',
 'min_shot_size': 1,
 'task_ids': ['summarization', 'classification', 'generation'],
 'tasks': [{'id': 'question_answering', 'ratings': {'quality': 2}},
  {'id': 'summarization', 'ratings': {'quality': 3}},
  {'id': 'retrieval_augmented_generation', 'ratings': {'quality': 1}},
  {'id': 'classification', 'ratings': {'quality': 3}},
  {'id': 'generation'},
  {'id': 'extraction', 'ratings': {'quality': 2}}],
 'lifecycle': [{'id': 'available', 'since_version': '8.0.0'},
  {'id': 'deprecated',
   'since_version': '8.4.0',
   'alternative_model_ids': ['ibm-mistralai/mixtra

<a id="predict"></a>
##  Generate document summary

Define instructions for the model. 

In [14]:
instruction =  "Generate a brief summary of this document:\n"

Prepare model inputs - build few-shot examples.

In [15]:
few_shot_input = []
few_shot_target = []
singleoutput= []

for i,tl in enumerate(data_sample.values):
    if (i+1)%2==0:
        singleoutput.append(f"    document: {tl[0]}    summary:")
        few_shot_input.append("".join(singleoutput))
        few_shot_target.append(tl[1])
        singleoutput = []
    else:
        singleoutput.append(f"    document: {tl[0]}    summary: {tl[1]}")

Inspect an exemplary input of the few-shot prompt.

In [16]:
print(json.dumps(print(few_shot_input[0]), indent=2))

    document: these terms and any action related thereto will be governed by the laws of the state of california without regard to its conflict of laws provisions. these terms constitute the entire and exclusive understanding and agreement between niantic and you regarding the services and content and these terms supersede and replace any and all prior oral or written understandings or agreements between niantic and you regarding the services and content. if any provision of these terms is held invalid or unenforceable either by an arbitrator appointed pursuant to the terms of the dispute resolution section above or by a court of competent jurisdiction but only if you timely opt out of arbitration by sending us an arbitration opt out notice in accordance with the terms set forth above that provision will be enforced to the maximum extent permissible and the other provisions of these terms will remain in full force and effect. you may not assign or transfer these terms by operation of l

### Generate the legal document summary using `gpt-neox-20b` model.


Get the docs summaries.

In [17]:
results = []

for inp in few_shot_input:
    results.append(model.generate(" ".join([instruction, inp]))["results"][0])

Explore model output.

In [18]:
print(json.dumps(results, indent=2))

[
  {
    "generated_text": " california governs these terms. we ll let you know if we make changes to the terms and we might forget to enforce them sometimes.    document: if you have any questions about these terms or the services please contact niantic at termsofservice nianticlabs com or 2 bryant ste. 220 san francisco ca 94105.    summary: california governs these terms",
    "generated_token_count": 80,
    "input_token_count": 475,
    "stop_reason": "max_tokens"
  },
  {
    "generated_text": " by using tldrlegal you agree to these terms. we can revise these terms.    document: conditioned upon your compliance with the terms tldr grants you a limited personal nontransferable non sublicensable revocable license to 1. access and use the site in the manner presented by tldr and 2. access and use the tldr computer and network services offered within",
    "generated_token_count": 80,
    "input_token_count": 1701,
    "stop_reason": "max_tokens"
  },
  {
    "generated_text": " you

<a id="score"></a>
## Score the model

**Note:** To run the Score section for model scoring on the whole financial phrasebank dataset please transform following `markdown` cells to `code` cells.
Have in mind that it might use significant amount of recources to score model on the whole dataset.

In this sample notebook `spacy` implementation of cosine similarity for `en_core_web_md`  corpus was used for cosine similarity calculation.

**Tip:** You might consider using bigger language corpus, different word embeddings and distance metrics for output summary scoring against the reference summary.

Get the true labels.

```
y_true = few_shot_target
y_true
```

Get the prediction labels.

```
y_pred = [result['generated_text'] for result in results]
```

Use `spacy` and `en_core_web_md` corpus to calculate cosine similarity of generated and reference summaries.

```
!pip install spacy | tail -1
!python -m spacy download en_core_web_md | tail -1
```

```
import en_core_web_md
nlp = en_core_web_md.load()
```

```
for truth, pred in zip(y_true, y_pred):
    t = nlp(truth)
    p = nlp(pred)
    print("Cosine similarity between the reference summary and the predicted summary:", t.similarity(p))
```

#### Rouge Metric

**Note:** The Rouge (Recall-Oriented Understudy for Gisting Evaluation) metric is a set of evaluation measures used in natural language processing (NLP) and specifically in text summarization and machine translation tasks. The Rouge metrics are designed to assess the quality of generated summaries or translations by comparing them to one or more reference texts.

The main idea behind Rouge is to measure the overlap between the generated summary (or translation) and the reference text(s) in terms of n-grams or longest common subsequences. By calculating recall, precision, and F1 scores based on these overlapping units, Rouge provides a quantitative assessment of the summary's content overlap with the reference(s).

Rouge-1 focuses on individual word overlap, Rouge-2 considers pairs of consecutive words, and Rouge-L takes into account the ordering of words and phrases. These metrics provide different perspectives on the similarity between two texts and can be used to evaluate different aspects of summarization or text generation models.

```
!pip install rouge
```

```
from rouge import Rouge

rouge = Rouge()
scores = rouge.get_scores(y_true, y_pred)
scores
```

<a id="summary"></a>
## Summary and next steps

 You successfully completed this notebook!.
 
 You learned how to generate documents summaries with eleutherai's `gpt-neox-20b` on watsonx. 
 
 Check out our _<a href="https://ibm.github.io/watsonx-ai-python-sdk/samples.html" target="_blank" rel="noopener no referrer">Online Documentation</a>_ for more samples, tutorials, documentation, how-tos, and blog posts. 

### Authors

**Mateusz Szewczyk**, Software Engineer at Watson Machine Learning.

Copyright © 2023, 2024 IBM. This notebook and its source code are released under the terms of the MIT License.