![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use Watsonx to summarize legal Contracts documents.

**Note:** Please note that for the watsonx challenge, please run these notebooks locally on your laptop/desktop and do not run it in IBM Cloud.  The instructions for running the notebook locally are provided in the readme.md file present in the zip file.

This notebook contains the steps and code to demonstrate support of text summarization in Watsonx. It introduces commands for data retrieval and model testing.

Some familiarity with Python is helpful. This notebook uses Python 3.10.

<a id="setup"></a>
##  Set up the environment



### Install and import the dependecies

In [1]:
!pip install rouge_score | tail -n 1
!pip install nltk | tail -n 1
!pip install ibm-watson-machine-learning==1.0.312 | tail -n 1



**Note:** Please restart the notebook kernel to pick up proper version of packages installed above.

In [2]:
import os, getpass, json
from pandas import read_json

### Watsonx API connection
This cell defines the credentials required to work with watsonx API for Foundation
Model inferencing.

**Action:** Provide the IBM Cloud user API key. For details, see
[documentation](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui).

In [3]:
credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": getpass.getpass("Please enter your WML api key (hit enter): ")
}

### Defining the project id
The API requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.

**Hint**: You can find the `project_id` as follows. Open the prompt lab in watsonx.ai. At the very top of the UI, there will be `Projects / <project name> /`. Click on the `<project name>` link. Then get the `project_id` from Project's Manage tab (Project -> Manage -> General -> Details).


In [4]:
try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

<a id="data"></a>
## Train/test data loading

Load train and test datasets. At first, training dataset (`train_data`) should be used to work with the models to prepare and tune prompt. Then, test dataset (`test_data`) should be used to calculate the metrics score for selected model, defined prompts and parameters.

In [5]:
filename_test = 'data/Summarisation/test.json'
filename_train = 'data/Summarisation/train.json'

test_data = read_json(filename_test).T[["original_text", "reference_summary"]]
train_data = read_json(filename_train).T[["original_text", "reference_summary"]]

In [6]:
train_data.head()

Unnamed: 0,original_text,reference_summary
legalsum05,subject to your compliance with these terms ni...,don t copy modify resell distribute or reverse...
legalsum06,for purposes of these terms a content means th...,we grant you full ownership of your user conte...
legalsum07,the app permits account holders to capture and...,trading s gonna be a thing. don t try to bring...
legalsum08,we may cancel suspend or terminate your accoun...,if you haven t played for a year you mess up o...
legalsum13,we may terminate your access to and use of the...,both you and we can terminate your account and...


In [7]:
test_data.head()

Unnamed: 0,original_text,reference_summary
tosdr263,the use of our services imply your approval of...,we reserve the right to modify the terms at an...
tosdr082,we collect content of your files and communica...,the service can read your private messages.
tosdr018,you do not have to give your personal or legal...,the service allows you to use pseudonyms.
legalsum50,you agree that you will not remove obscure or ...,keep copyright and trademark notices intact.
tosdr215,you may optionally add other information to yo...,the service allows you to use pseudonyms.


<a id="models"></a>
## Foundation Models on Watsonx

You need to specify `model_id` that will be used for inferencing.

**Action:** Use `FLAN_UL2` model.

In [8]:
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes

In [9]:
model_id = ModelTypes.FLAN_UL2

<a id="predict"></a>
##  Generate document summary

Define instructions for the model.

**Note:** Please **start with using [watsonx.ai Prompt Lab](https://dataplatform.cloud.ibm.com/wx/home?context=wx)** to find better prompts that provides you the best result on a small subset training records (under `train_data` variable). Make sure to not run an inference of all of `train_data`, as it'll take a long time to get the results. To get a sample from `train_data`, you can use e.g.`train_data.head(n=10)` to get first 10 records, or `train_data.sample(n=10)` to get random 10 records. Only once you have identified the best performing prompt, update this notebook to use the prompt and compute the metrics on the test data.

**Action:** Please edit the below cell and add your own prompt here. In the below prompt, we have the instruction (first sentence) and one example included in the prompt.  If you want to change the prompt or add your own examples or more examples, please change the below prompt accordingly.

In [10]:
instruction =  f"""
Generate a brief summary of this document:\n

document: {train_data.original_text[0]}\n
summary: {train_data.reference_summary[0]}\n\n
"""

In [11]:
print(instruction)


Generate a brief summary of this document:


document: subject to your compliance with these terms niantic grants you a limited nonexclusive nontransferable non sublicensable license to download and install a copy of the app on a mobile device and to run such copy of the app solely for your own personal noncommercial purposes. except as expressly permitted in these terms you may not a copy modify or create derivative works based on the app b distribute transfer sublicense lease lend or rent the app to any third party c reverse engineer decompile or disassemble the app or d make the functionality of the app available to multiple users through any means. niantic reserves all rights in and to the app not expressly granted to you under these terms. if you accessed or downloaded the app from the apple store then you agree to use the app only a on an apple branded product or device that runs ios apple s proprietary operating system software and b as permitted by the usage rules set forth in

### Defining the model parameters
We need to provide a set of model parameters that will influence the result:

In [12]:
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.MAX_NEW_TOKENS: 80,
    GenParams.MIN_NEW_TOKENS:1,
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY
}

Initialize the `Model` class.

In [13]:
from ibm_watson_machine_learning.foundation_models import Model

model = Model(
    model_id=model_id,
    params=parameters,
    credentials=credentials,
    project_id=project_id)

Get the docs summaries.

**Note:** Execution of this cell could take several minutes.

In [14]:
results = []
documents = list(test_data.original_text)

for input_text in documents:
    results.append(model.generate_text(prompt=" ".join([instruction, input_text])))

<a id="score"></a>
## Calculate rougeL metric

In this sample notebook `rouge_score` module was used for rougeL calculation.

#### Rouge Metric

**Note:** The Rouge (Recall-Oriented Understudy for Gisting Evaluation) metric is a set of evaluation measures used in natural language processing (NLP) and specifically in text summarization and machine translation tasks. The Rouge metrics are designed to assess the quality of generated summaries or translations by comparing them to one or more reference texts.

The main idea behind Rouge is to measure the overlap between the generated summary (or translation) and the reference text(s) in terms of n-grams or longest common subsequences. By calculating recall, precision, and F1 scores based on these overlapping units, Rouge provides a quantitative assessment of the summary's content overlap with the reference(s).

Rouge-1 focuses on individual word overlap, Rouge-2 considers pairs of consecutive words, and Rouge-L takes into account the ordering of words and phrases. These metrics provide different perspectives on the similarity between two texts and can be used to evaluate different aspects of summarization or text generation models.

In [15]:
from rouge_score import rouge_scorer
from collections import defaultdict
import numpy as np

def get_rouge_score(predictions, references):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL', 'rougeLsum'])
    aggregate_score = defaultdict(list)

    for result, ref in zip(predictions, references):
        for key, val in scorer.score(result, ref).items():
            aggregate_score[key].append(val.fmeasure)

    scores = {}
    for key in aggregate_score:
        scores[key] = np.mean(aggregate_score[key])
    
    return scores

In [16]:
print(get_rouge_score(results, test_data.reference_summary.values))

{'rouge1': 0.0836478894917744, 'rouge2': 0.011333333333333332, 'rougeL': 0.07080349589909705, 'rougeLsum': 0.07080349589909705}


---

Copyright © 2023 IBM. This notebook and its source code are released under the terms of the MIT License.