![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use watsonx, and `mixtral_8x7b_instruct_v01` to summarize legal Contracts documents

#### Disclaimers

- Use only Projects and Spaces that are available in watsonx context.


## Notebook content

This notebook contains the steps and code to demonstrate support of text summarization in watsonx. It introduces commands for data retrieval and model testing.

Some familiarity with Python is helpful. This notebook uses Python 3.10.


## Learning goal

The goal of this notebook is to demonstrate how to use `mixtral_8x7b_instruct_v01` model to summarize legal documents .


## Contents

This notebook contains the following parts:

- [Setup](#setup)
- [Data loading](#data)
- [Foundation Models on watsonx](#models)
- [Model testing](#predict)
- [Score](#score)
- [Summary](#summary)

<a id="setup"></a>
##  Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a <a href="https://cloud.ibm.com/catalog/services/watson-machine-learning" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> instance (a free plan is offered and information about how to create the instance can be found <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/wml-plans.html?context=wx&audience=wdp" target="_blank" rel="noopener no referrer">here</a>).


### Install and import the `datasets` and dependecies

In [None]:
!pip install wget | tail -n 1
!pip install requests | tail -n 1
!pip install "ibm-watsonx-ai>=1.0.10" | tail -n 1

### Defining the WML credentials
This cell defines the WML credentials required to work with watsonx Foundation Model inferencing.

**Action:** Provide the IBM Cloud user API key. For details, see
[documentation](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui).

In [1]:
import getpass

credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": getpass.getpass("Please enter your WML api key (hit enter): ")
}

### Defining the project id
The Foundation Model requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.

In [2]:
import os

try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

<a id="data"></a>
## Data loading

Download the `legal_contracts_summarization` dataset. It contains different legal documents, e.g. terms & conditions or licences, together with their summaries written by humans.

In [3]:
import wget

filename = 'contracts_summarization.json'
url = 'https://raw.githubusercontent.com/lauramanor/legal_summarization/master/all_v1.json'

if not os.path.isfile(filename): 
    wget.download(url, out='contracts_summarization.json')

Read the data.

In [4]:
import pandas as pd

data = pd.read_json("contracts_summarization.json").T

Inspect data sample. 

In [5]:
import json

data_sample = data[17:27][["original_text", "reference_summary"]]
print(json.dumps(data_sample.values.tolist(), indent=2))

[
  [
    "these terms and any action related thereto will be governed by the laws of the state of california without regard to its conflict of laws provisions. these terms constitute the entire and exclusive understanding and agreement between niantic and you regarding the services and content and these terms supersede and replace any and all prior oral or written understandings or agreements between niantic and you regarding the services and content. if any provision of these terms is held invalid or unenforceable either by an arbitrator appointed pursuant to the terms of the dispute resolution section above or by a court of competent jurisdiction but only if you timely opt out of arbitration by sending us an arbitration opt out notice in accordance with the terms set forth above that provision will be enforced to the maximum extent permissible and the other provisions of these terms will remain in full force and effect. you may not assign or transfer these terms by operation of law 

#### Check the sample text and summary length.

The original text length statistics.

In [6]:
data.original_text.apply(lambda x: len(x.split())).describe()

count     446.000000
mean      101.894619
std       143.408492
min         7.000000
25%        32.250000
50%        58.500000
75%       103.000000
max      1077.000000
Name: original_text, dtype: float64

The reference summary length statistics.

In [7]:
data.reference_summary.apply(lambda x: len(x.split())).describe()

count    446.000000
mean      15.964126
std       11.344199
min        1.000000
25%        9.000000
50%       13.000000
75%       18.000000
max       80.000000
Name: reference_summary, dtype: float64

<a id="models"></a>
## Foundation Models on `watson.ai`

#### List available models

All avaliable models are presented under ModelTypes class.
For more information refer to [documentation](https://ibm.github.io/watsonx-ai-python-sdk/fm_model.html#ibm_watsonx_ai.foundation_models.utils.enums.ModelTypes).

In [8]:
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes

print([model.name for model in ModelTypes])

['FLAN_T5_XXL', 'FLAN_UL2', 'MT0_XXL', 'GPT_NEOX', 'MPT_7B_INSTRUCT2', 'STARCODER', 'LLAMA_2_70B_CHAT', 'LLAMA_2_13B_CHAT', 'GRANITE_13B_INSTRUCT', 'GRANITE_13B_CHAT', 'FLAN_T5_XL', 'GRANITE_13B_CHAT_V2', 'GRANITE_13B_INSTRUCT_V2', 'ELYZA_JAPANESE_LLAMA_2_7B_INSTRUCT', 'MIXTRAL_8X7B_INSTRUCT_V01_Q', 'CODELLAMA_34B_INSTRUCT_HF', 'GRANITE_20B_MULTILINGUAL', 'MERLINITE_7B', 'GRANITE_20B_CODE_INSTRUCT', 'GRANITE_34B_CODE_INSTRUCT', 'GRANITE_3B_CODE_INSTRUCT', 'GRANITE_7B_LAB', 'GRANITE_8B_CODE_INSTRUCT', 'LLAMA_3_70B_INSTRUCT', 'LLAMA_3_8B_INSTRUCT', 'MIXTRAL_8X7B_INSTRUCT_V01']


You need to specify `model_id` that will be used for inferencing:

In [11]:
model_id = ModelTypes.MIXTRAL_8X7B_INSTRUCT_V01

### Defining the model parameters

You might need to adjust model `parameters` for different models or tasks, to do so please refer to [documentation](https://ibm.github.io/watsonx-ai-python-sdk/fm_model.html#metanames.GenTextParamsMetaNames).

In [12]:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams

parameters = {
    GenParams.DECODING_METHOD: "greedy",
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 150
}

### Initialize the model
Initialize the `Model` class with previous set params.

In [13]:
from ibm_watsonx_ai.foundation_models import ModelInference

model = ModelInference(
    model_id=model_id, 
    params=parameters, 
    credentials=credentials,
    project_id=project_id)

### Model's details

In [14]:
model.get_details()

{'model_id': 'mistralai/mixtral-8x7b-instruct-v01',
 'label': 'mixtral-8x7b-instruct-v01',
 'provider': 'Mistral AI',
 'source': 'Hugging Face',
 'functions': [{'id': 'text_generation'}],
 'short_description': 'The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.',
 'long_description': "This model is made with AutoGPTQ, which mainly leverages the quantization technique to 'compress' the model weights from FP16 to 4-bit INT and performs 'decompression' on-the-fly before computation (in FP16). As a result, the GPU memory, and the data transferring between GPU memory and GPU compute engine, compared to the original FP16 model, is greatly reduced. The major quantization parameters used in the process are listed below.",
 'input_tier': 'class_1',
 'output_tier': 'class_1',
 'number_params': '46.7b',
 'min_shot_size': 1,
 'task_ids': ['summarization',
  'retrieval_augmented_generation',
  'classification',
  'generation',
  'code',
  'extraction']

<a id="predict"></a>
##  Generate document summary

Define instructions for the model. 

In [15]:
instruction =  "Generate a brief summary of this document:\n"

Prepare model inputs - build few-shot examples.

In [16]:
few_shot_input = []
few_shot_target = []
singleoutput= []

for i,tl in enumerate(data_sample.values):
    if (i+1)%2==0:
        singleoutput.append(f"    document: {tl[0]}    summary:")
        few_shot_input.append("".join(singleoutput))
        few_shot_target.append(tl[1])
        singleoutput = []
    else:
        singleoutput.append(f"    document: {tl[0]}    summary: {tl[1]}")

Inspect an exemplary input of the few-shot prompt.

In [17]:
print(json.dumps(print(few_shot_input[0]), indent=2))

    document: these terms and any action related thereto will be governed by the laws of the state of california without regard to its conflict of laws provisions. these terms constitute the entire and exclusive understanding and agreement between niantic and you regarding the services and content and these terms supersede and replace any and all prior oral or written understandings or agreements between niantic and you regarding the services and content. if any provision of these terms is held invalid or unenforceable either by an arbitrator appointed pursuant to the terms of the dispute resolution section above or by a court of competent jurisdiction but only if you timely opt out of arbitration by sending us an arbitration opt out notice in accordance with the terms set forth above that provision will be enforced to the maximum extent permissible and the other provisions of these terms will remain in full force and effect. you may not assign or transfer these terms by operation of l

### Generate the legal document summary using `mixtral_8x7b_instruct_v01` model.


Get the docs summaries.

In [18]:
results = []

for inp in few_shot_input:
    results.append(model.generate(" ".join([instruction, inp]))["results"][0])

Explore model output.

In [19]:
print(json.dumps(results, indent=2))

[
  {
    "generated_text": " if you have questions about the terms or services, contact niantic.",
    "generated_token_count": 16,
    "input_token_count": 514,
    "stop_reason": "eos_token"
  },
  {
    "generated_text": " tldr grants you a limited license to access and use the site. you may not use the site in any automated way. all content on the site is the property of tldr or third parties and is protected by copyright. you may not redistribute sell decompile reverse engineer or disassemble the software. you may use content explicitly licensed to you by tldr under the creative commons attribution 3 0 license cc by. you are responsible for any content you access through the sites. tldr is not responsible for content hosted by third parties.",
    "generated_token_count": 116,
    "input_token_count": 1774,
    "stop_reason": "eos_token"
  },
  {
    "generated_text": " you agree not to use tldr to gain unauthorized access to any portion of the sites or any other systems or netwo

<a id="score"></a>
## Score the model

**Note:** To run the Score section for model scoring on the whole financial phrasebank dataset please transform following `markdown` cells to `code` cells.
Have in mind that it might use significant amount of recources to score model on the whole dataset.

In this sample notebook `spacy` implementation of cosine similarity for `en_core_web_md`  corpus was used for cosine similarity calculation.

**Tip:** You might consider using bigger language corpus, different word embeddings and distance metrics for output summary scoring against the reference summary.

Get the true labels.

```
y_true = few_shot_target
y_true
```

Get the prediction labels.

```
y_pred = [result['generated_text'] for result in results]
```

Use `spacy` and `en_core_web_md` corpus to calculate cosine similarity of generated and reference summaries.

```
!pip install spacy | tail -1
!python -m spacy download en_core_web_md | tail -1
```

```
import en_core_web_md
nlp = en_core_web_md.load()
```

```
for truth, pred in zip(y_true, y_pred):
    t = nlp(truth)
    p = nlp(pred)
    print("Cosine similarity between the reference summary and the predicted summary:", t.similarity(p))
```

#### Rouge Metric

**Note:** The Rouge (Recall-Oriented Understudy for Gisting Evaluation) metric is a set of evaluation measures used in natural language processing (NLP) and specifically in text summarization and machine translation tasks. The Rouge metrics are designed to assess the quality of generated summaries or translations by comparing them to one or more reference texts.

The main idea behind Rouge is to measure the overlap between the generated summary (or translation) and the reference text(s) in terms of n-grams or longest common subsequences. By calculating recall, precision, and F1 scores based on these overlapping units, Rouge provides a quantitative assessment of the summary's content overlap with the reference(s).

Rouge-1 focuses on individual word overlap, Rouge-2 considers pairs of consecutive words, and Rouge-L takes into account the ordering of words and phrases. These metrics provide different perspectives on the similarity between two texts and can be used to evaluate different aspects of summarization or text generation models.

```
!pip install rouge
```

```
from rouge import Rouge

rouge = Rouge()
scores = rouge.get_scores(y_true, y_pred)
scores
```

<a id="summary"></a>
## Summary and next steps

 You successfully completed this notebook!.
 
 You learned how to generate documents summaries with IBM's `mixtral_8x7b_instruct_v01` on watsonx. 
 
 Check out our _[Online Documentation](https://ibm.github.io/watson-machine-learning-sdk/samples.html)_ for more samples, tutorials, documentation, how-tos, and blog posts. 

### Authors

**Mateusz Szewczyk**, Software Engineer at Watson Machine Learning.

Copyright Â© 2024 IBM. This notebook and its source code are released under the terms of the MIT License.