# Generative AI Use Case: Summarize Dialogue

In [6]:
%pip install datasets
%pip install transformers
%pip install torch
%pip install torchdata


Collecting datasets
  Downloading datasets-2.13.1-py3-none-any.whl (486 kB)
                                              0.0/486.2 kB ? eta -:--:--
                                              0.0/486.2 kB ? eta -:--:--
                                              0.0/486.2 kB ? eta -:--:--
     --                                       30.7/486.2 kB ? eta -:--:--
     ---                                   41.0/486.2 kB 667.8 kB/s eta 0:00:01
     ---                                   41.0/486.2 kB 667.8 kB/s eta 0:00:01
     -----                                 71.7/486.2 kB 491.5 kB/s eta 0:00:01
     -------                               92.2/486.2 kB 438.1 kB/s eta 0:00:01
     -------                               92.2/486.2 kB 438.1 kB/s eta 0:00:01
     ---------                            122.9/486.2 kB 400.9 kB/s eta 0:00:01
     -----------                          153.6/486.2 kB 459.5 kB/s eta 0:00:01
     -----------                          153.6/486.2 kB 459.5 kB/s eta

In [7]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from transformers import GenerationConfig

In [8]:
huggingface_dataset_name = "knkarthick/dialogsum"
dataset = load_dataset(huggingface_dataset_name)

Downloading readme: 100%|██████████| 4.56k/4.56k [00:00<?, ?B/s]


Downloading and preparing dataset csv/knkarthick--dialogsum to C:/Users/astaj/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-c07c4cf4362c223c/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d...


Downloading data: 100%|██████████| 11.3M/11.3M [00:01<00:00, 10.5MB/s]
Downloading data: 100%|██████████| 442k/442k [00:00<00:00, 633kB/s]t]
Downloading data: 100%|██████████| 1.35M/1.35M [00:00<00:00, 1.66MB/s]
Downloading data files: 100%|██████████| 3/3 [00:06<00:00,  2.14s/it]
Extracting data files: 100%|██████████| 3/3 [00:00<00:00, 71.56it/s]
                                                                   

Dataset csv downloaded and prepared to C:/Users/astaj/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-c07c4cf4362c223c/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d. Subsequent calls will reuse this data.


100%|██████████| 3/3 [00:00<00:00, 41.78it/s]


In [9]:
example_indices = [40,200]
dash_line = '-'.join('' for x in range(100))

for i, index in enumerate(example_indices):
    print(dash_line)
    print("Example {}".format(i+1))
    print(dash_line)
    print('Input Dialogue:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('baseline summary:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print()
    

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
Input Dialogue:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
baseline summary:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------
Example 2


In [10]:
model_name = "google/flan-t5-base"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

Downloading (…)lve/main/config.json: 100%|██████████| 1.40k/1.40k [00:00<?, ?B/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Downloading pytorch_model.bin: 100%|██████████| 990M/990M [01:43<00:00, 9.53MB/s] 
Downloading (…)neration_config.json: 100%|██████████| 147/147 [00:00<00:00, 36.9kB/s]


In [11]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

Downloading (…)okenizer_config.json: 100%|██████████| 2.54k/2.54k [00:00<00:00, 3.26MB/s]
Downloading spiece.model: 100%|██████████| 792k/792k [00:00<00:00, 10.9MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 2.42M/2.42M [00:01<00:00, 1.97MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 2.20k/2.20k [00:00<?, ?B/s]


In [12]:
sentence = "Hello, my dog is cute"

sentence_encoded = tokenizer(sentence, return_tensors="pt")

sentence_decoded = tokenizer.decode(sentence_encoded['input_ids'][0], skip_special_tokens=True)

print("Encoded sentence: {}".format(sentence_encoded))
print("Decoded sentence: {}".format(sentence_decoded))


Encoded sentence: {'input_ids': tensor([[8774,    6,   82, 1782,   19, 5295,    1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])}
Decoded sentence: Hello, my dog is cute


### Without any Prompt Engineering

In [None]:
HI