<a href="https://colab.research.google.com/github/Harsh-Mathur-1503/genrative_ai/blob/main/gen_ai_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
pip install torch==1.13.1 torchdata==0.5.1

In [9]:
!pip install transformers datasets

In [15]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer

In this use case, we will be generating summary of a dialogue with the pre-trained Large Language Model FLAN-75 from Hugging Face.

Now we'll upload some simple dialogue from **DialogSum** Hugging Face dataset which consists of 10000+ dialogues manually labelled summaries and topics.

In [34]:
huggingface_dataset_name = "knkarthick/dialogsum"
dataset = load_dataset(huggingface_dataset_name)



  0%|          | 0/3 [00:00<?, ?it/s]

**Print a couple of dialogues with baseline summaries**

In [27]:
example_indices = [40,200]
dash_line = '-'.join('' for x in range(100))

for i,index in enumerate(example_indices):
  print(dash_line)
  print('EXAMPLE ',i+1)
  print(dash_line)
  print('INPUT_DIALOGUE : ')
  print(datset['test'][index]['dialogue'])
  print(dash_line)
  print('BASELINE_SUMMARY : ')
  print(datset['test'][index]['summary'])
  print(dash_line)
  print()

---------------------------------------------------------------------------------------------------
EXAMPLE  1
---------------------------------------------------------------------------------------------------
INPUT_DIALOGUE : 
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE_SUMMARY : 
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------
EXAMP

Load the FLAN-T5 Model, creating an instance of **AutoModelForSeq2SeqLM** class with **.from_pretrained()** method.

In [28]:
model_name = "google/flan-t5-large"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

To perform encoding and decoding , I need to work with tokenized form.
**Tokenisation** is a process of splitting texts into smaller units that can be processed by LLM models.

Parameter **use_fast** switches on fast tokenizer. At this stage, there is no need to go into the details of that but you can find the tokenizer parameters in the documentation.

In [37]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

In [38]:
sentence = "What time is it Tom ?"
sentence_encoded = tokenizer(sentence,return_tensors="pt")

sentence_decoded = tokenizer.decode(sentence_encoded['input_ids'][0],skip_special_tokens=True)
print("ENCODED SENTENCE : ")
print(sentence_encoded["input_ids"][0])
print("\n DECODED SENTENCE : ")
print(sentence_decoded)

ENCODED SENTENCE : 
tensor([ 363,   97,   19,   34, 3059,    3,   58,    1])

 DECODED SENTENCE : 
What time is it Tom?


Observing how well our LLM performs without any prompt engineering.

In [40]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    inputs = tokenizer(dialogue, return_tensors="pt")

    outputs = tokenizer.decode(
        model.generate(inputs["input_ids"], max_new_tokens=50)[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print(f'EXAMPLE {i + 1}')
    print(dash_line)
    print(f'INPUT_DIALOGUE:\n{dialogue}')
    print(dash_line)
    print(f'BASELINE_HUMAN_SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL_GENERATED_SUMMARY_WITHOUT_PROMPT_ENGINEERING:\n{outputs}\n')


---------------------------------------------------------------------------------------------------
EXAMPLE 1
---------------------------------------------------------------------------------------------------
INPUT_DIALOGUE:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE_HUMAN_SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL_GENERATED_SUMMARY_WITHOUT_PROMPT_ENGINEERING:
#Person1#: I'm afraid I'm late.

--------------------