<a href="https://colab.research.google.com/github/LearnByDoing2024/Youtube/blob/main/20241029_hugg_model_test/huggingface_model_tests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install transformers==4.17.0  # Ensure the correct version

Collecting transformers==4.17.0
  Using cached transformers-4.17.0-py3-none-any.whl.metadata (67 kB)
Collecting sacremoses (from transformers==4.17.0)
  Downloading sacremoses-0.1.1-py3-none-any.whl.metadata (8.3 kB)
Downloading transformers-4.17.0-py3-none-any.whl (3.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading sacremoses-0.1.1-py3-none-any.whl (897 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m897.5/897.5 kB[0m [31m15.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sacremoses, transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.44.2
    Uninstalling transformers-4.44.2:
      Successfully uninstalled transformers-4.44.2
Successfully installed sacremoses-0.1.1 transformers-4.17.0


In [None]:
import torch
from transformers import (
    T5Tokenizer, T5ForConditionalGeneration,
    DistilBertTokenizer, DistilBertForSequenceClassification,
    BertTokenizer, BertForQuestionAnswering,
    BartTokenizer, BartForConditionalGeneration,
    MarianMTModel, MarianTokenizer,
    AutoTokenizer, AutoModelForSeq2SeqLM  # Add these
)

In [None]:
# 1. Text Generator (T5) - Initialize conversation
t5_tokenizer = T5Tokenizer.from_pretrained('t5-small')
t5_model = T5ForConditionalGeneration.from_pretrained('t5-small')
input_text = "Generate a short story about AI."
input_ids = t5_tokenizer.encode(input_text, return_tensors='pt')
gen_output = t5_model.generate(input_ids, max_length=200)
generated_text = t5_tokenizer.decode(gen_output[0], skip_special_tokens=True)
print("Generated Text:")
print(generated_text)

Downloading:   0%|          | 0.00/773k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.27k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/231M [00:00<?, ?B/s]

  state_dict = torch.load(resolved_archive_file, map_location="cpu")


Generated Text:
Generate a short story about AI.


In [None]:
# 2. Sentiment Analyzer (DistilBERT) - Analyze sentiment of generated text
distilbert_tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
distilbert_model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
sentiment_input_ids = distilbert_tokenizer.encode(generated_text, return_tensors='pt')
sentiment_output = distilbert_model(sentiment_input_ids)
sentiment = torch.argmax(sentiment_output.logits)
print("\nSentiment Analysis:")
print("Sentiment:", "Positive" if sentiment else "Negative")

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/256M [00:00<?, ?B/s]

  state_dict = torch.load(resolved_archive_file, map_location="cpu")
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly 


Sentiment Analysis:
Sentiment: Negative


In [None]:
# 3. Question Answering (BERT) - Ask & answer based on sentiment
bert_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
bert_model = BertForQuestionAnswering.from_pretrained('bert-base-uncased')
if sentiment:
    question = "What is the main character's goal in this positive story?"
else:
    question = "What causes the conflict in this negative story?"

# Encode inputs with attention mask
encoding = bert_tokenizer.encode_plus(
    generated_text + " " + question,
    max_length=512,
    truncation=True,
    return_attention_mask=True,
    return_tensors='pt'
)

qa_input_ids = encoding['input_ids']
qa_attention_mask = encoding['attention_mask']

qa_output = bert_model(qa_input_ids, attention_mask=qa_attention_mask)

# Extract answer
answer_start = torch.argmax(qa_output.start_logits)
answer_end = torch.argmax(qa_output.end_logits) + 1
answer_ids = qa_input_ids[:, answer_start:answer_end]
qa_answer = bert_tokenizer.convert_tokens_to_string(bert_tokenizer.convert_ids_to_tokens(answer_ids[0]))

print("\nQuestion Answering:")
print("Question:", question)
print("Answer:", qa_answer)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForQuestionAnswering: ['cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased a


Question Answering:
Question: What causes the conflict in this negative story?
Answer: what


In [None]:
# 4. Text Summarizer (Bart) - Summarize QA response
bart_tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
bart_model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
summarize_input_ids = bart_tokenizer.encode(qa_answer, return_tensors='pt')
summarize_output = bart_model.generate(summarize_input_ids, max_length=50)
summary = bart_tokenizer.decode(summarize_output[0], skip_special_tokens=True)
print("\nSummary:")
print(summary)

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.51G [00:00<?, ?B/s]


Summary:
what.what. What.what will you do for a living? Share your story with CNN iReport.com. Share your photos and videos of your life with CNN.com's iReport gallery. Send photos of your


In [None]:
# 5. Language Translator (MarianMT) - Translate summary to French
marian_model_name = "Helsinki-NLP/opus-mt-en-fr"
marian_model = MarianMTModel.from_pretrained(marian_model_name)
marian_tokenizer = MarianTokenizer.from_pretrained(marian_model_name)
translate_input_ids = marian_tokenizer.encode(summary, return_tensors='pt')
translate_output = marian_model.generate(translate_input_ids)
translated_summary = marian_tokenizer.decode(translate_output[0], skip_special_tokens=True)
print("\nTranslated Summary (French):")
print(translated_summary)

Downloading:   0%|          | 0.00/1.38k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/287M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/760k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/784k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.28M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/42.0 [00:00<?, ?B/s]


Translated Summary (French):
Qu'est-ce que vous allez faire pour vivre? Partagez votre histoire avec CNN iReport.com. Partagez vos photos et vidéos de votre vie avec la galerie iReport de CNN.com. Envoyez des photos de votre


In [None]:
# 6. Chatbot (DialoGPT) - Respond to translated summary
from transformers import AutoTokenizer, AutoModelForCausalLM  # Update the import

dialogpt_tokenizer = AutoTokenizer.from_pretrained('microsoft/DialoGPT-small')
dialogpt_model = AutoModelForCausalLM.from_pretrained('microsoft/DialoGPT-small')
chat_input_ids = dialogpt_tokenizer.encode(translated_summary, return_tensors='pt')
chat_output = dialogpt_model.generate(chat_input_ids, max_length=100)
chat_response = dialogpt_tokenizer.decode(chat_output[0], skip_special_tokens=True)
print("\nChatbot Response:")
print(chat_response)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Downloading:   0%|          | 0.00/335M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Chatbot Response:
Qu'est-ce que vous allez faire pour vivre? Partagez votre histoire avec CNN iReport.com. Partagez vos photos et vidéos de votre vie avec la galerie iReport de CNN.com. Envoyez des photos de votre.
