In [2]:
# INSTALL REQUIRED PACKAGES
!pip -q install transformers torch

# IMPORT LIBRARIES
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    AutoModelForCausalLM,
    pipeline
)
import torch

# -----------------------------
# BERT FOR TEXT CLASSIFICATION
# -----------------------------

# Load pre-trained BERT model and tokenizer (fine-tuned on SST-2 for sentiment analysis)
model_name_bert = "bert-base-uncased"
tokenizer_bert = AutoTokenizer.from_pretrained(model_name_bert)
model_bert = AutoModelForSequenceClassification.from_pretrained("textattack/bert-base-uncased-SST-2")

# Create a classification pipeline
classifier = pipeline("sentiment-analysis", model=model_bert, tokenizer=tokenizer_bert)

# Run inference
result_bert = classifier("I love using Google Colab for deep learning projects.")
print("BERT Classification Result:")
print(result_bert)

# -----------------------------
# GPT-2 FOR TEXT GENERATION
# -----------------------------

# Load GPT-2 model and tokenizer
model_name_gpt = "gpt2"
tokenizer_gpt = AutoTokenizer.from_pretrained(model_name_gpt)
model_gpt = AutoModelForCausalLM.from_pretrained(model_name_gpt)

# Create a text generation pipeline
generator = pipeline("text-generation", model=model_gpt, tokenizer=tokenizer_gpt)

# Generate text
prompt_gpt = "Artificial intelligence is changing the way we"
result_gpt = generator(prompt_gpt, max_length=50, num_return_sequences=1)

# Access the generated text safely
print("\nGPT-2 Generation Result:")
if isinstance(result_gpt, list) and len(result_gpt) > 0:
    print(result_gpt[0].get('generated_text', 'No generated text found'))

# -----------------------------
# RLHF DISCUSSION (Conceptual)
# -----------------------------

print("""
\n
=== Conceptual Overview: Reinforcement Learning from Human Feedback (RLHF) ===

While not demonstrated here due to computational complexity, RLHF involves:

1. Supervised Fine-Tuning (SFT): Training a base model on human-written responses.
2. Reward Model Training: Building a model that scores outputs based on human preferences.
3. Policy Optimization: Updating the language model using reinforcement learning to maximize reward.

This technique has been used by OpenAI to align models like ChatGPT with user expectations in conversational settings.
""")

Device set to use cuda:0


BERT Classification Result:
[{'label': 'LABEL_1', 'score': 0.9978476762771606}]


Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



GPT-2 Generation Result:
Artificial intelligence is changing the way we think about a very large number of things. It's making it a lot more difficult to make good decisions.

As technology goes along that isn't going to be a new phenomenon. It will continue to



=== Conceptual Overview: Reinforcement Learning from Human Feedback (RLHF) ===

While not demonstrated here due to computational complexity, RLHF involves:

1. Supervised Fine-Tuning (SFT): Training a base model on human-written responses.
2. Reward Model Training: Building a model that scores outputs based on human preferences.
3. Policy Optimization: Updating the language model using reinforcement learning to maximize reward.

This technique has been used by OpenAI to align models like ChatGPT with user expectations in conversational settings.

