In [None]:
# Prompt Engineering - used to design and structure prompts or input queries
# provided to AI models particularly large language models(LLMs) like ChatGPT,
# Gemini, etc.

# Prompts serve as primary mechanism for guiding model's behavior, defining task
# and setting context for interaction.

# Design of prompt impacts significantly output quality and relevance.

# Different techniques like zero shot, few shot, Chain of Thought, Prompt Chaining
#  RAG based prompting

In [None]:
# Zero-Shot prompting involves asking model to perform a task without providing
# any prior examples or guidance. It relies primarily on AI's pretrained knowledge
# to interpret and respond

# Prompt - Explain the concept of climate change, its causes and its effects

In [None]:
# Few Shot prompting - include a small number of examples within prompt to
# demonstrate task to the model. helps model to better understand context and
# expected output

# Prompt
# Topic - Universe
# Explanation - The universe is all of space and time, encompassing all existing
# matter and energy, including galaxies, stars, planets, and the fundamental
# forces that govern them.

# Topic - Gravity
# Explanation - Gravity is the force of attraction between all objects with mass.
# It is responsible for keeping you on the ground, holding planets in orbit around
# the sun, and shaping the universe.

# Now Explain Climate Change

# Marketing & Sales-Email Marketing - Subject Lines to avoid SPAM/Junk Classification
# Email Content Generation - Word Limits, Tone(Aggresive/Appologetic) or (Positive/
# Negative)

In [None]:
# Chain of Thought Prompting - encourages model to reason through a problem step
# by step breaking into smaller components to arrive at a logical conclusion

# Used for complex tasks by middle management and top management executives

# Prompt
# US dollar to rupee rate is rupees 90.43 as of today. Explain its effect on
# stock markets of india
# Explain how indian government tackle this fall
# Explain its impact on key sectors or industries

In [None]:
# Prompt Chaining - involves linking multiple prompts together where output of
# one prompt serves as input for next.
# Structured prompt methodology used when reponse based prompting is required

# Prompt
# What is USD INR Rate today?
# Why is it falling?
# What should I do If I import goods from US?

In [None]:
# Retrieval Augmented Generation (RAG) combines external information retrieval
# with generative AI to produce responses based on up to date data or domain specific
# knowledge.

# Prompt - Using Bitcoin prices from yahoo finance, explain how bitcoin is
# going to perform in near future, forecast potential price and flows

In [None]:
# ReAct - combines reasoning and acting prompts encouraging model to think
# critically and act based on its reasoning

In [None]:
# Information Extraction and Summarization are core of prompt engineering.
# Large scale documents are extracted and summarized as needed
# Simple Zero-Shot Prompting is enough

In [None]:
# Large Language Models - Theory
# LLMs use deep learning, neural networks and transformer architecture to perform
# next token prediction at scale.
# LLMs learn statistical relationships in language and allow them to generate
# human like contextually relevant text.

# Neural Networks and Deep Learning - Neural Networks are sequential linear models
# and are mathematical models of human brain neurons. Multiple hidden layers with
# hundreds of neurons per layer process data and extract trends and patterns
# Neural Networks work with only numerical data and prompts must be encoded or
# tokenized into numeric embeddings

# Tokenization and Embeddings - Raw text is broken down into words or tokens,
# Each token is converted into numeric vector or embedding which will be
# multidimensional and these embeddings are used for calculating similarity
# between tokens or words.

# Transformer Architecture - LLMs use transformer architecture which is architecture
# of 2 Neural Networks - Encoder(input encoding & similarity calculation) &
# Decoder (converts numbers back to text or human language). Transformer architecture
# also has self attention mechanism that provides context to prompt and outputs
# parallel processing helps to train on large scale massive data

# Next Token prediction - predict next set of tokens or words based on probability
# in a sequence. By performing this process repeatedly LLMs generate context
# relevant text.

# Different Types of LLMs
# API based models - ChatGPT, Gemini, Claude
# Local Models - LLaMA, Mistral, Qwen , run locally on CPU/GPU environment
# Hybrid Models - LangChain enable app integration frameworks

# ***Summary of this Collab file***

This Colab file serves as a comprehensive introduction to large language models (LLMs) and prompt engineering. It begins by outlining various prompt engineering techniques like Zero-Shot, Few-Shot, Chain of Thought, Prompt Chaining, and Retrieval Augmented Generation (RAG), explaining their purpose in guiding AI models. We then delve into the theoretical underpinnings of LLMs, covering concepts such as deep learning, neural networks, transformer architecture, tokenization, embeddings, and next-token prediction, along with different types of LLMs. Practically, the notebook demonstrates how to load and use a pre-trained GPT-2 model and its tokenizer, illustrating the process of encoding a prompt into numerical inputs, extracting and understanding token embeddings, generating text, and decoding the model's output. Furthermore, it showcases the simplified usage of the transformers library's pipeline function for tasks like sentiment analysis, abstracting away complexities to allow for easy application of pre-trained models.

In [None]:
#!pip install transformers
from transformers import GPT2LMHeadModel, GPT2Tokenizer

In [None]:
model_name="gpt2"
model=GPT2LMHeadModel.from_pretrained(model_name)
tokenizer=GPT2Tokenizer.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
prompt="What is the capital of Japan"

In [None]:
inputs=tokenizer.encode(prompt,return_tensors="pt")
inputs

tensor([[2061,  318,  262, 3139,  286, 2869]])

In the line inputs=tokenizer.encode(prompt,return_tensors="pt"), "pt" stands for PyTorch. It specifies that the tokenizer.encode function should return the encoded input as a PyTorch tensor. If you were using TensorFlow, you might use "tf" instead.

PyTorch is an open-source machine learning framework widely used for deep learning applications. In the context of the transformers library, it provides the underlying tensor operations and infrastructure for building and running models like GPT-2. When you see return_tensors="pt", it instructs the tokenizer to output data structures that are compatible with PyTorch.

In [None]:
import torch

# Create a simple PyTorch tensor
a_pytorch_tensor = torch.tensor([[1, 2, 3], [4, 5, 6]])

print("Example PyTorch Tensor:")
print(a_pytorch_tensor)
print("\nShape of the tensor:", a_pytorch_tensor.shape)
print("Data type of the tensor:", a_pytorch_tensor.dtype)

Example PyTorch Tensor:
tensor([[1, 2, 3],
        [4, 5, 6]])

Shape of the tensor: torch.Size([2, 3])
Data type of the tensor: torch.int64


In this context, embeddings are numerical representations of the words or tokens in your input prompt.

Let's break it down based on the code you've executed:

Tokenization (inputs=tokenizer.encode(prompt,return_tensors="pt")): Your prompt "What is the capital of Japan" was first broken down into tokens, and each token was converted into a numerical ID. The output tensor([[2061, 318, 262, 3139, 286, 2869]]) shows these token IDs. There are 6 tokens in your prompt.

Embedding Generation (with_embeddings = model.transformer.wte(inputs)): The model.transformer.wte (Word Token Embeddings) layer of the GPT-2 model takes these numerical token IDs and converts each one into a dense vector of floating-point numbers. This vector is the embedding for that token.

The Embedding Tensor (print(with_embeddings) and with_embeddings.shape): The output of with_embeddings is a tensor containing these vectors.

Its shape torch.Size([1, 6, 768]) tells us:
1: This is the batch size (you processed one prompt at a time).
6: This corresponds to the 6 tokens in your input prompt.
768: This is the dimensionality of each token's embedding vector. So, each of your 6 tokens is represented by a vector of 768 numbers.
Why are embeddings used?

Embeddings are crucial because they capture the semantic meaning of words and their relationships. Words with similar meanings or contexts will have embedding vectors that are 'closer' to each other in this 768-dimensional space. This allows the model to understand the nuances of language and process text mathematically, which is essential for tasks like next token prediction, sentiment analysis, and machine translation.



In [None]:
with_embeddings = model.transformer.wte(inputs)
print(with_embeddings)

tensor([[[ 0.0090, -0.0207,  0.0244,  ...,  0.2356, -0.0294,  0.1107],
         [-0.0097,  0.0101,  0.0556,  ...,  0.1145, -0.0380, -0.0254],
         [-0.0393,  0.0050,  0.0421,  ..., -0.0477,  0.0670, -0.0471],
         [ 0.0573, -0.0070,  0.0914,  ...,  0.0372, -0.0265,  0.0640],
         [-0.0572,  0.0183,  0.0333,  ..., -0.0689, -0.0931, -0.0714],
         [ 0.0917,  0.0360,  0.0887,  ...,  0.0606, -0.0257,  0.1031]]],
       grad_fn=<EmbeddingBackward0>)


In [None]:
with_embeddings.shape

torch.Size([1, 6, 768])

In [None]:
output=model.generate(inputs,max_length=20,num_return_sequences=1)
output

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


tensor([[ 2061,   318,   262,  3139,   286,  2869,    30,   198,   198,   464,
          3139,   286,  2869,   318, 11790,    13,   632,   318,   262,  3139]])

In [None]:
reponse=tokenizer.decode(output[0],skip_special_tokens=True)
print(reponse)

What is the capital of Japan?

The capital of Japan is Tokyo. It is the capital


In [None]:
from transformers import pipeline

In [None]:
sentiment_pipeline=pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cuda:0


In [None]:
sentiment_pipeline("Though painfully aware of the ingrained societal hierarchy, it's their youthful zest for life, hope for a better tomorrow, and unbreakable bond that keep them going. Their persistence and optimism are tested time and again, but things reach a breaking point when a nationwide lockdown due to Covid-19 separates them from their families and the place they call home.")

[{'label': 'POSITIVE', 'score': 0.9935884475708008}]

The pipeline function from the transformers library is a high-level API designed to make using pre-trained models for various tasks incredibly easy, without needing to worry about the underlying complexities of tokenization, model loading, and output processing.