# NLP (Natural Language Processing) vs. LLMs (Large Language Models)

**NLP** is a broader field encompassing various techniques and approaches for enabling computers to understand, interpret, and generate human language.
It covers a wide range of tasks, including:
- Text classification
- Sentiment analysis
- Machine translation
- Named entity recognition
- Question answering

**LLMs** are a specific type of NLP model that are trained on massive amounts of text data.
They are capable of generating human-like text, translating languages, writing different kinds of creative content, and answering your questions in an informative way.
LLMs are typically based on deep learning architectures, such as transformers, and are known for their impressive performance on various NLP tasks.

**Key Differences:**
- **Scope:** NLP is a broad field, while LLMs are a specific type of model within NLP.
- **Data:** LLMs are trained on significantly larger datasets compared to traditional NLP models.
- **Capabilities:** LLMs exhibit superior capabilities in text generation, understanding context, and engaging in open-ended conversations compared to traditional NLP models.
- **Applications:** While both NLP and LLMs have numerous applications, LLMs are particularly well-suited for tasks requiring advanced language understanding and generation, such as chatbots, content creation, and code generation.

**In essence, LLMs are a powerful advancement within the field of NLP.**
They leverage deep learning and massive datasets to achieve unprecedented levels of language understanding and generation.

In [1]:
# First, ensure GPU is available
!nvidia-smi

Tue Oct 22 04:38:32 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   69C    P8              11W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [2]:
# Install required packages
!pip install torch transformers datasets evaluate accelerate bitsandbytes sentencepiece
!pip install -q gradio

Collecting datasets
  Downloading datasets-3.0.1-py3-none-any.whl.metadata (20 kB)
Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.44.1-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.17-py310-none-any.whl.metadata (7.2 kB)
INFO: pip is looking at multiple versions of multiprocess to determine which version is compatible with other requirements. This could take a while.
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading datasets-3.0.1-py3-none-any.whl (471 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m471.6/471.6 kB[0m [31m28.6 MB/

In [3]:
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    pipeline,
    AutoModelForSequenceClassification,
    BitsAndBytesConfig
)
import gradio as gr
from datasets import load_dataset
import evaluate
import numpy as np

In [4]:
print("GPU Available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU Name:", torch.cuda.get_device_name(0))

GPU Available: True
GPU Name: Tesla T4


In [5]:
# 1. Basic LLM Inference using smaller models
print("\n1. Basic LLM Inference")

def initialize_model(model_name="facebook/opt-350m"):
    """Initialize model and tokenizer"""
    print(f"Loading {model_name}...")
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map="auto",
        torch_dtype=torch.float16
    )
    return model, tokenizer

def generate_text(prompt, model, tokenizer, max_length=100):
    """Generate text using the model"""
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(
        **inputs,
        max_length=max_length,
        num_return_sequences=1,
        temperature=0.7,
        do_sample=True
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)


1. Basic LLM Inference


In [6]:
# Initialize small model for basic tasks
model, tokenizer = initialize_model()

Loading facebook/opt-350m...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/644 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/663M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [7]:
# Test basic generation
prompt = "Explain what a large language model is:"
print("\nPrompt:", prompt)
print("Generated Response:", generate_text(prompt, model, tokenizer))


Prompt: Explain what a large language model is:
Generated Response: Explain what a large language model is:

A large language model is an unstructured, generic model of a language. It can be useful for a developer to test different types of types and to test their performance.

Example:

The following code is a large language model:

$./configure --no-types

$./configure --no-types

$./configure --no-types

The following code uses a language


In [1]:

!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) Y
Token is valid (permission: read)

In [2]:
# Use a pipeline as a high-level helper
from transformers import pipeline

messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="meta-llama/Llama-2-7b-chat-hf", device=0)
pipe(messages)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacity of 14.75 GiB of which 1.06 MiB is free. Process 448642 has 14.74 GiB memory in use. Of the allocated memory 14.65 GiB is allocated by PyTorch, and 1.43 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

In [3]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [17]:
# 3. Text Classification with Transformer Models
print("\n3. Text Classification Example")

def setup_classification_model(model_name="prajjwal1/bert-tiny"):
    """Setup classification model"""
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    return model, tokenizer

classifier_model, classifier_tokenizer = setup_classification_model()
classifier = pipeline("sentiment-analysis",
                     model=classifier_model,
                     tokenizer=classifier_tokenizer,
                     device=0 if torch.cuda.is_available() else -1)

# Test classification
texts = [
    "I love working with language models!",
    "This task is quite challenging.",
    "The results are impressive and amazing."
]

print("\nSentiment Classification Results:")
for text in texts:
    result = classifier(text)
    print(f"Text: {text}")
    print(f"Sentiment: {result[0]['label']}, Score: {result[0]['score']:.3f}")


3. Text Classification Example


config.json:   0%|          | 0.00/285 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/17.8M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at prajjwal1/bert-tiny and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Sentiment Classification Results:
Text: I love working with language models!
Sentiment: LABEL_1, Score: 0.535
Text: This task is quite challenging.
Sentiment: LABEL_1, Score: 0.510
Text: The results are impressive and amazing.
Sentiment: LABEL_1, Score: 0.522


In [18]:
# 4. Creating a Simple Chatbot Interface
print("\n4. Simple Chatbot Interface")

def chatbot_response(message, history):
    """Generate chatbot response"""
    prompt = f"User: {message}\nAssistant:"
    response = generate_text(prompt, model, tokenizer, max_length=150)
    return response

# Create Gradio interface
iface = gr.ChatInterface(
    chatbot_response,
    title="Simple LLM Chatbot",
    description="A basic chatbot using a small language model"
)

# Launch the interface
iface.launch(share=True)


4. Simple Chatbot Interface




Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://73a34984170df54ec6.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [21]:
# 5. Few-shot Learning Example
print("\n5. Few-shot Learning Example")

few_shot_prompt = """
Question: What is the capital of France?
Answer: The capital of France is Paris.

Question: What is the capital of Japan?
Answer: The capital of Japan is Tokyo.

Question: What is the capital of Italy?
Answer: """

print("Few-shot learning response:")
# Increase max_length to accommodate the prompt length or reduce the prompt length
print(generate_text(few_shot_prompt, model, tokenizer, max_length=100)) # Increased max_length to 100


5. Few-shot Learning Example
Few-shot learning response:

Question: What is the capital of France?
Answer: The capital of France is Paris.

Question: What is the capital of Japan?
Answer: The capital of Japan is Tokyo.

Question: What is the capital of Italy?
Answer:                                            


In [22]:
# 6. Model Evaluation
print("\n6. Model Evaluation")

def evaluate_model(model, tokenizer, test_prompts):
    """Basic model evaluation"""
    results = []
    for prompt in test_prompts:
        response = generate_text(prompt, model, tokenizer)
        results.append({
            'prompt': prompt,
            'response': response,
            'length': len(response.split())
        })
    return results

test_prompts = [
    "Explain how a computer works:",
    "Write a short poem about technology:",
    "List three benefits of artificial intelligence:"
]

evaluation_results = evaluate_model(model, tokenizer, test_prompts)
print("\nModel Evaluation Results:")
for result in evaluation_results:
    print(f"\nPrompt: {result['prompt']}")
    print(f"Response: {result['response']}")
    print(f"Response length: {result['length']} words")



6. Model Evaluation

Model Evaluation Results:

Prompt: Explain how a computer works:
Response: Explain how a computer works:

A computer is a system that performs a set of tasks for different users—usually groups of people—and which is configured to look for the results of these tasks in a computer database. The tasks are performed by computer programs called programs.

A computer program is an acronym for Computer Programmer's Interface. A computer program is a combination of a number of programs and processes that perform a set of tasks in a computer system. The first computer program
Response length: 83 words

Prompt: Write a short poem about technology:
Response: Write a short poem about technology:  "When life gives you lemons, make lemonade."
That's not a good poem, but thank you anyway.
Response length: 22 words

Prompt: List three benefits of artificial intelligence:
Response: List three benefits of artificial intelligence:

1. Increased awareness of human behavior, which is 

In [23]:
print("""
Tips for Working with LLMs in Colab:
1. Always check GPU availability and memory usage
2. Use quantization for larger models
3. Be mindful of token limits and batch sizes
4. Consider using smaller models for experimentation
5. Save and load model checkpoints carefully

Exercises:
1. Try different prompts and parameters
2. Experiment with temperature and max_length
3. Test few-shot learning with different examples
4. Create custom evaluation metrics
5. Try different model architectures
""")


Tips for Working with LLMs in Colab:
1. Always check GPU availability and memory usage
2. Use quantization for larger models
3. Be mindful of token limits and batch sizes
4. Consider using smaller models for experimentation
5. Save and load model checkpoints carefully

Exercises:
1. Try different prompts and parameters
2. Experiment with temperature and max_length
3. Test few-shot learning with different examples
4. Create custom evaluation metrics
5. Try different model architectures

