# Exploring Opensource LLMs on Huggingface

This notebook demonstrates how to explore and use opensource Large Language Models (LLMs) from Huggingface Hub.

## Table of Contents
1. [Setup and Installation](#setup)
2. [Exploring Available Models](#exploring)
3. [Loading Model Weights](#loading)
4. [Using Models for Inference](#inference)
5. [Working with Different Model Types](#model-types)

## 1. Setup and Installation <a name="setup"></a>

First, let's import the necessary libraries and check our environment.

In [None]:
# Install required packages
# !pip install transformers torch huggingface-hub datasets

In [None]:
import transformers
from transformers import AutoModel, AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLM
from huggingface_hub import HfApi, list_models, model_info
import torch
import warnings
warnings.filterwarnings('ignore')

print(f"Transformers version: {transformers.__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

## 2. Exploring Available Models <a name="exploring"></a>

Huggingface Hub hosts thousands of opensource models. Let's explore how to search and filter them.

In [None]:
# Initialize the Huggingface Hub API
api = HfApi()

# Search for text-generation models
models = list(list_models(
    task="text-generation",
    sort="downloads",
    limit=10
))

print("Top 10 Most Downloaded Text Generation Models:")
print("=" * 80)
for i, model in enumerate(models, 1):
    print(f"{i}. {model.id}")
    print(f"   Downloads: {model.downloads if hasattr(model, 'downloads') else 'N/A'}")
    print(f"   Likes: {model.likes if hasattr(model, 'likes') else 'N/A'}")
    print()

In [None]:
# Get detailed information about a specific model
model_id = "gpt2"
info = model_info(model_id)

print(f"Model Information for '{model_id}':")
print("=" * 80)
print(f"Model ID: {info.id}")
print(f"Task: {info.pipeline_tag}")
print(f"Library: {info.library_name}")
print(f"Downloads: {info.downloads}")
print(f"Likes: {info.likes}")
print(f"Tags: {info.tags[:5] if info.tags else 'N/A'}")

## 3. Loading Model Weights <a name="loading"></a>

Let's learn how to load model weights from Huggingface Hub. We'll use GPT-2 as an example since it's lightweight and popular.

In [None]:
# Method 1: Load model and tokenizer directly
model_name = "gpt2"

print(f"Loading model: {model_name}")
print("This may take a moment as weights are downloaded...")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
print(f"✓ Tokenizer loaded")

# Load model
model = AutoModelForCausalLM.from_pretrained(model_name)
print(f"✓ Model loaded")

# Check model size
num_parameters = sum(p.numel() for p in model.parameters())
print(f"\nModel has {num_parameters:,} parameters")
print(f"Model size: ~{num_parameters * 4 / 1e9:.2f} GB (fp32)")

In [None]:
# Method 2: Load model with specific configurations
from transformers import GPT2Config, GPT2LMHeadModel

# Load configuration
config = GPT2Config.from_pretrained(model_name)
print("Model Configuration:")
print(f"  Vocabulary size: {config.vocab_size}")
print(f"  Hidden size: {config.n_embd}")
print(f"  Number of layers: {config.n_layer}")
print(f"  Number of attention heads: {config.n_head}")
print(f"  Max position embeddings: {config.n_positions}")

In [None]:
# Method 3: Load model with reduced precision to save memory
print("Loading model with half precision (fp16)...")
model_fp16 = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True
)
print(f"✓ Model loaded in fp16")
print(f"Memory footprint reduced by ~50%")

## 4. Using Models for Inference <a name="inference"></a>

Now let's use the loaded model to generate text.

In [None]:
# Simple text generation
prompt = "Artificial intelligence is"

# Tokenize input
inputs = tokenizer(prompt, return_tensors="pt")

# Generate text
print(f"Prompt: '{prompt}'")
print("\nGenerated text:")
print("=" * 80)

outputs = model.generate(
    inputs.input_ids,
    max_length=50,
    num_return_sequences=1,
    temperature=0.7,
    do_sample=True,
    top_k=50,
    top_p=0.95
)

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

In [None]:
# Using the pipeline API (easier interface)
from transformers import pipeline

# Create a text generation pipeline
generator = pipeline('text-generation', model=model_name, tokenizer=model_name)

# Generate multiple variations
prompts = [
    "The future of technology is",
    "In the world of machine learning,",
    "Open source software enables"
]

print("Generated texts using pipeline:")
print("=" * 80)

for prompt in prompts:
    result = generator(prompt, max_length=40, num_return_sequences=1)
    print(f"\nPrompt: {prompt}")
    print(f"Output: {result[0]['generated_text']}")

## 5. Working with Different Model Types <a name="model-types"></a>

Let's explore different types of models available on Huggingface Hub.

In [None]:
# Example 1: BERT for masked language modeling
from transformers import BertTokenizer, BertForMaskedLM

bert_model_name = "bert-base-uncased"
bert_tokenizer = BertTokenizer.from_pretrained(bert_model_name)
bert_model = BertForMaskedLM.from_pretrained(bert_model_name)

# Use BERT to predict masked words
text = "The capital of France is [MASK]."
inputs = bert_tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = bert_model(**inputs)
    predictions = outputs.logits

# Get the predicted token
mask_token_index = (inputs.input_ids == bert_tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]
predicted_token_id = predictions[0, mask_token_index].argmax(axis=-1)
predicted_token = bert_tokenizer.decode(predicted_token_id)

print(f"BERT Model Example:")
print(f"Input: {text}")
print(f"Prediction: {predicted_token}")

In [None]:
# Example 2: Smaller LLM models for resource-constrained environments
print("\nExploring smaller LLM models:")
print("=" * 80)

small_models = [
    "distilgpt2",           # Distilled version of GPT-2 (smaller, faster)
    "gpt2-medium",          # Medium-sized GPT-2
    "EleutherAI/gpt-neo-125M"  # GPT-Neo 125M parameters
]

for model_id in small_models:
    try:
        info = model_info(model_id)
        print(f"\n{model_id}:")
        print(f"  Pipeline: {info.pipeline_tag}")
        print(f"  Downloads: {info.downloads}")
    except Exception as e:
        print(f"\n{model_id}: Could not fetch info")

In [None]:
# Example 3: Using DistilGPT-2 (lighter version)
distil_model_name = "distilgpt2"
distil_tokenizer = AutoTokenizer.from_pretrained(distil_model_name)
distil_model = AutoModelForCausalLM.from_pretrained(distil_model_name)

# Compare model sizes
gpt2_params = sum(p.numel() for p in model.parameters())
distil_params = sum(p.numel() for p in distil_model.parameters())

print(f"\nModel Size Comparison:")
print(f"GPT-2: {gpt2_params:,} parameters")
print(f"DistilGPT-2: {distil_params:,} parameters")
print(f"Reduction: {(1 - distil_params/gpt2_params)*100:.1f}%")

## Best Practices for Model Usage

### Model Selection Criteria:
1. **Task Type**: Choose models based on your specific task (text generation, classification, QA, etc.)
2. **Model Size**: Balance between performance and resource constraints
3. **License**: Check model licenses for commercial use restrictions
4. **Community Support**: Popular models have better documentation and community support

### Performance Optimization:
1. **Use quantization**: Load models in fp16 or int8 for memory savings
2. **Batch processing**: Process multiple inputs together when possible
3. **Caching**: Save downloaded models locally to avoid re-downloading
4. **GPU acceleration**: Use CUDA when available for faster inference

### Exploring More Models:
- Visit [Huggingface Model Hub](https://huggingface.co/models)
- Filter by task, library, language, and license
- Check model cards for detailed information
- Read the documentation and community discussions

In [None]:
# Utility function to inspect model weights
def inspect_model_weights(model, layer_name=None):
    """
    Inspect the weights of a model or specific layer.
    
    Args:
        model: The loaded model
        layer_name: Optional specific layer name to inspect
    """
    print("Model Architecture:")
    print("=" * 80)
    
    total_params = 0
    for name, param in model.named_parameters():
        if layer_name is None or layer_name in name:
            print(f"Layer: {name}")
            print(f"  Shape: {param.shape}")
            print(f"  Parameters: {param.numel():,}")
            print(f"  Dtype: {param.dtype}")
            print(f"  Requires grad: {param.requires_grad}")
            print()
            total_params += param.numel()
    
    print(f"Total parameters: {total_params:,}")
    return total_params

# Example usage
print("\nInspecting first few layers of GPT-2:")
inspect_model_weights(model, layer_name="transformer.wte")

## Conclusion

This notebook demonstrated:
- How to explore and search for models on Huggingface Hub
- Different methods to load model weights
- How to use models for inference
- Working with different model architectures
- Best practices for model usage

### Next Steps:
1. Experiment with different models for your specific use case
2. Fine-tune models on your own data
3. Explore model quantization and optimization techniques
4. Deploy models to production environments

### Resources:
- [Huggingface Documentation](https://huggingface.co/docs)
- [Transformers Library](https://github.com/huggingface/transformers)
- [Model Hub](https://huggingface.co/models)
- [Datasets Hub](https://huggingface.co/datasets)