Note: Some of the images and code snippets in this note are from datacamp "Introduction to hugging face" course. Some are AI generated.

## Overview on Hugging Face

Hugging Face is a company and open-source community that provides tools, models, and infrastructure for building, training, and deploying machine learning models, especially in Natural Language Processing (NLP), and increasingly in vision, audio, multimodal, and reinforcement learning.

### 🔍 What Hugging Face Offers
#### 1. Transformers Library
The flagship open-source library

Provides pre-trained models for:

Text generation (GPT, T5)

Text classification (BERT, RoBERTa)

Translation, summarization, question answering, etc.

Supports frameworks like PyTorch, TensorFlow, and JAX

#### 2. 🧰 Other Libraries
| Library      | Purpose                                                     |
| ------------ | ----------------------------------------------------------- |
| `datasets`   | Access 1000s of datasets in a standard format               |
| `tokenizers` | Fast, customizable tokenization (in Rust + Python)          |
| `accelerate` | Simplifies multi-GPU and distributed training               |
| `diffusers`  | For generative models like Stable Diffusion                 |
| `trl`        | Reinforcement Learning for fine-tuning LLMs (e.g. PPO, DPO) |
| `peft`       | Parameter-efficient fine-tuning (LoRA, etc.)                |
| `optimum`    | Optimized inference for hardware (ONNX, OpenVINO, etc.)     |


#### 3. Hugging Face Hub (https://huggingface.co)
A central platform to:

Host and share models, datasets, and spaces

Search and download pretrained models

Access model cards (documentation and metadata)

Upload custom models or datasets

Collaborate via Git-based versioning

Think of it like GitHub for AI models.

#### 4. ⚙️ Inference API
Hosted models you can query via simple HTTP requests

Use pre-trained models without local setup

Supports LLMs, image generation, audio tasks, etc.

#### 5. Spaces
Run and share live machine learning demos (like Gradio apps)

Useful for showcasing projects or testing models

Often used to experiment with models visually

#### 6. 🏭 Training & Fine-Tuning Services
Hugging Face offers:

AutoTrain: No-code model training

Hosted notebooks

Custom model training (with AWS or on your infra)

#### 7. 🛡️ Enterprise and Commercial Offerings
Private model hosting (e.g., Hugging Face on AWS SageMaker)

Secure environments for regulated industries

Support for fine-tuning on sensitive data

Managed services

### 🚀 Popular Models on Hugging Face
BERT, RoBERTa (for understanding language)

GPT-2, GPT-J, GPT-NeoX (for generation)

T5, BART (sequence-to-sequence)

Stable Diffusion (image generation)

Whisper (speech recognition)

LLaMA, Mistral, Falcon, Phi (open LLMs)

### Use Cases
| Domain     | Use Case                                           |
| ---------- | -------------------------------------------------- |
| NLP        | Summarization, classification, QA, translation     |
| Vision     | Image classification, object detection, generation |
| Audio      | Speech-to-text, audio classification               |
| Multimodal | Text-to-image, video, speech + text                |
| Research   | Reproducibility, fine-tuning, sharing models       |
| Industry   | Chatbots, search engines, document analysis, etc.  |


![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

## Woriking with models from Hugging Face hub
![image.png](attachment:image.png)

Each model in the hub has a model card that includes key metadata such as a description, the model developer, the license it is available under, the supported languages and tasks, and more.

Pipelines in Hugging Face make it simple to perform tasks like text classification with minimal setup. Here’s how it works: First, we import the pipeline function from transformers. Then, we specify the task and model.

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

![image-5.png](attachment:image-5.png)

## Hugging Face datasets
![image.png](attachment:image.png)

Each dataset has a dataset card which provides more metadata and information about it.

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

It's important to note that most datasets in Hugging Face leverage Apache Arrow, which is a data format that leverages columnar-based storage instead of more traditional row-based data storage.

![image-5.png](attachment:image-5.png)

![image-6.png](attachment:image-6.png)
![image-7.png](attachment:image-7.png)


## Text Classification

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

![image-5.png](attachment:image-5.png)

![image-6.png](attachment:image-6.png)

![image-7.png](attachment:image-7.png)

![image-8.png](attachment:image-8.png)

## Text summarization
Summarization is the process of reducing a large piece of text, such as this one, into a smaller one while retaining key information.

#### Extractive vs. Abstractive
Summarization can either be extractive, where key sentences from the input text are selected to form a summary. This method is efficient and requires fewer resources but often lacks flexibility and may result in less cohesive, easy-to-read summaries. 

On the other hand, abstractive summarization generates new text that captures the main ideas while rephrasing for clarity and readability. Though more flexible, it demands more computational resources and processing.

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

## Auto models and tokenizers

![image.png](attachment:image.png)

#### Auto classes are a flexible way to load models, tokenizers, and other components without manual setup. They offer more control compared to pipelines, making them ideal for advanced tasks. While pipelines are great for quick experimentation, Auto classes let us customize every step.

![image-2.png](attachment:image-2.png)

