<img src="https://drive.google.com/uc?export=view&id=1wYSMgJtARFdvTt5g7E20mE4NmwUFUuog" width="200">

[![Build Fast with AI](https://img.shields.io/badge/BuildFastWithAI-GenAI%20Bootcamp-blue?style=for-the-badge&logo=artificial-intelligence)](https://www.buildfastwithai.com/genai-course)
[![EduChain GitHub](https://img.shields.io/github/stars/satvik314/educhain?style=for-the-badge&logo=github&color=gold)](https://github.com/satvik314/educhain)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1103aStq_C1yl6aNZuuqwUfgpalEoIXcd?usp=sharing)
## Master Generative AI in 6 Weeks
**What You'll Learn:**
- Build with Latest LLMs
- Create Custom AI Apps
- Learn from Industry Experts
- Join Innovation Community
Transform your AI ideas into reality through hands-on projects and expert mentorship.
[Start Your Journey](https://www.buildfastwithai.com/genai-course)
*Empowering the Next Generation of AI Innovators

## 🌟 **Hugging Face Transformers**: A Powerful Foundation for Generative AI and NLP
Hugging Face Transformers is an open-source library designed for state-of-the-art Natural Language Processing (NLP) and Generative AI (GenAI) tasks, leveraging the power of transformer-based models. 🚀

## ✨ **Key Features for GenAI and NLP**:

- **Pre-trained Models**: Offers a diverse collection of pre-trained transformer models, including powerful generative architectures like GPT, BART, and T5, alongside models like BERT for other NLP tasks.
  
- **Core GenAI Tasks**: Features out-of-the-box support for core GenAI tasks like text generation, as well as other NLP tasks like classification, translation, summarization, and question answering.
  
- **Tools for Managing LLMs**: Offers tools for managing and optimizing large language models (LLMs) used in advanced GenAI scenarios. 📚

###**Setup and Installation**

In [None]:
!pip install transformers datasets

###**Text Classification**

In [None]:
from transformers import pipeline

classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
result = classifier("This movie was absolutely fantastic!")
print(result)

Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.999874472618103}]


###**Text Generation using GPT-2**

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

input_text = "In a shocking turn of events, scientists discovered"
inputs = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**inputs, max_length=100, temperature=0.7)
print(tokenizer.decode(outputs[0]))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking turn of events, scientists discovered that the bacteria that cause the bacteria to grow in the gut are actually the same bacteria that cause the bacteria to grow in the brain.

The researchers found that the bacteria that cause the bacteria to grow in the brain are actually the same bacteria that cause the bacteria to grow in the brain.

The researchers found that the bacteria that cause the bacteria to grow in the brain are actually the same bacteria that cause the bacteria to grow in the brain.


###**Named Entity Recognition (NER)**

In [None]:
ner = pipeline("ner", model="dslim/bert-base-NER")
result = ner("Hugging Face is a company based in New York City.")
print(result)

Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'entity': 'B-ORG', 'score': 0.88835627, 'index': 1, 'word': 'Hu', 'start': 0, 'end': 2}, {'entity': 'I-ORG', 'score': 0.9137654, 'index': 2, 'word': '##gging', 'start': 2, 'end': 7}, {'entity': 'I-ORG', 'score': 0.9774943, 'index': 3, 'word': 'Face', 'start': 8, 'end': 12}, {'entity': 'B-LOC', 'score': 0.9995097, 'index': 9, 'word': 'New', 'start': 35, 'end': 38}, {'entity': 'I-LOC', 'score': 0.9993987, 'index': 10, 'word': 'York', 'start': 39, 'end': 43}, {'entity': 'I-LOC', 'score': 0.99958295, 'index': 11, 'word': 'City', 'start': 44, 'end': 48}]


###**Translation**

In [None]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-de")
result = translator("Hello world!", max_length=40)
print(result[0]['translation_text'])

Device set to use cpu


Hallo Welt!


###**Question Answering**

In [None]:
qa = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
context = """Hugging Face is a company based in New York..."""
question = "Where is Hugging Face based?"
result = qa(question=question, context=context)
print(result['answer'])

Device set to use cpu


New York


###**Using Pipelines**

In [None]:
from transformers import pipeline

vision_classifier = pipeline("image-classification", model="google/vit-base-patch16-224")
result = vision_classifier("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg")


Fast image processor class <class 'transformers.models.vit.image_processing_vit_fast.ViTImageProcessorFast'> is available for this model. Using slow image processor class. To use the fast image processor class set `use_fast=True`.
Device set to use cpu


###**Fine-Tuning a Model**

In [None]:
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    Trainer,
    TrainingArguments
)
from datasets import load_dataset

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Tokenization function
def tokenize(batch):
    return tokenizer(batch["text"], padding=True, truncation=True)

# Prepare dataset
dataset = load_dataset("imdb")
dataset = dataset.map(tokenize, batched=True)

# Training setup
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=16,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
)
trainer.train()

###**Custom Model & Tokenizer**

In [None]:
from transformers import AutoConfig, AutoModel, AutoTokenizer

config = AutoConfig.from_pretrained("bert-base-uncased")
config.num_labels = 10

model = AutoModel.from_config(config)
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
