<h1>An Introduction to HuggingFace Models</h1>

1. Pipeline
2. AutoModel
3. Pre-trained Model

<h4>Pipeline</h4>

Pipelines are ideal for quick application of pre-trained models to specific tasks without extensive coding.<br>
They handle input preprocessing, model execution, and output postprocessing.<br>
Users have limited control over model architectures, hyperparameters, and training procedures, which may restrict flexibility for specific use cases, as They offer limited customization options.

<h4>AutoModel</h4>

Users can fine-tune or customize various aspects of the model, such as architecture, tokenizer, optimizer, learning rate scheduler, and training procedure.<br>
This level of customization allows for fine-grained control over the entire NLP pipeline, from data preprocessing to model training and evaluation.<br>
You have less direct control over the model architecture itself.<br>
Fine-tuning AutoModels is less straightforward compared to working directly with the pre-trained model.<br>
You might need to dig deeper into the underlying code to access fine-tuning capabilities. <br>

<h4>Pre-trained Model</h4>

Pretrained models offer the most flexibility and control over model configuration and training process.<br>
Users can fine-tune pretrained models, adjust hyperparameters, modify architectures, and integrate with custom components to tailor the model to their specific needs.<br>
Pretrained models are suitable for advanced users or researchers who require full control over every aspect of the NLP pipeline.<br>

<h2>Pipeline</h2>

In [None]:
from transformers import pipeline

1. Text Classification:

In [None]:
# Create a text classification pipeline
classifier = pipeline("sentiment-analysis")

# Perform classification on a single text
result = classifier("I love this product!")
print(result)

2. Named Entity Recognition (NER):

In [None]:
# Create a NER pipeline
ner = pipeline("ner")

# Perform NER on a single text
result = ner("Apple is a company founded by Steve Jobs.")
print(result)

3. Text Generation:

In [None]:
# Create a text generation pipeline
generator = pipeline("text-generation")

# Generate text based on a prompt
result = generator("Once upon a time")
print(result)

4. Question Answering:

In [None]:
# Create a question answering pipeline
qa = pipeline("question-answering")

# Provide context and question
context = "The Hugging Face Transformers library was developed by Hugging Face."
question = "Who developed the Hugging Face Transformers library?"
result = qa(question=question, context=context)
print(result)

5. Summarization:

In [None]:
# Create a summarization pipeline
summarizer = pipeline("summarization")

# Summarize a piece of text
text = "The Hugging Face Transformers library provides state-of-the-art natural language processing models."
result = summarizer(text)
print(result)

<h2>AutoModel</h2>

In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer

1. Classification (BERT):

In [None]:
# Load pre-trained BERT model and tokenizer for sequence classification
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Input text
text = "I love natural language processing!"

# Tokenize input text
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Perform inference
outputs = model(**inputs)

# Print the output (predicted label)
predicted_label = outputs.logits.argmax().item()
print("Predicted Label:", predicted_label)

2. Text Generation (GPT-2):

In [None]:
# Load pre-trained GPT-2 model and tokenizer for text generation
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Input text
text = "Once upon a time, there was a king..."

# Tokenize input text
inputs = tokenizer(text, return_tensors="pt", max_length=50, truncation=True, padding=True)

# Perform inference
outputs = model.generate(**inputs)

# Decode generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated Text:", generated_text)

3. Question Answering (BERT):

In [None]:
# Load pre-trained BERT model and tokenizer for question answering
model_name = "bert-large-uncased-whole-word-masking-finetuned-squad"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Input question and context
question = "What is the capital of France?"
context = "The capital of France is Paris."

# Tokenize input question and context
inputs = tokenizer(question, context, return_tensors="pt", truncation=True, padding=True)

# Perform inference
outputs = model(**inputs)

# Print the output (answer)
answer_start = torch.argmax(outputs.start_logits)
answer_end = torch.argmax(outputs.end_logits)
answer = tokenizer.decode(inputs.input_ids[0][answer_start:answer_end+1])
print("Answer:", answer)

4. Named Entity Recognition (RoBERTa):

In [None]:
# Load pre-trained RoBERTa model and tokenizer for named entity recognition
model_name = "dbmdz/bert-large-cased-finetuned-conll03-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Input text
text = "Paris is a beautiful city located in France."

# Tokenize input text
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Perform inference
outputs = model(**inputs)

# Print the output (named entities)
named_entities = tokenizer.decode(inputs.input_ids[0][torch.argmax(outputs.logits)])
print("Named Entities:", named_entities)

5. Sentiment Analysis (DistilBERT):

In [None]:
# Load pre-trained DistilBERT model and tokenizer for sentiment analysis
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Input text
text = "This movie is fantastic! I loved it."

# Tokenize input text
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Perform inference
outputs = model(**inputs)

# Print the output (sentiment)
sentiment = "Positive" if torch.sigmoid(outputs.logits) >= 0.5 else "Negative"
print("Sentiment:", sentiment)

<h2>Pre-trained Models</h2>

BERT (Text Classification):

In [None]:
from transformers import BertTokenizer, BertForSequenceClassification

# Load pre-trained BERT tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

# Tokenize input text
inputs = tokenizer("This is a sample text for classification", return_tensors="pt")

# Perform inference
outputs = model(**inputs)

# Print classification result
predicted_class_index = outputs.logits.argmax().item()
predicted_class = model.config.id2label[predicted_class_index]
print("Predicted class:", predicted_class)

LLAMA (Data Augmentation):

In [None]:
from transformers import LLAMATokenizer, LLAMAForConditionalGeneration

# Load pre-trained LLAMA tokenizer and model
tokenizer = LLAMATokenizer.from_pretrained('salesforce/llama-zeroshot')
model = LLAMAForConditionalGeneration.from_pretrained('salesforce/llama-zeroshot')

# Generate text based on prompt
input_text = "Translate this text into French: Hello, how are you?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)

# Print augmented text
augmented_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Augmented text:", augmented_text)

XLNet (Text Generation):

In [None]:
from transformers import XLNetTokenizer, XLNetLMHeadModel

# Load pre-trained XLNet tokenizer and model
tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetLMHeadModel.from_pretrained('xlnet-base-cased')

# Generate text based on prompt
input_text = "The quick brown fox"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"], max_length=50)

# Print generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated text:", generated_text)

GPT (Text Completion):

In [None]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Load pre-trained GPT-2 tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Generate text completion based on prompt
input_text = "Once upon a time, "
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"], max_length=100, num_return_sequences=1)

# Print text completion
completed_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Completed text:", completed_text)