# **Introduction to Basic NLP Concepts**
This Jupyter Notebook explores fundamental Natural Language Processing (NLP) techniques using modern deep learning models.
We'll cover:
- **Sentiment Analysis**
- **Text Classification**
- **Text Summarization**
- **Machine Translation**
- **Autoregressive Token Prediction (LLM Concepts)**

We will use **Hugging Face Transformers** and pre-trained models to accomplish these tasks.

In [1]:
# Install Dependencies
!pip install transformers datasets torch sentencepiece

Defaulting to user installation because normal site-packages is not writeable
Collecting transformers
  Downloading transformers-4.49.0-py3-none-any.whl.metadata (44 kB)
Collecting datasets
  Downloading datasets-3.3.2-py3-none-any.whl.metadata (19 kB)
Collecting torch
  Downloading torch-2.6.0-cp311-cp311-manylinux_2_28_aarch64.whl.metadata (28 kB)
Collecting sentencepiece
  Downloading sentencepiece-0.2.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (7.7 kB)
Collecting huggingface-hub<1.0,>=0.26.0 (from transformers)
  Downloading huggingface_hub-0.29.2-py3-none-any.whl.metadata (13 kB)
Collecting regex!=2019.12.17 (from transformers)
  Downloading regex-2024.11.6-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (40 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers)
  Downloading tokenizers-0.21.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (6.7 kB)
Collecting safetensors>=0.4.1 (from transformers)
  Downloa

In [2]:
# Import Required Libraries
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification

  from .autonotebook import tqdm as notebook_tqdm


## **1. Sentiment Analysis**
Sentiment analysis classifies text as positive, neutral, or negative.

In [None]:
# Load a pre-trained sentiment analysis model
sentiment_pipeline = pipeline("sentiment-analysis")

# Example texts
texts = [
    "I love this product! It's amazing.",
    "This is the worst experience I've ever had.",
    "It's okay, not too bad but not great either.",
]

# Analyze sentiment
results = sentiment_pipeline(texts)
for text, result in zip(texts, results):
    print(
        f"Text: {text}\nSentiment: {result['label']} (Confidence: {result['score']:.2f})\n"
    )

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


Text: I love this product! It's amazing.
Sentiment: POSITIVE (Confidence: 1.00)

Text: This is the worst experience I've ever had.
Sentiment: NEGATIVE (Confidence: 1.00)

Text: It's okay, not too bad but not great either.
Sentiment: NEGATIVE (Confidence: 0.99)



## **2. Text Classification**
We classify text into different categories.

In [None]:
classifier = pipeline("zero-shot-classification")

# Example text
text = (
    "The stock market has been very volatile this year due to economic uncertainties."
)
labels = ["Finance", "Sports", "Politics", "Technology"]

# Classify text
result = classifier(text, candidate_labels=labels)
print(result)

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'sequence': 'The stock market has been very volatile this year due to economic uncertainties.', 'labels': ['Finance', 'Sports', 'Technology', 'Politics'], 'scores': [0.6767146587371826, 0.12513014674186707, 0.10809970647096634, 0.09005558490753174]}


## **3. Text Summarization**
Summarization condenses a long text while retaining key information.

In [None]:
summarizer = pipeline("summarization")

long_text = """Hugging Face is a company that provides tools for NLP and machine learning. It has open-source libraries like Transformers, which enables users to easily access state-of-the-art NLP models."""

summary = summarizer(long_text, max_length=50, min_length=20, do_sample=False)
print("Summary:", summary[0]["summary_text"])

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu
Your max_length is set to 50, but your input_length is only 44. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=22)


Summary:  Hugging Face is a company that provides tools for NLP and machine learning . It has open-source libraries like Transformers, which enables users to easily access state-of-the-art NLP models .


## **4. Machine Translation**
Translate text from one language to another.

In [None]:
translator = pipeline("translation_en_to_fr")
text_to_translate = "The weather is very nice today."
translated_text = translator(text_to_translate)
print("French Translation:", translated_text[0]["translation_text"])

No model was supplied, defaulted to google-t5/t5-base and revision a9723ea (https://huggingface.co/google-t5/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


French Translation: Le temps est très agréable aujourd'hui.


## **5. Conclusion**
- **Sentiment Analysis**: Understanding opinions in text.
- **Text Classification**: Categorizing text based on its content.
- **Summarization**: Extracting key points from long text.
- **Machine Translation**: Translating between languages.

Hugging Face's pre-trained models make these tasks **accessible and easy to implement**. 🚀