# **Hugging Face**


Auther: Tassawar Abbas\
Email: abbas829@gmail.com\
Kaggle: https://www.kaggle.com/abbas829\
gitub: https://www.github.com/abbas829


# Introducing Hugging Face: Revolutionizing AI Development

In recent years, Hugging Face has emerged as a game-changer in the field of artificial intelligence (AI) development. With its user-friendly interfaces, extensive libraries, and state-of-the-art models, Hugging Face has democratized AI and accelerated innovation in natural language processing (NLP) and beyond. Let's delve into what Hugging Face is all about and how it's transforming the AI landscape.

## What is Hugging Face?

Hugging Face is an AI research organization and a leading provider of open-source libraries and tools for NLP. Founded in 2016, Hugging Face has quickly gained popularity among developers, researchers, and industry professionals for its contributions to the AI community. The company is committed to advancing AI research and democratizing access to cutting-edge models and technologies.

## Key Features

### 1. Transformers Library

At the heart of Hugging Face is the Transformers library, a comprehensive collection of pre-trained models for NLP tasks such as text classification, sentiment analysis, machine translation, and more. These models, ranging from small to large architectures, are trained on massive datasets and fine-tuned for various downstream tasks, allowing developers to leverage state-of-the-art performance with minimal effort.

### 2. Model Hub

Hugging Face provides a centralized Model Hub, where users can discover, share, and deploy pre-trained models for a wide range of NLP tasks. The Model Hub hosts thousands of models contributed by the community, including models developed by Hugging Face and models from leading research institutions and industry partners. This extensive repository enables researchers and practitioners to access a diverse array of models and experiment with different architectures and capabilities.

### 3. 🤗 Accelerated Inference

Hugging Face offers 🤗 Accelerated Inference, a cloud-based service that enables fast and scalable deployment of AI models in production. With 🤗 Accelerated Inference, users can seamlessly deploy models for inference, monitor performance, and scale resources as needed, streamlining the deployment process and reducing time-to-market for AI applications.

## Getting Started with Hugging Face

Getting started with Hugging Face is easy and intuitive. Whether you're a seasoned AI researcher or a novice developer, Hugging Face provides comprehensive documentation, tutorials, and examples to help you get up and running quickly. Here's a simple example of using Hugging Face's Transformers library to perform sentiment analysis:

```python
from transformers import pipeline

# Load sentiment analysis pipeline
classifier = pipeline("sentiment-analysis")

# Analyze sentiment of text
result = classifier("I love Hugging Face!")
print(result)



# Brief Overview of Transformers

Transformers represent a groundbreaking architecture in the field of natural language processing (NLP) and have significantly advanced the state-of-the-art in various NLP tasks. Originally introduced by Vaswani et al. in the paper "Attention Is All You Need" in 2017, transformers have since become the foundation for many cutting-edge NLP models, including BERT, GPT, RoBERTa, and many others.

## Key Features

### Self-Attention Mechanism

Transformers rely on a self-attention mechanism, allowing them to weigh the importance of different words in a sentence based on their contextual relevance. This mechanism enables transformers to capture long-range dependencies and relationships within the input text more effectively compared to traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs).

### Multi-Head Attention

In addition to self-attention, transformers employ multi-head attention, which enhances the model's capacity to focus on different parts of the input text simultaneously. By attending to multiple representation subspaces, multi-head attention enables transformers to capture diverse linguistic patterns and semantic information, leading to more robust and expressive representations.

### Positional Encoding

To preserve the sequential order of words in a sentence, transformers incorporate positional encoding, which provides the model with information about the relative positions of words within the input sequence. Positional encoding enables transformers to account for word order and sequential relationships, essential for tasks such as language modeling and sequence generation.

### Feedforward Neural Networks

Transformers include feedforward neural networks (FFNs) as part of their architecture, allowing them to capture complex non-linear relationships and perform various transformations on input embeddings. FFNs consist of multiple layers of fully connected neural units, enabling transformers to learn hierarchical representations and extract high-level features from the input text.

## Applications

Transformers have been applied to a wide range of NLP tasks, including:

- Text classification
- Sentiment analysis
- Named entity recognition
- Machine translation
- Question answering
- Summarization
- Text generation

## Conclusion

Transformers have revolutionized the field of NLP and continue to drive advancements in AI research and applications. With their ability to model long-range dependencies, capture contextual information, and generate coherent text, transformers have become indispensable tools for natural language understanding and generation tasks, paving the way for more sophisticated and intelligent AI systems.



In [7]:
# Installing the required modules
# !pip install transformers

In [8]:
# Installing the required modules
# !pip install datasets

## **Text Classification**

We can classify the text using the Hugging face prebuit pipelines
we will extensively use **pipline()**


# Brief Overview of Pipelines

Pipelines in the context of machine learning refer to a sequence of data processing components that are chained together to automate a workflow. Pipelines streamline the process of model development, deployment, and evaluation by encapsulating multiple steps into a single entity. In the context of libraries such as scikit-learn and Hugging Face's Transformers, pipelines are commonly used to chain together data preprocessing, model training, and inference steps.

## Key Features

### Modularity

Pipelines allow for the modularization of machine learning workflows, enabling developers to break down complex tasks into smaller, more manageable components. Each component in the pipeline performs a specific task, such as data preprocessing, feature extraction, model training, or inference, making it easier to debug, maintain, and scale the workflow.

### Automation

By automating the sequence of steps required for model development and deployment, pipelines help reduce manual intervention and minimize human error. Once configured, pipelines can execute the entire workflow with a single command, from data preprocessing to model evaluation, speeding up the development cycle and improving productivity.

### Flexibility

Pipelines provide flexibility in designing and customizing machine learning workflows to suit specific requirements and use cases. Developers can easily swap out components or add new ones to adapt the pipeline to changing data sources, model architectures, or business objectives, without having to restructure the entire workflow.

### Reproducibility

Pipelines promote reproducibility in machine learning experiments by encapsulating the entire workflow, including data preprocessing, model training, and evaluation, into a single entity. This allows researchers and practitioners to reproduce experimental results with ease and compare different models or configurations systematically.

## Applications

Pipelines find applications across various domains and use cases in machine learning, including:

- Text classification
- Sentiment analysis
- Image classification
- Object detection
- Time series forecasting
- Recommender systems
- Natural language processing

## Conclusion

Pipelines play a crucial role in automating and streamlining machine learning workflows, from data preprocessing to model deployment. By encapsulating multiple steps into a single entity, pipelines simplify the development process, improve productivity, and enhance reproducibility in machine learning experiments, making them indispensable tools for researchers, practitioners, and businesses alike.



## **Sentiment Analysis**

In [2]:
import transformers
from transformers import pipeline

pipe = pipeline("text-classification")
pipe("This movie is very boring")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.





To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development





All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


[{'label': 'NEGATIVE', 'score': 0.9998028874397278}]

In [3]:
import transformers
from transformers import pipeline

pipe = pipeline(model="roberta-large-mnli")
pipe("I do not like this movie")

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.


[{'label': 'NEUTRAL', 'score': 0.7139405012130737}]

**We can also pass the string in the form of list**

In [4]:
# pipeline for text classification

pipe = pipeline("sentiment-analysis")
pipe(["This restaurant is awesome", "This restaurant is awful"])

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


[{'label': 'POSITIVE', 'score': 0.9998743534088135},
 {'label': 'NEGATIVE', 'score': 0.9996669292449951}]

## **Text Summrization**

we can summarize the text using the hugging face pipline
for that we need to pass the pipeline the parameter as "summarization"

In [5]:
# text to be summarized
input_text = "Start by providing your text input. It could be a sentence or a paragraph.\n Tokenization: The input is tokenized, which means breaking it down into smaller units like words or subwords.\n Tokens are the building blocks for NLP models\n.Model: The tokenized input is passed through a pre-trained NLP model. \n Hugging Face offers a wide range of models for different NLP tasks, such as sentiment analysis,\n  question answering, and text generation.Prediction/Output: The model processes the tokenized input and generates \n a prediction or output specific to the task. For example, if it's sentiment analysis, \n it could predict whether the input is positive or negative"
print(input_text)

Start by providing your text input. It could be a sentence or a paragraph.
 Tokenization: The input is tokenized, which means breaking it down into smaller units like words or subwords.
 Tokens are the building blocks for NLP models
.Model: The tokenized input is passed through a pre-trained NLP model. 
 Hugging Face offers a wide range of models for different NLP tasks, such as sentiment analysis,
  question answering, and text generation.Prediction/Output: The model processes the tokenized input and generates 
 a prediction or output specific to the task. For example, if it's sentiment analysis, 
 it could predict whether the input is positive or negative


In [6]:
# use bart in pytorch
summarizer = pipeline("summarization")
summarizer("Text Input: Start by providing your text input. It could be a sentence or a paragraph.Tokenization: The input is tokenized, which means breaking it down into smaller units like words or subwords. Tokens are the building blocks for NLP models.Model: The tokenized input is passed through a pre-trained NLP model. Hugging Face offers a wide range of models for different NLP tasks, such as sentiment analysis, question answering, and text generation.Prediction/Output: The model processes the tokenized input and generates a prediction or output specific to the task. For example, if it's sentiment analysis, it could predict whether the input is positive or negative.", min_length=5, max_length=30)

No model was supplied, defaulted to google-t5/t5-small and revision d769bba (https://huggingface.co/google-t5/t5-small).
Using a pipeline without specifying a model name and revision in production is not recommended.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


[{'summary_text': 'the input is tokenized, which means breaking it down into smaller units like words or subwords . the model processes the tokenized input'}]

## **Name Entity Relation**

Named Entity Recognition is a natural language processing (NLP) task

that involves identifying and classifying named entities in text into

predefined categories such as person names, organizations, locations, dates, and more.


In [7]:
nlp = pipeline("ner")
example = "My name is Ahmad and i am going to Pakistan"

ner_results = nlp(example)
print(ner_results)


No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
All PyTorch model weights were used when initializing TFBertForTokenClassification.

All the weights of TFBertForTokenClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForTokenClassification for predictions without further training.


[{'entity': 'I-PER', 'score': 0.99899393, 'index': 4, 'word': 'Ahmad', 'start': 11, 'end': 16}, {'entity': 'I-LOC', 'score': 0.9998221, 'index': 10, 'word': 'Pakistan', 'start': 35, 'end': 43}]


## **Image Classification**

To use this pipline download the image in PNG format.

Click on the folder icon on the left side panel.

Upload the image and copy its path in the code

In [9]:
import transformers
from transformers import pipeline
from diffusers import StableDiffusionPipeline
import torch

model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]

image.save("astronaut_rides_horse.png")


ModuleNotFoundError: No module named 'torch'