# Text Summarization using Hugging Face 

Using transformers library, we can access Hugging face's modules and features.
### Using Pipeline 
### Using models

In [1]:
pip install transformers


Collecting transformers
  Downloading transformers-4.46.3-py3-none-any.whl (10.0 MB)
Collecting huggingface-hub<1.0,>=0.23.2
  Downloading huggingface_hub-0.30.2-py3-none-any.whl (481 kB)


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
s3fs 2023.1.0 requires fsspec==2023.1.0, but you have fsspec 2025.3.0 which is incompatible.



Collecting tokenizers<0.21,>=0.20
  Downloading tokenizers-0.20.3-cp38-none-win_amd64.whl (2.4 MB)
Collecting safetensors>=0.4.1
  Downloading safetensors-0.5.3-cp38-abi3-win_amd64.whl (308 kB)
Collecting fsspec>=2023.5.0
  Downloading fsspec-2025.3.0-py3-none-any.whl (193 kB)
Installing collected packages: fsspec, huggingface-hub, tokenizers, safetensors, transformers
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2023.1.0
    Uninstalling fsspec-2023.1.0:
      Successfully uninstalled fsspec-2023.1.0
Successfully installed fsspec-2025.3.0 huggingface-hub-0.30.2 safetensors-0.5.3 tokenizers-0.20.3 transformers-4.46.3


In [2]:
pip install torch

Collecting torch
  Downloading torch-2.4.1-cp38-cp38-win_amd64.whl (199.4 MB)
Installing collected packages: torch
Successfully installed torch-2.4.1
Note: you may need to restart the kernel to use updated packages.


## Using Pipeline

Pipeine function can be used mostly for text related task.
For text classification, use "text-classification", and for text generation, use "text-generation"

In [4]:
from transformers import pipeline
 
# Load the summarization pipeline
summarizer = pipeline("summarization")
 
# Input text to summarize
text = """
Aishwarya Rai Bachchan (pronounced [ɛːʃʋəɾjᵊ ɾɑːj ˈbətːʃən]; née Rai; born 1 November 1973) is an Indian actress who is primarily known for her work in Hindi and Tamil films. Rai won the Miss World 1994 pageant and later established herself as one of the most-popular and influential celebrities in India. She has received numerous accolades for her acting, including two Filmfare Awards. In 2004, Time magazine named her one of the 100 most influential people in the world. In 2009, the Government of India honoured her with the Padma Shri and in 2012, the Government of France awarded her with the Order of Arts and Letters. She has often been called "the most beautiful woman in the world" by segments of the media.

While in college, Rai modelled and appeared in several television commercials, and entered the Miss India pageant, in which she was placed second. She was then crowned Miss World 1994, made her acting debut in Mani Ratnam's 1997 Tamil film Iruvar and had her Hindi film debut in Aur Pyaar Ho Gaya that year. Her first commercial success was the Tamil romantic drama Jeans (1998), which at the time was the most expensive Indian film. She achieved wider success and won two Filmfare Awards for Best Actress for her performances in Sanjay Leela Bhansali's romantic dramas Hum Dil De Chuke Sanam (1999) and Devdas (2002).
"""
 
# Generate a summary
summary = summarizer(text, max_length=70, min_length=25, do_sample=False)
print(summary[0]['summary_text'])

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


 Aishwarya Rai Bachchan won the Miss World 1994 pageant . She is an actress who is primarily known for her work in Hindi and Tamil films . In 2004, Time magazine named her one of the 100 most influential people in the world .


## Using Models and tokenizer

In [10]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

In [12]:
# Load a pre-trained model and tokenizer
model_name = "facebook/bart-large-cnn"  # Example model for summarization
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
 
# Input text to summarize
text = """
Aishwarya Rai Bachchan (pronounced [ɛːʃʋəɾjᵊ ɾɑːj ˈbətːʃən]; née Rai; born 1 November 1973) is an Indian actress who is primarily known for her work in Hindi and Tamil films. Rai won the Miss World 1994 pageant and later established herself as one of the most-popular and influential celebrities in India. She has received numerous accolades for her acting, including two Filmfare Awards. In 2004, Time magazine named her one of the 100 most influential people in the world. In 2009, the Government of India honoured her with the Padma Shri and in 2012, the Government of France awarded her with the Order of Arts and Letters. She has often been called "the most beautiful woman in the world" by segments of the media.

While in college, Rai modelled and appeared in several television commercials, and entered the Miss India pageant, in which she was placed second. She was then crowned Miss World 1994, made her acting debut in Mani Ratnam's 1997 Tamil film Iruvar and had her Hindi film debut in Aur Pyaar Ho Gaya that year. Her first commercial success was the Tamil romantic drama Jeans (1998), which at the time was the most expensive Indian film. She achieved wider success and won two Filmfare Awards for Best Actress for her performances in Sanjay Leela Bhansali's romantic dramas Hum Dil De Chuke Sanam (1999) and Devdas (2002).
"""
 
# Tokenize the input text
inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=512, truncation=True)
 
# Generate the summary
summary_ids = model.generate(inputs, max_length=50, min_length=25, length_penalty=2.0, num_beams=4, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
 
print(summary)

Aishwarya Rai Bachchan is an Indian actress who is primarily known for her work in Hindi and Tamil films. She won the Miss World 1994 pageant and later established herself as one of the most-popular and influential celebrities in India


# Sentiment Analysis using Hugging Face

In [13]:
from transformers import pipeline

In [14]:
sentiment_analyzer = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [15]:
sentiment_analyzer = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
 


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [16]:
text = "I love listening to music!"
result = sentiment_analyzer(text)
print(result)
 

[{'label': 'POSITIVE', 'score': 0.9998131394386292}]


In [17]:
texts = [
    "listening to music is fun!",
    "certain games are boring to play"
]
results = sentiment_analyzer(texts)
for result in results:
    print(result)
 

{'label': 'POSITIVE', 'score': 0.9998772144317627}
{'label': 'NEGATIVE', 'score': 0.9997655749320984}
