# Approach A: Extractive Summarization (Simpler)

In [7]:
!pip install gensim



In [13]:
from gensim.summarization import summarize # This import is no longer valid in newer gensim versions

long_text = """
The text summarization project aims to develop a system that can generate concise and coherent summaries of long text documents, capturing the main ideas and key information while preserving the original meaning and context. The project involves preprocessing the text data by tokenization and cleaning, selecting appropriate summarization techniques such as extractive or abstractive methods, and evaluating the generated summaries for coherence and accuracy. The deployed system will enable users to summarize large volumes of text efficiently, facilitating information retrieval, document understanding, and decision-making across various domains such as news aggregation, document summarization, and meeting transcripts analysis.
"""

# Generate summary (e.g., aiming for a summary of about 50 words)
# Use the summarize function directly from the top-level gensim package
from gensim.summarization import summarize
extractive_summary = summarize(long_text, word_count=50)

print("--- Extractive Summary ---")
print(extractive_summary)



--- Extractive Summary ---
The text summarization project aims to develop a system that can generate concise and coherent summaries of long text documents, capturing the main ideas and key information while preserving the original meaning and context.
The project involves preprocessing the text data by tokenization and cleaning, selecting appropriate summarization techniques such as extractive or abstractive methods, and evaluating the generated summaries for coherence and accuracy.


# Approach B: Abstractive Summarization (Advanced but Accessible)

In [14]:
!pip install transformers torch
!pip install transformers tensorflow

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5

In [15]:
from transformers import pipeline

# Load a pre-trained summarization model
# Models to try: "t5-small", "facebook/bart-large-cnn", "google/pegasus-xsum"
summarizer = pipeline("summarization", model="t5-small")

# Use the same long_text from the example above
abstractive_summary = summarizer(long_text, max_length=60, min_length=30, do_sample=False)

print("\n--- Abstractive Summary ---")
print(abstractive_summary[0]['summary_text'])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Device set to use cpu
Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



--- Abstractive Summary ---
text summarization project aims to develop a system that can generate concise and coherent summaries of long text documents . project involves preprocessing the text data by tokenization and cleaning . system will enable users to summarize large volumes of text efficiently .
