In [146]:
###Importing a sample text to work on which we would be using to generate our referenced text using NLP techniques

In [98]:
text = """Apple Inc., founded in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne, has grown into a global technology icon. Renowned for its innovation in consumer electronics, software, and services, Apple's impact spans from the revolutionary Macintosh computers to the ubiquitous iPhone, iPad, and Apple Watch. The company's commitment to seamless integration of hardware and software, along with its emphasis on design aesthetics and user experience, has garnered a fiercely loyal customer base worldwide. Beyond its products, Apple has pioneered advancements in digital media with iTunes and App Store, while also leading in sustainability efforts and corporate responsibility. Today, Apple continues to shape the future of technology through its relentless pursuit of excellence and its vision of making technology accessible and intuitive for all.Apple Inc., a cornerstone of Silicon Valley's tech landscape, has become synonymous with innovation and elegance in consumer electronics. Founded by Steve Jobs, Steve Wozniak, and Ronald Wayne, Apple revolutionized personal computing with the Macintosh and later redefined mobile technology with the iPhone, setting new benchmarks for usability and design. The company's ecosystem extends seamlessly across devices with services like iCloud and Apple Music, fostering a unique user experience. Apple's commitment to environmental stewardship and ethical manufacturing practices underscores its global influence, making it not just a leader in technology but also a model for corporate responsibility. With a blend of visionary leadership, cutting-edge technology, and a fiercely loyal customer base, Apple continues to shape the digital landscape and inspire the world with its innovative spirit."""

In [None]:
###Let's calculate the length of our sample text

In [99]:
len(text)

1743

In [None]:
###Installing needed dependencies. Spacy is a popular open-source library used for natural language processing (NLP) tasks in Python.
###It's designed to be fast, efficient, and easy to use, making it suitable for building practical applications and research in NLP. 

In [100]:
#pip install spacy

In [147]:
###We need to convert all words to lower case to avoid any layer of complexity for our model. 
###Spacy provides language-specific stop words for different languages. 
###Stop words are common words (like "the", "is", etc.) that are filtered out during the text processing to focus on more meaningful words. 
###In this case, STOP_WORDS is a set of English stop words provided by Spacy's English language module (spacy.lang.en).

In [None]:
###The string module in Python provides constants that are useful for dealing with strings, including punctuation, 
###which is a string of all ASCII punctuation characters (!"#$%&'()*+,-./:;<=>?@[\]^_{|}~). 
###This import statement brings in the punctuation` string, which can be used for text processing tasks such as removing punctuation marks
###from text.


In [101]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [102]:
nlp = spacy.load('en_core_web_sm')

In [None]:
###This method call loads a specific language model named 'en_core_web_sm' from Spacy's library. Here's what each part means:
###en: Indicates the language (English).
###core: Indicates it's a core model, typically covering basic NLP tasks like tokenization, POS tagging, and parsing.
###web_sm: Denotes a specific variant or size of the model ('sm' stands for small).


In [103]:
doc = nlp(text)

In [None]:
###Refers to the Spacy NLP object that was initialized and loaded with a specific language model ('en_core_web_sm' in this case).
##When we call nlp(text), Spacy applies a series of linguistic annotations and analyses to the input text based on the capabilities
###of the loaded model ('en_core_web_sm').
###This processing pipeline typically includes:
###Tokenization: Splitting the text into individual words or tokens.
###Part-of-Speech Tagging: Assigning grammatical categories (like noun, verb, adjective) to each token.
###Dependency Parsing: Analyzing the syntactic structure and dependencies between words.
###Named Entity Recognition (NER): Identifying entities such as persons, organizations, dates, etc., in the text.
###Sentence Boundary Detection: Segmenting the text into sentences.
###Lemmatization: Reducing words to their base forms.

In [104]:
tokens = [token.text.lower() for token in doc
         if not token.is_stop and
         not token.is_punct and
         token.text !='\n']

In [None]:
###Here we efficiently create a list of lowercase words from a Spacy doc, excluding stop words, punctuation, and newline characters, 
###facilitating further text analysis tasks.

In [105]:
tokens


['apple',
 'inc.',
 'founded',
 '1976',
 'steve',
 'jobs',
 'steve',
 'wozniak',
 'ronald',
 'wayne',
 'grown',
 'global',
 'technology',
 'icon',
 'renowned',
 'innovation',
 'consumer',
 'electronics',
 'software',
 'services',
 'apple',
 'impact',
 'spans',
 'revolutionary',
 'macintosh',
 'computers',
 'ubiquitous',
 'iphone',
 'ipad',
 'apple',
 'watch',
 'company',
 'commitment',
 'seamless',
 'integration',
 'hardware',
 'software',
 'emphasis',
 'design',
 'aesthetics',
 'user',
 'experience',
 'garnered',
 'fiercely',
 'loyal',
 'customer',
 'base',
 'worldwide',
 'products',
 'apple',
 'pioneered',
 'advancements',
 'digital',
 'media',
 'itunes',
 'app',
 'store',
 'leading',
 'sustainability',
 'efforts',
 'corporate',
 'responsibility',
 'today',
 'apple',
 'continues',
 'shape',
 'future',
 'technology',
 'relentless',
 'pursuit',
 'excellence',
 'vision',
 'making',
 'technology',
 'accessible',
 'intuitive',
 'apple',
 'inc.',
 'cornerstone',
 'silicon',
 'valley',
 'te

In [106]:
tokens1 = []
stopwords = list(STOP_WORDS)
allowed_pos = ['ADJ','PROPN','VERB','NOUN']
for token in doc:
    if token.text in stopwords or token.text in punctuation:
        continue
    if token.pos_ in allowed_pos:
        tokens1.append(token.text)

In [None]:
###Initializing  an empty list where filtered tokens will be stored.
###Here we convert Spacy's set of English stop words (STOP_WORDS) into a list. Stop words are common words 
###(like "the", "is", "at", etc.) that are typically filtered out during text processing to focus on more meaningful words.
###['ADJ', 'PROPN', 'VERB', 'NOUN']: Defines a list of allowed part-of-speech (POS) tags (ADJ for adjective, PROPN for proper noun, VERB 
###for verb, NOUN for noun). These are the POS tags that you want to include in tokens1

In [107]:
tokens1

['Apple',
 'Inc.',
 'founded',
 'Steve',
 'Jobs',
 'Steve',
 'Wozniak',
 'Ronald',
 'Wayne',
 'grown',
 'global',
 'technology',
 'icon',
 'Renowned',
 'innovation',
 'consumer',
 'electronics',
 'software',
 'services',
 'Apple',
 'impact',
 'spans',
 'revolutionary',
 'Macintosh',
 'computers',
 'ubiquitous',
 'iPhone',
 'iPad',
 'Apple',
 'Watch',
 'company',
 'commitment',
 'seamless',
 'integration',
 'hardware',
 'software',
 'emphasis',
 'design',
 'aesthetics',
 'user',
 'experience',
 'garnered',
 'loyal',
 'customer',
 'base',
 'products',
 'Apple',
 'pioneered',
 'advancements',
 'digital',
 'media',
 'iTunes',
 'App',
 'Store',
 'leading',
 'sustainability',
 'efforts',
 'corporate',
 'responsibility',
 'Today',
 'Apple',
 'continues',
 'shape',
 'future',
 'technology',
 'relentless',
 'pursuit',
 'excellence',
 'vision',
 'making',
 'technology',
 'accessible',
 'intuitive',
 'Apple',
 'Inc.',
 'cornerstone',
 'Silicon',
 'Valley',
 'tech',
 'landscape',
 'synonymous',
 '

In [108]:
from collections import Counter

In [109]:
word_freq = Counter(tokens)

In [None]:
###The code snippet utilizes Python's Counter class from the collections module to analyze word frequencies in a list of tokens (tokens).
###The goal is to count how often each unique word (token) appears in the tokens list, which typically represents processed text data.
###Counter provides an efficient way to perform frequency analysis on collections of data, such as lists of tokens in text processing.
###It allows quick access to counts of specific elements and supports operations typical of dictionaries.

In [110]:
word_freq

Counter({'apple': 10,
         'technology': 6,
         'steve': 4,
         'inc.': 2,
         'founded': 2,
         'jobs': 2,
         'wozniak': 2,
         'ronald': 2,
         'wayne': 2,
         'global': 2,
         'innovation': 2,
         'consumer': 2,
         'electronics': 2,
         'software': 2,
         'services': 2,
         'macintosh': 2,
         'iphone': 2,
         'company': 2,
         'commitment': 2,
         'design': 2,
         'user': 2,
         'experience': 2,
         'fiercely': 2,
         'loyal': 2,
         'customer': 2,
         'base': 2,
         'digital': 2,
         'corporate': 2,
         'responsibility': 2,
         'continues': 2,
         'shape': 2,
         'making': 2,
         'landscape': 2,
         '1976': 1,
         'grown': 1,
         'icon': 1,
         'renowned': 1,
         'impact': 1,
         'spans': 1,
         'revolutionary': 1,
         'computers': 1,
         'ubiquitous': 1,
         'ipad': 1,
   

In [111]:
max_freq = max(word_freq.values())

In [None]:
###The goal of max_freq is to find the maximum frequency (count) of any word in the word_freq dictionary.

In [112]:
max_freq

10

In [113]:
for word in word_freq.keys():
    word_freq[word] = word_freq[word]/max_freq

In [None]:
###Here we iterate through each word (key) in the word_freq dictionary (or Counter object), which contains word frequencies.
###Divide each word's frequency (word_freq[word]) by the maximum frequency (max_freq) found earlier using max(word_freq.values()).
###This operation scales down all word frequencies proportionally to their maximum occurrence in the dataset.

In [None]:
###Here we normalize word frequencies in a Counter object by dividing each frequency by the maximum frequency found, facilitating
####standardized comparisons and analyses of word usage patterns within text.

In [114]:
word_freq

Counter({'apple': 1.0,
         'technology': 0.6,
         'steve': 0.4,
         'inc.': 0.2,
         'founded': 0.2,
         'jobs': 0.2,
         'wozniak': 0.2,
         'ronald': 0.2,
         'wayne': 0.2,
         'global': 0.2,
         'innovation': 0.2,
         'consumer': 0.2,
         'electronics': 0.2,
         'software': 0.2,
         'services': 0.2,
         'macintosh': 0.2,
         'iphone': 0.2,
         'company': 0.2,
         'commitment': 0.2,
         'design': 0.2,
         'user': 0.2,
         'experience': 0.2,
         'fiercely': 0.2,
         'loyal': 0.2,
         'customer': 0.2,
         'base': 0.2,
         'digital': 0.2,
         'corporate': 0.2,
         'responsibility': 0.2,
         'continues': 0.2,
         'shape': 0.2,
         'making': 0.2,
         'landscape': 0.2,
         '1976': 0.1,
         'grown': 0.1,
         'icon': 0.1,
         'renowned': 0.1,
         'impact': 0.1,
         'spans': 0.1,
         'revolutionary': 

In [115]:
sent_token = [sent.text for sent in doc.sents]

In [148]:
###Our main purpose is to extract and store each sentence (as a string) from doc into the list sent_token
###Here we leverage Spacy's sentence segmentation capabilities to identify and retrieve sentences from a processed document (doc) and
###Providing a concise and efficient way to collect sentence texts into a list (sent_token).
### It is useful for tasks such as text summarization, sentiment analysis, and context analysis where sentence-level information is required.

In [116]:
sent_token

['Apple Inc., founded in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne, has grown into a global technology icon.',
 "Renowned for its innovation in consumer electronics, software, and services, Apple's impact spans from the revolutionary Macintosh computers to the ubiquitous iPhone, iPad, and Apple Watch.",
 "The company's commitment to seamless integration of hardware and software, along with its emphasis on design aesthetics and user experience, has garnered a fiercely loyal customer base worldwide.",
 'Beyond its products, Apple has pioneered advancements in digital media with iTunes and App Store, while also leading in sustainability efforts and corporate responsibility.',
 'Today, Apple continues to shape the future of technology through its relentless pursuit of excellence and its vision of making technology accessible and intuitive for all.',
 "Apple Inc., a cornerstone of Silicon Valley's tech landscape, has become synonymous with innovation and elegance in consumer elect

In [117]:
sent_score = {}
for sent in sent_token:
    for word in sent.split():
        if word.lower() in word_freq.keys():
            if sent not in sent_score.keys():
                sent_score[sent] = word_freq[word]
            else:
                sent_score[sent] += word_freq[word]
        print(word)

Apple
Inc.,
founded
in
1976
by
Steve
Jobs,
Steve
Wozniak,
and
Ronald
Wayne,
has
grown
into
a
global
technology
icon.
Renowned
for
its
innovation
in
consumer
electronics,
software,
and
services,
Apple's
impact
spans
from
the
revolutionary
Macintosh
computers
to
the
ubiquitous
iPhone,
iPad,
and
Apple
Watch.
The
company's
commitment
to
seamless
integration
of
hardware
and
software,
along
with
its
emphasis
on
design
aesthetics
and
user
experience,
has
garnered
a
fiercely
loyal
customer
base
worldwide.
Beyond
its
products,
Apple
has
pioneered
advancements
in
digital
media
with
iTunes
and
App
Store,
while
also
leading
in
sustainability
efforts
and
corporate
responsibility.
Today,
Apple
continues
to
shape
the
future
of
technology
through
its
relentless
pursuit
of
excellence
and
its
vision
of
making
technology
accessible
and
intuitive
for
all.
Apple
Inc.,
a
cornerstone
of
Silicon
Valley's
tech
landscape,
has
become
synonymous
with
innovation
and
elegance
in
consumer
electronics.
Founded
by
Ste

In [None]:
###Checks if the lowercase version of word exists in word_freq, ensuring it's a word of interest based on previous frequency analysis.
###If sent (sentence) is not yet a key in sent_score, it initializes sent_score[sent] with word_freq[word].
###If sent already exists in sent_score, it adds word_freq[word] to sent_score[sent].
###This step accumulates the scores based on the frequency of relevant words in each sentence.

In [118]:
sent_score

{'Apple Inc., founded in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne, has grown into a global technology icon.': 1.2000000000000002,
 "Renowned for its innovation in consumer electronics, software, and services, Apple's impact spans from the revolutionary Macintosh computers to the ubiquitous iPhone, iPad, and Apple Watch.": 0.8999999999999999,
 "The company's commitment to seamless integration of hardware and software, along with its emphasis on design aesthetics and user experience, has garnered a fiercely loyal customer base worldwide.": 2.0,
 'Beyond its products, Apple has pioneered advancements in digital media with iTunes and App Store, while also leading in sustainability efforts and corporate responsibility.': 1.0,
 'Today, Apple continues to shape the future of technology through its relentless pursuit of excellence and its vision of making technology accessible and intuitive for all.': 2.5000000000000004,
 "Apple Inc., a cornerstone of Silicon Valley's tech landscape

In [119]:
#pip install pandas

In [120]:
import pandas as pd

In [121]:
pd.DataFrame(list(sent_score.items()),columns = ['Sentence','Score'])

Unnamed: 0,Sentence,Score
0,"Apple Inc., founded in 1976 by Steve Jobs, Ste...",1.2
1,Renowned for its innovation in consumer electr...,0.9
2,The company's commitment to seamless integrati...,2.0
3,"Beyond its products, Apple has pioneered advan...",1.0
4,"Today, Apple continues to shape the future of ...",2.5
5,"Apple Inc., a cornerstone of Silicon Valley's ...",0.8
6,"Founded by Steve Jobs, Steve Wozniak, and Rona...",1.6
7,The company's ecosystem extends seamlessly acr...,1.1
8,Apple's commitment to environmental stewardshi...,2.2
9,"With a blend of visionary leadership, cutting-...",1.9


In [122]:
from heapq import nlargest

In [123]:
num_sentences = 3
join_sent = nlargest(num_sentences, sent_score, key = sent_score.get)

In [None]:
###Specifies the number of top sentences (num_sentences) that you want to extract based on their scores from sent_score.
###Finding Top Sentences: Uses the nlargest function from the heapq module to retrieve the num_sentences largest elements from sent_score.
###sent_score.get: Specifies that the ranking criterion is based on the values (scores) stored in sent_score. 
###sent_score.get retrieves the value associated with each sentence (sent) in sent_score.

In [124]:
" ".join(join_sent)

"Today, Apple continues to shape the future of technology through its relentless pursuit of excellence and its vision of making technology accessible and intuitive for all. Apple's commitment to environmental stewardship and ethical manufacturing practices underscores its global influence, making it not just a leader in technology but also a model for corporate responsibility. The company's commitment to seamless integration of hardware and software, along with its emphasis on design aesthetics and user experience, has garnered a fiercely loyal customer base worldwide."

In [125]:
#pip install transformers

In [126]:
#pip install tensorflow

In [127]:
#pip install flax

In [128]:
from transformers import pipeline

In [129]:
#pip install --upgrade tensorflow

In [130]:
#pip install tf-keras

In [131]:
from transformers import pipeline, AutoTokenizer, TFAutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("t5-base")
model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-base")

# Example usage with a pipeline
text_generator = pipeline("text2text-generation", model=model, tokenizer=tokenizer)


All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


In [None]:
###FAutoModelForSeq2SeqLM.from_pretrained("t5-base"): Loads a TensorFlow-compatible version (TFAutoModelForSeq2SeqLM) of the T5 model
###("t5-base") for sequence-to-sequence tasks. This model is specifically tuned for tasks like text generation

###Once text_generator is instantiated, you can use it to generate text by passing input prompts or sequences to it.
###The pipeline handles tokenization using tokenizer and generates text based on the model (model).

In [132]:
text = """Apple Inc., founded in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne, has grown into a global technology icon. Renowned for its innovation in consumer electronics, software, and services, Apple's impact spans from the revolutionary Macintosh computers to the ubiquitous iPhone, iPad, and Apple Watch. The company's commitment to seamless integration of hardware and software, along with its emphasis on design aesthetics and user experience, has garnered a fiercely loyal customer base worldwide. Beyond its products, Apple has pioneered advancements in digital media with iTunes and App Store, while also leading in sustainability efforts and corporate responsibility. Today, Apple continues to shape the future of technology through its relentless pursuit of excellence and its vision of making technology accessible and intuitive for all.Apple Inc., a cornerstone of Silicon Valley's tech landscape, has become synonymous with innovation and elegance in consumer electronics. Founded by Steve Jobs, Steve Wozniak, and Ronald Wayne, Apple revolutionized personal computing with the Macintosh and later redefined mobile technology with the iPhone, setting new benchmarks for usability and design. The company's ecosystem extends seamlessly across devices with services like iCloud and Apple Music, fostering a unique user experience. Apple's commitment to environmental stewardship and ethical manufacturing practices underscores its global influence, making it not just a leader in technology but also a model for corporate responsibility. With a blend of visionary leadership, cutting-edge technology, and a fiercely loyal customer base, Apple continues to shape the digital landscape and inspire the world with its innovative spirit."""

In [133]:
summary = text_generator(text, max_length = 100, min_length = 10, do_sample = False)

In [134]:
summary

[{'generated_text': 'Apple Inc. is a global technology icon. a global leader in consumer electronics, software, and services. a global technology icon. a global leader in technology and corporate responsibility. a blend of visionary leadership, cutting-edge technology, and corporate responsibility..'}]

In [None]:
###Creating a GUI for Text Generation 

In [136]:
import tkinter as tk
from tkinter import scrolledtext, messagebox
from transformers import pipeline, AutoTokenizer, TFAutoModelForSeq2SeqLM
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation
from collections import Counter
from heapq import nlargest

In [None]:
###mports the tkinter library and assigns it an alias tk. tkinter is the standard GUI (Graphical User Interface) library for Python.

###scrolledtext: This is a widget in tkinter that provides a scrolling, multiline text area that can be used to display or input large amounts
###of text.
###messagebox: This module provides functions to create and show pop-up message boxes for showing alerts, notifications, or asking for user
###input.

In [137]:
nlp = spacy.load('en_core_web_sm')

In [None]:
###same steps as above but this time for GUI

In [138]:
tokenizer = AutoTokenizer.from_pretrained("t5-base")
model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-base")
text_generator = pipeline("text2text-generation", model=model, tokenizer=tokenizer)

All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


In [139]:
def generate_summary(text):
    doc = nlp(text)
    tokens = [token.text.lower() for token in doc if not token.is_stop and not token.is_punct and token.text != '\n']
    tokens1 = []
    stopwords = list(STOP_WORDS)
    allowed_pos = ['ADJ', 'PROPN', 'VERB', 'NOUN']
    for token in doc:
        if token.text in stopwords or token.text in punctuation:
            continue
        if token.pos_ in allowed_pos:
            tokens1.append(token.text)
    
    word_freq = Counter(tokens)
    max_freq = max(word_freq.values())
    for word in word_freq.keys():
        word_freq[word] = word_freq[word] / max_freq
    
    sent_token = [sent.text for sent in doc.sents]
    sent_score = {}
    for sent in sent_token:
        for word in sent.split():
            if word.lower() in word_freq.keys():
                if sent not in sent_score.keys():
                    sent_score[sent] = word_freq[word]
                else:
                    sent_score[sent] += word_freq[word]
    
    num_sentences = 3  # Number of sentences in the summary
    join_sent = nlargest(num_sentences, sent_score, key=sent_score.get)
    summary_text = " ".join(join_sent)
    
    return summary_text

In [140]:
def summarize_text():
    input_text = text_input.get("1.0", tk.END)
    if input_text.strip() == "":
        messagebox.showerror("Error", "Please enter some text to summarize.")
        return
    
    summary = generate_summary(input_text)
    summary_output.configure(state='normal')
    summary_output.delete("1.0", tk.END)
    summary_output.insert(tk.END, summary)
    summary_output.configure(state='disabled')


In [None]:
###User Interaction: summarize_text facilitates user interaction by fetching user input, validating it, generating a summary based on the 
###input, and displaying the summary in an output widget (summary_output).
###Error Handling: Ensures that users are prompted to enter text if the input area is empty before attempting to generate a summary.
###Text Manipulation: Demonstrates how to manipulate Text widgets (text_input for input and summary_output for output) in a 
###tkinter-based GUI application.


In [141]:
# Create a themed GUI
root = tk.Tk()
root.title("Text Summarization Tool")
root.geometry("600x400")  # Set initial window size

''

In [142]:
# Text Input Box
text_input = scrolledtext.ScrolledText(root, width=60, height=10, wrap=tk.WORD, font=("Helvetica", 12))
text_input.pack(pady=20)

In [143]:
# Generate Summary Button
generate_button = tk.Button(root, text="Generate Summary", command=summarize_text, font=("Helvetica", 14), bg="#4CAF50", fg="white", relief=tk.RAISED)
generate_button.pack(pady=10)

In [144]:
# Summary Output Box
summary_output = scrolledtext.ScrolledText(root, width=60, height=5, wrap=tk.WORD, font=("Helvetica", 12))
summary_output.pack(pady=20)
summary_output.configure(state='disabled')

In [145]:
root.mainloop()