<a href="https://colab.research.google.com/github/JashVaghasiya/EasyMinutes-MOM/blob/main/ActionItemSummerization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from transformers import BertTokenizer, BertForSequenceClassification
from transformers import BartTokenizer, BartForConditionalGeneration
import torch

In [2]:
# Assuming the use of GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [3]:
# Load the BERT classifier
classifier_folder_path = '/content/drive/MyDrive/my_model'  # Update this path
classifier_tokenizer = BertTokenizer.from_pretrained(classifier_folder_path)
classifier_model = BertForSequenceClassification.from_pretrained(classifier_folder_path).to(device)

In [4]:
# Function to classify sentences as action/non-action
def classify_sentences(sentences):
    classifier_model.eval()
    action_sentences = []
    for sentence in sentences:
        inputs = classifier_tokenizer(sentence, return_tensors="pt", padding=True, truncation=True, max_length=512).to(device)
        with torch.no_grad():
            outputs = classifier_model(**inputs)
        logits = outputs.logits
        predictions = torch.argmax(logits, dim=-1)
        if predictions == 1:  # Assuming '1' denotes action items
            action_sentences.append(sentence)
    return action_sentences

In [5]:
# Load the BART model for summarization
summarizer_model_name = 'facebook/bart-large-cnn'
summarizer_tokenizer = BartTokenizer.from_pretrained(summarizer_model_name)
summarizer_model = BartForConditionalGeneration.from_pretrained(summarizer_model_name).to(device)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

In [6]:
def generate_summary(sentences):
    summarizer_model.eval()
    input_text = " ".join(sentences)
    # Adjust the max_length and min_length to control summary length
    # Increase length_penalty to encourage brevity
    inputs = summarizer_tokenizer(input_text, return_tensors="pt", max_length=1024, truncation=True, padding="max_length").to(device)
    summary_ids = summarizer_model.generate(
        inputs['input_ids'],
        num_beams=5,
        max_length=80,  # Decrease for shorter summaries
        min_length=20,  # Adjust based on the shortest acceptable summary length
        length_penalty=0.5,  # Increase to make the summary shorter
        early_stopping=True
    )
    summary = summarizer_tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

In [13]:

# Example usage
transcript_sentences = [
  "Jane: Good morning, everyone.",
  "Let’s start with the current status of our software and hardware updates for our next-gen universal remote.",
  "Alex, can you kick us off with the software update?",
  "Alex: Sure, Jane.",
  "The team has successfully integrated the latest voice recognition software.",
  "However, we're encountering latency issues when the remote processes multiple commands in quick succession.",
  "We’re considering optimizing the command queue management to resolve this.",
  "Sam: On the hardware side, we’ve upgraded the microcontroller to support the new software features Alex mentioned.",
  "We're also experimenting with a new battery design to extend the remote’s life.",
  "The prototype is promising, but we need to ensure it doesn't significantly increase the production cost.",
  "Mia: Design-wise, we’ve incorporated feedback from the last user testing session.",
  "The new button layout and the tactile feedback have been well-received in preliminary tests.",
  "However, aligning the new design with Sam’s hardware changes is our next challenge.",
  "Eric: From a QA perspective, we need to schedule a comprehensive testing phase for both the software updates and the new hardware components.",
  "It’s crucial to test the voice recognition in various environments to ensure it performs consistently.",
  "Jane: Great updates, team.",
  "Addressing the latency in voice command processing is our top priority.",
  "Alex, let’s brainstorm with your team on the command queue management.",
  "Sam, keep us posted on the battery design cost analysis.",
  "Mia, work with Sam to ensure the design and hardware are aligned.",
  "Eric, start planning the testing phase, focusing on voice recognition performance.",
  "Alex: Will do.",
  "I’ll also look into leveraging cloud processing to reduce the load on the remote’s processor, which might help with the latency.",
  "Sam: Understood.",
  "I’ll coordinate with the suppliers to get an estimate on the new battery costs and report back by next week.",
  "Mia: I’ll schedule a meeting with Sam tomorrow to discuss the design and hardware integration.",
  "Eric: I’m on it.",
  "I’ll prepare a detailed testing plan and reach out to the team for input.",
  "Jane: Excellent.",
  "Let’s reconvene next week to review progress on these action items.",
  "Thank you, everyone, for your hard work.",
  "Meeting adjourned."
]

# Classify and filter sentences
action_sentences = classify_sentences(transcript_sentences)

# Generate summary from action sentences
if action_sentences:
    action_summary = generate_summary(action_sentences)
    print("Summary of Action Items:\n", action_summary)
else:
    print("No action items found.")


Summary of Action Items:
 Alex: We're encountering latency issues when the remote processes multiple commands in quick succession. Sam: On the hardware side, we’ve upgraded the microcontroller to support the new software features. Eric: From a QA perspective, we need to schedule a comprehensive testing phase.


# Similarity Score and ROUGE Score

In [10]:
pip install gensim rouge

Collecting rouge
  Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Installing collected packages: rouge
Successfully installed rouge-1.0.1


In [25]:
from rouge import Rouge
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

def intrinsic_evaluation(original_text, generated_summary):
    rouge = Rouge()
    rouge_scores = rouge.get_scores(generated_summary, original_text, avg=True)

    tfidf_vectorizer = TfidfVectorizer()
    tfidf_matrix = tfidf_vectorizer.fit_transform([original_text, generated_summary])
    cosine_sim = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])[0][0]

    return rouge_scores, cosine_sim

original_text_str = " ".join(transcript_sentences)  # Join the list into a single string if it's not already
generated_summary_str = action_summary  # Make sure this is a string; join if it's a list

# Now call the function with the corrected string inputs
rouge_scores, cosine_sim = intrinsic_evaluation(original_text_str, generated_summary_str)

print("ROUGE Scores:", rouge_scores)
print("Semantic Similarity (Cosine Similarity):", cosine_sim)

ROUGE Scores: {'rouge-1': {'r': 0.18316831683168316, 'p': 1.0, 'f': 0.30962342834544215}, 'rouge-2': {'r': 0.11708860759493671, 'p': 0.9024390243902439, 'f': 0.20728291113213917}, 'rouge-l': {'r': 0.18316831683168316, 'p': 1.0, 'f': 0.30962342834544215}}
Semantic Similarity (Cosine Similarity): 0.6362174562050944


# Diversity Score

In [15]:
def calculate_diversity(summary):
    words = summary.split()
    unique_words = set(words)
    diversity_score = len(unique_words) / len(words) if words else 0
    return diversity_score

# Example usage
diversity_score = calculate_diversity(action_summary)
print(f"Diversity score: {diversity_score}")


Diversity score: 0.8809523809523809


# Readability Score

In [19]:
pip install textstat

Collecting textstat
  Downloading textstat-0.7.3-py3-none-any.whl (105 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.1/105.1 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pyphen (from textstat)
  Downloading pyphen-0.14.0-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pyphen, textstat
Successfully installed pyphen-0.14.0 textstat-0.7.3


In [20]:
import textstat

def calculate_ari(text):
    ari_score = textstat.automated_readability_index(text)
    return ari_score

# Example usage
text = "The quick brown fox jumps over the lazy dog."
ari_score = calculate_ari(text)
print(f"Automated Readability Index (ARI): {ari_score}")


Automated Readability Index (ARI): 1.9


#LSA (Latent Semantic Analysis)
LSA similarity is a simplified example. In practice, you may need to adjust the number of topics (num_topics) in the LSI model for optimal performance depending on the length and complexity of your texts.

In [23]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [24]:
from gensim import corpora
from gensim.models import LsiModel
from gensim.similarities import MatrixSimilarity
from nltk.tokenize import word_tokenize

def lsa_similarity(text1, text2):
    # Assuming text1 and text2 are strings here
    texts = [word_tokenize(text1.lower()), word_tokenize(text2.lower())]

    dictionary = corpora.Dictionary(texts)
    corpus = [dictionary.doc2bow(text) for text in texts]

    lsi = LsiModel(corpus, id2word=dictionary, num_topics=2)
    vec_bow = [dictionary.doc2bow(text) for text in texts]
    vec_lsi = [lsi[vec] for vec in vec_bow]  # convert the query to LSI space

    index = MatrixSimilarity(vec_lsi, num_features=len(dictionary))

    # Assuming text1 is your "document" and text2 is your "summary"
    sims = index[vec_lsi[1]]  # compare the summary to the document
    return sims[0]

# If 'transcript_sentences' is a list of sentences and 'action_summary' is a single summary string:
original_text = " ".join(transcript_sentences)  # Join the list into a single string
summary_text = action_summary  # Assuming 'action_summary' is already a single string

# Example usage
similarity_score = lsa_similarity(original_text, summary_text)
print(f"LSA Similarity Score: {similarity_score}")


LSA Similarity Score: 0.760815441608429


The scores you've received from your intrinsic evaluation of a text summarization task include metrics from ROUGE, cosine similarity (semantic similarity), diversity score, Automated Readability Index (ARI), and LSA (Latent Semantic Analysis) similarity score. Let's go through each metric, explain what they mean, and provide some insight into their quality.

### ROUGE Scores:

- **ROUGE-1 and ROUGE-L Recall (r):** Both are 0.183, indicating that 18.3% of the words in the original text are also found in the summary. High recall means the summary covers more of the original text's content, but this is relatively low, suggesting the summary may be missing a lot of content from the original.
- **ROUGE-1 and ROUGE-L Precision (p):** Both are 1.0, meaning that every word in the summary also appears in the original text. While high precision is generally good, when combined with low recall, it indicates the summary may be too brief.
- **ROUGE-1 and ROUGE-L F1-score (f):** Around 0.310, this is a balance between precision and recall, indicating moderate summary quality.
- **ROUGE-2 Recall (r):** 0.117, showing that 11.7% of bigrams (pair of consecutive words) from the original text are in the summary, which is quite low and suggests the summary may not well represent the original text's structure.
- **ROUGE-2 Precision (p) and F1-score (f):** Precision is high (0.902), but the F1-score is low (0.207), similar to ROUGE-1, indicating the summary is likely too concise.

**Comment:** The ROUGE scores suggest that while the summary is very precise (it only includes content directly from the original), it might be too brief, missing a significant portion of the original content. This can be good for highly focused summaries but might miss important details for comprehensive understanding.

### Semantic Similarity (Cosine Similarity): 0.636

This score measures the cosine similarity between the TF-IDF vectors of the original text and the summary. A score of 0.636 is moderate, indicating some level of semantic overlap but also room for improvement. It suggests the summary captures some of the original text's semantic meaning but not fully.

**Comment:** The summary has a decent semantic overlap with the original text, which is positive, but there's potential for better capturing the full semantic scope of the original.

### Diversity Score: 0.881

This metric measures the ratio of unique words to the total number of words in the summary. A score close to 1 indicates a high diversity of vocabulary in the summary.

**Comment:** The high diversity score is excellent, indicating the summary uses a wide range of vocabulary, which can contribute to a more engaging and informative summary.

### Automated Readability Index (ARI): 1.9

ARI estimates the US grade level needed to comprehend the text. A score of 1.9 suggests that the text should be easily understandable by students in 2nd grade.

**Comment:** The low ARI indicates that the summary is very easy to read, which is generally good, especially for broad audiences. However, if the content is complex and the ARI is too low, it might suggest oversimplification.

### LSA Similarity Score: 0.761

This score measures the similarity in the latent semantic space between the original text and the summary. A score closer to 1 indicates a higher similarity.

**Comment:** This is a good score, suggesting that the summary captures a significant portion of the original text's thematic and conceptual content. It indicates effective summarization in terms of retaining the original message and themes.

**Overall Comment:**
The evaluation presents a mixed picture:
- The summary is precise, easy to read, and diverse in vocabulary, with moderate to good semantic and thematic alignment with the original text.
- However, it might be too concise, missing significant details from the original text, as suggested by the low recall in ROUGE scores.

Improving the summary might involve balancing out the precision and recall better—ensuring that while the summary remains accurate and concise, it also covers more of the original content's critical points and themes.