# STEPS FOR GENERATING TRANSCRIPTS INSIGHTS

1. Preprocessing: Clean and preprocess the transcript data, including removing any irrelevant information, converting to lowercase, and removing punctuation and stop words.

2. Sentence segmentation: Divide the transcript data into sentences using Natural Language Processing (NLP) techniques such as sentence tokenization.

3. Topic modeling: Use NLP techniques such as Latent Dirichlet Allocation (LDA) or Non-negative Matrix Factorization (NMF) to identify the topics discussed in the meeting.

4. Keyword extraction: Extract important keywords or phrases from the transcript data that summarize the main points discussed in each topic.

5. Sentiment analysis: Analyze the sentiment of each sentence in the transcript data to determine the overall tone of the meeting.

6. Action item extraction: Extract any action items or decisions made during the meeting, such as tasks assigned to specific individuals or deadlines.

7. Organizing and summarizing: Organize the extracted information into a concise, readable format, such as bullet points or a table, and summarize the main points discussed in each topic.

8. Writing the minutes: Write the minutes of the meeting, including the date, time, attendees, topics discussed, actions taken, and any other relevant information.


In [1]:
##Important Libraries
import nltk
import re
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.sentiment import SentimentIntensityAnalyzer
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.decomposition import NMF, LatentDirichletAllocation
import warnings
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\p_adi\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [2]:
transcript_text = ['PREOPERATIVE DIAGNOSIS: , Morbid obesity.,POSTOPERATIVE DIAGNOSIS:  ,Morbid obesity.,PROCEDURE: , Laparoscopic antecolic antegastric Roux-en-Y gastric bypass with EEA anastomosis.,ANESTHESIA: , General with endotracheal intubation.,INDICATION FOR PROCEDURE: , This is a 30-year-old female, who has been overweight for many years.  She has tried many different diets, but is unsuccessful.  She has been to our Bariatric Surgery Seminar, received some handouts, and signed the consent.  The risks and benefits of the procedure have been explained to the patient.,PROCEDURE IN DETAIL:  ,The patient was taken to the operating room and placed supine on the operating room table.  All pressure points were carefully padded.  She was given general anesthesia with endotracheal intubation.  SCD stockings were placed on both legs.  Foley catheter was placed for bladder decompression.  The abdomen was then prepped and draped in standard sterile surgical fashion.  Marcaine was then injected through umbilicus.  A small incision was made.  A Veress needle was introduced into the abdomen.  CO2 insufflation was done to a maximum pressure of 15 mmHg.  A 12-mm VersaStep port was placed through the umbilicus.  I then placed a 5-mm port just anterior to the midaxillary line and just subcostal on the right side.  I placed another 5-mm port in the midclavicular line just subcostal on the right side, a few centimeters below and medial to that, I placed a 12-mm VersaStep port.  On the left side, just anterior to the midaxillary line and just subcostal, I placed a 5-mm port.  A few centimeters below and medial to that, I placed a 15-mm port.  I began by lifting up the omentum and identifying the transverse colon and lifting that up and thereby identifying my ligament of Treitz.  I ran the small bowel down approximately 40 cm and divided the small bowel with a white load GIA stapler.  I then divided the mesentery all the way down to the base of the mesentery with a LigaSure device.  I then ran the distal bowel down, approximately 100 cm, and at 100 cm, I made a hole at the antimesenteric portion of the Roux limb and a hole in the antimesenteric portion of the duodenogastric limb, and I passed a 45 white load stapler and fired a stapler creating a side-to-side anastomosis.  I reapproximated the edges of the defect.  I lifted it up and stapled across it with another white load stapler.  I then closed the mesenteric defect with interrupted Surgidac sutures.  I divided the omentum all the way down to the colon in order to create a passageway for my small bowel to go antecolic.  I then put the patient in reverse Trendelenburg.  I placed a liver retractor, identified, and dissected the angle of His.  I then dissected on the lesser curve, approximately 2.5 cm below the gastroesophageal junction, and got into a lesser space.  I fired transversely across the stomach with a 45 blue load stapler.  I then used two fires of the 60 blue load with SeamGuard to go up into my angle of His, thereby creating my gastric pouch.  I then made a hole at the base of the gastric pouch and had Anesthesia remove the bougie and place the OG tube connected to the anvil.  I pulled the anvil into place, and I then opened up my 15-mm port site and passed my EEA stapler.  I passed that in the end of my Roux limb and had the spike come out antimesenteric.  I joined the spike with the anvil and fired a stapler creating an end-to-side anastomosis, then divided across the redundant portion of my Roux limb with a white load GI stapler, and removed it with an Endocatch bag.  I put some additional 2-0 Vicryl sutures in the anastomosis for further security.  I then placed a bowel clamp across the bowel.  I went above and passed an EGD scope into the mouth down to the esophagus and into the gastric pouch.  I distended gastric pouch with air.  There was no air leak seen.  I could pass the scope easily through the anastomosis.  There was no bleeding seen through the scope.  We closed the 15-mm port site with interrupted 0 Vicryl suture utilizing Carter-Thomason.  I copiously irrigated out that incision with about 2 L of saline.  I then closed the skin of all incisions with running Monocryl.  Sponge, instrument, and needle counts were correct at the end of the case.  The patient tolerated the procedure well without any complications.']

In [14]:
transcript_text = " ".join(transcript_text)

In [15]:


def preprocess_transcript(transcript_text):
    # Remove special characters and digits
    transcript_text = re.sub("(\\d|\\W)+", " ", transcript_text)
    
    # Convert to lowercase
    transcript_text = transcript_text.lower()
    
    # Tokenize the words
    words = word_tokenize(transcript_text)
    
    # Remove stop words
    stop_words = set(stopwords.words("english"))
    words = [word for word in words if word not in stop_words]
    words1 = " ".join(words)
    
    return words1

def extract_topics(transcript_text, num_topics=5, model="nmf"):
    preprocessed_transcript = preprocess_transcript(transcript_text)
    
    # Vectorize the transcript text
    vectorizer = TfidfVectorizer()
    X = vectorizer.fit_transform([preprocessed_transcript])
    
    if model == "nmf":
        # Use NMF to extract topics
        nmf = NMF(n_components=num_topics, random_state=1)
        W = nmf.fit_transform(X)
        H = nmf.components_
    else:
        # Use LDA to extract topics
        lda = LatentDirichletAllocation(n_components=num_topics, random_state=1)
        W = lda.fit_transform(X)
        H = lda.components_
    
    # Get the feature names and topic words
    feature_names = vectorizer.get_feature_names()
    topic_words = [
        [
            feature_names[i]
            for i in topic.argsort()[:-10 - 1 :-1]
        ]
        for topic in H
    ]
    
    return topic_words

def extract_keywords(transcript_text, num_keywords=5):
    preprocessed_transcript = preprocess_transcript(transcript_text)
    
    # Vectorize the transcript text
    vectorizer = CountVectorizer()
    X = vectorizer.fit_transform([preprocessed_transcript])
    
    # Get the feature names and keyword counts
    feature_names = vectorizer.get_feature_names()
    keyword_counts = X.toarray().sum(axis=0)
    
    # Get the top keywords
    keywords = [
        feature_names[i]
        for i in keyword_counts.argsort()[:-num_keywords - 1 :-1]
    ]
    
    return keywords




In [16]:
word1 = preprocess_transcript(transcript_text)
word1

'preoperative diagnosis morbid obesity postoperative diagnosis morbid obesity procedure laparoscopic antecolic antegastric roux en gastric bypass eea anastomosis anesthesia general endotracheal intubation indication procedure year old female overweight many years tried many different diets unsuccessful bariatric surgery seminar received handouts signed consent risks benefits procedure explained patient procedure detail patient taken operating room placed supine operating room table pressure points carefully padded given general anesthesia endotracheal intubation scd stockings placed legs foley catheter placed bladder decompression abdomen prepped draped standard sterile surgical fashion marcaine injected umbilicus small incision made veress needle introduced abdomen co insufflation done maximum pressure mmhg mm versastep port placed umbilicus placed mm port anterior midaxillary line subcostal right side placed another mm port midclavicular line subcostal right side centimeters medial p

In [18]:
t_words = extract_topics(word1)
t_words

  return np.sqrt(res * 2)


[['placed',
  'port',
  'stapler',
  'mm',
  'side',
  'procedure',
  'load',
  'gastric',
  'anastomosis',
  'bowel'],
 ['cm',
  'anesthesia',
  'mesentery',
  'spike',
  'approximately',
  'seen',
  'abdomen',
  'go',
  'divided',
  'medial'],
 ['pressure',
  'umbilicus',
  'junction',
  'pouch',
  'co',
  'ran',
  'defect',
  'lifting',
  'across',
  'morbid'],
 ['fired',
  'across',
  'bowel',
  'limb',
  'omentum',
  'midaxillary',
  'anterior',
  'lifting',
  'bypass',
  'stockings'],
 ['anvil',
  'irrigated',
  'redundant',
  'mm',
  'sutures',
  'interrupted',
  'medial',
  'reverse',
  'surgical',
  'preoperative']]

In [19]:
k_word = extract_keywords(word1)
k_word

['placed', 'mm', 'port', 'stapler', 'load']

In [24]:
s_words = extract_sentiment(word1)
s_words

[{'neg': 0.082, 'neu': 0.866, 'pos': 0.052, 'compound': -0.9231}]

In [26]:
e_word = extract_action_items(word1)
e_word

[]

In [29]:
s1 = summarize_topics(t_words, s_words, e_word)
s1

[{'topic': ['placed',
   'port',
   'stapler',
   'mm',
   'side',
   'procedure',
   'load',
   'gastric',
   'anastomosis',
   'bowel'],
  'sentiment': 0.019224999999999992,
  'action_items': []}]

In [23]:
def extract_sentiment(transcript_text):
    # Initialize the sentiment analyzer
    sentiment_analyzer = SentimentIntensityAnalyzer()
    
    # Tokenize the transcript into sentences
    sentences = sent_tokenize(transcript_text)
    
    # Get the sentiment scores for each sentence
    sentiments = [sentiment_analyzer.polarity_scores(sentence) for sentence in sentences]
    
    return sentiments


In [27]:
def extract_action_items(transcript_text):
    # Regular expression pattern to match action items
    pattern = re.compile(r"^\s*(\w+)\s*will\s+(.+)\s+by\s+(.+)$", re.MULTILINE)
    
    # Extract the action items
    action_items = re.findall(pattern, transcript_text)
    
    return action_items

def summarize_topics(topics, sentiments, action_items):
    # Initialize a list to store the summarized information
    summary = []
    
    # Loop through the topics
    for topic, sentiment in zip(topics, sentiments):
        # Get the average sentiment for the topic
        avg_sentiment = sum(sentiment.values()) / len(sentiment.values())
        
        # Filter the action items for the topic
        topic_actions = [item for item in action_items if topic in item[1]]
        
        # Summarize the topic information
        topic_summary = {
            "topic": topic,
            "sentiment": avg_sentiment,
            "action_items": topic_actions
        }
        
        summary.append(topic_summary)
    
    return summary


In [30]:
import re
import torch
from transformers import AutoTokenizer, AutoModel

# Load the BERT model
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

def extract_sentences(transcript_text):
    # Split the transcript text into sentences
    sentences = re.split(r"(?<=[.!?])\s+", transcript_text)
    
    return sentences

def predict_sentiment(sentences):
    # Encode the sentences with BERT
    inputs = tokenizer.batch_encode_plus(sentences, return_tensors="pt", padding=True)
    
    # Pass the encoded sentences through the BERT model
    with torch.no_grad():
        outputs = model(inputs["input_ids"], attention_mask=inputs["attention_mask"])
        
    # Get the sentiment scores for each sentence
    sentiment_scores = torch.sigmoid(outputs[0]).mean(dim=1)
    
    return sentiment_scores

def extract_action_items(transcript_text):
    # Regular expression pattern to match action items
    pattern = re.compile(r"^\s*(\w+)\s*will\s+(.+)\s+by\s+(.+)$", re.MULTILINE)
    
    # Extract the action items
    action_items = re.findall(pattern, transcript_text)
    
    return action_items

def summarize_topics(sentences, sentiment_scores, action_items):
    # Initialize a list to store the summarized information
    summary = []
    
    # Loop through the sentences and sentiment scores
    for sentence, sentiment_score in zip(sentences, sentiment_scores):
        # Filter the action items for the sentence
        sentence_actions = [item for item in action_items if sentence in item[1]]
        
        # Summarize the sentence information
        sentence_summary = {
            "sentence": sentence,
            "sentiment_score": sentiment_score,
            "action_items": sentence_actions
        }
        
        summary.append(sentence_summary)
    
    return summary

def write_minutes(summary, attendees, date, time):
    # Initialize the minutes text
    minutes = "Minutes of the Meeting\n\n"
    
    # Add the date, time, and attendees
    minutes += "Date: {}\n".format(date)
    minutes += "Time: {}\n".format(time)
    minutes += "Attendees: {}\n\n".format(", ".join(attendees))
    
    # Loop through the summarized sentences
    for sentence_summary in summary:
        # Add the sentence
        minutes += "Sentence: {}\n".format(sentence_summary["sentence"])
        
        # Add the sentiment score for the sentence
        minutes += "Sentiment Score: {:.2f}\n".format(sentence_summary["sentiment_score"])
        
        # Add the action items for the
        if topic_summary["action_items"]:
            minutes += "Action Items:\n"
            for item in topic_summary["action_items"]:
                minutes += "- {} will {} by {}\n".format(*item)
        minutes += "\n"
    
    return minutes


Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [31]:
def main():
    # The transcript text
    transcript_text = "John: Good morning everyone.\n\nJane: Hi John.\n\nJohn: Before we start, does anyone have any items to add to the agenda?\n\nJane: I would like to propose that we discuss our sales strategy for the next quarter.\n\nJohn: Great, let's add that to the agenda.\n\nJane: Also, I would like to follow up on the action item from the last meeting. I was supposed to create a report on our expenses and I will have that ready by tomorrow.\n\nJohn: Thanks, Jane. Can we add that to the agenda as well?\n\nJane: Sure, no problem.\n\nJohn: Alright, let's get started. First on the agenda, we have our sales strategy for the next quarter. Does anyone have any ideas on how we can improve our sales?\n\nJane: I was thinking that we could reach out to our current customers and offer them a loyalty discount.\n\nJohn: That's a great idea. Can you add that to our action items and have it ready for the next meeting?\n\nJane: Sure, I will have that done by next week.\n\nJohn: Alright, let's move on to the next item on the agenda. Does anyone have any questions or concerns regarding the expenses report?\n\nJane: No, I don't think so.\n\nJohn: Alright, then let's wrap up the meeting. Does anyone have any last minute items to add?\n\nJane: No, that's all from me.\n\nJohn: Great, then let's adjourn the meeting. Have a good day everyone.\n\nJane: You too.\n"
    
    # The attendees
    attendees = ["John", "Jane"]
    
    # The date and time of the meeting
    date = "09-Feb-2023"
    time = "10:00 AM"
    
    # Extract the sentences from the transcript text
    sentences = extract_sentences(transcript_text)
    
    # Predict the sentiment scores for the sentences
    sentiment_scores = predict_sentiment(sentences)
    
    # Extract the action items from the transcript text
    action_items = extract_action_items(transcript_text)
    
    # Summarize the sentences and sentiment scores
    summary = summarize_topics(sentences, sentiment_scores, action_items)
    
    # Write the minutes of the meeting
    minutes = write_minutes(summary, attendees, date, time)
    
    # Print the minutes
    print(minutes)

# Call the main function
if __name__ == "__main__":
    main()


TypeError: unsupported format string passed to Tensor.__format__

In [24]:
import spacy

def sentiment_analysis(transcripts):
    nlp = spacy.load("en_core_web_sm")
    
    transcript = " ".join(transcripts)
    doc = nlp(transcript)
    
    sentiments = []
    for sent in doc.sents:
        sentiments.append(sent._.sentiment)
    
    average_sentiment = sum(sentiments) / len(sentiments)
    return {'average_sentiment': average_sentiment}


In [25]:
transcripts = ["This is a sample meeting transcript. The meeting was productive and all attendees were positive about the outcome.",               "This is another meeting transcript. The meeting was unproductive and attendees were negative about the outcome."]
transcripts = " ".join(transcripts)
sentiment = sentiment_analysis(transcripts)



AttributeError: [E046] Can't retrieve unregistered extension attribute 'sentiment'. Did you forget to call the `set_extension` method?

In [15]:
from spacy.tokens import Token

Token.set_extension("sentiment", default=0, force=True)


In [26]:
import spacy
from spacy.tokens import Doc

def sentiment_score(doc):
    nlp = spacy.load("en_core_web_sm")
    
    transcript = " ".join(transcripts)
    doc = nlp(transcript)
    
    sentiments = []
    for sent in doc.sents:
        sentiments.append(sent._.sentiment)
    
    average_sentiment = sum(sentiments) / len(sentiments)
    # and return a sentiment score for the given doc
    return sentiment_score

Doc.set_extension("sentiment", getter=sentiment_score, force=True)

nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a positive sentence.")
sentiment = doc._.sentiment
print(sentiment)


AttributeError: [E046] Can't retrieve unregistered extension attribute 'sentiment'. Did you forget to call the `set_extension` method?