<a href="https://colab.research.google.com/github/donmarcos/pharmacopia/blob/main/summarize.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Summarizing medical information like warnings and indications using Spacy**

In [1]:
import spacy
import textwrap
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation
from heapq import nlargest
punctuation += '\n' 
stopwords = list(STOP_WORDS)
reduction_rate = 0.5  

In [2]:
sample = "7 PATIENT COUNSELING INFORMATION Information for Patients Patients should be advised that amoxicillin may be taken every 8 hours or every 12 hours, depending on the dose prescribed. Patients should be counseled that antibacterial drugs, including amoxicillin, should only be used to treat bacterial infections. They do not treat viral infections (e.g., the common cold). When amoxicillin is prescribed to treat a bacterial infection, patients should be told that although it is common to feel better early in the course of therapy, the medication should be taken exactly as directed. Skipping doses or not completing the full course of therapy may: (1) decrease the effectiveness of the immediate treatment, and (2) increase the likelihood that bacteria will develop resistance and will not be treatable by amoxicillin or other antibacterial drugs in the future. Patients should be counseled that diarrhea is a common problem caused by antibiotics, and it usually ends when the antibiotic is discontinued. Sometimes after starting treatment with antibiotics, patients can develop watery and bloody stools (with or without stomach cramps and fever) even as late as 2 or more months after having taken their last dose of the antibiotic. If this occurs, patients should contact their physician as soon as possible. Patients should be aware that amoxicillin contains a penicillin class drug product that can cause allergic reactions in some individuals. CLINITEST \\u00ae is a registered trademark of Siemens Medical Solutions Diagnostics, and Ames Company, Inc. CLINISTIX \\u00ae is a registered trademark of Bayer Healthcare Llc, and Ames Company, Inc. CLOtest \\u00ae is a registered trademark of Kimberly-Clark Worldwide, Inc. Distributed by: Aurobindo Pharma USA, Inc. 279 Princeton-Hightstown Road East Windsor, NJ 08520 Manufactured by: Aurobindo Pharma Limited Hyderabad-500 038, India Revised: 06/2018"

In [3]:
nlp_pl = spacy.load('en_core_web_sm')     
document = nlp_pl(sample)                 

tokens = [token.text for token in document] 

word_frequencies = {}
for word in document:
    if word.text.lower() not in stopwords:
        if word.text.lower() not in punctuation:
            if word.text not in word_frequencies.keys():
                word_frequencies[word.text] = 1
            else:
                word_frequencies[word.text] += 1

max_frequency = max(word_frequencies.values())

for word in word_frequencies.keys():
    word_frequencies[word] = word_frequencies[word]/max_frequency

print(word_frequencies)

{'7': 0.2, 'PATIENT': 0.2, 'COUNSELING': 0.2, 'INFORMATION': 0.2, 'Information': 0.2, 'Patients': 1.0, 'advised': 0.2, 'amoxicillin': 1.0, 'taken': 0.6, '8': 0.2, 'hours': 0.4, '12': 0.2, 'depending': 0.2, 'dose': 0.4, 'prescribed': 0.4, 'counseled': 0.4, 'antibacterial': 0.4, 'drugs': 0.4, 'including': 0.2, 'treat': 0.6, 'bacterial': 0.4, 'infections': 0.4, 'viral': 0.2, 'e.g.': 0.2, 'common': 0.6, 'cold': 0.2, 'infection': 0.2, 'patients': 0.6, 'told': 0.2, 'feel': 0.2, 'better': 0.2, 'early': 0.2, 'course': 0.4, 'therapy': 0.4, 'medication': 0.2, 'exactly': 0.2, 'directed': 0.2, 'Skipping': 0.2, 'doses': 0.2, 'completing': 0.2, '1': 0.2, 'decrease': 0.2, 'effectiveness': 0.2, 'immediate': 0.2, 'treatment': 0.4, '2': 0.4, 'increase': 0.2, 'likelihood': 0.2, 'bacteria': 0.2, 'develop': 0.4, 'resistance': 0.2, 'treatable': 0.2, 'future': 0.2, 'diarrhea': 0.2, 'problem': 0.2, 'caused': 0.2, 'antibiotics': 0.4, 'usually': 0.2, 'ends': 0.2, 'antibiotic': 0.4, 'discontinued': 0.2, 'startin

In [4]:
sentence_tokens = [sent for sent in document.sents]

def get_sentence_scores(sentence_tok, len_norm=True):
  sentence_scores = {}
  for sent in sentence_tok:
      word_count = 0
      for word in sent:
          if word.text.lower() in word_frequencies.keys():
              word_count += 1
              if sent not in sentence_scores.keys():
                  sentence_scores[sent] = word_frequencies[word.text.lower()]
              else:
                  sentence_scores[sent] += word_frequencies[word.text.lower()]
      if len_norm:
        sentence_scores[sent] = sentence_scores[sent]/word_count
  return sentence_scores
                
sentence_scores = get_sentence_scores(sentence_tokens,len_norm=False)        #sentence scoring without lenght normalization
#sentence_scores_rel = get_sentence_scores(sentence_tokens,len_norm=True)     #sentence scoring with length normalization

In [5]:
def get_summary(sentence_sc, rate):
  summary_length = int(len(sentence_sc)*rate)
  summary = nlargest(summary_length, sentence_sc, key = sentence_sc.get)
  final_summary = [word.text for word in summary]
  summary = ' '.join(final_summary)
  return summary

In [6]:
get_summary(sentence_scores, reduction_rate)

'When amoxicillin is prescribed to treat a bacterial infection, patients should be told that although it is common to feel better early in the course of therapy, the medication should be taken exactly as directed. Skipping doses or not completing the full course of therapy may: (1) decrease the effectiveness of the immediate treatment, and (2) increase the likelihood that bacteria will develop resistance and will not be treatable by amoxicillin or other antibacterial drugs in the future. Sometimes after starting treatment with antibiotics, patients can develop watery and bloody stools (with or without stomach cramps and fever) even as late as 2 or more months after having taken their last dose of the antibiotic. 7 PATIENT COUNSELING INFORMATION Information for Patients Patients should be advised that amoxicillin may be taken every 8 hours or every 12 hours, depending on the dose prescribed. Patients should be counseled that antibacterial drugs, including amoxicillin, should only be use