<a href="https://colab.research.google.com/github/DrKenReid/Extractive-Text-Summarization/blob/main/Extractive_Text_Summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text Summarization
## Introduction
This tutorial extractive summarization on text data. This tutorial will help you extract key information quickly.

## Setup and Imports
First, let's set up our environment and import the necessary Python libraries.

In [1]:
import re
from collections import defaultdict
import textwrap

print("Libraries imported successfully.")

Libraries imported successfully.


## Implementing the Summarization Functions
Now, let's implement our summarization functions. We'll create a series of functions that will preprocess the text, tokenize it into sentences, and then extract the most important sentences based on key medical terms.

In [2]:
def preprocess_text(text):
    """
    Preprocess the text by removing special characters and converting to lowercase.

    Args:
        text (str): The input text to preprocess.

    Returns:
        str: The preprocessed text.
    """
    # Remove special characters and digits
    text = re.sub('[^a-zA-Z\s]', '', text) # '[^a-zA-Z\s]' matches any character that isn't a letter or whitespace.
    # Convert to lowercase
    text = text.lower()
    return text

def sentence_tokenize(text):
    """
    Tokenize the text into sentences.

    Args:
        text (str): The input text to tokenize.

    Returns:
        list: A list of sentences.
    """
    # Simple sentence tokenization by splitting on periods
    return text.split('.')

def summarize_text(text, num_sentences=3):
    """
    Summarize the given medical text.

    Args:
        text (str): The input medical text to summarize.
        num_sentences (int): The number of sentences to include in the summary.

    Returns:
        str: The summarized text.
    """
    # Preprocess the text
    preprocessed_text = preprocess_text(text)

    # Tokenize into sentences
    sentences = sentence_tokenize(text)

    # Define key medical terms
    key_terms = ['diagnosis', 'treatment', 'plan', 'presents', 'complains', 'history']

    # Score sentences based on the presence of key terms
    sentence_scores = defaultdict(int)
    for i, sentence in enumerate(sentences):
        for term in key_terms:
            if term in sentence.lower():
                sentence_scores[i] += 1

    # Select top sentences
    top_sentences = sorted(sentence_scores, key=sentence_scores.get, reverse=True)[:num_sentences]

    # Construct the summary
    summary = ' '.join([sentences[i].strip() for i in sorted(top_sentences)])

    return summary

print("Summarization functions defined successfully.")

Summarization functions defined successfully.


## Using the Summarization Function
Now that we have defined our summarization function, let's use it on a sample medical text.

In [3]:
# Example usage
medical_text = """
Patient presents with complaints of chest pain and shortness of breath for the past 3 days. Pain is described as sharp and worsens with exertion. Patient has a history of hypertension and type 2 diabetes, both controlled with medication. Physical examination reveals elevated blood pressure (150/90 mmHg) and slight wheezing in the lower left lung. ECG shows ST-segment depression in leads V3-V5. Initial troponin levels are slightly elevated. Diagnosis: Possible acute coronary syndrome. Plan: Admit patient for further cardiac workup, including serial troponins and stress test. Start on aspirin, beta-blockers, and consider anticoagulation therapy. Monitor vitals closely and manage pain as needed.
"""

summary = summarize_text(medical_text)
print("Original medical text:")
print(textwrap.fill(medical_text, width=100))
print("\nSummary:")
print(textwrap.fill(summary, width=100))

Original medical text:
 Patient presents with complaints of chest pain and shortness of breath for the past 3 days. Pain is
described as sharp and worsens with exertion. Patient has a history of hypertension and type 2
diabetes, both controlled with medication. Physical examination reveals elevated blood pressure
(150/90 mmHg) and slight wheezing in the lower left lung. ECG shows ST-segment depression in leads
V3-V5. Initial troponin levels are slightly elevated. Diagnosis: Possible acute coronary syndrome.
Plan: Admit patient for further cardiac workup, including serial troponins and stress test. Start on
aspirin, beta-blockers, and consider anticoagulation therapy. Monitor vitals closely and manage pain
as needed.

Summary:
Patient presents with complaints of chest pain and shortness of breath for the past 3 days Patient
has a history of hypertension and type 2 diabetes, both controlled with medication Diagnosis:
Possible acute coronary syndrome


## Conclusion
This tutorial has introduced you to a basic method of summarizing medical texts using data science techniques. The approach we've used is rule-based and focuses on extracting key sentences based on the presence of important medical terms.