**Drug Discovery and Development:** NLP can analyze vast amounts of biomedical literature and clinical trial data to identify potential drug targets, predict adverse drug reactions, and accelerate the drug discovery process.

In [None]:
import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from collections import Counter

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...


In [None]:
# Sample biomedical text data
biomedical_text = """
Clinicians often suspect that a treatment effect can vary across individuals.
However, they usually lack "evidence-based" guidance regarding potential heterogeneity of treatment effects (HTE).
Potentially actionable HTE is rarely discovered in clinical trials and is widely believed (or rationalized) by researchers to be rare.
Conventional statistical methods to test for possible HTE are extremely conservative and tend to reinforce this belief.
In truth, though, there is no realistic way to know whether a common, or average, effect estimated from a clinical trial is relevant for all, or even most, patients.
This absence of evidence, misinterpreted as evidence of absence, may be resulting in sub-optimal treatment for many individuals.
We first summarize the historical context in which current statistical methods for randomized controlled trials (RCTs) were developed,
focusing on the conceptual and technical limitations that shaped, and restricted, these methods. In particular,
we explain how the common-effect assumption came to be virtually unchallenged.
Second, we propose a simple graphical method for exploratory data analysis that can provide useful visual evidence of possible HTE.
The basic approach is to display the complete distribution of outcome data rather than relying uncritically on simple summary statistics.
Modern graphical methods, unavailable when statistical methods were initially formulated a century ago, now render fine-grained interrogation of the data feasible.
We propose comparing observed treatment-group data to "pseudo data" engineered to mimic that which would be expected under a particular HTE model, such as the common-effect model.
A clear discrepancy between the distributions of the common-effect pseudo data and the actual treatment-effect data provides prima facie evidence of HTE to motivate additional confirmatory investigation.
Artificial data are used to illustrate implications of ignoring heterogeneity in practice and how the graphical method can be useful.
"""

# Tokenization and preprocessing
stop_words = set(stopwords.words('english'))
wordnet_lemmatizer = WordNetLemmatizer()

def preprocess_text(text):
    tokens = word_tokenize(text.lower())
    tokens = [wordnet_lemmatizer.lemmatize(word) for word in tokens if word.isalnum()]
    tokens = [word for word in tokens if word not in stop_words]
    return tokens

preprocessed_text = preprocess_text(biomedical_text)

# Counting word frequencies
word_freq = Counter(preprocessed_text)

# Displaying the most common words
print("Most common words in the text:")
for word, freq in word_freq.most_common(10):
    print(f"{word}: {freq}")


Most common words in the text:
data: 8
method: 7
hte: 6
evidence: 4
treatment: 3
effect: 3
trial: 3
statistical: 3
graphical: 3
individual: 2


In [None]:
!pip install summa
from summa import summarizer

Collecting summa
  Downloading summa-1.2.0.tar.gz (54 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/54.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.9/54.9 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: summa
  Building wheel for summa (setup.py) ... [?25l[?25hdone
  Created wheel for summa: filename=summa-1.2.0-py3-none-any.whl size=54386 sha256=2d69021dfd0bc5e03a257b56d222c9b1f7a6a865c28d67ad3c3fb1de8a7fee72
  Stored in directory: /root/.cache/pip/wheels/4a/ca/c5/4958614cfba88ed6ceb7cb5a849f9f89f9ac49971616bc919f
Successfully built summa
Installing collected packages: summa
Successfully installed summa-1.2.0


In [None]:
# Summarize the text using TextRank
summary = summarizer.summarize(biomedical_text, ratio=0.2)

# Print the summary
print("Summary:")
print(summary)

Summary:
Second, we propose a simple graphical method for exploratory data analysis that can provide useful visual evidence of possible HTE.
We propose comparing observed treatment-group data to "pseudo data" engineered to mimic that which would be expected under a particular HTE model, such as the common-effect model.
A clear discrepancy between the distributions of the common-effect pseudo data and the actual treatment-effect data provides prima facie evidence of HTE to motivate additional confirmatory investigation.
