Summarization of articles papers

In [None]:
# Importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")

model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
file_path = os.path.join(os.getcwd(), 'papers_new.xlsx')
file = pd.read_excel(file_path)




In [None]:
# remove articles with missing values or duplicates

file.dropna(inplace=True)
file.drop_duplicates(inplace=True)

# get a df with only 1000 rows

file = file.iloc[:6500, :]
file.shape

(6500, 4)

In [None]:
# tokenization
def preprocess_function(batch):
  source = [str(i) for i in batch[['full-text']].values]
  target = [str(i) for i in batch[['abstract']].values]


  source_ids = tokenizer(source, max_length=128, padding='max_length', truncation=True)
  target_ids = tokenizer(target, max_length=128, padding='max_length', truncation=True)
  labels = target_ids["input_ids"]
  labels = [[(l if l != tokenizer.pad_token_id else -100) for l in label] for label in labels]

  return {"input_ids": source_ids["input_ids"], "attention_mask": source_ids["attention_mask"], "labels": labels}

In [None]:
# preprocessing of the articles using batched=True
tokenized_dataset = preprocess_function(file)
token_df = pd.DataFrame(tokenized_dataset, columns=['input_ids', 'attention_mask', 'labels'], index=file.index)


In [None]:
df_source = pd.concat([file, token_df], axis=1, join='inner').reset_index(drop=True)

In [None]:
df_source.head()

Unnamed: 0,paper_id,title,abstract,full-text,input_ids,attention_mask,labels
0,97de2cec75e4557f014e5ecbc4db7885ced49c68,Neuroimmune multi-hit perspective of coronavir...,Abstract\n\nIt is well accepted that environme...,Background\n\nSophisticated defensive strategi...,"[0, 48759, 48277, 37457, 282, 37457, 282, 104,...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[0, 48759, 47638, 37457, 282, 37457, 282, 243,..."
1,5c9bbf96e1a04c87f7a76f783f00ddf4af3c688e,Gender Differences in COVID-19 Conspiracy Theo...,"Abstract\n\nIn this article, we evaluate gende...","\n\n2020a, 2020b; Uscinski et al. 2020) . In t...","[0, 48759, 37457, 282, 37457, 282, 24837, 102,...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[0, 48759, 47638, 37457, 282, 37457, 282, 1121..."
2,26e4e16e5c03a202e957a932439aed7e713b2115,Estimated number of N95 respirators needed for...,Abstract\n\nObjective: Due to shortages of N95...,\n\nResults: For an acute-care hospital with 4...,"[0, 49329, 37457, 282, 37457, 282, 41981, 35, ...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[0, 48759, 47638, 37457, 282, 37457, 282, 4667..."
3,71ff1959c1833b0a89a9f3fde26c63e38859f4dc,O R I G I N A L A R T I C L E,Abstract\n\nObjectives: To report changes in p...,| INTRODUC TI ON\n\nThe SARS- pandemic led to ...,"[0, 48759, 15483, 2808, 6997, 7111, 12945, 290...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[0, 48759, 47638, 37457, 282, 37457, 282, 4667..."
4,2116036d1757b01907eafaf770bf2ca4e142ab06,Journal Pre-proof Addressing the impact of COV...,Abstract\n\n Administrative buildings showed ...,Introduction\n\nThe outbreak of COVID-19 was f...,"[0, 48759, 46576, 37457, 282, 37457, 282, 133,...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[0, 48759, 47638, 37457, 282, 37457, 282, 3745..."


In [None]:
# spliting train and tests

from sklearn.model_selection import train_test_split

train, test = train_test_split(df_source, test_size=0.2, random_state=42)

In [None]:
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')
!pip install evaluate
import evaluate
metric = evaluate.load("rouge")

def calc_rouge_scores(candidates, references):
    result = metric.compute(predictions=candidates, references=references, use_stemmer=True)
    result = {key: round(value * 100, 1) for key, value in result.items()}
    return result

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!




In [None]:
ref = list(test['abstract'])
candidate = list(test['full-text'].apply(lambda x: '/n'.join(nltk.sent_tokenize(x)[:3])))



In [None]:
print (f'Scores {calc_rouge_scores(candidate, ref)}')

Scores {'rouge1': np.float64(22.6), 'rouge2': np.float64(6.9), 'rougeL': np.float64(13.8), 'rougeLsum': np.float64(14.0)}


In [None]:
# training arguments

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir='/contents',
    per_device_train_batch_size = 16,
    num_train_epochs = 5,
    optim='adamw_torch',  # Define the optimizer. it is AdamW,
    evaluation_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    include_inputs_for_metrics=True
)

Using `include_inputs_for_metrics` is deprecated and will be removed in version 5 of 🤗 Transformers. Please use `include_for_metrics` list argument instead.


In [None]:
!pip install datasets
from datasets import Dataset

train_ds = Dataset.from_pandas(train)
test_ds = Dataset.from_pandas(test)



In [None]:
trainer = Trainer(
    model = model,
    args = training_args,
    train_dataset = train_ds,
    eval_dataset = test_ds

)

In [None]:
import wandb
wandb.init(mode="offline")

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


In [None]:
# !pip install git+https://github.com/huggingface/accelerate
# !pip install --upgrade transformers

In [None]:
trainer.train()



In [None]:

# Evaluate the model
eval_results = trainer.evaluate()

# Print evaluation results
print(eval_results)

{'eval_loss': 2.879042625427246, 'eval_runtime': 44.0529, 'eval_samples_per_second': 29.51, 'eval_steps_per_second': 3.7, 'epoch': 5.0}


In [None]:

# Save the model and tokenizer after training

model.save_pretrained("/content/model_summary_10k")
tokenizer.save_pretrained("/content/model_summary_10k")


('/content/model_summary_10k/tokenizer_config.json',
 '/content/model_summary_10k/special_tokens_map.json',
 '/content/model_summary_10k/vocab.json',
 '/content/model_summary_10k/merges.txt',
 '/content/model_summary_10k/added_tokens.json',
 '/content/model_summary_10k/tokenizer.json')

In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load the trained model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("/content/model_summary")
model = AutoModelForSeq2SeqLM.from_pretrained("/content/model_summary")

# Function to summarize text
def summarize(text):
    # Tokenize the input text
    inputs = tokenizer(text, max_length=1024, truncation=True, return_tensors="pt")

    # Generate the summary
    summary_ids = model.generate(inputs["input_ids"], max_length=200, min_length=50, length_penalty=2.0, num_beams=4, early_stopping=True)

    # Decode the summary
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary


OSError: Incorrect path_or_model_id: '/content/model_summary'. Please provide either the path to a local folder or the repo_id of a model on the Hub.

In [None]:
Intro = """

Starting in mid-March of 2020, the outbreak of Coronavirus Disease 2019 (COVID- 19) led to large-scale changes in hospital practices across the United States, including protocols overseeing inpatient hospitalization for a variety of surgeries. 1-3 A recent national survey found that both patients and physicians have sought to reduce patient exposure to hospitals during the pandemic to limit resource utilization and prevent the spread of COVID-19. 4 As a result, the pandemic provided a unique opportunity to assess whether the changing practice patterns favoring shorter hospitalization could impact outcomes following major surgeries, such as adult spinal deformity (ASD) correction, which usually require longer hospital stays (LOS).In the United States, ASD has a prevalence of up to 68% in adults over the age of 60. 5 Recent reviews have reported a 17% to 68% incidence of postoperative complications following ASD surgery, with around 42% of ASD patients requiring nonroutine discharge. [6] [7] [8] [9] [10] To facilitate recovery and minimize the risk of complications, postoperative monitoring, and stabilization of patients in the inpatient setting is often needed. Prior studies have reported an average LOS of approximately 8 days following ASD surgery, 8, 11, 12 but it is currently unknown how the pandemic has impacted duration of hospital stay and complication rates in this patient population.The primary aim of this study was to compare LOS and discharge disposition of ASD patients undergoing surgery before and during the pandemic. Secondary aims were to compare the rates of 30-day complications, readmissions, and emergency department (ED) visits before and during the pandemic. We hypothesized that the mean LOS after ASD surgery was lower during the pandemic compared with before, though that may have increased the rate of complications, readmissions, and/or ED visits. We also hypothesized that there would be a significant increase in the rate of home discharge following surgeries performed during the pandemic.

"""

Methods = """

MATERIALS AND METHODS

This study was approved by our institutional review board.

Data Source and Collection

Data were collected through a retrospective review of a spine surgical registry at a tertiary care center. We identified all adult patients who underwent elective adult thoracolumbar deformity surgery (defined by more than or equal to five level fusion) from July 1 to December 31 of 2019 (pre-COVID, N ¼ 60) and July 1 to December 31 of 2020 (during-COVID, N ¼ 57). These time periods were chosen to reflect the impact of institutional policies governing elective surgeries at our medical center. The World Health Organization officially declared COVID-19 a pandemic on March 11, 2020, 13 and many institutional policies had been set in place by July 1, 2020. The same time period was chosen for 2019 to reduce the impact of seasonal variation on case volume. 14, 15 All surgeries were performed by fellowship-trained orthopedic spine surgeons. All patients met the 30-day follow-up required for inclusion in this study. We excluded any non-elective surgeries, malignancy, trauma, or infection (pre-COVID: nine patients; during-COVID: seven patients). An additional exclusion criterion was a diagnosis of COVID-19 during the hospital course, but no patients in the during-COVID period cohort had a COVID-19 diagnosis at any point during their surgical care.We collected data on baseline patient demographics and surgical risk: age, sex, and American Society of Anesthesiologists (ASA) Classification. We also collected data on surgical characteristics: the number of levels fused, estimated blood loss (EBL), whether or not the surgery was a revision procedure, and the presence of a three-column osteotomy.


Outcome Measures

The primary outcome measures were length of stay (days) and discharge disposition (home vs. non-home). Secondary outcome measures were the rates of reoperation, unplanned ED visit, readmission, and major complications within 30 days postoperatively. Major complications were defined as: neurologic complications (new and persistent postoperative motor weakness or sensory deficit), myocardial infarction, pulmonary embolism/deep vein thrombosis (PE/DVT), pneumonia, or cerebrovascular accident (CVA).


Statistical Analysis

Bivariate analyses were used to detect differences in baseline patient demographics, ASA classification, and surgical characteristics between patients in 2019 and 2020. Categorical variables were assessed with Chi-squared tests, parametric continuous variables with Student t tests, and non-parametric continuous variables with Mann-Whitney U tests. Bivariate analyses were similarly performed on our primary and secondary outcome measures to detect differences between the two groups. To control for differences in baseline demographics and surgical characteristics, multivariable regression models were constructed for length of stay and discharge disposition, the two outcome measures found to be significant on bivariate analysis. A Poisson regression model was utilized for length of stay, due to the variable being count data, 16 and a logistic regression was used for discharge disposition, a binary variable. Both multivariable regression models controlled for age, sex, ASA, levels fused, three-column osteotomy, EBL, and revision surgery. Results from the Poisson regression are reported as incidence rate ratios (IRR), and results from the logistic regression are reported as odds ratios (OR). Statistical significance was set at P < 0.05. Analysis was conducted using STATA version 15.0 (College Station, TX).

"""

Results = """

Total Case Volume

There was no significant difference in the volume of elective adult thoracolumbar deformity surgeries performed during the two time periods (60 vs. 57 cases, P ¼ 0.782; Table 1 ).


Patient Demographics and Surgical Characteristics

On bivariate analysis, patients who underwent ASD surgery during the pandemic were younger (mean: 61 vs. 67 yrs, P ¼ 0.015) compared to before the pandemic, but there were no significant differences in sex or ASA classification (Table 1) . Compared with cases performed before the pandemic, those performed during the pandemic involved significantly longer fusion constructs (mean: nine vs. eight levels, P ¼ 0.007) and more EBL (mean: 1762 vs. 1127 cm 3 , P ¼ 0.002) ( Table 1 ). There were no significant differences between the pre-COVID and during-COVID groups in terms of the number of procedures which were revision surgeries or number of those which involved a three-column osteotomy.

Length of Stay

On bivariate analysis, patients in the during-COVID cohort had 26% shorter LOS compared with those in the pre-COVID cohort (mean: 6 vs. 9 days, P ¼ 0.039; Table 2 ). The difference in LOS remained significantly shorter after controlling for age, sex, ASA, levels fused, three-column osteotomy, EBL, and revision surgery using the multivariable regression model (P ¼ 0.015, Table 3 ).

Discharge Disposition

On bivariate analysis, patients were significantly more likely to be discharged home in the during-COVID cohort compared with pre-COVID cohort (70% vs. 28%, P < 0.001; Table 3 ). After controlling for age, sex, ASA, levels fused, three-column osteotomy, EBL, and revision surgery in the multivariable regression model, patients in the during-COVID cohort were found to have 7.2-times greater odds of home discharge compared with the pre-COVID cohort (P < 0.001; Table 3 ).

Readmissions, ED Visits, Major Complications

There were no statistically significant differences in the rates of 30-day major complications, readmissions, or unplanned ED visits between the pre-COVID and during-COVID cohorts ( Table 2) .
"""

Discussion = """
DISCUSSION

The length of hospital stay and discharge disposition after ASD surgery carry substantial implications for patient care and healthcare costs. The resource limitations and increased demand for hospital beds due to COVID-19 presented a unique opportunity to assess how shorter LOS could influence postoperative outcomes. In this study, we examined the impact of the pandemic on LOS, discharge disposition, and 30-day postoperative complications after elective, adult thoracolumbar deformity surgery. Our data demonstrate that ASD patients who underwent surgery during the pandemic had significantly shorter LOS and were more likely to be discharged home compared with those treated before the pandemic. Importantly, there were no significant differences in rates of major complications, reoperation, readmission, or ED visits resulting from the shorter LOS and increased rate of home discharge.In the current era of value-based care, postoperative LOS and discharge disposition are important considerations both clinically and economically. While ASD surgery is primarily elective, patients usually require a longer LOS with more extensive inpatient monitoring compared with other spine and orthopedic procedures. 17 The ASD patient population is thus uniquely suitable for our study's objective of detecting changes in LOS resulting from the pandemic. Although close inpatient monitoring is required after ASD surgery, prolonged LOS can also slow the recovery process by preventing faster mobilization and delaying patient independence. Further, extended inpatient hospitalization has been shown to increase the risk of complications such as pneumonia and other hospital-acquired infections. 18 ASD surgery is also quite costly, and one major component of this cost is the longer hospital stay often required after ASD surgery relative to other spine surgeries. [19] [20] [21] A 2019 report from the Kaiser Family Foundation estimated that the average hospital expense for each inpatient day ranged from $1274 to $3329. 22 Although costs were not directly assessed in the present study, one important implication of our findings is that shortening LOS for ASD surgery for appropriate patients may potentially lead to healthcare savings without increasing short-term complication or readmission rates.There are several factors that may have contributed to the shortened LOS seen in our analysis. At the hospital and provider levels, the pandemic increased the demand for hospital beds, and many surgical floors across the nation were transitioned to care for medicine patients due to the need for increased hospital capacity. Due to these circumstances, providers may have felt a need to discharge patients more rapidly compared with before the pandemic. From the patient perspective, several national surveys have demonstrated that patients with non-COVID illnesses sought to avoided hospitals to reduce risk of exposure to the virus. 4, 23 As a result, recent data has shown declines in non-COVID emergency department admissions as well as declines in a variety of medical admissions and surgical cases due to the pandemic. [24] [25] [26] [27] [28] It is thus possible that patients undergoing surgery during the pandemic may have tried to meet discharge requirements faster with the goal of leaving the hospital sooner.It is important to note that patient selection may have played a role in these findings. In our data, although there was no difference in overall surgical risk between the groups in terms of the ASA classification, the pre-COVID patients were on average 6 years older than the during-COVID patients. This may be due to surgeons selecting younger patients for elective surgery during the pandemic since they were less likely to contract as severe form of COVID-19. As such, one possible explanation for the differences in LOS may be that the older patients in the pre-COVID period inherently required more postoperative medical optimization than the younger patients in the during-COVID period. Interestingly, our data suggests that the during-COVID patients may have received more invasive surgeries, due to significantly greater number of levels fused per patient as well as greater EBL. This could be explained by the fact that ASD patients with the most severe deformity or clinical presentation could have been prioritized during the pandemic. Although our study rigorously controlled for various factors that have been shown to influence LOS and discharge disposition such as age, ASA class, the number of levels fused, revision surgery, and three-column osteotomy, 10,29 our analysis did not control for spinal alignment or preoperative health-related quality of life (HRQoL) scores.Several factors may explain the 42% higher rate of home discharge observed during the pandemic compared with before. Recent literature has shown that the pandemic placed increased capacity demands on rehabilitation centers across the country. 30 were hospitalized for COVID-19 during the initial months of the COVID-19 outbreak required facility-based rehabilitation following discharge. This increased demand of rehabilitation centers to care for an unexpected population of COVID-19 patients may have factored into the decisionmaking of surgeons and physical therapists to discharge a greater percentage of patients home. Further, as the pandemic progressed and nursing facilities became epicenters of virus spread, approximately 1800 nursing facilities across the country closed or merged. 33 The public reputation of nursing homes during this time may have further encouraged patients and their families to avoid nursing facilities in favor of home discharge. Importantly, our data suggests that even with the hospital and patient factors that led to reduced LOS and the increased rate of home discharge, ASD patients did not have increased 30-day complications, ED visits, or readmissions during the pandemic. While surgeons should not alter their practice solely on the basis of resource availability in patients who clearly require more extensive inpatient recovery or facility services, lessons learned from the pandemic indicate that efforts toward reducing LOS may not necessarily have negative consequences on ASD patients in the immediate postoperative interval. Prior literature has examined several potential methods for optimizing perioperative care pathways to shorten hospital stay, including preoperative nutritional repletion, multimodal anesthesia, intraoperative maintenance of patient temperature, postoperative resumption of diet, and early mobilization. [34] [35] [36] Similar to our findings, these prior studies also found no difference in postoperative complications and readmission rates after reductions in LOS. Such techniques may have already helped to reduce LOS in the ASD population, as the average total length of stay for ASD surgery has declined over the past decade on a national scale. 17 Similar efforts aimed at reducing modifiable risk factors for non-routine discharge may likewise result in greater home discharge rates without an increase in complications or readmissions.The results of our study should be interpreted in the context of its limitations. First, although we rigorously controlled for a multitude of factors that have been shown to be associated with adverse outcomes in ASD surgery, we were unable to control for spinal alignment or HRQoL indices. The patients in the pre-COVID and during-COVID cohorts may thus have slight differences in terms of disease severity which may have affected our outcome measures, but every effort was made to control for these potential confounders through our study design. Second, our study only investigated major complications, but not minor complications that could potentially prolong LOS such as postoperative urinary tract infection or ileus. Finally, our study was limited in follow-up duration due to the recency of the COVID-19 pandemic. Ultimately, adult spinal deformity constitutes a unique patient population, and further insight is needed on whether these findings are consistent across more common spine and orthopedic surgical procedures.Larger sample sizes using data from multiple institutions will be needed to further support our findings.

"""
Conclusion = """

CONLUSIONS

During the COVID-19 pandemic, the LOS for patients undergoing thoracolumbar ASD surgery decreased, and more patients were discharged home without adversely affecting complication or readmission rates. Lessons learned during the pandemic may help improve resource utilization without negatively influencing outcomes.

Key Points

Hospital length of stay after ASD surgery was significantly shorter during the COVID-19 pandemic compared with before (6 vs. 9 days, P ¼ 0.039). ASD patients were more likely to be discharged home during the pandemic compared with before (70% vs. 28%; P < 0.001). Despite shorter length of stay and higher home discharge rates, there were no significant differences in major complications, reoperations, readmissions, or ED visits between patients who underwent ASD surgery before the pandemic and those who underwent surgery during the pandemic (P > 0.05 for all). Lessons learned during the pandemic may help improve resource utilization without negatively influencing short-term outcomes.
"""


# Get the summary
# sum_intro = summarize(Intro)
# sum_methods = summarize(Methods)
# sum_results = summarize(Results)
# sum_discussion = summarize(Discussion)
# sum_conclusion = summarize(Conclusion)


# print(sum_intro + "\n\n" + sum_methods + "\n\n" + sum_results + "\n\n" + sum_discussion + "\n\n" + sum_conclusion)
print(sum_results)

Objective: Preventive adult thoracolumbar deformity surgeries performed during the pandemic involved significantly longer fusion constructs (EBL) and more EBL. Patients who underwent ASD surgery during this pandemic were younger (mean: 61 vs. 67 yrs, P ¼ 0.015), but there were no significant differences in sex or ASA classification (Table 1) in terms of the number of procedures which involved a three-column osteotomy or revision surgery. Objective: Protecting patients from the effects of ASD surgery on the quality of life of their children and their families.Objective : Protecting them from the impact of ASD surgeries on their health
