Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pulling abstracts into dataframe #42

Open
bes827 opened this issue Sep 21, 2021 · 2 comments
Open

Pulling abstracts into dataframe #42

bes827 opened this issue Sep 21, 2021 · 2 comments

Comments

@bes827
Copy link

bes827 commented Sep 21, 2021

Is there a way to pull the abstract text into the dataframe?

for example, using this code europepmc::epmc_search(query = '"2019-nCoV" OR "2019nCoV"') creates a 29 columns df that has useful info but the abstract is not there. is there a function or a script to get the abstract?

thanks for the great package.

@njahn82
Copy link
Member

njahn82 commented Sep 21, 2021

Thank you for your question. You could make use of the following:

library(europepmc)
library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.1
#> Warning: package 'readr' was built under R version 4.1.1
out <- europepmc::epmc_search(query = '"2019-nCoV" OR "2019nCoV"', output = "raw")
#> 15997 records found, returning 100
abstract <- purrr::map_chr(out, "abstractText", .null = NA_integer_)
head(abstract)
#> [1] "Since the COVID-19 epidemic is still expanding around the world and poses a serious threat to human life and health, it is necessary for us to carry out epidemic transmission prediction, whole genome sequence analysis, and public psychological stress assessment for 2019-nCoV. However, transmission prediction models are insufficiently accurate and genome sequence characteristics are not clear, and it is difficult to dynamically assess the public psychological stress state under the 2019-nCoV epidemic. Therefore, this study develops a 2019nCoVAS web service (http://www.combio-lezhang.online/2019ncov/home.html) that not only offers online epidemic transmission prediction and lineage-associated underrepresented permutation (LAUP) analysis services to investigate the spreading trends and genome sequence characteristics, but also provides psychological stress assessments based on such an emotional dictionary that we built for 2019-nCoV. Finally, we discuss the shortcomings and further study of the 2019nCoVAS web service."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
#> [2] NA                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
#> [3] "The rapid spread of SARS-CoV-2 led to the necessity of developing diagnostic tests for rapid virus detection. Many commercial platforms have appeared and have been approved for this purpose. In this study, 95 positive and 5 negative retrospective samples were analyzed by 4 different commercial RT-qPCR kits (TaqMan 2019nCoV Assay, Allplex™SARS-COV-2 Assay, FTD SARS-COV-2 Assay and qCOVID-19). The Hologic Aptima SARS-COV-2 and the Clart-COVID-19 system were also tested. serial dilutions of SARS-COV-2 standard control were included for sensitivity analysis. Among the qPCR tested qCOVID19 and Allplex™SARS-COV-2 Assay were both able to detect all the clinical samples included in the study. All four qPCR evaluated showed high sensitivity for samples with Ct<33. Clart-COVID-19 microarrays detected all samples and controls used in this study whereas Hologic Aptima Panther failed with one of the clinical samples. However, the main problem with this system was the number of invalidated samples despite avoiding the use of medium with guanidine isothiocyanate as recommended by the manufacturer. All the techniques tested were of value for SARS-CoV-2 detection."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
#> [4] "<h4>Objective</h4>To summarize guidelines on self-care and clinical management of persons with laryngectomy during the COVID-19 pandemic.<h4>Method</h4>Articles published in electronic databases-PubMed, Scopus, Web of Science, and CINHAL with the compliant keywords-were scouted from December 2019 to November 2020. All original articles, letters to editors, reviews, and consensus statements were reviewed and included.<h4>Results</h4>In all, 20 articles that had information pertaining to self-care of persons with laryngectomy or guidelines for clinicians working with this population were identified. Four of the included studies were case reports of persons with laryngectomy who contracted the COVID-19 virus. One of the included articles was a cohort study that explored the use of telerehabilitation in persons with laryngectomy.<h4>Conclusion</h4>The hallmarks of preventative strategies for persons with laryngectomy during the COVID-19 pandemic are as follows: physical distancing, use of a three-ply mask or surgical mask to cover the mouth and nose, and use of Heat Moisture Exchange (HME) device over stoma in addition to covering it with a surgical mask or laryngectomy bib. Telerehabilitation, not a preference with this population prior to the pandemic, has gained popularity and acceptance during the COVID-19 situation. The reports of COVID-positive persons with laryngectomy have indicated contrary findings from the tracheal and nasal swabs, necessitating compulsory inclusion of both nasal and tracheal swabs."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
#> [5] "<h4>Objective</h4>To review misinformation related to coronavirus disease 2019 (COVID-19) on social media during the first phase of the pandemic and to discuss ways of countering misinformation.<h4>Methods</h4>We searched PubMed®, Scopus, Embase®, PsycInfo and Google Scholar databases on 5 May 2020 and 1 June 2020 for publications related to COVID-19 and social media which dealt with misinformation and which were primary empirical studies. We followed the preferred reporting items for systematic reviews and meta-analyses and the guidelines for using a measurement tool to assess systematic reviews. Evidence quality and the risk of bias of included studies were classified using the grading of recommendations assessment, development and evaluation approach. The review is registered in the international prospective register of systematic reviews (PROSPERO; CRD42020182154).<h4>Findings</h4>We identified 22 studies for inclusion in the qualitative synthesis. The proportion of COVID-19 misinformation on social media ranged from 0.2% (413/212 846) to 28.8% (194/673) of posts. Of the 22 studies, 11 did not categorize the type of COVID-19-related misinformation, nine described specific misinformation myths and two reported sarcasm or humour related to COVID-19. Only four studies addressed the possible consequences of COVID-19-related misinformation: all reported that it led to fear or panic.<h4>Conclusion</h4>Social media play an increasingly important role in spreading both accurate information and misinformation. The findings of this review may help health-care organizations prepare their responses to subsequent phases in the COVID-19 infodemic and to future infodemics in general."                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
#> [6] "<h4>Background</h4>Since it was declared a pandemic on March 11, 2020, COVID-19 has dominated headlines around the world and researchers have generated thousands of scientific articles about the disease. The fast speed of publication has challenged researchers and other stakeholders to keep up with the volume of published articles. To search the literature effectively, researchers use databases such as PubMed.<h4>Objective</h4>The aim of this study is to evaluate the performance of different searches for COVID-19 records in PubMed and to assess the complexity of searches required.<h4>Methods</h4>We tested PubMed searches for COVID-19 to identify which search string performed best according to standard metrics (sensitivity, precision, and F-score). We evaluated the performance of 8 different searches in PubMed during the first 10 weeks of the COVID-19 pandemic to investigate how complex a search string is needed. We also tested omitting hyphens and space characters as well as applying quotation marks.<h4>Results</h4>The two most comprehensive search strings combining several free-text and indexed search terms performed best in terms of sensitivity (98.4%/98.7%) and F-score (96.5%/95.7%), but the single-term search COVID-19 performed best in terms of precision (95.3%) and well in terms of sensitivity (94.4%) and F-score (94.8%). The term Wuhan virus performed the worst: 7.7% for sensitivity, 78.1% for precision, and 14.0% for F-score. We found that deleting a hyphen or space character could omit a substantial number of records, especially when searching with SARS-CoV-2 as a single term.<h4>Conclusions</h4>Comprehensive search strings combining free-text and indexed search terms performed better than single-term searches in PubMed, but not by a large margin compared to the single term COVID-19. For everyday searches, certain single-term searches that are entered correctly are probably sufficient, whereas more comprehensive searches should be used for systematic reviews. Still, we suggest additional measures that the US National Library of Medicine could take to support all PubMed users in searching the COVID-19 literature."

Created on 2021-09-21 by the reprex package (v2.0.0)

output = "raw" returns full metadata record as list, including the abstract.

@bes827
Copy link
Author

bes827 commented Sep 23, 2021

Thank you. This works as expected.
I have another similar question (I can open another issue if you this is more appropriate). How would I extract the journal title? I tried the code below but it did not work.

I know this is more of a purrr question rather than europepmc but I appreciate your help :)

dat <- out %>% { tibble( title = map_chr(., "title", .null = NA_integer_), abstract = map_chr(., "abstractText", .null = NA_integer_), pmid = map_chr(., "pmid", .null = NA_integer_), journal = map(., "journalInfo$journal$title", .null = NA_integer_) )}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants