Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems parsing .bib from Web of Science #31

Open
gorkang opened this issue Jul 19, 2019 · 12 comments
Open

Problems parsing .bib from Web of Science #31

gorkang opened this issue Jul 19, 2019 · 12 comments

Comments

@gorkang
Copy link

gorkang commented Jul 19, 2019

I downloaded a bib file from Web of Science savedrecs.zip and there are multiple issues when reading it. The solution shown in #21 doesn't work here :(

Most of them seen to be related with what you @ottlngr mentioned in in #21 (key-value pairs not separated by linebreaks):

  • AUTHORS: The authors not in the first line are lost
  • ABSTRACT: Only the first line of the abstract is imported

But other issues seem to arise from a different thing:

  • A bunch of extra columns appear (for a simplified case, see [A] below)

[A] single_reference.zip
When reading this bib reference, the following lines of the abstract are creating new columns (the first-word of the line is the column title, and the text in the cell is whatever comes after the "="):

  • benefits and harms; n = 451) or non-evidence-based (e.g., relative risks
  • on benefits only; n = 446) patient information about a cancer screening
  • non-evidence-based patient information (n = 446), a mean of 33.1% of
  • whereas with evidence-based patient information (n = 451), only half as

So, the first of those creates a BENEFITS column with a text "451) or non-evidence-based (e.g., relative risks"

Please, let me know if I can be of any help testing/debugging this.

@ottlngr
Copy link
Contributor

ottlngr commented Jul 26, 2019

Hi, thanks for your message.

This seems to happen because of the multi-line values in this particular .bib file. I'll have to play with it a bit to see what can be improved in bib2df to avoid this behaviour.

@jjsantana
Copy link

Any news on this issue? I have the same problem. I have downloaded a bib file from Web of Science and anything after a line break (e.g. all of the abstracts) is excluded from the dataframe. I really like your package otherwise, and hope that you are able to resolve this critical problem!

@paulcbauer
Copy link
Contributor

@ottlngr we ran into the same issue (our code builds on bib2df). Maybe the function here could constitute the basis for a solution (not sure how robust it is): https://github.com/paulcbauer/flex_bib/blob/master/merge_bib_lines.R

@jjsantana maybe this helps: https://github.com/paulcbauer/flex_bib#caveats

ottlngr added a commit that referenced this issue Jul 2, 2020
@ottlngr
Copy link
Contributor

ottlngr commented Jul 2, 2020

@paulcbauer I added a test caste that covers this issue. Of cource it fails at the moment, but feel free to try integrating your function and see if the test succeeds.

@paulcbauer
Copy link
Contributor

paulcbauer commented Jul 7, 2020 via email

@ottlngr
Copy link
Contributor

ottlngr commented Jul 10, 2020

Cool, thanks for the effort. I will have a closer look at it.

@paulcbauer
Copy link
Contributor

paulcbauer commented Jul 10, 2020 via email

@xiaofanliang
Copy link

@paulcbauer 's suggestion to the merge_bib_lines function in https://github.com/paulcbauer/flex_bib#caveats works out for me as a temporary solution (Thank you!). It can also process bib files that contain multiple bibs.

@robertberryuk
Copy link

Hi there

Wondered if there was an update on this issue. I'm unable to import full abstracts from WoS .bib files and cannot get the above solutions to work. Thanks.

@robertberryuk
Copy link

Apologies - I did get @paulcbauer's merge_bib_lines function to work and it solved the issue with import of incomplete abstracts - many thanks.

@robertberryuk
Copy link

Problem I have now is that the merge_bib_lines function does not parse text properly when the character "=" is encountered - any ideas? Thanks

@paulcbauer
Copy link
Contributor

paulcbauer commented Jun 3, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants