Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite code of MedlineImporter #9673

Merged
merged 8 commits into from
Mar 18, 2023
Merged

Conversation

aqurilla
Copy link
Contributor

@aqurilla aqurilla commented Mar 16, 2023

This fixes #9536 by rewriting the org.jabref.logic.importer.fileformat.MedlineImporter class to use a StAX-Parser, and removes all JAXB dependencies from the class. The corresponding xjc task is also removed from build.gradle.

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@aqurilla aqurilla marked this pull request as ready for review March 16, 2023 06:53
@Siedlerchr
Copy link
Member

Thanks! Looks good! can you take a look at #4273 as well?

@aqurilla
Copy link
Contributor Author

Sure, I'll check that one!

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for working on this! 🎉

I have micro nitpicks 😇 - otherwise, looks good!

@aqurilla
Copy link
Contributor Author

I think the AstrophysicsDataSystemTest failed due to some network issues? I did not make any changes related to those files

@ThiloteE ThiloteE added type: code-quality Issues related to code or architecture decisions import labels Mar 17, 2023
@koppor
Copy link
Member

koppor commented Mar 17, 2023

Yes, their servers return server error:

java.io.IOException: org.jabref.logic.importer.FetcherServerException: Encountered HTTP Status Code 504

@koppor
Copy link
Member

koppor commented Mar 17, 2023

I would go ahead to merge this as is - and do fixing of #4273 as follow-up?

@aqurilla
Copy link
Contributor Author

I found that Abstract and Title xml elements containing italics, bold tags etc. were getting cut short, so updated the code to handle this.
Agreed that MathML support should be a separate PR since it looks like it will require handling a large number of tags!

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think, it's an improvement and I go forward with merge.

Regarding the italics:

The apolipoprotein E (
                        <i>APOE</i>) gene on chromosome

should get

The apolipoprotein E (\textit{APOE}) gene on chromosome

Since it are very rare cases of HTML tags, it should be "easy". Maybe, this can be implemented when converting MathML?

@koppor koppor merged commit 23a0c44 into JabRef:main Mar 18, 2023
Siedlerchr added a commit that referenced this pull request Mar 18, 2023
* upstream/main: (26 commits)
  Remove bibtexml.xsd (We don't support BibTeXML any more)
  Rewrite code of MedlineImporter (#9673)
  Rename variable and add more documentation
  Fix checkstyle
  Add hanling of "field = {content\}"
  Add link to PR
  Add link to PR.
  JabRef writes a new backup file only if there is a change.
  Add debug output to log.txt
  Refine logging howto
  Change log level for non-found files during PDF indexing
  Initial ImportHandlerTest
  Streamline AppDirs paths
  add a clear history button in sub-menu
  Removed redundant list
  add for en.properties
  add Localization.lang for string
  correct changelog
  add test for getLastSearchHistory and change HashLinkedSet to ArrayList
  fix the checkstlye
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
import type: code-quality Issues related to code or architecture decisions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rewrite code of MedlineImporter
4 participants