Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Extract PDF References #10437

Merged
merged 14 commits into from Mar 12, 2024
Merged

Conversation

aqurilla
Copy link
Contributor

@aqurilla aqurilla commented Oct 1, 2023

This fixes #10200 by implementing reference extraction from PDF files

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@koppor
Copy link
Member

koppor commented Oct 24, 2023

I added an initial test pdf at aqurilla#1.

@koppor koppor added the import label Dec 18, 2023
@BBC-Esq
Copy link

BBC-Esq commented Dec 23, 2023

Interested in this as an attorney and extracting numerous legal citations...

@koppor
Copy link
Member

koppor commented Dec 24, 2023

Interested in this as an attorney and extracting numerous legal citations...

Can you share example PDFs we can license under MIT license - or maybe just sharable under a different license. 😅

@koppor
Copy link
Member

koppor commented Mar 11, 2024

Screenshots:

  1. Activate functionality

image

  1. Result

image


For completeness, the PDF part with references:

image

@koppor koppor marked this pull request as ready for review March 11, 2024 23:48
@calixtus calixtus enabled auto-merge March 11, 2024 23:49
@koppor
Copy link
Member

koppor commented Mar 11, 2024

@aqurilla Very nice work! Thank you for working on this!. Sorry on us for not having given feedback earlier.

@calixtus calixtus added this pull request to the merge queue Mar 11, 2024
Merged via the queue into JabRef:main with commit 4c64706 Mar 12, 2024
20 checks passed
Siedlerchr added a commit to Frequinzy/jabref that referenced this pull request Mar 13, 2024
* upstream/main: (36 commits)
  chore: remove repetitive words (JabRef#11015)
  Fix test names (JabRef#11014)
  Remove obsolete "Comments" tab configuration (JabRef#11011)
  Fix "Other fields" tab respecting custom tabs (JabRef#11012)
  [WIP] Extract PDF References (JabRef#10437)
  Fixed jump to entry from crossref (JabRef#11009)
  fix suggestion provider for crossref field (JabRef#10962)
  Use SequencedSet for required and optional fields (JabRef#11007)
  Bump io.github.classgraph:classgraph from 4.8.165 to 4.8.168 (JabRef#11005)
  Bump org.glassfish.hk2:hk2-api from 3.0.6 to 3.1.0 (JabRef#11006)
  Bump org.apache.logging.log4j:log4j-to-slf4j from 2.23.0 to 2.23.1 (JabRef#11003)
  Bump org.javamodularity.moduleplugin from 1.8.14 to 1.8.15 (JabRef#11002)
  Bump jakarta.xml.bind:jakarta.xml.bind-api from 4.0.1 to 4.0.2 (JabRef#11004)
  Bump softprops/action-gh-release from 1 to 2 (JabRef#11000)
  Bump gittools/actions from 0.13.2 to 0.13.4 (JabRef#11001)
  Update custom-svg-icons.md (JabRef#10999)
  Update Texworks icon (JabRef#10998)
  Use tags editor for auto completion preferences (JabRef#10990)
  Enable auto merge of CHANGELOG.md (JabRef#10986)
  Enhance DOI parser to deal with special characters (JabRef#10989)
  ...

# Conflicts:
#	build.gradle
Siedlerchr added a commit that referenced this pull request Mar 17, 2024
* upstream/main: (26 commits)
  Speed up failure reporting (#11030)
  Importing of BibDesk Groups and Linked Files (#10968)
  Convert RemoveBracesFormatterTest to @ParameterizedTest (#11033)
  Update teaching.md
  Remove non-existing recipe (#11029)
  Update CSL styles (#11031)
  Clean up defintions of entry types (#11013)
  Fix log file path on Windows (#11028)
  Change to rolling logs (#11023)
  chore: remove repetitive words (#11015)
  Fix test names (#11014)
  Remove obsolete "Comments" tab configuration (#11011)
  Fix "Other fields" tab respecting custom tabs (#11012)
  [WIP] Extract PDF References (#10437)
  Fixed jump to entry from crossref (#11009)
  fix suggestion provider for crossref field (#10962)
  Use SequencedSet for required and optional fields (#11007)
  Bump io.github.classgraph:classgraph from 4.8.165 to 4.8.168 (#11005)
  Bump org.glassfish.hk2:hk2-api from 3.0.6 to 3.1.0 (#11006)
  Bump org.apache.logging.log4j:log4j-to-slf4j from 2.23.0 to 2.23.1 (#11003)
  ...

# Conflicts:
#	src/main/resources/csl-styles
Siedlerchr added a commit that referenced this pull request Mar 17, 2024
* upstream/main: (26 commits)
  Speed up failure reporting (#11030)
  Importing of BibDesk Groups and Linked Files (#10968)
  Convert RemoveBracesFormatterTest to @ParameterizedTest (#11033)
  Update teaching.md
  Remove non-existing recipe (#11029)
  Update CSL styles (#11031)
  Clean up defintions of entry types (#11013)
  Fix log file path on Windows (#11028)
  Change to rolling logs (#11023)
  chore: remove repetitive words (#11015)
  Fix test names (#11014)
  Remove obsolete "Comments" tab configuration (#11011)
  Fix "Other fields" tab respecting custom tabs (#11012)
  [WIP] Extract PDF References (#10437)
  Fixed jump to entry from crossref (#11009)
  fix suggestion provider for crossref field (#10962)
  Use SequencedSet for required and optional fields (#11007)
  Bump io.github.classgraph:classgraph from 4.8.165 to 4.8.168 (#11005)
  Bump org.glassfish.hk2:hk2-api from 3.0.6 to 3.1.0 (#11006)
  Bump org.apache.logging.log4j:log4j-to-slf4j from 2.23.0 to 2.23.1 (#11003)
  ...

# Conflicts:
#	src/main/resources/csl-styles
@aqurilla
Copy link
Contributor Author

aqurilla commented Apr 6, 2024

@koppor no problem, thank you!

@koppor
Copy link
Member

koppor commented Apr 7, 2024

@koppor no problem, thank you!

@aqurilla Just as side note: I implement the offline parsing at #11156. Thanks to your "Framework", I could focus on the logic part!

@aqurilla
Copy link
Contributor Author

aqurilla commented Apr 7, 2024

@koppor that is great to hear! Thanks for adding the offline functionality for this feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature request: extract pdf references
7 participants