JabRef/jabref #15565

qodo-free-for-open-source-projects · 2026-04-16T16:31:52Z

2. Arxiv pdf url fails 🐞 Bug ≡ Correctness

ArXivFetcher.performSearchById still calls arXiv.asyncPerformSearchById(identifier) with the raw input, so https://arxiv.org/pdf/<id>.pdf fails because ArXivIdentifier.parse rejects the trailing .pdf. The new UrlIdentifierParser.parseArXiv result is only used for the DOI-infusion optimization, not for the actual arXiv lookup, so the feature doesn’t work end-to-end for PDF URLs.

Agent Prompt

### Issue description `ArXivFetcher.performSearchById` still passes the raw user input into `arXiv.asyncPerformSearchById(...)`. For inputs like `https://arxiv.org/pdf/2203.02155.pdf`, `ArXivIdentifier.parse(...)` rejects the `.pdf` suffix, so the actual fetch returns empty even though `UrlIdentifierParser.parseArXiv` can normalize this URL. ### Issue Context `UrlIdentifierParser.parseArXiv(identifier)` is currently only used to accelerate DOI infusion, not to normalize the identifier used in the actual arXiv fetch. ### Fix Focus Areas - jablib/src/main/java/org/jabref/logic/importer/fetcher/ArXivFetcher.java[339-346] ### Suggested fix 1. Parse once at the start of `performSearchById`: - `Optional<ArXivIdentifier> parsed = UrlIdentifierParser.parseArXiv(identifier);` 2. If `parsed.isEmpty()`, return `Optional.empty()` (or keep existing behavior, but avoid calling the API with a non-parseable URL). 3. Call `arXiv.asyncPerformSearchById(parsed.get().asString())` instead of using `identifier`. 4. Pass `parsed` into `inplaceAsyncInfuseArXivWithDoi(...)` so both the fetch and the optimization use the same normalized value.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-free-for-open-source-projects · 2026-04-16T16:31:52Z

4. Acm doi regex rejects pdf 🐞 Bug ≡ Correctness

UrlIdentifierParser.parseDOI short-circuits on DOI_ACM_PATTERN and captures everything after /doi/, so URLs like https://dl.acm.org/doi/pdf/10.... are turned into pdf/10.... and then rejected by DOI.parse. This is a regression because DOI.parse is already able to extract a DOI embedded later in an arbitrary https URL.

Agent Prompt

### Issue description `UrlIdentifierParser.parseDOI` uses `DOI_ACM_PATTERN = https?://dl.acm.org/doi/(?:abs/)?(.+)` and returns `DOI.parse(acmMatcher.group(1))`. For common ACM URLs such as `/doi/pdf/10.1145/...`, this extracts `pdf/10.1145/...` and causes parsing to fail. ### Issue Context `DOI.parse(...)` is already designed to handle many URL forms by allowing an arbitrary `https?://...` prefix before the DOI group. ### Fix Focus Areas - jablib/src/main/java/org/jabref/logic/importer/util/UrlIdentifierParser.java[16-43] ### Suggested fix options Option A (simplest/robust): - Remove the special-case ACM/doi.org regexes and just `return DOI.parse(trimmedInput);` (or use `DOI.findInText(trimmedInput)` first if you want to safely ignore query/fragment junk). Option B (keep special-cases): - Tighten the ACM pattern to capture only a DOI starting with `10.` and stop at query/fragment: - e.g., `https?://dl\\.acm\\.org/doi/(?:abs/|pdf/|full/)?(10\\.[^\\s?#]+)` - Use `matches()` (or anchor with `^...$`) instead of `find()` so you don’t accidentally capture trailing unrelated text.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-free-for-open-source-projects · 2026-04-16T16:31:52Z

1. Tests only assert presence 📘 Rule violation ≡ Correctness

UrlIdentifierParserTest uses assertTrue(optional.isPresent())/assertFalse(optional.isPresent()) instead of asserting the exact parsed DOI/arXiv value. This weakens test precision and can allow incorrect-but-present parsing results to pass.

Agent Prompt

## Issue description The new tests only assert `Optional.isPresent()` / `isPresent()`-negation, which is a weak predicate and does not verify that the parser extracted the correct DOI/arXiv identifier. ## Issue Context Per test compliance, assertions should compare against the full expected value/structure (e.g., `assertEquals(expectedOptional, actualOptional)`). ## Fix Focus Areas - jablib/src/test/java/org/jabref/logic/importer/util/UrlIdentifierParserTest.java[13-102]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

-Original file line number
+Diff line change
@@ Expand Up / @@ -42,6 +42,7 @@ @@
     import org.jabref.model.entry.field.InternalField;
     import org.jabref.model.entry.field.StandardField;
     import org.jabref.model.entry.identifier.ArXivIdentifier;
+    import org.jabref.logic.importer.util.UrlIdentifierParser;
     import org.jabref.model.entry.identifier.DOI;
     import org.jabref.model.entry.types.StandardEntryType;
     import org.jabref.model.paging.Page;
@@ Expand Down Expand Up @@
         public Optional<BibEntry> performSearchById(String identifier) throws FetcherException {
             CompletableFuture<Optional<BibEntry>> arXivBibEntryPromise = arXiv.asyncPerformSearchById(identifier);
             if (this.doiFetcher != null) {
-                inplaceAsyncInfuseArXivWithDoi(arXivBibEntryPromise, ArXivIdentifier.parse(identifier));
+                inplaceAsyncInfuseArXivWithDoi(arXivBibEntryPromise, UrlIdentifierParser.parseArXiv(identifier));
             }
             return arXivBibEntryPromise.join();
         }
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -31,6 +31,7 @@ @@
     import org.jabref.model.entry.BibEntry;
     import org.jabref.model.entry.field.StandardField;
     import org.jabref.model.entry.identifier.DOI;
+    import org.jabref.logic.importer.util.UrlIdentifierParser;
     import org.jabref.model.entry.types.StandardEntryType;
     import org.jabref.model.util.DummyFileUpdateMonitor;
     import org.jabref.model.util.OptionalUtil;
@@ Expand Down Expand Up / @@ -88,7 +89,7 @@ public Optional<HelpFile> getHelpPage() { @@
         private void doAPILimiting(String identifier) {
             // Without a generic API Rate Limiter implemented on the project, use Guava's RateLimiter for avoiding
             // API throttling when multiple threads are working, specially during DOI Content Negotiations
-            Optional<DOI> doi = DOI.parse(identifier);
+            Optional<DOI> doi = UrlIdentifierParser.parseDOI(identifier);
             try {
                 Optional<String> agency;
@@ Expand Down Expand Up @@
         @Override
         public Optional<BibEntry> performSearchById(String identifier) throws FetcherException {
-            DOI doi = DOI.parse(identifier)
+            DOI doi = UrlIdentifierParser.parseDOI(identifier)
                          .orElseThrow(() -> new FetcherException(Localization.lang("Invalid DOI: '%0'.", identifier)));
             URL doiURL;
@@ Expand All @@
                 throw new FetcherException("Invalid URL", e);
             }
             if (agency.isPresent() && "medra".equalsIgnoreCase(agency.get())) {
-                return new Medra().performSearchById(identifier);
+                return new Medra().performSearchById(doi.asString());
             }
             URLDownload download = getUrlDownload(doiURL);
@@ Expand Down @@

-Original file line number
+Diff line change
@@ -0,0 +1,59 @@
+    package org.jabref.logic.importer.util;
+    import java.util.Optional;
+    import java.util.regex.Matcher;
+    import java.util.regex.Pattern;
+    import org.jabref.model.entry.identifier.ArXivIdentifier;
+    import org.jabref.model.entry.identifier.DOI;
+    /**
+     * Parses identifiers from URLs and plain text.
+     * Extracts DOI, arXiv ID, etc. from various URL formats.
+     */
+    public class UrlIdentifierParser {
+        private static final Pattern DOI_URL_PATTERN =
+                Pattern.compile("https?://(?:dx\\.)?doi\\.org/(.+)");
+        private static final Pattern DOI_ACM_PATTERN =
+                Pattern.compile("https?://dl\\.acm\\.org/doi/(?:abs/)?(.+)");
+        private static final Pattern ARXIV_URL_PATTERN =
+                Pattern.compile("https?://arxiv\\.org/(?:abs|pdf)/([\\w.\\-]+?)(?:\\.pdf)?$");
+        public static Optional<DOI> parseDOI(String input) {
+            if (input == null || input.isBlank()) {
+                return Optional.empty();
+            }
+            String trimmedInput = input.trim();
+            Matcher doiUrlMatcher = DOI_URL_PATTERN.matcher(trimmedInput);
+            if (doiUrlMatcher.find()) {
+                return DOI.parse(doiUrlMatcher.group(1));
+            }
+            Matcher acmMatcher = DOI_ACM_PATTERN.matcher(trimmedInput);
+            if (acmMatcher.find()) {
+                return DOI.parse(acmMatcher.group(1));
+            }
+            return DOI.parse(trimmedInput);
+        }
+        public static Optional<ArXivIdentifier> parseArXiv(String input) {
+            if (input == null || input.isBlank()) {
+                return Optional.empty();
+            }
+            String trimmedInput = input.trim();
+            Matcher arxivMatcher = ARXIV_URL_PATTERN.matcher(trimmedInput);
+            if (arxivMatcher.find()) {
+                return ArXivIdentifier.parse(arxivMatcher.group(1));
+            }
+            return ArXivIdentifier.parse(trimmedInput);
+        }
+    }

-Original file line number
+Diff line change
@@ -0,0 +1,103 @@
+    package org.jabref.logic.importer.util;
+    import org.jabref.model.entry.identifier.ArXivIdentifier;
+    import org.jabref.model.entry.identifier.DOI;
+    import org.junit.jupiter.api.Test;
+    import static org.junit.jupiter.api.Assertions.assertFalse;
+    import static org.junit.jupiter.api.Assertions.assertTrue;
+    class UrlIdentifierParserTest {
+        @Test
+        void parseDOIFromPlainDOI() {
+            String input = "10.1145/3544548.3580995";
+            assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+        }
+        @Test
+        void parseDOIFromDoiOrgURL() {
+            String input = "https://doi.org/10.1145/3544548.3580995";
+            assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+        }
+        @Test
+        void parseDOIFromDxDoiOrgURL() {
+            String input = "https://dx.doi.org/10.1145/3544548.3580995";
+            assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+        }
+        @Test
+        void parseDOIFromHTTPURL() {
+            String input = "http://doi.org/10.1145/3544548.3580995";
+            assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+        }
+        @Test
+        void parseDOIFromACMDigitalLibrary() {
+            String input = "https://dl.acm.org/doi/10.1145/3544548.3580995";
+            assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+        }
+        @Test
+        void parseDOIFromACMAbsURL() {
+            String input = "https://dl.acm.org/doi/abs/10.1145/3544548.3580995";
+            assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+        }
+        @Test
+        void parseDOIReturnsEmptyForNull() {
+            assertFalse(UrlIdentifierParser.parseDOI(null).isPresent());
+        }
+        @Test
+        void parseDOIReturnsEmptyForEmptyString() {
+            assertFalse(UrlIdentifierParser.parseDOI("").isPresent());
+        }
+        @Test
+        void parseDOIReturnsEmptyForInvalidURL() {
+            assertFalse(UrlIdentifierParser.parseDOI("https://example.com").isPresent());
+        }
+        @Test
+        void parseArXivFromPlainID() {
+            String input = "2203.02155";
+            assertTrue(UrlIdentifierParser.parseArXiv(input).isPresent());
+        }
+        @Test
+        void parseArXivFromAbsURL() {
+            String input = "https://arxiv.org/abs/2203.02155";
+            assertTrue(UrlIdentifierParser.parseArXiv(input).isPresent());
+        }
+        @Test
+        void parseArXivFromPDFURL() {
+            String input = "https://arxiv.org/pdf/2203.02155.pdf";
+            assertTrue(UrlIdentifierParser.parseArXiv(input).isPresent());
+        }
+        @Test
+        void parseArXivFromHTTPURL() {
+            String input = "http://arxiv.org/abs/2203.02155";
+            assertTrue(UrlIdentifierParser.parseArXiv(input).isPresent());
+        }
+        @Test
+        void parseArXivReturnsEmptyForNull() {
+            assertFalse(UrlIdentifierParser.parseArXiv(null).isPresent());
+        }
+        @Test
+        void parseArXivReturnsEmptyForInvalidURL() {
+            assertFalse(UrlIdentifierParser.parseArXiv("https://example.com").isPresent());
+        }
+        @Test
+        void parseArXivHandlesOldIDFormat() {
+            String input = "https://arxiv.org/abs/math.GT/0309136";
+            assertTrue(UrlIdentifierParser.parseArXiv(input).isPresent());
+        }
+    }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JabRef/jabref #15565

Uh oh!

Diff view

Diff view

There are no files selected for viewing

qodo-free-for-open-source-projects Bot Apr 16, 2026

Uh oh!

Uh oh!

qodo-free-for-open-source-projects Bot Apr 16, 2026

Uh oh!

qodo-free-for-open-source-projects Bot Apr 16, 2026

Uh oh!

Uh oh!

Uh oh!

JabRef/jabref #15565

Uh oh!

JabRef/jabref #15565

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

qodo-free-for-open-source-projects Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qodo-free-for-open-source-projects Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-free-for-open-source-projects Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!