Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add paging support #5518

Merged
merged 8 commits into from Nov 8, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
@@ -0,0 +1,26 @@
package org.jabref.logic.importer;

import org.jabref.model.entry.BibEntry;
import org.jabref.model.paging.Page;

public interface PagedSearchBasedFetcher extends SearchBasedFetcher {
JoHaHu marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some JavaDoc here about the idea?

I see that one can

a) jump to arbitrary pages
b) the page size supported is pre-defined

Would there also be a method setPageSize() make sense? Maybe a sub interface PagedSearchBasedFetcherWithVariablePageSize?

The alternative solution could have been iterate as jcabi github does it. However, with your approach, one can directly jump to a certain page and is not forced to query sequentially.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to keep the fetcher as stateless as possible. If thats is not required, i can provide an interface for that purpose.
It probably depends where we want configure the page size. In the application settings or as a dropdown?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see it in a drop down. Internally, it can be persisted as preference, so that at a restart of JabRef, the value is the same.

I can imagine that with pagination we need to be stateful. If I as user go back and forth in the UI, I want to have quick responses. If there is a request sent to the server each time, this will be slow.

I am aware that this will require a "refresh search results" button in the UI, which just resets the factory to trigger new searches.

I would even use Google Guava Cache for caching different search results over different fetchers. -- The search results for publications won't differ from hour to hour, but merely from day to day or even week to week. Having the possibility to limit the cache by memory size. - I also find the loading pretty nice. see https://github.com/google/guava/wiki/CachesExplained#from-a-callable:

CacheLoader<Key, Graph> loader = CacheLoader.from(key -> createExpensiveGraph(key));

Source: https://guava.dev/releases/snapshot/api/docs/com/google/common/cache/CacheLoader.html

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would let each fetcher decide the appropriate page size. Some fetchers might not support a variable page size, and even if they do there might be restrictions (for example some fetchers get a list of ids and then fetch details for every id; in this case you probably want to have only a small page size).

Copy link
Contributor Author

@JoHaHu JoHaHu Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see it in a drop down. Internally, it can be persisted as preference, so that at a restart of JabRef, the value is the same.

Then I think we dont need a setSize method, because how the getSize is implemented is up to the fetcher. The fetcher can then use the Preference API.

I can imagine that with pagination we need to be stateful. If I as user go back and forth in the UI, I want to have quick responses. If there is a request sent to the server each time, this will be slow.

I would argue that the fetcher doesn't need to be stateless but should be stateless. It's solely responsibility is to fetch results. The ViewModel can then cache the results. Caching is possible in the fetcher as well, but the Viewmodel is statefull anyway, therefore we don't add too much new complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even use Google Guava Cache for caching different search results over different fetchers. -- The search results for publications won't differ from hour to hour, but merely from day to day or even week to week. Having the possibility to limit the cache by memory size. - I also find the loading pretty nice. see https://github.com/google/guava/wiki/CachesExplained#from-a-callable:

That's another reason to do it in the viewmodel

Copy link
Contributor Author

@JoHaHu JoHaHu Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can implement a cache decorator for the fetchers. That way caching becomes very easy. But instanceof checks will fail

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking, that query optimization can be done best at the result provider. Thus, if I query 1000 elements at once as JabRef, following can happen:

  1. Provider supports 1000 entrie at once --> they are returned -(query optimization there)
  2. Provider supports 50 entries at once --> jabref does 20 calls to the provider (nearly no query optimization at the provider)

My thoughts with the factory were driven with the idea that the first setting should be supported.

That leads me to the idea that we might need maxPageSize and pageSize. Maybe even different page sizes for the view and for the fetcher...

We can surely separate the the basic fetcher and some layer inbetween.

UI --> caching layer --> fetcher

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching is a nice idea, but not thaaaat important. I would say, first focus on the paging UI and then we can have a look at how to implement caching.


/**
* @param query search query send to endpoint
* @param pageNumber requested site number
* @return Page with search results
*/
Page<BibEntry> performSearchPaged(String query, int pageNumber) throws FetcherException;

/**
* @return default pageSize
*/
default int getPageSize() {
return 20;
}

@Override
default boolean supportsPaging() {
return true;
}
JoHaHu marked this conversation as resolved.
Show resolved Hide resolved
}
@@ -0,0 +1,16 @@
package org.jabref.logic.importer;

import java.net.MalformedURLException;
import java.net.URISyntaxException;
import java.net.URL;

public interface PagedSearchBasedParserFetcher extends SearchBasedParserFetcher, PagedSearchBasedFetcher {

/**
* Constructs a URL based on the query, size and page number.
* @param query the search query
* @param size the size of the page
* @param pageNumber the number of the page
* */
URL getURLForQuery(String query, int size, int pageNumber) throws URISyntaxException, MalformedURLException, FetcherException;
}
Expand Up @@ -17,4 +17,8 @@ public interface SearchBasedFetcher extends WebFetcher {
* @return a list of {@link BibEntry}, which are matched by the query (may be empty)
*/
List<BibEntry> performSearch(String query) throws FetcherException;

default boolean supportsPaging() {
koppor marked this conversation as resolved.
Show resolved Hide resolved
return false;
}
}
Expand Up @@ -22,15 +22,16 @@
import org.jabref.logic.importer.FetcherException;
import org.jabref.logic.importer.IdBasedParserFetcher;
import org.jabref.logic.importer.ImportFormatPreferences;
import org.jabref.logic.importer.PagedSearchBasedParserFetcher;
import org.jabref.logic.importer.ParseException;
import org.jabref.logic.importer.Parser;
import org.jabref.logic.importer.SearchBasedParserFetcher;
import org.jabref.logic.importer.fileformat.BibtexParser;
import org.jabref.logic.net.URLDownload;
import org.jabref.model.cleanup.FieldFormatterCleanup;
import org.jabref.model.entry.BibEntry;
import org.jabref.model.entry.field.StandardField;
import org.jabref.model.entry.field.UnknownField;
import org.jabref.model.paging.Page;
import org.jabref.model.strings.StringUtil;
import org.jabref.model.util.DummyFileUpdateMonitor;

Expand All @@ -42,7 +43,7 @@
/**
* Fetches data from the SAO/NASA Astrophysics Data System (https://ui.adsabs.harvard.edu/)
*/
public class AstrophysicsDataSystem implements IdBasedParserFetcher, SearchBasedParserFetcher, EntryBasedParserFetcher {
public class AstrophysicsDataSystem implements IdBasedParserFetcher, PagedSearchBasedParserFetcher, EntryBasedParserFetcher {

private static final String API_SEARCH_URL = "https://api.adsabs.harvard.edu/v1/search/query";
private static final String API_EXPORT_URL = "https://api.adsabs.harvard.edu/v1/export/bibtexabs";
Expand Down Expand Up @@ -87,6 +88,16 @@ public URL getURLForQuery(String query) throws URISyntaxException, MalformedURLE
return builder.build().toURL();
}

@Override
public URL getURLForQuery(String query, int size, int pageNumber) throws URISyntaxException, MalformedURLException, FetcherException {
URIBuilder builder = new URIBuilder(API_SEARCH_URL);
builder.addParameter("q", query);
builder.addParameter("fl", "bibcode");
builder.addParameter("rows", String.valueOf(size));
builder.addParameter("start", String.valueOf(size * pageNumber));
return builder.build().toURL();
}

/**
* @param entry BibEntry for which a search URL is created
* @return URL which points to a search request for given entry
Expand Down Expand Up @@ -276,4 +287,21 @@ private List<BibEntry> performSearchByIds(Collection<String> identifiers) throws
throw new FetcherException("An internal parser error occurred", e);
}
}

@Override
public Page<BibEntry> performSearchPaged(String query, int pageNumber) throws FetcherException {

if (StringUtil.isBlank(query)) {
return new Page<>(query, pageNumber);
}
try {
List<String> bibcodes = fetchBibcodes(getURLForQuery(query, getPageSize(), pageNumber));
Collection<BibEntry> results = performSearchByIds(bibcodes);
return new Page<>(query, pageNumber, results);
} catch (URISyntaxException e) {
throw new FetcherException("Search URI is malformed", e);
} catch (IOException e) {
throw new FetcherException("A network error occurred", e);
}
}
}
37 changes: 37 additions & 0 deletions src/main/java/org/jabref/model/paging/Page.java
@@ -0,0 +1,37 @@
package org.jabref.model.paging;

import java.util.Collection;
import java.util.Collections;

public class Page<T> {

private int pageNumber;
private String query;
private Collection<T> content;

public Page(String query, int pageNumber, Collection<T> content) {
this.query = query;
this.pageNumber = pageNumber;
this.content = Collections.unmodifiableCollection(content);
}

public Page(String query, int pageNumber) {
this(query, pageNumber, Collections.emptyList());
}

public Collection<T> getContent() {
return content;
}

public int getPageNumber() {
return pageNumber;
}

public String getQuery() {
return query;
}

public int getSize() {
return content.size();
}
}
2 changes: 2 additions & 0 deletions src/test/java/org/jabref/logic/importer/WebFetchersTest.java
Expand Up @@ -70,6 +70,8 @@ void getSearchBasedFetchersReturnsAllFetcherDerivingFromSearchBasedFetcher() thr
Set<Class<?>> expected = controlClasses.loadClasses().stream().collect(Collectors.toSet());

expected.remove(SearchBasedParserFetcher.class);
expected.remove(PagedSearchBasedParserFetcher.class);
koppor marked this conversation as resolved.
Show resolved Hide resolved
expected.remove(PagedSearchBasedFetcher.class);
assertEquals(expected, getClasses(searchBasedFetchers));
}
}
Expand Down
Expand Up @@ -8,13 +8,15 @@
import org.jabref.model.entry.BibEntry;
import org.jabref.model.entry.field.StandardField;
import org.jabref.model.entry.types.StandardEntryType;
import org.jabref.model.paging.Page;
import org.jabref.testutils.category.FetcherTest;

import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertNotEquals;
import static org.junit.jupiter.api.Assertions.assertTrue;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.when;
Expand Down Expand Up @@ -198,4 +200,26 @@ public void testPerformSearchByLuceyPaulEntry() throws Exception {
Optional<BibEntry> fetchedEntry = fetcher.performSearchById("2000JGR...10520297L");
assertEquals(Optional.of(luceyPaulEntry), fetchedEntry);
}

@Test
public void performSearchByQueryPaged_searchLimitsSize() throws Exception {
Page<BibEntry> page = fetcher.performSearchPaged("author:\"A\"", 0);
assertEquals(fetcher.getPageSize(), page.getSize(), "fetcher return wrong page size");
}

@Test
JoHaHu marked this conversation as resolved.
Show resolved Hide resolved
public void performSearchByQueryPaged_invalidAuthorsReturnEmptyPages() throws Exception {
Page<BibEntry> page = fetcher.performSearchPaged("author:\"ThisAuthorWillNotBeFound\"", 0);
Page<BibEntry> page5 = fetcher.performSearchPaged("author:\"ThisAuthorWillNotBeFound\"", 5);
assertEquals(0, page.getSize(), "fetcher doesnt return empty pages for invalid author");
assertEquals(0, page5.getSize(), "fetcher doesnt return empty pages for invalid author");
}

@Test
public void performSearchByQueryPaged_twoPagesNotEqual() throws Exception {
Page<BibEntry> page = fetcher.performSearchPaged("author:\"A\"", 0);
Page<BibEntry> page2 = fetcher.performSearchPaged("author:\"A\"", 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, in the UI it will be hard to always pass the query. Why not adding an intermediate layer searchPagesFactory = fetcher.searchPagesFactory("author:...") and then searchPagesFactory.getPage(0)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stateless argument applies here too. I would prefer the viewmodel to handle state

// This tests if the fetcher actually performs paging
assertNotEquals(page.getContent(), page2.getContent(), "Two consecutive pages shouldn't be equal");
}
}