Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset files API extension for filters #9783

Merged
merged 55 commits into from
Oct 4, 2023
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
ebb9c9a
Added: QueryDSL setup and used for querying dataset version files (pe…
GPortas Aug 14, 2023
83dc353
Refactor: DatasetVersionFilesServiceBean for dataset version file ope…
GPortas Aug 14, 2023
22f1cf0
Refactor: DatasetVersionFilesServiceBean
GPortas Aug 14, 2023
be8b000
Added: content type filter to getVersionFiles API endpoint
GPortas Aug 14, 2023
7db09fb
Refactor: DatasetVersionFilesServiceBean query building
GPortas Aug 14, 2023
772a712
Added: category name filter to getVersionFiles API endpoint
GPortas Aug 14, 2023
1ab692d
Added: access type filter to getVersionFiles API endpoint (pending IT…
GPortas Aug 15, 2023
d5d0602
Added: embargo filters to getVersionFiles API endpoint
GPortas Aug 16, 2023
2a9d928
Merge branch '9692-files-api-extension-display-data' of github.com:IQ…
GPortas Aug 16, 2023
2c39070
Fixed: QueryDSL to use jakarta, and missing after-develop-merge impor…
GPortas Aug 16, 2023
50df166
Added: docs and release notes
GPortas Aug 16, 2023
7153e54
Added: missing docs
GPortas Aug 16, 2023
59de169
Added: getVersionFiles searchText filtering through DB querying
GPortas Aug 21, 2023
9aafbfc
Added: release notes and docs
GPortas Aug 23, 2023
538ebaf
Merge branch '9692-files-api-extension-display-data' of github.com:IQ…
GPortas Aug 25, 2023
7b02a86
Merge branch '9714-files-api-extension-filters' of github.com:IQSS/da…
GPortas Aug 25, 2023
e9cf041
Stash: getVersionFileCounts endpoint WIP. Pending access type count, …
GPortas Aug 28, 2023
cc5a1bf
Added: setFileCategories API endpoint
GPortas Aug 29, 2023
eadab48
Fixed: getVersionFilesIT test case for category filtering
GPortas Aug 29, 2023
70b9193
Added: getVersionFileCounts count per category test coverage
GPortas Aug 29, 2023
ace6783
Added: getVersionFileCounts count per access status
GPortas Aug 30, 2023
a87136c
Reefactor: new JsonPrinter methods for getVersionFileCounts response
GPortas Aug 30, 2023
e1913b3
Added: docs
GPortas Aug 30, 2023
aa60eae
Added: deleted, tabularData, and fileAccessRequest boolean fields to …
GPortas Sep 8, 2023
312aedd
Stash: userFileAccessRequested endpoint WIP
GPortas Sep 8, 2023
fad0ad7
Merge branch '9692-files-api-extension-display-data' of github.com:IQ…
GPortas Sep 8, 2023
1286b26
Merge branch '9714-files-api-extension-filters' of github.com:IQSS/da…
GPortas Sep 8, 2023
4e7e2ee
Merge branch '9785-files-api-extension-search' of github.com:IQSS/dat…
GPortas Sep 8, 2023
536885b
Merge branch '9834-files-api-extension-file-counts' of github.com:IQS…
GPortas Sep 8, 2023
455cb2c
Fixed: removed deleted field from DataFile payload which causes nulla…
GPortas Sep 8, 2023
55a81be
Refactor: simpler IT testGetUserPermissionsOnFile
GPortas Sep 8, 2023
0248e1e
Added: tests and tweaks for userFileAccessRequested API endpoint
GPortas Sep 9, 2023
d33e8f5
Added: hasBeenDeleted files API endpoint. Pending IT
GPortas Sep 11, 2023
85bd095
Merge branch '9692-files-api-extension-display-data' of github.com:IQ…
GPortas Sep 12, 2023
a05cf6b
Merge branch '9714-files-api-extension-filters' of github.com:IQSS/da…
GPortas Sep 12, 2023
19f129e
Merge branch '9785-files-api-extension-search' of github.com:IQSS/dat…
GPortas Sep 12, 2023
1aa3703
Added: IT for getHasBeenDeleted Files API endpoint
GPortas Sep 12, 2023
c224af6
Added: docs for userFileAccessRequested endpoint
GPortas Sep 12, 2023
578fdc5
Added: docs for hasBeenDeleted endpoint
GPortas Sep 12, 2023
85b9139
Added: release notes for #9851
GPortas Sep 12, 2023
aacbc64
Fixed: curl call examples in files API docs
GPortas Sep 12, 2023
d9b3f54
Fixed: null check for DataFile owner in JsonPrinter
GPortas Sep 12, 2023
4f3d27e
Merge branch 'develop' of github.com:IQSS/dataverse into 9714-files-a…
GPortas Sep 21, 2023
7b8d5ad
Merge branch '9714-files-api-extension-filters' of github.com:IQSS/da…
GPortas Sep 21, 2023
a5b605e
Merge branch '9785-files-api-extension-search' of github.com:IQSS/dat…
GPortas Sep 21, 2023
81628ad
Merge branch '9834-files-api-extension-file-counts' of github.com:IQS…
GPortas Sep 21, 2023
0cd37a4
Merge branch 'develop' of github.com:IQSS/dataverse into 9714-files-a…
GPortas Sep 27, 2023
9cc1eba
Merge branch '9714-files-api-extension-filters' of github.com:IQSS/da…
GPortas Sep 27, 2023
d4af8cf
Merge pull request #9900 from IQSS/9851-datafile-payload-extension
kcondon Sep 28, 2023
0de95c4
Merge pull request #9820 from IQSS/9785-files-api-extension-search
kcondon Sep 28, 2023
68baacc
Merge pull request #9853 from IQSS/9834-files-api-extension-file-counts
kcondon Oct 3, 2023
8adac54
Merge branch 'develop' of github.com:IQSS/dataverse into 9714-files-a…
GPortas Oct 4, 2023
7b1f5d0
Fixed: calling new getRequester method instead of old getAuthenticate…
GPortas Oct 4, 2023
ed903b2
Added: getVersionFiles filter by tabular tag name (pending IT)
GPortas Oct 4, 2023
c00ad79
Added: IT and minor refactor for setFileTabularTags endpoint
GPortas Oct 4, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 14 additions & 0 deletions doc/release-notes/9714-files-api-extension-filters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
The getVersionFiles endpoint (/api/datasets/{id}/versions/{versionId}/files) has been extended to support optional filtering by:

- Access status: through the `accessStatus` query parameter, which supports the following values:

- Public
- Restricted
- EmbargoedThenRestricted
- EmbargoedThenPublic


- Category name: through the `categoryName` query parameter. To return files to which the particular category has been added.


- Content type: through the `contentType` query parameter. To return files matching the requested content type. For example: "image/png".
37 changes: 35 additions & 2 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -964,6 +964,37 @@ This endpoint supports optional pagination, through the ``limit`` and ``offset``

curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?limit=10&offset=20"

Category name filtering is also optionally supported. To return files to which the requested category has been added.

Usage example:

.. code-block:: bash

curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?categoryName=Data"

Content type filtering is also optionally supported. To return files matching the requested content type.

Usage example:

.. code-block:: bash

curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?contentType=image/png"

File access filtering is also optionally supported. In particular, by the following possible values:

* ``Public``
* ``Restricted``
* ``EmbargoedThenRestricted``
* ``EmbargoedThenPublic``

If no filter is specified, the files will match all of the above categories.

Usage example:

.. code-block:: bash

curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?accessStatus=Public"

Ordering criteria for sorting the results is also optionally supported. In particular, by the following possible values:

* ``NameAZ`` (Default)
Expand All @@ -973,14 +1004,16 @@ Ordering criteria for sorting the results is also optionally supported. In parti
* ``Size``
* ``Type``

Please note that these values are case sensitive and must be correctly typed for the endpoint to recognize them.

Usage example:

.. code-block:: bash

curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?orderCriteria=Newest"

Please note that both filtering and ordering criteria values are case sensitive and must be correctly typed for the endpoint to recognize them.

Keep in mind that you can combine all of the above query params depending on the results you are looking for.

View Dataset Files and Folders as a Directory Index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
3 changes: 3 additions & 0 deletions modules/dataverse-parent/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,9 @@

<!-- Container related -->
<fabric8-dmp.version>0.43.0</fabric8-dmp.version>

<!-- Persistence -->
<querydsl.version>5.0.0</querydsl.version>
</properties>

<pluginRepositories>
Expand Down
14 changes: 14 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,20 @@
<artifactId>expressly</artifactId>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>com.querydsl</groupId>
<artifactId>querydsl-apt</artifactId>
<version>${querydsl.version}</version>
<classifier>jakarta</classifier>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.querydsl</groupId>
<artifactId>querydsl-jpa</artifactId>
<version>${querydsl.version}</version>
<classifier>jakarta</classifier>
</dependency>

<dependency>
<groupId>commons-io</groupId>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
package edu.harvard.iq.dataverse;

import edu.harvard.iq.dataverse.QDataFileCategory;
import edu.harvard.iq.dataverse.QDvObject;
import edu.harvard.iq.dataverse.QEmbargo;
import edu.harvard.iq.dataverse.QFileMetadata;

import com.querydsl.core.types.dsl.BooleanExpression;
import com.querydsl.core.types.dsl.CaseBuilder;
import com.querydsl.core.types.dsl.DateExpression;
import com.querydsl.core.types.dsl.DateTimeExpression;

import com.querydsl.jpa.impl.JPAQuery;
import com.querydsl.jpa.impl.JPAQueryFactory;

import jakarta.ejb.Stateless;
import jakarta.inject.Named;
import jakarta.persistence.EntityManager;
import jakarta.persistence.PersistenceContext;

import java.io.Serializable;
import java.sql.Timestamp;
import java.time.LocalDate;
import java.util.List;

@Stateless
@Named
public class DatasetVersionFilesServiceBean implements Serializable {

@PersistenceContext(unitName = "VDCNet-ejbPU")
private EntityManager em;

private final QFileMetadata fileMetadata = QFileMetadata.fileMetadata;
private final QDvObject dvObject = QDvObject.dvObject;
private final QDataFileCategory dataFileCategory = QDataFileCategory.dataFileCategory;

/**
* Different criteria to sort the results of FileMetadata queries used in {@link DatasetVersionFilesServiceBean#getFileMetadatas}
*/
public enum FileMetadatasOrderCriteria {
NameAZ, NameZA, Newest, Oldest, Size, Type
}

/**
* Status of the particular DataFile based on active embargoes and restriction state used in {@link DatasetVersionFilesServiceBean#getFileMetadatas}
*/
public enum DataFileAccessStatus {
Public, Restricted, EmbargoedThenRestricted, EmbargoedThenPublic
}

/**
* Returns a FileMetadata list of files in the specified DatasetVersion
*
* @param datasetVersion the DatasetVersion to access
* @param limit for pagination, can be null
* @param offset for pagination, can be null
* @param contentType for retrieving only files with this content type
* @param accessStatus for retrieving only files with this DataFileAccessStatus
* @param categoryName for retrieving only files categorized with this category name
* @param orderCriteria a FileMetadatasOrderCriteria to order the results
* @return a FileMetadata list from the specified DatasetVersion
*/
public List<FileMetadata> getFileMetadatas(DatasetVersion datasetVersion, Integer limit, Integer offset, String contentType, DataFileAccessStatus accessStatus, String categoryName, FileMetadatasOrderCriteria orderCriteria) {
JPAQuery<FileMetadata> baseQuery = createBaseQuery(datasetVersion, orderCriteria);

if (contentType != null) {
baseQuery.where(fileMetadata.dataFile.contentType.eq(contentType));
}
if (accessStatus != null) {
baseQuery.where(createAccessStatusExpression(accessStatus));
}
if (categoryName != null) {
baseQuery.from(dataFileCategory).where(dataFileCategory.name.eq(categoryName).and(fileMetadata.fileCategories.contains(dataFileCategory)));
}

applyOrderCriteriaToQuery(baseQuery, orderCriteria);

if (limit != null) {
baseQuery.limit(limit);
}
if (offset != null) {
baseQuery.offset(offset);
}

return baseQuery.fetch();
}

private JPAQuery<FileMetadata> createBaseQuery(DatasetVersion datasetVersion, FileMetadatasOrderCriteria orderCriteria) {
JPAQueryFactory queryFactory = new JPAQueryFactory(em);
JPAQuery<FileMetadata> baseQuery = queryFactory.selectFrom(fileMetadata).where(fileMetadata.datasetVersion.id.eq(datasetVersion.getId()));
if (orderCriteria == FileMetadatasOrderCriteria.Newest || orderCriteria == FileMetadatasOrderCriteria.Oldest) {
baseQuery.from(dvObject).where(dvObject.id.eq(fileMetadata.dataFile.id));
}
return baseQuery;
}

private BooleanExpression createAccessStatusExpression(DataFileAccessStatus accessStatus) {
QEmbargo embargo = fileMetadata.dataFile.embargo;
BooleanExpression activelyEmbargoedExpression = embargo.dateAvailable.goe(DateExpression.currentDate(LocalDate.class));
BooleanExpression inactivelyEmbargoedExpression = embargo.isNull();
BooleanExpression accessStatusExpression;
switch (accessStatus) {
case EmbargoedThenRestricted:
accessStatusExpression = activelyEmbargoedExpression.and(fileMetadata.dataFile.restricted.isTrue());
break;
case EmbargoedThenPublic:
accessStatusExpression = activelyEmbargoedExpression.and(fileMetadata.dataFile.restricted.isFalse());
break;
case Restricted:
accessStatusExpression = inactivelyEmbargoedExpression.and(fileMetadata.dataFile.restricted.isTrue());
break;
case Public:
accessStatusExpression = inactivelyEmbargoedExpression.and(fileMetadata.dataFile.restricted.isFalse());
break;
default:
throw new IllegalStateException("Unexpected value: " + accessStatus);
}
return accessStatusExpression;
}

private void applyOrderCriteriaToQuery(JPAQuery<FileMetadata> query, FileMetadatasOrderCriteria orderCriteria) {
DateTimeExpression<Timestamp> orderByLifetimeExpression = new CaseBuilder().when(dvObject.publicationDate.isNotNull()).then(dvObject.publicationDate).otherwise(dvObject.createDate);
switch (orderCriteria) {
case NameZA:
query.orderBy(fileMetadata.label.desc());
break;
case Newest:
query.orderBy(orderByLifetimeExpression.desc());
break;
case Oldest:
query.orderBy(orderByLifetimeExpression.asc());
break;
case Size:
query.orderBy(fileMetadata.dataFile.filesize.asc());
break;
case Type:
query.orderBy(fileMetadata.dataFile.contentType.asc());
break;
default:
query.orderBy(fileMetadata.label.asc());
break;
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -49,22 +49,6 @@ public class DatasetVersionServiceBean implements java.io.Serializable {

private static final SimpleDateFormat logFormatter = new SimpleDateFormat("yyyy-MM-dd'T'HH-mm-ss");

private static final String QUERY_STR_FIND_ALL_FILE_METADATAS_ORDER_BY_LABEL = "SELECT fm FROM FileMetadata fm"
+ " WHERE fm.datasetVersion.id=:datasetVersionId"
+ " ORDER BY fm.label";
private static final String QUERY_STR_FIND_ALL_FILE_METADATAS_ORDER_BY_DATE = "SELECT fm FROM FileMetadata fm, DvObject dvo"
+ " WHERE fm.datasetVersion.id = :datasetVersionId"
+ " AND fm.dataFile.id = dvo.id"
+ " ORDER BY CASE WHEN dvo.publicationDate IS NOT NULL THEN dvo.publicationDate ELSE dvo.createDate END";
private static final String QUERY_STR_FIND_ALL_FILE_METADATAS_ORDER_BY_SIZE = "SELECT fm FROM FileMetadata fm, DataFile df"
+ " WHERE fm.datasetVersion.id = :datasetVersionId"
+ " AND fm.dataFile.id = df.id"
+ " ORDER BY df.filesize";
private static final String QUERY_STR_FIND_ALL_FILE_METADATAS_ORDER_BY_TYPE = "SELECT fm FROM FileMetadata fm, DataFile df"
+ " WHERE fm.datasetVersion.id = :datasetVersionId"
+ " AND fm.dataFile.id = df.id"
+ " ORDER BY df.contentType";

@EJB
DatasetServiceBean datasetService;

Expand Down Expand Up @@ -166,18 +150,6 @@ public DatasetVersion getDatasetVersion(){
}
} // end RetrieveDatasetVersionResponse

/**
* Different criteria to sort the results of FileMetadata queries used in {@link DatasetVersionServiceBean#getFileMetadatas}
*/
public enum FileMetadatasOrderCriteria {
NameAZ,
NameZA,
Newest,
Oldest,
Size,
Type
}

public DatasetVersion find(Object pk) {
return em.find(DatasetVersion.class, pk);
}
Expand Down Expand Up @@ -1252,50 +1224,4 @@ public List<DatasetVersion> getUnarchivedDatasetVersions(){
return null;
}
} // end getUnarchivedDatasetVersions

/**
* Returns a FileMetadata list of files in the specified DatasetVersion
*
* @param datasetVersion the DatasetVersion to access
* @param limit for pagination, can be null
* @param offset for pagination, can be null
* @param orderCriteria a FileMetadatasOrderCriteria to order the results
* @return a FileMetadata list of the specified DatasetVersion
*/
public List<FileMetadata> getFileMetadatas(DatasetVersion datasetVersion, Integer limit, Integer offset, FileMetadatasOrderCriteria orderCriteria) {
TypedQuery<FileMetadata> query = em.createQuery(getQueryStringFromFileMetadatasOrderCriteria(orderCriteria), FileMetadata.class)
.setParameter("datasetVersionId", datasetVersion.getId());
if (limit != null) {
query.setMaxResults(limit);
}
if (offset != null) {
query.setFirstResult(offset);
}
return query.getResultList();
}

private String getQueryStringFromFileMetadatasOrderCriteria(FileMetadatasOrderCriteria orderCriteria) {
String queryString;
switch (orderCriteria) {
case NameZA:
queryString = QUERY_STR_FIND_ALL_FILE_METADATAS_ORDER_BY_LABEL + " DESC";
break;
case Newest:
queryString = QUERY_STR_FIND_ALL_FILE_METADATAS_ORDER_BY_DATE + " DESC";
break;
case Oldest:
queryString = QUERY_STR_FIND_ALL_FILE_METADATAS_ORDER_BY_DATE;
break;
case Size:
queryString = QUERY_STR_FIND_ALL_FILE_METADATAS_ORDER_BY_SIZE;
break;
case Type:
queryString = QUERY_STR_FIND_ALL_FILE_METADATAS_ORDER_BY_TYPE;
break;
default:
queryString = QUERY_STR_FIND_ALL_FILE_METADATAS_ORDER_BY_LABEL;
break;
}
return queryString;
}
} // end class
31 changes: 25 additions & 6 deletions src/main/java/edu/harvard/iq/dataverse/api/Datasets.java
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,9 @@ public class Datasets extends AbstractApiBean {
@Inject
PrivateUrlServiceBean privateUrlService;

@Inject
DatasetVersionFilesServiceBean datasetVersionFilesServiceBean;

/**
* Used to consolidate the way we parse and handle dataset versions.
* @param <T>
Expand Down Expand Up @@ -484,20 +487,36 @@ public Response getVersion(@Context ContainerRequestContext crc, @PathParam("id"
: ok(json(dsv));
}, getRequestUser(crc));
}

@GET
@AuthRequired
@Path("{id}/versions/{versionId}/files")
public Response getVersionFiles(@Context ContainerRequestContext crc, @PathParam("id") String datasetId, @PathParam("versionId") String versionId, @QueryParam("limit") Integer limit, @QueryParam("offset") Integer offset, @QueryParam("orderCriteria") String orderCriteria, @Context UriInfo uriInfo, @Context HttpHeaders headers) {
return response( req -> {
public Response getVersionFiles(@Context ContainerRequestContext crc,
@PathParam("id") String datasetId,
@PathParam("versionId") String versionId,
@QueryParam("limit") Integer limit,
@QueryParam("offset") Integer offset,
@QueryParam("contentType") String contentType,
@QueryParam("accessStatus") String accessStatus,
@QueryParam("categoryName") String categoryName,
@QueryParam("orderCriteria") String orderCriteria,
@Context UriInfo uriInfo,
@Context HttpHeaders headers) {
return response(req -> {
DatasetVersion datasetVersion = getDatasetVersionOrDie(req, versionId, findDatasetOrDie(datasetId), uriInfo, headers);
DatasetVersionServiceBean.FileMetadatasOrderCriteria fileMetadatasOrderCriteria;
DatasetVersionFilesServiceBean.FileMetadatasOrderCriteria fileMetadatasOrderCriteria;
try {
fileMetadatasOrderCriteria = orderCriteria != null ? DatasetVersionServiceBean.FileMetadatasOrderCriteria.valueOf(orderCriteria) : DatasetVersionServiceBean.FileMetadatasOrderCriteria.NameAZ;
fileMetadatasOrderCriteria = orderCriteria != null ? DatasetVersionFilesServiceBean.FileMetadatasOrderCriteria.valueOf(orderCriteria) : DatasetVersionFilesServiceBean.FileMetadatasOrderCriteria.NameAZ;
} catch (IllegalArgumentException e) {
return error(Response.Status.BAD_REQUEST, "Invalid order criteria: " + orderCriteria);
}
return ok(jsonFileMetadatas(datasetversionService.getFileMetadatas(datasetVersion, limit, offset, fileMetadatasOrderCriteria)));
DatasetVersionFilesServiceBean.DataFileAccessStatus dataFileAccessStatus;
try {
dataFileAccessStatus = accessStatus != null ? DatasetVersionFilesServiceBean.DataFileAccessStatus.valueOf(accessStatus) : null;
} catch (IllegalArgumentException e) {
return error(Response.Status.BAD_REQUEST, "Invalid access status: " + accessStatus);
}
return ok(jsonFileMetadatas(datasetVersionFilesServiceBean.getFileMetadatas(datasetVersion, limit, offset, contentType, dataFileAccessStatus, categoryName, fileMetadatasOrderCriteria)));
}, getRequestUser(crc));
}

Expand Down