Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geospatial search #8239

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
f74e0c2
add bonding box indexing
qqmyers Sep 9, 2021
f3db9ee
Merge remote-tracking branch 'IQSS/develop' into
qqmyers Nov 11, 2021
7b15877
Merge remote-tracking branch 'IQSS/develop' into GDCC/geosearch
qqmyers Feb 2, 2022
d9eed2f
Merge remote-tracking branch 'IQSS/develop' into GDCC/geosearch
qqmyers May 24, 2022
93a3cb6
Merge remote-tracking branch 'IQSS/develop' into GDCC/geosearch
qqmyers May 26, 2022
84e9614
Merge remote-tracking branch 'IQSS/develop' into GDCC/geosearch
qqmyers Jun 26, 2022
e765a8d
Merge remote-tracking branch 'IQSS/develop' into GDCC/geosearch
qqmyers Jul 28, 2022
d5430e4
Merge remote-tracking branch 'IQSS/develop' into GDCC/geosearch
qqmyers Aug 3, 2022
4ec3d0f
Merge remote-tracking branch 'IQSS/develop' into GDCC/geosearch
qqmyers Aug 9, 2022
6da3b2a
Merge remote-tracking branch 'IQSS/develop' into GDCC/geosearch
qqmyers Aug 12, 2022
3b8c859
Merge remote-tracking branch 'IQSS/develop' into GDCC/geosearch
qqmyers Aug 18, 2022
8ce67a3
Merge remote-tracking branch 'IQSS/develop' into GDCC/geosearch
qqmyers Sep 14, 2022
1007a6b
Update conf/solr/8.11.1/schema.xml
qqmyers Sep 14, 2022
b903e2a
release note and addition to search doc
qqmyers Sep 15, 2022
d79ba87
Merge branch 'GDCC/geosearch' of https://github.com/GlobalDataverseCo…
qqmyers Sep 15, 2022
c210821
add space
qqmyers Sep 19, 2022
8c11b7d
Merge branch 'develop' into GDCC/geosearch #8239
pdurbin Sep 26, 2022
de079fd
Merge branch 'develop' into GDCC/geosearch #8239
pdurbin Oct 5, 2022
0b0c3b9
add multivalued in schema
qqmyers Oct 6, 2022
23a6d58
Merge branch 'GDCC/geosearch' of https://github.com/GlobalDataverseCo…
qqmyers Oct 6, 2022
cf9b4ae
case matters
qqmyers Oct 6, 2022
f542925
north < south latitude is an error
qqmyers Oct 6, 2022
4f9434e
fix another non-physical box
qqmyers Oct 12, 2022
1db095f
handle multiples - make bbox a single surrounding box
qqmyers Oct 12, 2022
202438a
typo
qqmyers Oct 13, 2022
8c048a7
wrong scope
qqmyers Oct 13, 2022
d48c29a
Merge branch 'develop' into GDCC/geosearch #8239
pdurbin Oct 13, 2022
c7c16d4
add geo_point and geo_radius #8239
pdurbin Oct 14, 2022
3d647f4
move hard coded strings to SearchFields class #8239
pdurbin Oct 25, 2022
7272de6
add geospatial search test #8239
pdurbin Oct 25, 2022
efa8e9c
Merge branch 'develop' into GDCC/geosearch #8239
pdurbin Oct 25, 2022
e5187b2
Avoid DatasetCreate exception with only one coordinate #8239
pdurbin Oct 25, 2022
28215fb
rename solr_srpt to geolocation and solr_bboxtype to boundingBox #8239
pdurbin Oct 25, 2022
ff32672
add error checking for geo_point and geo_radius #8239
pdurbin Oct 26, 2022
6e7499e
update docs and release note (supported via API) #8239
pdurbin Oct 26, 2022
b5383f4
Merge remote-tracking branch 'IQSS/develop' into GDCC/geosearch
qqmyers Nov 7, 2022
364e347
test invalid lat/long (too large) #8239
pdurbin Nov 21, 2022
9d8332d
Merge branch 'develop' into GDCC/geosearch #8239
pdurbin Nov 21, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions conf/solr/8.11.1/schema.xml
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,11 @@

<field name="dsPersistentId" type="text_en" multiValued="false" stored="true" indexed="true"/>
<field name="filePersistentId" type="text_en" multiValued="false" stored="true" indexed="true"/>
<!-- Dataverse geospatial search -->
<!-- https://solr.apache.org/guide/8_11/spatial-search.html#rpt -->
<field name="geolocation" type="location_rpt" multiValued="true" stored="true" indexed="true"/>
<!-- https://solr.apache.org/guide/8_11/spatial-search.html#bboxfield -->
<field name="boundingBox" type="bbox" multiValued="true" stored="true" indexed="true"/>

<!--
METADATA SCHEMA FIELDS
Expand Down Expand Up @@ -1104,6 +1109,9 @@
-->
<fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
geo="true" distErrPct="0.025" maxDistErr="0.001" distanceUnits="kilometers" />
<!-- Dataverse - per GeoBlacklight, adding field type for bboxField that enables, among other things, overlap ratio calculations -->
<fieldType name="bbox" class="solr.BBoxField"
geo="true" distanceUnits="kilometers" numberType="pdouble" />

<!-- Payloaded field types -->
<fieldType name="delimited_payloads_float" stored="false" indexed="true" class="solr.TextField">
Expand Down
5 changes: 5 additions & 0 deletions doc/release-notes/8239-geospatial-indexing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Support for indexing the "Geographic Bounding Box" fields ("West Longitude", "East Longitude", "North Latitude", and "South Latitude") from the Geospatial metadata block has been added.

Geospatial search is supported but only via API using two new parameters: `geo_point` and `geo_radius`.

A Solr schema update is required.
8 changes: 4 additions & 4 deletions doc/sphinx-guides/source/_static/api/ddi_dataset.xml
Original file line number Diff line number Diff line change
Expand Up @@ -88,12 +88,12 @@
<geoBndBox>
<westBL>10</westBL>
<eastBL>20</eastBL>
<northBL>30</northBL>
<southBL>40</southBL>
<northBL>40</northBL>
<southBL>30</southBL>
</geoBndBox>
<geoBndBox>
<southBL>80</southBL>
<northBL>70</northBL>
<southBL>70</southBL>
<northBL>80</northBL>
<eastBL>60</eastBL>
<westBL>50</westBL>
</geoBndBox>
Expand Down
2 changes: 2 additions & 0 deletions doc/sphinx-guides/source/api/search.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ show_relevance boolean Whether or not to show details of which fields were ma
show_facets boolean Whether or not to show facets that can be operated on by the "fq" parameter. False by default. See :ref:`advanced search example <advancedsearch-example>`.
fq string A filter query on the search term. Multiple "fq" parameters can be used. See :ref:`advanced search example <advancedsearch-example>`.
show_entity_ids boolean Whether or not to show the database IDs of the search results (for developer use).
geo_point string Latitude and longitude in the form ``geo_point=42.3,-71.1``. You must supply ``geo_radius`` as well. See also :ref:`geospatial-search`.
geo_radius string Radial distance in kilometers from ``geo_point`` (which must be supplied as well) such as ``geo_radius=1.5``.
metadata_fields string Includes the requested fields for each dataset in the response. Multiple "metadata_fields" parameters can be used to include several fields. The value must be in the form "{metadata_block_name}:{field_name}" to include a specific field from a metadata block (see :ref:`example <dynamic-citation-some>`) or "{metadata_field_set_name}:\*" to include all the fields for a metadata block (see :ref:`example <dynamic-citation-all>`). "{field_name}" cannot be a subfield of a compound field. If "{field_name}" is a compound field, all subfields are included.
=============== ======= ===========

Expand Down
7 changes: 7 additions & 0 deletions doc/sphinx-guides/source/user/find-use-data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,13 @@ enter search terms for Dataverse collections, dataset metadata (citation and dom
metadata. If you are searching for tabular data files you can also search at the variable level for name and label. To find
out more about what each field searches, hover over the field name for a detailed description of the field.

.. _geospatial-search:

Geospatial Search
-----------------

Geospatial search is available from the :doc:`/api/search` (look for "geo" parameters). The metadata fields that are geospatially indexed are "West Longitude", "East Longitude", "North Latitude", and "South Latitude" from the "Geographic Bounding Box" field in the "Geospatial Metadata" block.

Browsing a Dataverse Installation
---------------------------------

Expand Down
27 changes: 26 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/api/Search.java
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ public Response search(
@QueryParam("show_my_data") boolean showMyData,
@QueryParam("query_entities") boolean queryEntities,
@QueryParam("metadata_fields") List<String> metadataFields,
@QueryParam("geo_point") String geoPointRequested,
@QueryParam("geo_radius") String geoRadiusRequested,
@Context HttpServletResponse response
) {

Expand All @@ -87,6 +89,8 @@ public Response search(
// sanity checking on user-supplied arguments
SortBy sortBy;
int numResultsPerPage;
String geoPoint;
String geoRadius;
List<Dataverse> dataverseSubtrees = new ArrayList<>();

try {
Expand Down Expand Up @@ -119,6 +123,17 @@ public Response search(
throw new IOException("Filter is empty, which should never happen, as this allows unfettered searching of our index");
}

geoPoint = getGeoPoint(geoPointRequested);
geoRadius = getGeoRadius(geoRadiusRequested);

if (geoPoint != null && geoRadius == null) {
return error(Response.Status.BAD_REQUEST, "If you supply geo_point you must also supply geo_radius.");
}

if (geoRadius != null && geoPoint == null) {
return error(Response.Status.BAD_REQUEST, "If you supply geo_radius you must also supply geo_point.");
}

} catch (Exception ex) {
return error(Response.Status.BAD_REQUEST, ex.getLocalizedMessage());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to catch a nonnumeric exception here and give more feedback?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, getGeoPoint and getGeoRadius will already throw exceptions with feedback about non-numeric values.

For example, if the user does this:

/api/search?q=*&geo_point=41.9580775,-70.6621063&geo_radius=junk

They'll get this error:

{"status":"ERROR","message":"Non-number radius supplied."}

That said, I may not be following the use cases, the concerns. I'm happy to add more tests for these.

Expand All @@ -137,7 +152,9 @@ public Response search(
paginationStart,
dataRelatedToMe,
numResultsPerPage,
true //SEK get query entities always for search API additional Dataset Information 6300 12/6/2019
true, //SEK get query entities always for search API additional Dataset Information 6300 12/6/2019
geoPoint,
geoRadius
);
} catch (SearchException ex) {
Throwable cause = ex;
Expand Down Expand Up @@ -340,4 +357,12 @@ private Dataverse getSubtree(String alias) throws Exception {
}
}

private String getGeoPoint(String geoPointRequested) {
return SearchUtil.getGeoPoint(geoPointRequested);
}

private String getGeoRadius(String geoRadiusRequested) {
return SearchUtil.getGeoRadius(geoRadiusRequested);
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import edu.harvard.iq.dataverse.DataFileTag;
import edu.harvard.iq.dataverse.Dataset;
import edu.harvard.iq.dataverse.DatasetField;
import edu.harvard.iq.dataverse.DatasetFieldCompoundValue;
import edu.harvard.iq.dataverse.DatasetFieldConstant;
import edu.harvard.iq.dataverse.DatasetFieldServiceBean;
import edu.harvard.iq.dataverse.DatasetFieldType;
Expand Down Expand Up @@ -37,6 +38,7 @@
import java.io.IOException;
import java.io.InputStream;
import java.sql.Timestamp;
import java.text.NumberFormat;
import java.text.SimpleDateFormat;
import java.time.LocalDate;
import java.util.ArrayList;
Expand Down Expand Up @@ -947,6 +949,70 @@ public SolrInputDocuments toSolrDocs(IndexableDataset indexableDataset, Set<Long
}
}
}

//ToDo - define a geom/bbox type solr field and find those instead of just this one
if(dsfType.getName().equals(DatasetFieldConstant.geographicBoundingBox)) {
String minWestLon=null;
String maxEastLon=null;
String maxNorthLat=null;
String minSouthLat=null;
for (DatasetFieldCompoundValue compoundValue : dsf.getDatasetFieldCompoundValues()) {
String westLon=null;
String eastLon=null;
String northLat=null;
String southLat=null;
for(DatasetField childDsf: compoundValue.getChildDatasetFields()) {
switch (childDsf.getDatasetFieldType().getName()) {
case DatasetFieldConstant.westLongitude:
westLon = childDsf.getRawValue();
break;
case DatasetFieldConstant.eastLongitude:
eastLon = childDsf.getRawValue();
break;
case DatasetFieldConstant.northLatitude:
northLat = childDsf.getRawValue();
break;
case DatasetFieldConstant.southLatitude:
southLat = childDsf.getRawValue();
break;
}
}
if ((eastLon != null || westLon != null) && (northLat != null || southLat != null)) {
// we have a point or a box, so proceed
if (eastLon == null) {
eastLon = westLon;
} else if (westLon == null) {
westLon = eastLon;
}
if (northLat == null) {
northLat = southLat;
} else if (southLat == null) {
southLat = northLat;
}
//Find the overall bounding box that includes all bounding boxes
if(minWestLon==null || Float.parseFloat(minWestLon) > Float.parseFloat(westLon)) {
minWestLon=westLon;
}
if(maxEastLon==null || Float.parseFloat(maxEastLon) < Float.parseFloat(eastLon)) {
maxEastLon=eastLon;
}
if(minSouthLat==null || Float.parseFloat(minSouthLat) > Float.parseFloat(southLat)) {
minSouthLat=southLat;
}
if(maxNorthLat==null || Float.parseFloat(maxNorthLat) < Float.parseFloat(northLat)) {
maxNorthLat=northLat;
}
//W, E, N, S
solrInputDocument.addField(SearchFields.GEOLOCATION, "ENVELOPE(" + westLon + "," + eastLon + "," + northLat + "," + southLat + ")");
}
}
//Only one bbox per dataset
//W, E, N, S
if ((minWestLon != null || maxEastLon != null) && (maxNorthLat != null || minSouthLat != null)) {
solrInputDocument.addField(SearchFields.BOUNDING_BOX, "ENVELOPE(" + minWestLon + "," + maxEastLon + "," + maxNorthLat + "," + minSouthLat + ")");
}

}
}

for(String metadataBlockName : metadataBlocksWithValue) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewatched Jim's recent "Experimenting with Geospatial indexing" talk ( https://osf.io/84pnw ) and it reminded me that I asked for an test to be added to SearchIT. I still want this and I'm happy to try to add it, if it make sense for me to jump in the code. 😄

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -268,4 +268,9 @@ more targeted results for just datasets. The format is YYYY (i.e.
public static final String FULL_TEXT = "_text_";
public static final String EMBARGO_END_DATE = "embargoEndDate";

// SpatialRecursivePrefixTreeFieldType: https://solr.apache.org/guide/8_11/spatial-search.html#rpt
public static final String GEOLOCATION = "geolocation";
// BBoxField (bounding box): https://solr.apache.org/guide/8_11/spatial-search.html#bboxfield
public static final String BOUNDING_BOX = "boundingBox";

}
Original file line number Diff line number Diff line change
Expand Up @@ -355,15 +355,15 @@ The real issue here (https://github.com/IQSS/dataverse/issues/7304) is caused
DataverseRequest dataverseRequest = new DataverseRequest(session.getUser(), httpServletRequest);
List<Dataverse> dataverses = new ArrayList<>();
dataverses.add(dataverse);
solrQueryResponse = searchService.search(dataverseRequest, dataverses, queryToPassToSolr, filterQueriesFinal, sortField, sortOrder.toString(), paginationStart, onlyDataRelatedToMe, numRows, false);
solrQueryResponse = searchService.search(dataverseRequest, dataverses, queryToPassToSolr, filterQueriesFinal, sortField, sortOrder.toString(), paginationStart, onlyDataRelatedToMe, numRows, false, null, null);
if (solrQueryResponse.hasError()){
logger.info(solrQueryResponse.getError());
setSolrErrorEncountered(true);
}
// This 2nd search() is for populating the "type" ("dataverse", "dataset", "file") facets: -- L.A.
// (why exactly do we need it, again?)
// To get the counts we display in the types facets particulary for unselected types - SEK 08/25/2021
solrQueryResponseAllTypes = searchService.search(dataverseRequest, dataverses, queryToPassToSolr, filterQueriesFinalAllTypes, sortField, sortOrder.toString(), paginationStart, onlyDataRelatedToMe, numRows, false);
solrQueryResponseAllTypes = searchService.search(dataverseRequest, dataverses, queryToPassToSolr, filterQueriesFinalAllTypes, sortField, sortOrder.toString(), paginationStart, onlyDataRelatedToMe, numRows, false, null, null);
if (solrQueryResponse.hasError()){
logger.info(solrQueryResponse.getError());
setSolrErrorEncountered(true);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ public class SearchServiceBean {
* @throws SearchException
*/
public SolrQueryResponse search(DataverseRequest dataverseRequest, List<Dataverse> dataverses, String query, List<String> filterQueries, String sortField, String sortOrder, int paginationStart, boolean onlyDatatRelatedToMe, int numResultsPerPage) throws SearchException {
return search(dataverseRequest, dataverses, query, filterQueries, sortField, sortOrder, paginationStart, onlyDatatRelatedToMe, numResultsPerPage, true);
return search(dataverseRequest, dataverses, query, filterQueries, sortField, sortOrder, paginationStart, onlyDatatRelatedToMe, numResultsPerPage, true, null, null);
}

/**
Expand All @@ -121,10 +121,24 @@ public SolrQueryResponse search(DataverseRequest dataverseRequest, List<Datavers
* @param onlyDatatRelatedToMe
* @param numResultsPerPage
* @param retrieveEntities - look up dvobject entities with .find() (potentially expensive!)
* @param geoPoint e.g. "35,15"
* @param geoRadius e.g. "5"
* @return
* @throws SearchException
*/
public SolrQueryResponse search(DataverseRequest dataverseRequest, List<Dataverse> dataverses, String query, List<String> filterQueries, String sortField, String sortOrder, int paginationStart, boolean onlyDatatRelatedToMe, int numResultsPerPage, boolean retrieveEntities) throws SearchException {
public SolrQueryResponse search(
DataverseRequest dataverseRequest,
List<Dataverse> dataverses,
String query,
List<String> filterQueries,
String sortField, String sortOrder,
int paginationStart,
boolean onlyDatatRelatedToMe,
int numResultsPerPage,
boolean retrieveEntities,
String geoPoint,
String geoRadius
) throws SearchException {

if (paginationStart < 0) {
throw new IllegalArgumentException("paginationStart must be 0 or greater");
Expand Down Expand Up @@ -204,8 +218,12 @@ public SolrQueryResponse search(DataverseRequest dataverseRequest, List<Datavers
for (String filterQuery : filterQueries) {
solrQuery.addFilterQuery(filterQuery);
}


if (geoPoint != null && !geoPoint.isBlank() && geoRadius != null && !geoRadius.isBlank()) {
solrQuery.setParam("pt", geoPoint);
solrQuery.setParam("d", geoRadius);
// See https://solr.apache.org/guide/8_11/spatial-search.html#bbox
solrQuery.addFilterQuery("{!bbox sfield=" + SearchFields.GEOLOCATION + "}");
}

// -----------------------------------
// Facets to Retrieve
Expand Down
46 changes: 45 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/search/SearchUtil.java
Original file line number Diff line number Diff line change
Expand Up @@ -181,5 +181,49 @@ public static String constructQuery(List<String> queryStrings, boolean isAnd, bo

return queryBuilder.toString().trim();
}


/**
* @return Null if supplied point is null or whitespace.
* @throws IllegalArgumentException If the lat/long is not separated by a
* comma.
* @throws NumberFormatException If the lat/long values are not numbers.
*/
public static String getGeoPoint(String userSuppliedGeoPoint) throws IllegalArgumentException, NumberFormatException {
if (userSuppliedGeoPoint == null || userSuppliedGeoPoint.isBlank()) {
return null;
}
String[] parts = userSuppliedGeoPoint.split(",");
// We'll supply our own errors but Solr gives a decent one:
// "Point must be in 'lat, lon' or 'x y' format: 42.3;-71.1"
if (parts.length != 2) {
String msg = "Must contain a single comma to separate latitude and longitude.";
throw new IllegalArgumentException(msg);
}
float latitude = Float.parseFloat(parts[0]);
float longitude = Float.parseFloat(parts[1]);
return latitude + "," + longitude;
}

/**
* @return Null if supplied radius is null or whitespace.
* @throws NumberFormatException If the radius is not a positive number.
*/
public static String getGeoRadius(String userSuppliedGeoRadius) throws NumberFormatException {
if (userSuppliedGeoRadius == null || userSuppliedGeoRadius.isBlank()) {
return null;
}
float radius = 0;
try {
radius = Float.parseFloat(userSuppliedGeoRadius);
} catch (NumberFormatException ex) {
String msg = "Non-number radius supplied.";
throw new NumberFormatException(msg);
}
if (radius <= 0) {
String msg = "The supplied radius must be greater than zero.";
throw new NumberFormatException(msg);
}
return userSuppliedGeoRadius;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,9 @@ private SolrQueryResponse findHits(SavedSearch savedSearch) throws SearchExcepti
paginationStart,
dataRelatedToMe,
numResultsPerPage,
false // do not retrieve entities
false, // do not retrieve entities
null,
null
);
return solrQueryResponse;
}
Expand Down
Loading