Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

outlierForLayer field provides data for only one layer #31

Open
shawnlaffan opened this issue Mar 31, 2018 · 9 comments
Open

outlierForLayer field provides data for only one layer #31

shawnlaffan opened this issue Mar 31, 2018 · 9 comments
Labels

Comments

@shawnlaffan
Copy link

This is related to #27.

As an example, the online search for Acacia cangaiensis produces one record that is flagged as an outlier for three layers, Bio15, Bio17 and Bio26.

https://biocache.ala.org.au/occurrences/d97cd2e1-c871-4be5-bd50-2b963f210902

However, the data downloaded via ALA4R give only one layer, el882, which corresponds to Bio15.

Can more information be packed into this field? Or a new field be provided? A comma separated list should work well enough to state which layers a record is an outlier for.

Code to reproduce is below.

Thanks,
Shawn.

library(ALA4R)

search_term = "Acacia cangaiensis"
wkt_text = "POLYGON((154 -43.74,154 -9,112.9 -9,112.9 -43.74,154 -43.74))"

ala = occurrences(taxon=search_term, wkt=wkt_text, download_reason_id=7)
ala$data = ala$data[!(is.na(ala$data$longitude) | is.na(ala$data$latitude)),]
ala$data[ala$data$id == 'd97cd2e1-c871-4be5-bd50-2b963f210902', 'outlierForLayer']
@nickdos
Copy link

nickdos commented Apr 4, 2018

Hi @shawnlaffan,

Can more information be packed into this field? Or a new field be provided? A comma separated list should work well enough to state which layers a record is an outlier for.

Are you wanting to get the data for the other outlier layers as separate columns or are you simply needing more information about the el882 layer included in that field?

Also, is it a case that the outlier for layer X assertion data NOT coming through in the download, and this would be sufficient?

A user story or use case would be helpful to frame the request, as well.

@shawnlaffan
Copy link
Author

Hi @nickdos,

My use case is to identify records that are outliers for two or more env layers. Many of the records that are single layer outliers seem to be OK for my purposes (admittedly that's not based on rigorous testing, though).

I had a look at the API pages, and the problem might be at the API level where the table is generated since a direct check also gives only one layer. Of course, now I cannot reproduce that since I forget which search I used. Perhaps it is the csv generation component.

In any case, direct json access contains the three outlier layers. Snippet from https://biocache.ala.org.au/ws/occurrence/d97cd2e1-c871-4be5-bd50-2b963f210902 :

processed |  
-- | --
rowKey | "dr376\|MEL\|MEL0618363A"
uuid | "d97cd2e1-c871-4be5-bd50-2b963f210902"
occurrence |  
basisOfRecord | "PreservedSpecimen"
modified | "2000-12-08"
occurrenceStatus | "present"
recordedBy | "Beauglehole, A.C."
outlierForLayers |  
0 | "el882"
1 | "el889"
2 | "el894"

In terms of packing the info into the existing structures in ALA4R, multiple columns would work, but would get unwieldy pretty quickly, hence packing them into a single entry might be good, e.g. "el882;el883;el887". A space or semicolon would actually be a better separator than a comma, as otherwise csv parsing libs come into play.

Hopefully that helps explain things a bit more.

Shawn.

@shawnlaffan
Copy link
Author

Just an update.

This is the record returned via the ALA4R::occurrences() call. The outlierForLayer field lists el882, but el889 and el894 are not listed.

"","id","catalogNumber","matchTaxonConceptLsid","scientificNameOriginal","commonName","scientificName","rank","kingdom","phylum","class","order","family","genus","species","subspecies","institutionCode","collectionCode","locality","latitudeOriginal","longitudeOriginal","geodeticDatum","latitude","longitude","coordinateUncertaintyInMetres","country","IBRA7Regions","IMCRA4Regions","state","localGovernmentAreas","minimumElevationInMetres","maximumElevationInMetres","minimumDepthInMeters","maximumDepthInMeters","collector","year","month","eventDate","basisOfRecordOriginal","basisOfRecord","sex","outlierForLayer","taxonIdentificationIssue","locationQuality","altitudeNonNumeric","assumedPresentOccurrenceStatus","badlyFormedBasisOfRecord","coordinatePrecisionMismatch","dataAreGeneralised","decimalLatLongConverted","firstOfMonth","firstOfYear","geodeticDatumAssumedWgs84","incompleteCollectionDate","inferredDuplicateRecord","invalidCollectionDate","occCultivatedEscapee","uncertaintyRangeMismatch","unrecognisedCollectionCode","unrecognisedInstitutionCode","unrecognisedOccurrenceStatus","unrecognizedGeodeticDatum"
"30","d97cd2e1-c871-4be5-bd50-2b963f210902","MEL 0618363A","http://id.biodiversity.org.au/node/apni/2894960","Acacia cangaiensis Tindale & Kodela","","Acacia cangaiensis","species","Plantae","Charophyta","Equisetopsida","Fabales","Fabaceae","Acacia","Acacia cangaiensis","","MEL","MEL","Wannon River Falls Reserve, 19 km WNW of Hamilton Post Office.",-37.6667,141.8333,"",-37.6667,141.8333,10000,"Australia","Victorian Midlands","","Victoria","Southern Grampians (S)",NA,NA,"","","Beauglehole, A.C.",1978,2,"1978-02-06","PreservedSpecimen","PreservedSpecimen","","el882","noIssue",TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE

@nickdos
Copy link

nickdos commented Apr 4, 2018

Thanks @shawnlaffan. Looks like a bug (or feature) where the SOLR index has a multiValued field type but the download code is only grabbing the first value. I've logged an issue (linked above).

@shawnlaffan
Copy link
Author

Thanks @nickdos

@nickdos
Copy link

nickdos commented Dec 10, 2018

See latest comment on AtlasOfLivingAustralia/biocache-service#195 (comment) for a fix

@shawnlaffan
Copy link
Author

Thanks @nickdos

@nickdos
Copy link

nickdos commented Dec 10, 2018

I'm not very knowledgeable on ALA4R so I'm not sure if the fix suggested requires a code fix in ALA4R or not. @peggynewman any ideas?

@peggynewman
Copy link
Contributor

Yes @nickdos @shawnlaffan it's an ALA4R code fix. I'll label this a bug so it can go through in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants