Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Submission: HakaiCTDResearch #8

Closed
12 tasks done
JessyBarrette opened this issue Feb 4, 2021 · 23 comments
Closed
12 tasks done

Dataset Submission: HakaiCTDResearch #8

JessyBarrette opened this issue Feb 4, 2021 · 23 comments
Assignees

Comments

@JessyBarrette
Copy link
Collaborator

JessyBarrette commented Feb 4, 2021

Hakai Dataset Submission

Below are listed all the different steps related to the initial submission of a dataset.

A more detailed written and visual description of every step is available respectively
here and here.

Submission steps

Initial Submission (Data Administrator)

  • Original Data Submission
  • CIOOS Metadata Form completed

ERDDAP Dataset Creation (Data Integrator)

  • Dataset Transformation (Format label)
    • 🟢 Format Compatible
    • 🟡 Format Minor Revisions
    • 🟠 Format Major Revisions
    • 🔴 Format Incompatible/Missing Information
  • Near Real-time Data Integration
  • QARTOD Integration
  • ERDDAP Integration
  • ERDDAP Dataset Documentation
  • ERDDAP Test Locally
  • Add Dataset to Development Branch

Dataset Review (Data Administrator)

  • Dataset Development Branch Revision (Reviewer Label)
    • 🟢 Reviewer Approved
    • 🟡 Reviewer Minor Revisions
    • 🟠 Reviewer Major Revisions

Dataset Completion (Data Integrator)

  • Merge Development Dataset to Production Branch
  • COMPLETED
@raytula
Copy link
Contributor

raytula commented Feb 16, 2021

Hi @finnshort
Today @JessyBarrette @jenjax2 @trollpete and myself were reviewing the steps needed to complete manual QC of Hakai CTD casts. It is (still?) appropriate for Jen to download a cast list from the EIMS and to set the QC column for each measurement in each cast to 'AV', 'SVD', etc and then upload the updated file? Also, what is the best way for Jen to add a comment for the overall cast? (ie. can/should Jen update the Comment column to include her personal comment? If so, what happens with any existing comment that may have been entered in the field)
Thanks, Ray

@finnshort
Copy link

Hi @raytula, the QC feature is still live/working- I just gave it a once over on the development server and I don't see anything that's changed in the CTD schema in the past couple years that has affected it. I do recommend checking the results to make sure they are coming through as expected (as with any data changes).

I know @fostermh and @jodiew have been working on implementing changes to the CTD flags so maybe they can comment on whether they anticipate this affecting the QC flagging feature at all.

The comments can still be updated through the CTD QC excel sheets. If Jen deletes an existing comment and adds her own then the old one will be removed (not recommended). It would be better if she adds her own comment after the existing one so that both are saved.

Let me know if that answers your questions!

@fostermh
Copy link
Contributor

I was under the impression the old QC workflow was scraped. level 1 and 2 QC flags are being added to support the workflow of Jessy populating them automatically from a script. We have not moved on to anything else as we are waiting for a clearly defined workflow. To date, this has been a problem and I'm hesitant to endorse any development without it.

@JessyBarrette
Copy link
Collaborator Author

JessyBarrette commented Feb 22, 2021

@fostermh @finnshort @jodiew @raytula Sorry for the late reply on this. I think we have now a clearly defined workflow. Just to make things clear for everyone: Changes will be done on the database side to reflect the following items:

Database Changes

  1. The data used through all this comes from the database view 'ctd/views/file/cast/data'. I'm not sure if you want to create a new view for this project, I'll leave that up to you but for now the QC tool is using this view as an input.
  2. All the *_flag variables available within that view should be replaced by a level_1 and level_2 QC flag.
    1. Level 1: QARTOD Flag number [1,2,3,4,9] (Update: column should be called *_UQL. UQL=UNESCO Quality Level [QARTOD])
    2. Level 2: Aggregated String description of the test results from the FAIL and SUSPECT Flag (Update: likely be call *_flags, we could just keep the already existing *_flag column)
  3. A position Level 1 and position Level 2 flag should be added is associated with the different tests that are applied the latitude/longitude data versus the expected site location.
  4. A Station_Latitude and Station_longitude should be added and corresponds to the latitude/longitude of the associated site within the database.

Workflow

  1. The CTD data itself is only populated by the seabird and RBR processing tool only. Not change is done by the QCing tool on that data.
  2. The QCing tool will only affect the level_1 and level_2 flag columns.
    1. The following section of the QCing tool explains how to install the package
    2. The QCing tool can be triggered by either providing an Hakai CTD profile ID or a json string of the data to be qced, See here for more detail.
  3. UPDATE:IGNORE GREY LIST: The QCing tool retrieves manual inputs contained in the view 'eims/views/output/ctd_flags' a csv file present within the repository hakai-profile-qc
    1. A hakai_id and 'query' column should be added this view.
    2. Those will be manually populated by QC reviewer or automated QC tool to overwrite tests results from the automated QCing tool if needed.
    3. We could instead rely on a simple csv file within the QCing tool, if that option is preferred by the Hakai IT. This is what we'll do!
  4. Research Dataset: The research dataset will be generated based on the CTD QC log view ('need to find the endpoint') NetCDF files with exclusively the hakai_id and variables associated with a QARTOD flag =1 and an AV value in the CTD QC log
    • This view won't have any direct interactions with the CTD profile dataset on the database anymore.

Let me know if you have any questions!

@JessyBarrette
Copy link
Collaborator Author

@jenjax2 @finnshort mentionned that we need to have your permission to get access to the CTD qc log. This log will be use to generate Research Ready NetCDF files which will then be uploaded to the Hakai CTD Research Dataset.

@jenjax2
Copy link
Collaborator

jenjax2 commented Feb 22, 2021

@JessyBarrette @finnshort you have my permission to get access to the CTD qc log.

When you say Research Ready NetCDF files, do you mean these NETCDF files will be created after I qc the data?

@JessyBarrette
Copy link
Collaborator Author

That's right Jenn, once you have a set of data QCed, a simple script will read your QC log and generate static NetCDF files with the profiles and variables you QCed as AV and the QARTOD flags is GOOD ==1.

Those NetCDF files once uploaded to our server would never be changed (unless we want to), other variables could potentially be added though. This is probably the only way I think we can make sure that the reviewed data remain constant overtime even though the whole database as an example gets rebuild for whatever reason.

@JessyBarrette
Copy link
Collaborator Author

@raytula @jenjax2 @trollpete A first draft of the research dataset is now available on the development ERDDAP:
https://goose.hakai.org/erddap/tabledap/HakaiWaterPropertiesInstrumentProfileResearch.html

For the moment, I kept all the variables that will be available either on the provisional or research dataset. We will likely remove the flag columns from the research dataset since exclusively data flagged as GOOD is kept here. Please, review data and metadata.

Once we all agree on the variables and associated attributes, this will be carried over to the provisional dataset.

The example, regroup all the dataset and associated variables which were flagged as AV by the reviewer and associated with QARTOD Flag =1 [GOOD].

I will make a little jupyter notebook to review the conflicting results between the reviewer and QARTOD flags in the next few days.

@JessyBarrette
Copy link
Collaborator Author

I forgot to mention you can also review the generated NetCDF files which are used behind ERDDAP and accessible here:
https://goose.hakai.org/erddap/files/HakaiWaterPropertiesInstrumentProfileResearch/

The objective is to have a long-term file format that regroups all the information for each profile within a single file. This may evolve a bit over the near future, but hopefully we'll get something pretty stable soonish.

@n-a-t-e
Copy link
Member

n-a-t-e commented Mar 16, 2021

I switched this dataset from a view to a table that is recreated nightly. This is to avoid issues with rebuilds in the EIMS system, now it has no connection to EIMS tables (eg they could be dropped without affecting erddap). Since CTD data doesn't come in that quick anyways I figured only updating nightly won't bother people. See 5bcb6d9

@raytula
Copy link
Contributor

raytula commented Mar 17, 2021

The ERDDAP links @JessyBarrette provided are not currently working. Is that perhaps due to the delayed EIMS rebuild today?

@JessyBarrette
Copy link
Collaborator Author

@raytula sorry that was my bad the dataset is back online. FYI Goose ERDDAP restart every 15mins and this dataset is among the last ones to appears when the servers refresh.

@n-a-t-e
Copy link
Member

n-a-t-e commented Mar 17, 2021

@raytula sorry that was my bad the dataset is back online. FYI Goose ERDDAP restart every 15mins and this dataset is among the last ones to appears when the servers refresh.

goose ERDDAP should only restart if there are changes to datasets.xml though, checks every 15. Though on production it will just reload the one dataset

@JessyBarrette
Copy link
Collaborator Author

@jenjax2 @raytula The research CTD profile dataset is now available on goose erddap:
https://goose.hakai.org/erddap/tabledap/HakaiWaterPropertiesInstrumentProfileResearch.html

Since we're only presenting data flagged as 1 and reviewed. We may want to omit any of the flag columns.

@JessyBarrette
Copy link
Collaborator Author

This dataset is using netcdf files generated for each profiles are available here: https://goose.hakai.org/erddap/files/HakaiWaterPropertiesInstrumentProfileResearch/

Grouped by work_area/station

@raytula
Copy link
Contributor

raytula commented Mar 25, 2021

Looks good to me. Yes, no need for flag columns from the research datasets.

@raytula
Copy link
Contributor

raytula commented Mar 26, 2021

FYI. The link was working yesterday, but not today (I think)
https://goose.hakai.org/erddap/files/HakaiWaterPropertiesInstrumentProfileResearch/

Error {
    code=404;
    message="Not Found: Currently unknown datasetID=HakaiWaterPropertiesInstrumentProfileResearch";
}

@JessyBarrette
Copy link
Collaborator Author

@raytula thanks! I made a mistake last night while trying to remove the flags from the dataset.xml. Should be back on in the next 15min or so

@JessyBarrette
Copy link
Collaborator Author

@jenjax2 @raytula All station except QU39 and KC10 data was removed from the research dataset.

@raytula
Copy link
Contributor

raytula commented Mar 30, 2021

@jenjax2 @raytula All station except QU39 and KC10 data was removed from the research dataset.

Great. Thanks @JessyBarrette

@jenjax2
Copy link
Collaborator

jenjax2 commented Mar 31, 2021

Thanks @JessyBarrette !

@JessyBarrette
Copy link
Collaborator Author

This dataset is now available online. We can close that issue!

@JessyBarrette
Copy link
Collaborator Author

A DOI was added to this metadata record https://doi.org/10.21966/6cz5-6d70

This will also get added to the ERDDAP datasets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants