Skip to content
Permalink
Browse files

Improving Quality Control notebook

  • Loading branch information...
castelao committed Aug 31, 2019
1 parent 1e6b71f commit cdcc4f20f01b9ced0f5b97b0c31d317e179c4bf9
Showing with 92 additions and 56 deletions.
  1. +92 −56 docs/notebooks/QualityControl.ipynb
@@ -4,34 +4,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Quality Control CTD data with PySeabird"
"# Quality Control CTD data with PySeabird\n",
"### Author: Guilherme Castelão"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Author: Guilherme Castelão"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"pySeabird is a package to parse/load CTD data files. It should be an easy task but the problem is that the format have been changing along the time. Work with multiple ships/cruises data requires first to understand each file, to normalize it into a common format for only than start your analysis. That can still be done with few general regular expression rules, but I would rather use strict rules. If I'm loading hundreds or thousands of profiles, I want to be sure that no mistake passed by. I rather ignore a file in doubt and warn it, than belive that it was loaded right and be part of my analysis.\n",
"This is a minimalist example on how to use the Python Seabird package to read and apply a quality control in a CTD output file. For more details, please check the [documentation](https://seabird.readthedocs.io/en/latest/).\n",
"\n",
"With that in mind, I wrote this package with the ability to load multiple rules, so new rules can be added without change the main engine.\n",
"### Requirements\n",
"\n",
"For more information, check the documentatio"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To run this notebook you will need the packages seabird, cotede, and supportdata. To install those you can run in the terminal\n",
"This notebook requires the packages seabird, supportdata, and cotede. You can install those using pip as following:\n",
"\n",
"pip install seabird supportdata cotede"
"```shell\n",
"pip install seabird[QC]\n",
"```"
]
},
{
@@ -123,13 +112,20 @@
"print(\"Data: %s\" % profile.keys())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's apply the quality control procedure recommended by GTSPP"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"profile = fProfileQC('dPIRX003.cnv')"
"profile = fProfileQC('dPIRX003.cnv', cfg='gtspp')"
]
},
{
@@ -163,7 +159,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's check which tests were performed, hence which flags are available"
"Let's check which tests were performed, hence which flags are available, on the the primary temperature sensor"
]
},
{
@@ -174,7 +170,7 @@
{
"data": {
"text/plain": [
"dict_keys(['valid_datetime', 'location_at_sea', 'global_range', 'profile_envelop', 'gradient', 'gradient_depthconditional', 'spike', 'spike_depthconditional', 'stuck_value', 'tukey53H_norm', 'digit_roll_over', 'woa_normbias', 'cars_normbias', 'rate_of_change', 'cum_rate_of_change', 'anomaly_detection', 'overall'])"
"dict_keys(['valid_datetime', 'location_at_sea', 'global_range', 'profile_envelop', 'gradient', 'spike', 'woa_normbias', 'overall'])"
]
},
"execution_count": 8,
@@ -190,7 +186,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The flagging standard is described in the manual. The one used here is 0 for no QC performed and 1 approved data"
"The flagging standard is described in [CoTeDe's manual](https://cotede.readthedocs.io/en/latest/) . The one used here is 0 for no QC performed, 1 for approved data, and 9 for missing data.\n",
"\n",
"Note that the overall flag is the combined result from all tested flags. In the example above it considers the other 7 flags and takes the highest value, therefore, if the overall is equal to 1 means that all possible tests approved that measurement, while a value of 4 means that at least one tests suggests its a bad measurement."
]
},
{
@@ -201,7 +199,7 @@
{
"data": {
"text/plain": [
"array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,\n",
"array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 9, 0, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
@@ -215,8 +213,8 @@
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0,\n",
" 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 9, 0, 9, 0, 9, 9, 0, 9, 9,\n",
" 9, 0, 9, 9, 9, 9, 9, 0, 9, 9, 9, 9, 9, 9, 0, 9, 9, 9, 9],\n",
" dtype=int8)"
]
},
@@ -226,47 +224,21 @@
}
],
"source": [
"profile.flags['TEMP']['anomaly_detection']"
"profile.flags['TEMP']['spike']"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Header: dict_keys(['sbe_model', 'seasave', 'instrument_type', 'nquan', 'nvalues', 'start_time', 'bad_flag', 'file_type', 'md5', 'datetime', 'LATITUDE', 'LONGITUDE', 'filename'])\n",
"Data: ['timeS', 'PRES', 'TEMP', 'TEMP2', 'CNDC', 'CNDC2', 'potemperature', 'potemperature2', 'PSAL', 'PSAL2', 'flag']\n"
]
}
],
"source": [
"print(\"Header: %s\" % profile.attributes.keys())\n",
"print(\"Data: %s\" % profile.keys())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have latitude in the header, and pressure in the data."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"idx = profile.flags['TEMP']['overall'] < 3"
"idx = profile.flags['TEMP']['overall'] <= 2"
]
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 11,
"metadata": {},
"outputs": [
{
@@ -275,7 +247,7 @@
"Text(0.5, 1.0, 'dPIRX003.cnv')"
]
},
"execution_count": 12,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
},
@@ -305,6 +277,70 @@
"plt.title(profile.attributes['filename'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Other pre defined quality control procedures are available, please check [CoTeDe's manual](https://cotede.readthedocs.io/en/latest/) to learn the details of the tests and what is available. For instance, to apply the EuroGOOS recommendations change the cfg argument"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['valid_datetime', 'location_at_sea', 'global_range', 'gradient_depthconditional', 'spike_depthconditional', 'digit_roll_over', 'woa_normbias', 'overall'])"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"profile = fProfileQC('dPIRX003.cnv', cfg='eurogoos')\n",
"profile.flags['TEMP'].keys()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If not defined, the default configuration is a collection of tests resulted for our work on [IQuOD](http://www.iquod.org/), and is equivalent to define `cfg='cotede'`."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Deprecated cfg format. It should contain a threshold item.\n",
"Deprecated cfg format. It should contain a threshold item.\n"
]
},
{
"data": {
"text/plain": [
"dict_keys(['valid_datetime', 'location_at_sea', 'global_range', 'profile_envelop', 'gradient', 'gradient_depthconditional', 'spike', 'spike_depthconditional', 'stuck_value', 'tukey53H_norm', 'digit_roll_over', 'woa_normbias', 'cars_normbias', 'rate_of_change', 'cum_rate_of_change', 'anomaly_detection', 'overall'])"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"profile = fProfileQC('dPIRX003.cnv')\n",
"profile.flags['TEMP'].keys()"
]
},
{
"cell_type": "code",
"execution_count": null,

0 comments on commit cdcc4f2

Please sign in to comment.
You can’t perform that action at this time.