Skip to content

Commit

Permalink
Parallelisation framework updated (#84)
Browse files Browse the repository at this point in the history
* Update docs with variables and units

* Fix warning

* Update build

* Update README

* Update build

* Update build

* Update build

* Use repurpose parallel framework

* Use new repurpose parallel framework based on joblib

Improved logging for metadata collection

* Enable daily builds

* Fix docs typos

* Fix pandas deprecation warnining from parse_dates

* Add repurpose dependency

* Add method to compute sensor data coverage to Sensor class

* Update readme

* Update CI build
  • Loading branch information
wpreimes committed May 6, 2024
1 parent ee244e2 commit 89b13e4
Show file tree
Hide file tree
Showing 11 changed files with 198 additions and 122 deletions.
14 changes: 9 additions & 5 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ on:
push:
pull_request:
workflow_dispatch:
#schedule:
# - cron: '0 0 * * *' # daily
schedule:
- cron: '0 0 * * *' # nightly build

jobs:
build:
Expand Down Expand Up @@ -85,9 +85,9 @@ jobs:
fi
ls .artifacts/dist
- name: Upload Artifacts
uses: actions/upload-artifact@v2
uses: actions/upload-artifact@v4
with:
name: Artifacts
name: Artifacts-py${{ matrix.python-version }}-${{ matrix.os }}
path: .artifacts/*
coveralls:
name: Submit Coveralls 👚
Expand All @@ -111,7 +111,11 @@ jobs:
echo "GITHUB_REF = $GITHUB_REF"
echo "GITHUB_REPOSITORY = $GITHUB_REPOSITORY"
- name: Download Artifacts
uses: actions/download-artifact@v2
uses: actions/download-artifact@v4
with:
path: Artifacts
pattern: Artifacts-*
merge-multiple: true
- name: Display downloaded files
run: ls -aR
- name: Upload to PyPI
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,14 @@ Changelog
Unreleased changes in master branch
===================================

-

Version 1.4.1
=============

- Fixed a bug where some ISMN files could not be parsed correctly due to empty spaces in the sensor name (`Issue #81 <https://github.com/TUW-GEO/ismn/issues/81>`_)
- Parallel metadata collection now uses the repurpose package wrapper around joblib
- Logging was improved for metadata collection

Version 1.4.0
=============
Expand Down
6 changes: 3 additions & 3 deletions docs/examples/interface.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"source": [
"# Reading ISMN data (`ismn.interface.ISMN_Interface`)\n",
"This example shows the basic functionality to read data downloaded from the International Soil Moisture Network (ISMN).\n",
"The data can be selected and downloaded for free from <http://ismn.earth> after registration."
"Data for your study area can be selected and downloaded for free from <http://ismn.earth> after registration."
]
},
{
Expand All @@ -17,7 +17,7 @@
"\n",
"<img src=\"ismn.png\" style=\"height: 450px;\"/>\n",
"\n",
"ISMN files are downloaded as a compressed `.zip` file after selecting the data from the website. You can extract it (with any zip software) locally into one (root) folder (in this case 'Data_separate_files'). The will be organised like this:\n",
"ISMN files are downloaded as a compressed `.zip` file after selecting the data from the website. You can extract it (with any zip software) locally into one (root) folder (in this case 'Data_separate_files'). The archive will be organised like this:\n",
"```shell\n",
"Data_separate_files/\n",
"├── network/\n",
Expand Down Expand Up @@ -2009,4 +2009,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ dependencies:
- tqdm
- click
- more_itertools
- repurpose
- sphinx
- nbsphinx
- sphinx_rtd_theme
Expand Down
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ install_requires =
configparser
more_itertools
tqdm
repurpose
# The usage of test_requires is discouraged, see `Dependency Management` docs
# tests_require = pytest; pytest-cov
# Require a specific Python version, e.g. Python 2.7 or >= 3.4
Expand Down
70 changes: 54 additions & 16 deletions src/ismn/components.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,28 +20,20 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import os.path

import sys
from pygeogrids import BasicGrid
from typing import Union

import numpy as np
import warnings
import logging
from collections import OrderedDict
import pandas as pd

from ismn.meta import MetaData, Depth
from ismn.const import deprecated, CITATIONS
from ismn.const import deprecated, CITATIONS, ismnlog

import json

logger = logging.getLogger(__name__)

ch = logging.StreamHandler()
ch.setLevel(logging.INFO)
formatter = logging.Formatter("%(levelname)s - %(asctime)s: %(message)s")
ch.setFormatter(formatter)
logger.addHandler(ch)


class IsmnComponent:
pass
Expand Down Expand Up @@ -121,6 +113,52 @@ def metadata(self) -> MetaData:
def data(self):
return self.read_data()

def get_coverage(self, only_good=True, start=None, end=None,
freq='1h'):
"""
Estimate the temporal coverage of this sensor, i.e. the percentage
of valid observations in the sensor time series.
Returns
-------
only_good: bool, optional (default: True)
Only consider values where the ISMN quality flag is 'G'
as valid observations
start: str or datetime, optional (default: None)
Beginning of the period in which measurements are expected.
If None, the start of the time series is used.
end: str or datetime, optional (default: None)
End of the period in which measurements are expected.
If None, the start of the time series is used.
freq: str, optional (default: '1h')
Frequency at which the sensor is expected to take measurements.
Most sensors in ISMN provide hourly measurements (default).
If a different frequency is used, it must be on that
:func:`pd.date_range` can interpret.
Returns
-------
perc_coverage : float
Data coverage of the sensor at the chosen expected measurement
frequency within the chosen period. 0=No data, 100=no data gaps
"""
data = self.read_data()
if start is None:
start = pd.Timestamp(data.index.values[0]).to_pydatetime()
else:
start = pd.to_datetime(start)
if end is None:
end = pd.Timestamp(data.index.values[-1]).to_pydatetime()
else:
end = pd.to_datetime(end)

if only_good:
data = data[data[f"{self.variable}_flag"] == 'G'].loc[:, self.variable]

cov = (len(data.values) / len(pd.date_range(start, end, freq=freq))) * 100

return cov

def read_data(self):
"""
Load data from filehandler for this Sensor by calling
Expand All @@ -133,7 +171,7 @@ def read_data(self):
(if it was loaded and kept before).
"""
if self.filehandler is None:
logging.warning(f"No filehandler found for sensor {self.name}")
ismnlog.warning(f"No filehandler found for sensor {self.name}")
else:
if self._data is None:
data = self.filehandler.read_data()
Expand Down Expand Up @@ -413,7 +451,7 @@ def add_sensor(
keep_loaded_data,
)
else:
logger.warning(f"Sensor already exists: {name}")
ismnlog.warning(f"Sensor already exists: {name}")

def remove_sensor(self, name):
"""
Expand All @@ -427,7 +465,7 @@ def remove_sensor(self, name):
if name in self.sensors:
del self.sensors[name]
else:
logger.warning(f"Sensor not found: {name}")
ismnlog.warning(f"Sensor not found: {name}")

def iter_sensors(self, **filter_kwargs):
"""
Expand Down Expand Up @@ -564,7 +602,7 @@ def add_station(self, name, lon, lat, elev):
if name not in self.stations:
self.stations[name] = Station(name, lon, lat, elev)
else:
logger.warning(f"Station already exists: {name}")
ismnlog.warning(f"Station already exists: {name}")

def remove_station(self, name):
"""
Expand All @@ -578,7 +616,7 @@ def remove_station(self, name):
if name in self.stations:
del self.stations[name]
else:
logger.warning(f"Station not found {name}")
ismnlog.warning(f"Station not found {name}")

def iter_stations(self, **filter_kwargs):
"""
Expand Down
11 changes: 9 additions & 2 deletions src/ismn/const.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,21 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

import numpy as np
import sys
from collections import OrderedDict
import functools
import warnings
import os
import pandas as pd
import logging
import numpy as np

ismnlog = logging.getLogger('ismn')
ch = logging.StreamHandler()
ch.setLevel(logging.INFO)
formatter = logging.Formatter("%(levelname)s - %(asctime)s: %(message)s")
ch.setFormatter(formatter)
ismnlog.addHandler(ch)

def deprecated(func):
# mark func as deprecated (warn when used)
Expand Down Expand Up @@ -59,7 +67,6 @@ class IsmnFileError(IOError):
class DepthError(ValueError):
pass


# Note: At the moment citations are stored in this package, keep them updated.
# Once the full list of citations is provided together with the downloaded
# ISMN data (in the README file), citations here can be deleted (keep in mind
Expand Down
Loading

0 comments on commit 89b13e4

Please sign in to comment.