Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs in technical Paper Example Notebooks #39

Open
3 tasks
vhoogelander opened this issue Aug 16, 2023 · 6 comments
Open
3 tasks

Bugs in technical Paper Example Notebooks #39

vhoogelander opened this issue Aug 16, 2023 · 6 comments

Comments

@vhoogelander
Copy link

I checked the Example Notebooks, and found some bugs:

  • 1.The main problem is still the issue with generating forcing using ESMValTool (see this issue). There seems to be something wrong with the temperature data. This is the log that I get from example NB1:

        ERROR   [3482181] Failed to run preprocessor function 'save' on the data
        [<iris 'Cube' of air_temperature / (K) (time: 10592; latitude: 9; longitude: 7)>]
        loaded from original input file(s)
        [LocalFile('/mnt/data/climate-data/obs6/Tier3/ERA5/OBS6_ERA5_reanaly_1_day_tas_1990-1990.nc'),
         LocalFile('/mnt/data/climate-data/obs6/Tier3/ERA5/OBS6_ERA5_reanaly_1_day_tas_19900101-19901231.nc'),
         LocalFile('/mnt/data/climate-data/obs6/Tier3/ERA5/OBS6_ERA5_reanaly_1_day_tas_1991-1991.nc'),
         LocalFile('/mnt/data/climate-data/obs6/Tier3/ERA5/OBS6_ERA5_reanaly_1_day_tas_1992-1992.nc')]
        (and 26 further file(s) not shown here; refer to the debug log for a full list)
        with function argument(s)
        compress = False,
        filename = PosixPath('/home/vhoogeland/technicalPaperExampleNotebooks/esmvaltool_output/recipe_marrmot_20230728_122744/preproc/diagnostic_daily/tas/OBS6_ERA5_reanaly_1_day_tas_1990-2018.nc')
        ERROR   [3477466] No such comm target registered: jupyter.widget.control
        WARNING [3477466] No such comm: fc0ceb52-d861-433e-9279-70a996fbfd43
    
  • 2. When I run this in NB2:

        cfg_file, cfg_dir = model.setup(end_time=experiment_end_date)
    

And this in NB3:

      observations_df, metadata = ewatercycle.observation.grdc.get_grdc_data(
          station_id,
          start_time=experiment_start_date,
          end_time=experiment_end_date,
      )

I get an [Errno 5] Input/output error. (Is this related to the disk space?)

  • 3. In NB4, when I run this:

      reference = ewatercycle.models.PCRGlobWB(version="setters", parameter_set=experiment_parameterset)
    

I get the following error: NoSectionError: No section: 'globalOptions'.

@Peter9192
Copy link
Contributor

Hi @vhoogelander thanks for opening this issue. Can you include the full error messages and provide details about the machine on which you're running these notebooks?

@vhoogelander
Copy link
Author

vhoogelander commented Aug 16, 2023

Hi Peter, I am running the notebooks on vhoogeland2@host-192-168-0-55 (this is the machine name right?).
For the first problem, I don't get any error in the NB itself, it just keeps running.
These are the full error messages of the 2nd problem:
NB2:

  Error                                     Traceback (most recent call last)
  Cell In[8], line 1
  ----> 1 cfg_file, cfg_dir = model.setup(end_time=experiment_end_date)
        2 print(cfg_file)
        3 print(cfg_dir)
  
  File /opt/conda/envs/ewatercycle/lib/python3.10/site-packages/ewatercycle/models/wflow.py:113, in Wflow.setup(self, cfg_dir, **kwargs)
      102 def setup(self, cfg_dir: Optional[str] = None, **kwargs) -> Tuple[str, str]:  # type: ignore
      103     """Start the model inside a container and return a valid config file.
      104 
      105     Args:
     (...)
      111         Path to config file and working directory
      112     """
  --> 113     self._setup_working_directory(cfg_dir)
      114     cfg = self.config
      116     if "start_time" in kwargs:

  File /opt/conda/envs/ewatercycle/lib/python3.10/site-packages/ewatercycle/models/wflow.py:160, in Wflow._setup_working_directory(self, cfg_dir)
      157 self.work_dir.parent.mkdir(parents=True, exist_ok=True)
      159 assert self.parameter_set
  --> 160 shutil.copytree(src=self.parameter_set.directory, dst=self.work_dir)
      161 if self.forcing:
      162     forcing_path = to_absolute_path(
      163         self.forcing.netcdfinput, parent=self.forcing.directory
      164     )
  
  File /opt/conda/envs/ewatercycle/lib/python3.10/shutil.py:556, in copytree(src, dst, symlinks, ignore, copy_function, ignore_dangling_symlinks, dirs_exist_ok)
      554 with os.scandir(src) as itr:
      555     entries = list(itr)
  --> 556 return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
      557                  ignore=ignore, copy_function=copy_function,
      558                  ignore_dangling_symlinks=ignore_dangling_symlinks,
      559                  dirs_exist_ok=dirs_exist_ok)

File /opt/conda/envs/ewatercycle/lib/python3.10/shutil.py:512, in _copytree(entries, src, dst, symlinks, ignore, copy_function, ignore_dangling_symlinks, dirs_exist_ok)
    510         errors.append((src, dst, str(why)))
    511 if errors:
--> 512     raise Error(errors)
    513 return dst
Error: [('/mnt/data/parameter-sets/wflow_merrimack_techpaper/inmaps/wflow_ERA5_Merrimack_2001_2016.nc', '/home/vhoogeland2/technicalPaperExampleNotebooks/ewatercycle_output/wflow_20230809_130831/inmaps/wflow_ERA5_Merrimack_2001_2016.nc', "[Errno 5] Input/output error: '/mnt/data/parameter-sets/wflow_merrimack_techpaper/inmaps/wflow_ERA5_Merrimack_2001_2016.nc' -> '/home/vhoogeland2/technicalPaperExampleNotebooks/ewatercycle_output/wflow_20230809_130831/inmaps/wflow_ERA5_Merrimack_2001_2016.nc'"), 
  ........ VERY LONG MESSAGE ........,
  '/home/vhoogeland2/technicalPaperExampleNotebooks/ewatercycle_output/wflow_20230809_130831/staticmaps/wflow_uparea.map', '[Errno 5] Input/output error')]

NB3:

OSError                                   Traceback (most recent call last)
Cell In[6], line 1
----> 1 observations_df, metadata = ewatercycle.observation.grdc.get_grdc_data(
      2     station_id,
      3     start_time=experiment_start_date,
      4     end_time=experiment_end_date,
      5 )
      6 grdc_obs = observations_df.rename(columns={"streamflow": "Observations from GRDC"})
      7 grdc_lon = metadata["grdc_longitude_in_arc_degree"]

File /opt/conda/envs/ewatercycle/lib/python3.10/site-packages/ewatercycle/observation/grdc.py:107, in get_grdc_data(station_id, start_time, end_time, parameter, data_home, column)
    104     raise ValueError(f"The grdc file {raw_file} does not exist!")
    106 # Convert the raw data to an xarray
--> 107 metadata, df = _grdc_read(
    108     raw_file,
    109     start=get_time(start_time).date(),
    110     end=get_time(end_time).date(),
    111     column=column,
    112 )
    114 # Add start/end_time to metadata
    115 metadata["UserStartTime"] = start_time

File /opt/conda/envs/ewatercycle/lib/python3.10/site-packages/ewatercycle/observation/grdc.py:129, in _grdc_read(grdc_station_path, start, end, column)
    127 def _grdc_read(grdc_station_path, start, end, column):
    128     with grdc_station_path.open("r", encoding="cp1252", errors="ignore") as file:
--> 129         data = file.read()
    131     metadata = _grdc_metadata_reader(grdc_station_path, data)
    133     all_lines = data.split("\n")

OSError: [Errno 5] Input/output error

And the 3rd problem:

NoSectionError                            Traceback (most recent call last)
Cell In[8], line 1
----> 1 reference = ewatercycle.models.PCRGlobWB(version="setters", parameter_set=experiment_parameterset)
      3 reference_config, reference_dir = reference.setup(
      4     start_date = experiment_start_date, 
      5     end_date = experiment_end_date)
      7 print(reference_config, reference_dir)

File /opt/conda/envs/ewatercycle/lib/python3.10/site-packages/ewatercycle/models/pcrglobwb.py:47, in PCRGlobWB.__init__(self, version, parameter_set, forcing)
     45 super().__init__(version, parameter_set, forcing)
     46 self._set_docker_image()
---> 47 self._setup_default_config()

File /opt/conda/envs/ewatercycle/lib/python3.10/site-packages/ewatercycle/models/pcrglobwb.py:81, in PCRGlobWB._setup_default_config(self)
     79 cfg = CaseConfigParser()
     80 cfg.read(config_file)
---> 81 cfg.set("globalOptions", "inputDir", str(input_dir))
     82 if self.forcing:
     83     cfg.set(
     84         "globalOptions",
     85         "startTime",
     86         get_time(self.forcing.start_time).strftime("%Y-%m-%d"),
     87     )

File /opt/conda/envs/ewatercycle/lib/python3.10/configparser.py:1205, in ConfigParser.set(self, section, option, value)
   1202 """Set an option.  Extends RawConfigParser.set by validating type and
   1203 interpolation syntax on the value."""
   1204 self._validate_value_types(option=option, value=value)
-> 1205 super().set(section, option, value)

File /opt/conda/envs/ewatercycle/lib/python3.10/configparser.py:903, in RawConfigParser.set(self, section, option, value)
    901         sectdict = self._sections[section]
    902     except KeyError:
--> 903         raise NoSectionError(section) from None
    904 sectdict[self.optionxform(option)] = value

NoSectionError: No section: 'globalOptions'

@Peter9192
Copy link
Contributor

Hi @vhoogelander, actually I meant whether it's a research cloud machine. I'm guessing it's this one, right? https://ewatercyclestud.ewatercycle-tud.src.surf-hosted.nl

Yes, on that machine it looks like the /home volume is full. That might explain the problem with NB2. NB3 looks different though, it's just reading, not copying.

Also it would be helpful if you could refer to the names of each of the notebooks (and where you got them from). Now I cannot really figure out which ones you have been running. It would be even better if you could reduce the problem to a minimal example and copy/paste the code here so we can reproduce it easily.

For now, I'll come back with a quick response.

  • Problem 1: I think that should be fixed when you start a new machine. To fix it on a running machine an admin might need to pin packages as described in change in ESMValTool breaks eWaterCycle forcing ewatercycle#355
  • Problem 2: I guess it has to do with your home disk being full
  • Problem 3: I tried (in a new python terminal on that machine I mentioned above):
import ewatercycle.observation.grdc
grdc_station_id = "6335020"

observations, metadata = ewatercycle.observation.grdc.get_grdc_data(
    station_id=grdc_station_id,
    start_time="1990-01-01T00:00:00Z",  # or: model_instance.start_time_as_isostr
    end_time="1990-12-15T00:00:00Z",
    column="GRDC",
)

observations.head()

that worked without problems.
Can you be more specific about what notebook/station ID etc you were using?

  • Problem 4: I'm not sure which dataset you're trying to load there. If I look at e.g. cat /mnt/data/parameter-sets/pcrglobwb_rhinemeuse_30min/setup_natural_test.ini it does seem to contain that section.

As you see, it would be helpful if you could be more specific about the issues you encountered.

On a side note: I did notice that the link to the example notebooks on the terria landing page is outdated. It currently points to link, but that no longer exists. We might need to pin it to a release or bring back the example notebooks in some other way. I'll open a new issue about that.

@vhoogelander
Copy link
Author

Hi @Peter9192, Thank you for your comment. I am referring to the technical paper notebooks (Case1_Marrmot_Merrimack..., Case2_wflow_LISFlood..., Case3_CoupleMarrmotAndPCRGlobWB and Case4_ForcePCRGlob) which I got via the terria landing page more than a year ago.

  1. If I restart my server, the problem remains. Or is this not you mean with starting a new machine? If not, how can I do this? (maybe a stupid question)

  2. I tried to clean up my own folder a bit, but the error of problem 2 remains. What is the maximum disk space of my home directory?

  3. I re-ran the cell of NB3, but apparently I'm not getting an error anymore here for some reason. I was using the same station ID (6335020), so I am not really sure what was the problem here, but it seems to be fixed now ;).

  4. I am loading this parameter set:
    name=pcrglobwb_merrimack_05min
    directory=/mnt/data/parameter-sets/pcrglobwb_global
    config=/mnt/data/parameter-sets/pcrglobwb_global/merrimack_05min_era5.ini
    I think this was the original dataset used in the Example Notebook, but I am not 100% sure.

@Peter9192
Copy link
Contributor

It's not the jupyter server, it's the SURF research cloud machine (https://portal.live.surfresearchcloud.nl/) that should be updated (or make a new one). @RolfHut knows how to do this.

The parameter set does have a globalOptions section, but I got a similar input/output error when I first tried to open it. It seems the disks were even fuller today than yesterday. The /home disk is 250GB in total shared by all users on that machine. I won't details here, but it looks like a few heavy users are taking up most of the available disk space.

@sverhoeven
Copy link
Member

I noticed that the dcache server that gives us files in /mnt/data was having hickups and timeouts. This could cause weird file reading behavior.

@BSchilperoort BSchilperoort transferred this issue from eWaterCycle/ewatercycle Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants