Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VIC code reorganization #7

Closed
bartnijssen opened this issue Jul 3, 2013 · 12 comments
Closed

VIC code reorganization #7

bartnijssen opened this issue Jul 3, 2013 · 12 comments
Assignees
Milestone

Comments

@bartnijssen
Copy link
Member

Proposal for VIC source code reorganization

What and why?

I propose a reorganization of the VIC code, that would allow multiple VIC drivers to use a single science core, so that science-based changes to the code can easily be shared amongst difference VIC configurations.

VIC is currently implemented in a number of different configurations, both offline and coupled. For example:

In addition, there are likely many project-specific configurations of VIC in use by other teams.

The challenge with the current implementation is that the computational or scientific core of VIC is not cleanly separated from the driver. The effect of this is that once a VIC version is implemented within a certain configuration, it is difficult to keep that configuration updated with the latest science-based changes that are made to VIC. In other words, after porting VIC version X so that it works in anything other than the default configuration ( VIC classic ), it is difficult to accommodate changes made as part of the VIC versions X.1, X.2, etc.

The goal of the proposed reorganization is limited to cleanly separating the VIC computational / scientific core from the driver. The driver determines whether the model is run in uncoupled, coupled, time-first, or space-first mode. This will allow all configurations to benefit from updates to VIC's core. Note that changes to VIC's core may require additional changes to the driver (for example, if new parameters need to be read).

The proposed reorganization includes changes in the organization of the VIC source code tree and in changes in the source code itself. Note that these changes should not affect the results. That is, VIC classic should produce the same results before and after the reorganization. Similarly, when using the same model parameters and driven with the same meteorological forcings, all drivers should produce the same results. This will allow us to put some rigorous tests in place to ensure that the reorganization does not introduce unwanted effects and to ensure that the results are correct if someone creates a new driver.

The reorganization will allow us to add additional drivers to the VIC code repository, which would allow VIC to be operated in time-first or space-first mode, uncoupled or coupled.

The following are some preliminary ideas (feedback requested) for how to implement this reorganization.

Driver and VIC core separation

Each driver must call at a minimum the following functions:

  • vic_alloc() -- Allocate memory for VIC structures
  • vic_init() -- Initialize model parameters
  • vic_restore() -- Restore model state
  • vic_run() -- Run VIC for a single grid cell, for a single time step.
  • vic_write() -- Write VIC model output for a single timestep (either for an entire domain or for a single grid cell).
  • vic_save() -- Save a VIC model state.
  • vic_finalize() -- Final cleanup.

In essence, the most important call will be vic_run(), which will run VIC for a single timestep and for a single grid cell. All the code that is invoked as part of the vic_run() call is by definition part of VIC core and will be the same regardless of the each driver. The arguments to the vic_run() call will include the complete set of meteorological forcings for that particular time step, that is, any manipulation of meteorological forcings will be done in the driver. The memory allocated in vic_alloc() will need to persist in the driver between successive calls to vic_run() until vic_finalize() is called.

The other vic_ functions, as well as any other functionality will be implemented as part of each driver and will consequently vary between VIC implementations, even though there may be a significant amount of overlap between drivers. Note that the drivers will need to implement additional functionality, for example, how to deal with meteorological forcings, how to deal with time varying model parameters that are read from file, and how to deal with changes to a model state as part of a data assimilation scheme.

Because only the code that is invoked by vic_run() is part of VIC's core, it could be argued that only the call to vic_run() should be required by each driver. However, requiring calls to the other vic_ functions as well, will likely promote code reuse among drivers and facilitate the implementation of new drivers. For example, two different drivers that both operate in image mode and write NetCDF output, may be able to use the same vic_write() and vic_save() functions and parts of the same vic_init() and vic_finalize() functions.

In short, pseudo-code for VIC classic would look something like:

foreach gridcell:
    vic_alloc()
    vic_init()
    vic_restore()
    foreach timestep:
        vic_run()
        if output:
            vic_write()
        if save:
            vic_save()
    vic_finalize()

Pseudo-code for an image model implementation would look something like:

vic_alloc()
vic_init()
vic_restore()
foreach timestep:
    foreach gridcell:
        vic_run()
    if output:
        vic_write()
    if save:
        vic_save()
vic_finalize()

In both cases there would be a large amount of additional code between consecutive vic_ calls. This code would be specific to each driver. For example, before each call to vic_run(), atmospheric forcings would need to be updated, time-varying model parameters may need to be updated, and the model state may need to be updated in a data assimilation scheme. Whether predefined names should be used for each of these steps is up for discussion (as is the rest of this document).

NVIC drivers may need to be implemented in Fortran to interact with other model components (for example as part of RASM). In that case it is important that the vic_ functions are callable from Fortran, in particular the vic_alloc(), vic_run(), and vic_finalize() functions. Alternatively, these functions may need to be wrapped in another set of functions that are callable from Fortran. I am not sure yet, how this should be implemented and how driver dependent this will be and how this will affect the above functions.

Source tree organization

In the current version of VIC, the source code is all located in a single directory. I propose that we separate the code into directory trees that refect the code separation outlined above. For example:

|-drivers
|---classic
|-----include
|-----src
|---lis
|-----include
|-----src
|---rasm
|-----include
|-----src
|---test
|-----include
|-----src
|-vic_run
|---include
|---src

For all directories I propose that we cleanly split the header files from the source code (hence include and src directories in each of the subdirectories). All the core vic code would go in the vic_run directories, while all the other code would be specific to each driver. I am thinking that there must be a better way to ensure that code is re-used, so am looking for suggestions. For example, should the mtclim code go in to the driver/classic directory or somewhere else? How can we set it up so that read and write functions can be shared more easily. Alternatively, just keep it split up as above and allow for duplication, since each driver will be tailored to a specific environment and/or application.

As said, this is a proposal and needs work: Let the feedback begin. I am planning to start making changes along these lines on a branch in the next week or two.

@tbohn
Copy link
Contributor

tbohn commented Jul 3, 2013

At first glance it looks great! I'll have to think about it a bit to give
more detailed comments.

On Wed, Jul 3, 2013 at 1:54 PM, bartnijssen notifications@github.comwrote:

Proposal for VIC source code reorganization What and why?

I propose a reorganization of the VIC code, that would allow multiple VIC
drivers to use a single science core, so that science-based changes to the
code can easily be shared amongst difference VIC configurations.

VIC is currently implemented in a number of different configurations, both
offline and coupled. For example:

In addition, there are likely many project-specific configurations of VIC
in use by other teams.

The challenge with the current implementation is that the computational or
scientific core of VIC is not cleanly separated from the driver. The effect
of this is that once a VIC version is implemented within a certain
configuration, it is difficult to keep that configuration updated with the
latest science-based changes that are made to VIC. In other words, after
porting VIC version X so that it works in anything other than the default
configuration ( VIC classic ), it is difficult to accommodate changes
made as part of the VIC versions X.1, X.2, etc.

The goal of the proposed reorganization is limited to cleanly separating
the VIC computational / scientific core from the driver. The driver
determines whether the model is run in uncoupled, coupled, time-first, or
space-first mode. This will allow all configurations to benefit from
updates to VIC's core. Note that changes to VIC's core may require
additional changes to the driver (for example, if new parameters need to be
read).

The proposed reorganization includes changes in the organization of the
VIC source code tree and in changes in the source code itself. Note that
these changes should not affect the results. That is, _VIC classic_should produce the same results before and after the reorganization.
Similarly, when using the same model parameters and driven with the same
meteorological forcings, all drivers should produce the same results. This
will allow us to put some rigorous tests in place to ensure that the
reorganization does not introduce unwanted effects and to ensure that the
results are correct if someone creates a new driver.

The reorganization will allow us to add additional drivers to the VIC code
repository, which would allow VIC to be operated in time-first or
space-first mode, uncoupled or coupled.

The following are some preliminary ideas (feedback requested) for how to
implement this reorganization.
Driver and VIC core separation

Each driver must call at a minimum the following functions:

  • vic_alloc() -- Allocate memory for VIC structures
  • vic_init() -- Initialize model parameters
  • vic_restore() -- Restore model state
  • vic_run() -- Run VIC for a single grid cell, for a single time step.
  • vic_write() -- Write VIC model output for a single timestep (either
    for an entire domain or for a single grid cell).
  • vic_save() -- Save a VIC model state.
  • vic_finalize() -- Final cleanup.

In essence, the most important call will be vic_run(), which will run VIC
for a single timestep and for a single grid cell. All the code that is
invoked as part of the vic_run() call is by definition part of VIC core
and will be the same regardless of the each driver. The arguments to the
vic_run() call will include the complete set of meteorological forcings
for that particular time step, that is, any manipulation of meteorological
forcings will be done in the driver. The memory allocated in vic_alloc()will need to persist in the driver between successive calls to
vic_run() until vic_finalize() is called.

The other vic_ functions, as well as any other functionality will be
implemented as part of each driver and will consequently vary between VIC
implementations, even though there may be a significant amount of overlap
between drivers. Note that the drivers will need to implement additional
functionality, for example, how to deal with meteorological forcings, how
to deal with time varying model parameters that are read from file, and how
to deal with changes to a model state as part of a data assimilation scheme.

Because only the code that is invoked by vic_run() is part of VIC's core,
it could be argued that only the call to vic_run() should be required by
each driver. However, requiring calls to the other vic_ functions as
well, will likely promote code reuse among drivers and facilitate the
implementation of new drivers. For example, two different drivers that both
operate in image mode and write NetCDF output, may be able to use the same
vic_write() and vic_save() functions and parts of the same vic_init() and
vic_finalize() functions.

In short, pseudo-code for VIC classic would look something like:

foreach gridcell:
vic_alloc()
vic_init()
vic_restore()
foreach timestep:
vic_run()
if output:
vic_write()
if save:
vic_save()
vic_finalize()

Pseudo-code for an image model implementation would look something like:

vic_alloc()
vic_init()
vic_restore()
foreach timestep:
foreach gridcell:
vic_run()
if output:
vic_write()
if save:
vic_save()
vic_finalize()

In both cases there would be a large amount of additional code between
consecutive vic_ calls. This code would be specific to each driver. For
example, before each call to vic_run(), atmospheric forcings would need
to be updated, time-varying model parameters may need to be updated, and
the model state may need to be updated in a data assimilation scheme.
Whether predefined names should be used for each of these steps is up for
discussion (as is the rest of this document).

NVIC drivers may need to be implemented in Fortran to interact with other
model components (for example as part of RASM). In that case it is
important that the vic_ functions are callable from Fortran, in
particular the vic_alloc(), vic_run(), and vic_finalize() functions.
Alternatively, these functions may need to be wrapped in another set of
functions that are callable from Fortran. I am not sure yet, how this
should be implemented and how driver dependent this will be and how this
will affect the above functions.
Source tree organization

In the current version of VIC, the source code is all located in a single
directory. I propose that we separate the code into directory trees that
refect the code separation outlined above. For example:

|-drivers
|---classic
|-----include
|-----src
|---lis
|-----include
|-----src
|---rasm
|-----include
|-----src
|---test
|-----include
|-----src
|-vic_run
|---include
|---src

For all directories I propose that we cleanly split the header files from
the source code (hence include and src directories in each of the
subdirectories). All the core vic code would go in the vic_rundirectories, while all the other code would be specific to each driver. I
am thinking that there must be a better way to ensure that code is re-used,
so am looking for suggestions. For example, should the mtclim code go in
to the driver/classic directory or somewhere else? How can we set it up
so that read and write functions can be shared more easily. Alternatively,
just keep it split up as above and allow for duplication, since each driver
will be tailored to a specific environment and/or application.

As said, this is a proposal and needs work: Let the feedback begin. I am
planning to start making changes along these lines on a branch in the next
week or two.


Reply to this email directly or view it on GitHubhttps://github.com//issues/7
.

@bartnijssen
Copy link
Member Author

Most drivers will probably not be part of VIC, but will be part of some other package. For example, the RASM driver for VIC will not be part of the VIC archive, but will be part of the RASM source code repo. Similarly, the LIS driver for VIC will be part of the LIS source code repo rather than VIC's.

The only drivers that will actually be part of the source code repo are those that will run VIC in standalone mode, either the current driver ( classic ) or an image mode one. I also think there should be a test one, which is a bare bones driver that people can use to test their implementation. Maybe for the standalone drivers we can implement the additional vic_ functions to standardize the code and make a cleaner separation between the different functionalities of the code.

For the drivers that are not part of the VIC source repo, things like memory allocation may be sufficiently different (for example, to function under MPI), that I am not fully convinced that it makes sense to mandate the calls to functions other than vic_run(). Many of these drivers may also need to interface with other languages such as FORTRAN. As long as they have an existing implementation to look at (i.e. one of the driver in the VIC source code repo), maybe we only specify the call to vic_run() and then have a clear description that goes along with that to indicate the memory that needs to be allocated.

@jhamman
Copy link
Member

jhamman commented Jul 15, 2013

Bart,

This is a really nice comprehensive look at the task ahead. The source tree structure makes good sense and seems to allow for easy application of new code to all run modes. I have a few questions and comments.

Is there reason to believe that mtclim will only be used in the driver-classic configuration? I can imagine a few scenarios where mtclim would be used in image mode?

Why wouldn’t we want to have a standard set of init subroutines that cover all run modes? Ideally, the initial state and parameter files could be easily shared between modes. I think I have a fairly good idea of how to structure these inputs but would need some help thinking through the different allocation needs between classic run modes.

As far as the write routines, my impression is that most applications of VIC eventually end up pulling the outputs together into a grid structure. For this reason, I think it would be worth thinking through how to facilitate this during runtime rather than as a post process. Maybe we can ignore this for the classic mode and just setup grid write capabilities for the image modes.

I think there will be a handful of smaller decisions along the way, (For example, should the vic_write() routine be called by the driver or the vic_run() routine?), but this is a really good framework to start from.

@tbohn
Copy link
Contributor

tbohn commented Jul 15, 2013

Hi all,

One reason for not running MTCLIM in image mode is that it requires the
entire timeseries of forcings from a given cell to determine whether the
cell is arid or humid, and to compute its 90-day precipitation window for
the humidity algorithm.

Ted

On Mon, Jul 15, 2013 at 11:49 AM, Joe Hamman notifications@github.comwrote:

Bart,

This is a really nice comprehensive look at the task ahead. The source
tree structure makes good sense and seems to allow for easy application of
new code to all run modes. I have a few questions and comments.

Is there reason to believe that mtclim will only be used in the
driver-classic configuration? I can imagine a few scenarios where mtclim
would be used in image mode?

Why wouldn’t we want to have a standard set of init subroutines that cover
all run modes? Ideally, the initial state and parameter files could be
easily shared between modes. I think I have a fairly good idea of how to
structure these inputs but would need some help thinking through the
different allocation needs between classic run modes.

As far as the write routines, my impression is that most applications of
VIC eventually end up pulling the outputs together into a grid structure.
For this reason, I think it would be worth thinking through how to
facilitate this during runtime rather than as a post process. Maybe we can
ignore this for the classic mode and just setup grid write capabilities for
the image modes.

I think there will be a handful of smaller decisions along the way, (For
example, should the vic_write() routine be called by the driver or the
vic_run() routine?), but this is a really good framework to start from.


Reply to this email directly or view it on GitHubhttps://github.com//issues/7#issuecomment-20993946
.

@bartnijssen
Copy link
Member Author

Indeed - Ted's comment is the main reason for not using MTCLIM in image mode. I think we should build a standalone processor that can do what MTCLIM does (analogous to writing out the VIC met forcings in the current version) and that can be used in point or image mode. The output files will be large, but that is not quite the same problem that it once was.

@jhamman
Copy link
Member

jhamman commented Jul 19, 2013

Thanks for the comments guys. I agree that it does make sense to build a standalone processor that can do what MTCLIM does. I think removing MTCLIM from the internals of VIC may be the cleanest way to implement the changes Bart has outlined above. Furthermore, explicitly running MTCLIM would reduce the degree to which it is used as a "black box" pre-processor and could aid in scientific development of both models (VIC and MTCLIM).

I think some of the issues associated with disk space may be alleviated by leveraging the netCDF4/HDF5 binary storage/compression packages. It would be interesting to do back of the envelope estimates of disk and memory usage. Reading the full set of met forcings from disk may increase run time a bit due to the read activity but would limit further limit the memory requirements (especially when running in image mode).

@bartnijssen
Copy link
Member Author

I am picking this back up - even though this is not slated to VIC 5.0. I have a branch in my own repo to start working on this: https://github.com/bartnijssen/VIC/tree/feature/reorg

I need it in the context of RASM

@tbohn
Copy link
Contributor

tbohn commented Feb 3, 2014

Would it help if I get rid of the DIST_PRCP option (and therefore the dist_prec() function) before you do this?

@bartnijssen
Copy link
Member Author

It would definitely facilitate things, but if you do that on your own branch I can just pull in those changes when you are ready.

@tbohn
Copy link
Contributor

tbohn commented Feb 3, 2014

It was next on my list anyway, after I finished dealing with the soil
temperature issue. But I'm at a point where I can put the soil temperature
thing on the back burner and do the dist_prcp issue.

On Mon, Feb 3, 2014 at 12:34 PM, bartnijssen notifications@github.comwrote:

It would definitely facilitate things, but if you do that on your own
branch I can just pull in those changes when you are ready.

Reply to this email directly or view it on GitHubhttps://github.com//issues/7#issuecomment-33996779
.

@bartnijssen
Copy link
Member Author

I am continuing this work and further documentation of the process on the wiki attached to my own fork of this repo. If it is useful we can push some of that wiki info back to the main repo later.

For details see here:

@bartnijssen bartnijssen mentioned this issue Oct 16, 2014
2 tasks
@jhamman jhamman mentioned this issue Nov 21, 2014
7 tasks
@jhamman
Copy link
Member

jhamman commented Dec 8, 2014

Closed via @bartnijssen's feature/reorg and feature/image branches that came in via #169.

@jhamman jhamman closed this as completed Dec 8, 2014
jhamman pushed a commit that referenced this issue Nov 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants