Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial preprocessor implementation #52

Merged
merged 15 commits into from
Jun 9, 2020

Conversation

amoodie
Copy link
Member

@amoodie amoodie commented Jun 6, 2020

This is kind of a mess of a PR. I apologize for that.

The main contribution here is the addition a feature INcomplete preprocessor for the model. In preparation for more advanced YAML parsing, I refactored some of what Eric put together on the command line interface #31.

In short, I have started to set up a "high level" and "low level" API.

high level

The high level API works from the command line as Eric nicely implemented:

pyDeltaRCM --config config_file.yml

where the .yml must have timesteps specified to work from the high level API. Alternatively the API can be used as:

pyDeltaRCM --config config_file.yml --timesteps 2

which would run the .yml for two timesteps and the timesteps does not need to be specified within the yaml then.

There is also a high-level python API that works as:

pp = preprocessor.Preprocessor(input_file=config_file.yml, timesteps=2)
pp.run_jobs()

which would run the jobs defined by the input_file for two timesteps each.

The incomplete features that should become part of the preprocessor:

  • matrix expansion Matrix expansion in config yaml #32
  • an ensemble: N option to run the same config N times with different seeds
  • the two can also be combined to produce many runs
  • sending the job handling to a preprocessor allows us to use mpi4py to run jobs easily on multiple cores (clusters)
  • checkpointing (or determining whether to restart a run) should be handled by the preprocessor

I will open issues about these points after merge.

With this PR, time looping is only part of the package high-level API, so this kind of gets rid of the need for the run_pyDeltaRCM.py script and the looping there, but I did not yet delete these files. I think we should wait until we get checkpointing reimplemented.

low level

You can still use the low-level api:

delta = DeltaModel(input_file=os.path.join(os.getcwd(), 'tests', 'test.yaml'))

for time in range(0, 1):
    delta.update()

delta.finalize()

other junk I did

  • changed the name of the main model code to DeltaModel as we have discussed and renamed the module to model.
  • changed the entry point name from run_pyDeltaRCM to just pyDeltaRCM
  • added some docs to explain the API to the user guide.
  • updated all the docs to have new names
  • consolidated some test files
  • added a test delta pytest.fixture to be used in every test that needs a DeltaModel so that tests no longer create temporary files in the package working directory (used to create test and pyDeltaRCM_output)

@amoodie
Copy link
Member Author

amoodie commented Jun 6, 2020

should say: I want to merge this now rather than after all the other stuff is implemented because this was kind of a pain to get mergable on top of the recent PRs, so it will be easier to make incremental change on top now.

@amoodie amoodie requested a review from ericbarefoot June 6, 2020 00:40
@amoodie amoodie added the enhancement New feature or request label Jun 6, 2020
@amoodie amoodie added this to In progress in Version 2.0 via automation Jun 6, 2020
Copy link
Collaborator

@ericbarefoot ericbarefoot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General thoughts

  1. I'm super stoked about the preprocessor and the general idea for the API and argument parsing. I think this is a great step forward for us in coming to a stable API for users.
  2. I'm a-ok with all the renaming. There were several things that made the interface quite confusing, and doing some consistent renaming is a good thing.
  3. The beginnings of the jobs list matrix expansion is dopeeeee.

Concerns

The only concern I have is with the idea of eliminating a default timestep choice for the user. I think a lot of the parameters where we supply a default value are similarly arbitrary, but we don't force the user to supply. I would prefer, in some ways, to supply a default (5 or 10) and if the user doesn't supply one, maybe it could automatically activate the verbose = 2 flag or something and print a warning so that they know they're running with a default value and that their computer isn't cycling garbage.

I am willing to be persuaded, but could you make the argument for changing it to a required parameter?

Goof by me

I made comment somewhere about using pytest.raises() but I was going commit-by-commit, and I saw that you fixed it, but now I can't find the comment to delete it. oops.

EDIT: found and resolved this ^

@@ -142,6 +142,3 @@ coeff_U_ero_sand:
alpha:
type: ['float', 'int']
default: 0.1
timesteps:
type: ['float', 'int']
default: 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we remove the notion of a default timestepping?

tests/test_initialization.py Outdated Show resolved Hide resolved
@amoodie
Copy link
Member Author

amoodie commented Jun 7, 2020

Thanks for checking out @ericbarefoot 🌮

I think a lot of the parameters where we supply a default value are similarly arbitrary

I think the concept of time in modeling (timestep, n_timesteps, time, etc) is kind of a "special" variable.

The way I set it up, the high level API requires the user to specify the timesteps; they may specify it in either the .yml or via argument (--timesteps or timesteps=). An error is thrown if it is not supplied. So, because timesteps is required to run the high level API, I would argue that it does not need a default. The low-level API the user handles timestepping, so no value is needed.

That said, you're right, it could still have a default value. But practically speaking, I can't imagine why anyone would ever want to specify a bunch of parameters in their own .yml and then just use the default 5 timesteps...all of the other variables have "reasonable" choices, so maybe someone would want to just change the sand fraction from the default. In that case though, I still don't know how we would assume their choice of number of timesteps...

Now, all THAT said, most users who are really doing science will likely change lots of the input parameters and use the low-level API anyway, so it's kind of irrelevant.

Version 2.0 automation moved this from In progress to Reviewer approved Jun 9, 2020
@ericbarefoot ericbarefoot merged commit a2af4a2 into DeltaRCM:develop Jun 9, 2020
Version 2.0 automation moved this from Reviewer approved to Done Jun 9, 2020
@amoodie amoodie deleted the partial_preprocessor branch February 25, 2021 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Version 2.0
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants