Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed release 1.1.1 #91

Merged
merged 19 commits into from
Jul 25, 2023
Merged

Proposed release 1.1.1 #91

merged 19 commits into from
Jul 25, 2023

Conversation

jmrohwer
Copy link
Member

I've fixed a number of bugs (one of them quite severe) and added a couple of features. I'd like to cut a new release that makes these generally available, specifically to students in my lab. The following can serve as a release announcement (edits welcome).

PySCeS Version 1.1.1

We are pleased to announce the release of the Python Simulator for Cellular Systems: PySCeS (https://pysces.github.io/) version 1.1.1. This is the first bugfix release in the 1.1 series.

New features

  • When used in a notebook environment, the ipympl backend for matplotlib is now enabled if installed. This allows use in JupyterLab (as opposed to classic notebook). If ipympl is not installed, fallback is to the standard nbAgg, which is part of matplotlib.
  • Simulation results (mod.sim object) are now returned as a pandas DataFrame if pandas is installed, otherwise a numpy recarray is returned as before.

Bug fixes

  • Fixed a bug in simulations with RateRules using Assimulo, where a wrong solver variable was being assigned internally.
  • Fixed SBML export when assignment rules were evaluating reaction rates.
  • Enabled assignment rules (forcing functions) to track parameter changes when using CVODE. This is needed in case events change parameters during the course of the simulation.

README: https://github.com/PySCeS/pysces/blob/master/README.md

DOCUMENTATION: https://pyscesdocs.readthedocs.io/en/latest/

@jmrohwer jmrohwer requested a review from bgoli June 30, 2023 16:35
@jmrohwer jmrohwer self-assigned this Jun 30, 2023
@jmrohwer
Copy link
Member Author

jmrohwer commented Jul 8, 2023

@bgoli have you had a chance to look at this?

@bgoli
Copy link
Member

bgoli commented Jul 9, 2023 via email

Copy link
Member

@bgoli bgoli Jul 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a pandas export is fine but I don't think we should have a polymorphic data structures (mod._sim can be two different types) that depends only on whether a module is present or not. Would something like a getSimAsDF() work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is a bit kludgy. In principle something like getSimAsDF() can work, but it is not very handy in terms of a workflow. I have been doing simulations a lot, and being able to inspect the results as a pandas DF is very handy, both in terms of inspecting the data directly (being able to scroll the numbers) and being able to plot results quickly (mod.sim.plot(x='Time', y=['S1', 'R1]) or plotting 2 variables against each other mod.sim.plot(x='S1', y=['S2'])). The bottom line is I would like to access this DF object directly after a simulation without first calling a getSimASDF() method first every time.

An easy option would be to introduce a dependency on pandas, but I remember discussions with you wanting to limit dependencies, that's why I included the test if pandas is installed.

Another option would be to instantiate the mod.sim object only when pandas is installed (as a pandas df) and throw and error otherwise. This would break backwards compatibility though. Although, I cannot imagine anyone who has been using the mod.sim recarray functionality not being willing to install pandas and then use the DF. It is just so much more powerful.

In the end I'm happy to go along with any method that allows me direct access to a pandas DF after a simulation without calling a separate method first every time. If you have other suggestions I am keen to hear them 😃

@bgoli
Copy link
Member

bgoli commented Jul 16, 2023

Other than the Pandas thing everything else looks good to go

@jmrohwer
Copy link
Member Author

Copying email discussion:

I see 3 options currently:

  • add a dependency on pandas and make mod.sim a DataFrame (DF)
  • make mod.sim a DF if pandas is installed, if it is not installed raise an error (this would not be backwards compatible)
  • leave mod.sim a numpy recarray (as is currently) and create a new object (also with lazy loading), e.g. mod.simdf, that returns a DF. This has the advantage of being backwards compatible but returns two objects with essentially the same functionality. In addition, when doing an interactive session, mod.simdf is more cumbersome to type than mod.sim yet mod.simdf is the one which I would use all the time.

Alternatively, if we can't agree immediately on a way forward re. this issue, I could remove the relevant commit from the PR and do a new release with the rest. I would ideally, however, like to get everything wrapped up to avoid doing a subsequent release very soon. And this DF thing is just such a handy feature! Once you've worked with it, the DF functionality is just so much more powerful, incl. tab completion for model attributes when working interactively. We (i.e. I and my students) really need this and would not want to go back.

@jmrohwer
Copy link
Member Author

Copying @bgoli email reply:

I see that a pandas dataframe is useful for interactive work but it does increase the dependencies that are needed by Pandas to install PySCeS and is just overhead for any other workflow.

Changing the type of an attribute based only on if a module is installed or not, is not really a good option. However, Pandas support is interesting for interactive work, but, then so is maybe Zarr for HPC workflows etc. So what about something like this?

We create an alternate datatype (or something) config key with the default of None.

mod.__config__[‘alternate_datatype’] = None

and then a basic Pandas enable method (to start with) method:

def enableDataPandas(self, var=True):
  if var:
    try:
      import pandas
      self.__config__[‘alternate_datatype’] =pandasexcept ImportError:
      print(‘Instructions to install panda’)
      return false
    return true
  else:
    self.__config__[‘alternate_datatype’] = None
    return false

Now for mod.sim - and any other variable that might be useful to the user as a dataframe - you can use Pandas simply by adding something like:

if self.__config__[‘alternate_datatype’] ==pandas’:
  print(“variable type changed to pandas and whatever else must happen here”)
  return dataframe
else:
  print(“variable type changed back to numpy and whatever else must happen here”)
  return numpy.rec #or whatever

The advantage of going this route is that we can have multiple custom datatype outputs with the default being numpy. Of course that means you would have to type mod.enableDataPandas() once in a script, is that too much extra typing?

Would something like this work, it is more flexible as we can potentially include other formats (e.g. Zarr) for other workflows etc?

@jmrohwer
Copy link
Member Author

So as I'm seeing it both our proposals can return two different types for mod.sim, but the difference is:

  • yours is as a result of a config key
  • mine would be as a result of testing for the presence of a module

I agree that the additional overhead of pandas and its dependencies is not warranted to make it a general dependency of PySCeS. So far we are on the same page.

As to the use case of mod.sim (and e.g. mod.scan):

  • these are instantiated by lazy loading so only created if needed so create no unnecessary overhead
  • their use will generally be interactive (but not strictly necessary so)
  • any operations on attributes such as mod.sim.S1 or mod.sim.Time (e.g. using them in matplotlib plotting commands) will work equally well for a pandas DF or a numpy recarray so that should not break anything
  • if one has pandas installed I can think of no reason whatsoever to not use a DF vs. a recarray

But the argument about multiple datatype outputs such as Zarr is a valid one though, you have a point and I see the benefit of a configuration option. However:

  • I would prefer making the configurable an attribute of the pysces module that is imported (i.e. global setting) that can be set via .pyscfg.ini, and not a per-model setting. I can see myself using pandas all the time and even if it is typing mod.enableDataPandas() only once per script, it would be a PITA not being able to set this as default behaviour. Also what about doing an analysis with 4 or 5 different models in the same session, it would then have to be set separately for every model instance.
  • I would move the print statements to the enabling function, otherwise we have this endless spitting out of output every time mod.sim is called.

Would that be acceptable?

@bgoli
Copy link
Member

bgoli commented Jul 19, 2023

We're on the same page, to answer your questions:

I would prefer making the configurable an attribute of the pysces module t

Yes absolutely, I also had this in mind when thinking about using a config option :-) I would like to keep the enablePandas method as it allows scripts to be slightly more reusable in that you can include the enable method explicitly to avoid any possible strange errors if people don't have Pandas installed. Practically, the global config option could just make it so this method is automatically called on model instantiation. I'm sure we already do similar stuff with some model config options so it should be easy to add (I'm happy to try if you like).

I would move the print statements to the enabling function ...

My apologies that is me taking shortcuts, those print statements were supposed to represent the code being called in those blocks and should absolutely not exist anywhere in real life.

If we agree this is okay then to me it sounds like we are good to go 👍

@bgoli
Copy link
Member

bgoli commented Jul 19, 2023

In addition we could eventually also add a module level enablePandas() function that sets the global config option for a session and this is then set automatically in any instantiated model.

@jmrohwer jmrohwer linked an issue Jul 21, 2023 that may be closed by this pull request
@jmrohwer
Copy link
Member Author

@bgoli from my side this is now good to go. Can you confirm? I will of course update the CI stuff again for the release, did not want to build all the permutations now for testing.

@bgoli
Copy link
Member

bgoli commented Jul 24, 2023

Thanks for the test notebook, however, it fails on the Panda import ... any ideas?

image

Could this be a windows issue in which case I can look into it

@jmrohwer
Copy link
Member Author

It is not a windows issue per se as I've tested on Windows as well. Is this a development (editable) install and have you pulled the latest version? Perhaps related to this comment? I was getting exactly the same before putting in the extra import.

@bgoli
Copy link
Member

bgoli commented Jul 25, 2023

Have you tested it without the initial auto import from your config? I've experienced this before and usually the only way I can get it to work in a terminal with a dynamic (runtime) import is to set a reference to the module with a private class attribute.

I can commit my changes but essentially all I've done is add to

'''python

def __init__(self, args):
    # existing code
    if  __CUSTOM_DATATYPE__ == 'pandas':
        if _checkPandas():
            self.enableDataPandas(True)
        else:
            self.enableDataPandas(False)

'''
and modified enableDataPandas

'''python

def enableDataPandas(self, var=True):
    if var:
        try:
            import pandas as pandas
            self.__pandas = pandas
            self.__settings__['custom_datatype'] = 'pandas'
            print('Pandas output enabled.')
        except ModuleNotFoundError:

'''

this creates a nice private attribute that can be used for development
'''python

self.__pandas
'''
which as far as I can tell always works no matter what the OS, etc.

What do you thing, should I push it and you can test linux, etc?

@jmrohwer
Copy link
Member Author

Indeed I had not tested it if my configuration file was set to custom_datatype = None. I also get the same error on Linux when doing this. Can't say I understand why. Feel free to commit your changes (then there's a commit from you again as well 😁) then I'll test on my side.

@jmrohwer
Copy link
Member Author

Are you happy otherwise if I merge the PR if this works?

@bgoli
Copy link
Member

bgoli commented Jul 25, 2023

The reasons are obscure and have to do with scoping and I think that a module is only ever imported once ... or something. Let me commit this and then if it works on your side go for the release.

…sue pandas is now attached as a private attribute
@bgoli
Copy link
Member

bgoli commented Jul 25, 2023

Happy for you to merge at your convenience 👍

@jmrohwer
Copy link
Member Author

Great, just running final tests on my side 👍

@jmrohwer jmrohwer merged commit 6dbf8ec into main Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unify configuration across platforms
2 participants