-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposed release 1.1.1 #91
Conversation
@bgoli have you had a chance to look at this? |
Looks interesting, I have time from today.
…On Sat, Jul 8, 2023 at 3:14 PM Johann Rohwer ***@***.***> wrote:
@bgoli <https://github.com/bgoli> have you had a chance to look at this?
—
Reply to this email directly, view it on GitHub
<#91 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGHUEOKT5B5RY7XGYNTDVLXPFMJ7ANCNFSM6AAAAAAZ2CBYGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a pandas export is fine but I don't think we should have a polymorphic data structures (mod._sim can be two different types) that depends only on whether a module is present or not. Would something like a getSimAsDF() work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is a bit kludgy. In principle something like getSimAsDF()
can work, but it is not very handy in terms of a workflow. I have been doing simulations a lot, and being able to inspect the results as a pandas DF is very handy, both in terms of inspecting the data directly (being able to scroll the numbers) and being able to plot results quickly (mod.sim.plot(x='Time', y=['S1', 'R1])
or plotting 2 variables against each other mod.sim.plot(x='S1', y=['S2'])
). The bottom line is I would like to access this DF object directly after a simulation without first calling a getSimASDF()
method first every time.
An easy option would be to introduce a dependency on pandas, but I remember discussions with you wanting to limit dependencies, that's why I included the test if pandas is installed.
Another option would be to instantiate the mod.sim
object only when pandas is installed (as a pandas df) and throw and error otherwise. This would break backwards compatibility though. Although, I cannot imagine anyone who has been using the mod.sim
recarray functionality not being willing to install pandas and then use the DF. It is just so much more powerful.
In the end I'm happy to go along with any method that allows me direct access to a pandas DF after a simulation without calling a separate method first every time. If you have other suggestions I am keen to hear them 😃
Other than the Pandas thing everything else looks good to go |
Copying email discussion: I see 3 options currently:
Alternatively, if we can't agree immediately on a way forward re. this issue, I could remove the relevant commit from the PR and do a new release with the rest. I would ideally, however, like to get everything wrapped up to avoid doing a subsequent release very soon. And this DF thing is just such a handy feature! Once you've worked with it, the DF functionality is just so much more powerful, incl. tab completion for model attributes when working interactively. We (i.e. I and my students) really need this and would not want to go back. |
Copying @bgoli email reply: I see that a pandas dataframe is useful for interactive work but it does increase the dependencies that are needed by Pandas to install PySCeS and is just overhead for any other workflow. Changing the type of an attribute based only on if a module is installed or not, is not really a good option. However, Pandas support is interesting for interactive work, but, then so is maybe Zarr for HPC workflows etc. So what about something like this? We create an alternate datatype (or something) config key with the default of None.
and then a basic Pandas enable method (to start with) method: def enableDataPandas(self, var=True):
if var:
try:
import pandas
self.__config__[‘alternate_datatype’] = ‘pandas’
except ImportError:
print(‘Instructions to install panda’)
return false
return true
else:
self.__config__[‘alternate_datatype’] = None
return false Now for if self.__config__[‘alternate_datatype’] == ‘pandas’:
print(“variable type changed to pandas and whatever else must happen here”)
return dataframe
else:
print(“variable type changed back to numpy and whatever else must happen here”)
return numpy.rec #or whatever The advantage of going this route is that we can have multiple custom datatype outputs with the default being numpy. Of course that means you would have to type Would something like this work, it is more flexible as we can potentially include other formats (e.g. Zarr) for other workflows etc? |
So as I'm seeing it both our proposals can return two different types for
I agree that the additional overhead of pandas and its dependencies is not warranted to make it a general dependency of PySCeS. So far we are on the same page. As to the use case of
But the argument about multiple datatype outputs such as Zarr is a valid one though, you have a point and I see the benefit of a configuration option. However:
Would that be acceptable? |
We're on the same page, to answer your questions: I would prefer making the configurable an attribute of the pysces module t Yes absolutely, I also had this in mind when thinking about using a config option :-) I would like to keep the enablePandas method as it allows scripts to be slightly more reusable in that you can include the enable method explicitly to avoid any possible strange errors if people don't have Pandas installed. Practically, the global config option could just make it so this method is automatically called on model instantiation. I'm sure we already do similar stuff with some model config options so it should be easy to add (I'm happy to try if you like). I would move the print statements to the enabling function ... My apologies that is me taking shortcuts, those print statements were supposed to represent the code being called in those blocks and should absolutely not exist anywhere in real life. If we agree this is okay then to me it sounds like we are good to go 👍 |
In addition we could eventually also add a module level enablePandas() function that sets the global config option for a session and this is then set automatically in any instantiated model. |
@bgoli from my side this is now good to go. Can you confirm? I will of course update the CI stuff again for the release, did not want to build all the permutations now for testing. |
It is not a windows issue per se as I've tested on Windows as well. Is this a development (editable) install and have you pulled the latest version? Perhaps related to this comment? I was getting exactly the same before putting in the extra import. |
Have you tested it without the initial auto import from your config? I've experienced this before and usually the only way I can get it to work in a terminal with a dynamic (runtime) import is to set a reference to the module with a private class attribute. I can commit my changes but essentially all I've done is add to '''python
''' '''python
''' this creates a nice private attribute that can be used for development self.__pandas What do you thing, should I push it and you can test linux, etc? |
Indeed I had not tested it if my configuration file was set to |
Are you happy otherwise if I merge the PR if this works? |
The reasons are obscure and have to do with scoping and I think that a module is only ever imported once ... or something. Let me commit this and then if it works on your side go for the release. |
…sue pandas is now attached as a private attribute
Happy for you to merge at your convenience 👍 |
Great, just running final tests on my side 👍 |
I've fixed a number of bugs (one of them quite severe) and added a couple of features. I'd like to cut a new release that makes these generally available, specifically to students in my lab. The following can serve as a release announcement (edits welcome).
PySCeS Version 1.1.1
We are pleased to announce the release of the Python Simulator for Cellular Systems: PySCeS (https://pysces.github.io/) version 1.1.1. This is the first bugfix release in the 1.1 series.
New features
ipympl
backend for matplotlib is now enabled if installed. This allows use in JupyterLab (as opposed to classic notebook). If ipympl is not installed, fallback is to the standardnbAgg
, which is part of matplotlib.mod.sim
object) are now returned as a pandas DataFrame if pandas is installed, otherwise a numpy recarray is returned as before.Bug fixes
RateRules
using Assimulo, where a wrong solver variable was being assigned internally.README: https://github.com/PySCeS/pysces/blob/master/README.md
DOCUMENTATION: https://pyscesdocs.readthedocs.io/en/latest/