Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue with big datasets #57

Open
filippomc opened this issue Mar 13, 2019 · 9 comments
Open

Performance issue with big datasets #57

filippomc opened this issue Mar 13, 2019 · 9 comments

Comments

@filippomc
Copy link

filippomc commented Mar 13, 2019

Hi,

we're facing a performance issue with our application based on PyEcore. After some profiling it seems that there is a big overhead when loading a timeseries into an object.

Here the generated code:

class TimeSeries(Value):
    scalingFactor = EAttribute(eType=EInt)
    value = EAttribute(eType=EDouble, upper=-1)
    unit = EReference(containment=True)

    def __init__(self, unit=None, scalingFactor=None, value=None, **kwargs):
        super(TimeSeries, self).__init__(**kwargs)
        if scalingFactor is not None:
            self.scalingFactor = scalingFactor
        if value:
            self.value.extend(value)
        if unit is not None:
            self.unit = unit

value will be a big list of floats (we can't pass a numpy array because of if value but thats another minor issue).
When we're calling the extend function we're in fact triggering a logic which is adding the elements of the list one by one to an OrderedSet and also doing checks on every element.

One possible solution i was figuring out to handle this is to replace that
self.value.extend(value)
with
self.value.items = value
although I don't know if I'm missing something important on the logic behind.

Thanks

@aranega
Copy link
Member

aranega commented Mar 13, 2019

Hi @filippometacell

Indeed, as you saw, what PyEcore does in the case you add multiple elements to a collection, it checks at runtime if each element as the right type in order to ensure that the collection content will respect the structural constraints from the metamodel. Also, behind the scene, the collection tries to set opposite collections and containers.

I will check if there is a dirty trick I can use to improve performances in this case (I have an idea, but it's not very beautiful...)

EDIT> By the way, your value collection is an OrderedSet, meaning that the same value cannot be added twice to your collection. Is that really what you want in Geppetto?

@filippomc
Copy link
Author

Hi @aranega, thanks for your fast response.

That's really a good point, I was just thinking about the overhead but we're possibly losing data we definitely don't want to lose.

@aranega
Copy link
Member

aranega commented Mar 16, 2019

@filippometacell

Sorry for the delay, I had some thing to fix before I could work on the performance issue. I made an experimental modification on pyegeppetto in the experimental/no_check branch. You can try it and tell me if the performances are better. On my bench, the perfs are better, but you never know (I only tested with 10 millions values).

@filippomc
Copy link
Author

filippomc commented Mar 18, 2019

Hi @aranega, thanks for the patch

I'm trying make some tests but I'm getting the following error:

  File "/home/user/testnwb/dependencies/pygeppetto/pygeppetto/model/values/values.py", line 139, in no_check_eattribute_extend
    self.owner.notify(Notification(new=sublist,
NameError: name 'Notification' is not defined

also made a pip install after the checkout.
Any idea?

@aranega
Copy link
Member

aranega commented Mar 18, 2019

@filippometacell My bad... I forgot to add some imports when I copied the code... Sorry about that, it should be ok now in the same branch...

@filippomc
Copy link
Author

Hi @aranega, I ran some tests and it's working perfectly. Performances are much better and we are no more losing data in the timeseries. Will this be ported into pyecore?

@aranega
Copy link
Member

aranega commented Mar 19, 2019

@filippometacell Regarding the data losses in the series, it depends on the way the value meta-attribute is declared in the .ecore, for this one, I can perform the modification by hand in the .ecore and in the code (it's only one property to set to False). For the other, I will try to find a good way of integrated in PyEcore. At the moment, the solution I pushed just removes the container/opposite check as well as the runtime check. If I remove it for all elements, it could lead to models that does not conform to their metamodel anymore, so I should be careful.

On top of some performances improvement I will try to add, one thing I could do is to add either an annotation that you could place on some meta-attribute to manually disable some checks, or a context manager that could temporarily remove the checks.

I will try to add all of this quickly, but I need to do more profiling work on the check method that consumes a lot of time and that could probably be improved.

@tarelli
Copy link

tarelli commented Mar 19, 2019

Hey @aranega thanks for looking at this! Unique is shown as false in Eclipse for value, is this a problem of how PyEcore treats the default value for a property? cc @filippometacell
Screenshot 2019-03-19 at 12 47 08

@aranega
Copy link
Member

aranega commented Mar 20, 2019

@tarelli outch, actually, I checked, the .ecore is well defining the property as it should... I perhaps introduced this error manually at some point. I will propose a new pull request on master to correct it. At the same time, I will go again on each unique=false properties to be sure that they are well set in pygeppetto.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants