Performance issue with big datasets #57

filippomc · 2019-03-13T14:26:46Z

Hi,

we're facing a performance issue with our application based on PyEcore. After some profiling it seems that there is a big overhead when loading a timeseries into an object.

Here the generated code:

class TimeSeries(Value):
    scalingFactor = EAttribute(eType=EInt)
    value = EAttribute(eType=EDouble, upper=-1)
    unit = EReference(containment=True)

    def __init__(self, unit=None, scalingFactor=None, value=None, **kwargs):
        super(TimeSeries, self).__init__(**kwargs)
        if scalingFactor is not None:
            self.scalingFactor = scalingFactor
        if value:
            self.value.extend(value)
        if unit is not None:
            self.unit = unit

value will be a big list of floats (we can't pass a numpy array because of if value but thats another minor issue).
When we're calling the extend function we're in fact triggering a logic which is adding the elements of the list one by one to an OrderedSet and also doing checks on every element.

One possible solution i was figuring out to handle this is to replace that
self.value.extend(value)
with
self.value.items = value
although I don't know if I'm missing something important on the logic behind.

Thanks

The text was updated successfully, but these errors were encountered:

aranega · 2019-03-13T15:02:29Z

Hi @filippometacell

Indeed, as you saw, what PyEcore does in the case you add multiple elements to a collection, it checks at runtime if each element as the right type in order to ensure that the collection content will respect the structural constraints from the metamodel. Also, behind the scene, the collection tries to set opposite collections and containers.

I will check if there is a dirty trick I can use to improve performances in this case (I have an idea, but it's not very beautiful...)

EDIT> By the way, your value collection is an OrderedSet, meaning that the same value cannot be added twice to your collection. Is that really what you want in Geppetto?

filippomc · 2019-03-13T15:34:40Z

Hi @aranega, thanks for your fast response.

That's really a good point, I was just thinking about the overhead but we're possibly losing data we definitely don't want to lose.

aranega · 2019-03-16T21:37:57Z

@filippometacell

Sorry for the delay, I had some thing to fix before I could work on the performance issue. I made an experimental modification on pyegeppetto in the experimental/no_check branch. You can try it and tell me if the performances are better. On my bench, the perfs are better, but you never know (I only tested with 10 millions values).

filippomc · 2019-03-18T15:03:13Z

Hi @aranega, thanks for the patch

I'm trying make some tests but I'm getting the following error:

  File "/home/user/testnwb/dependencies/pygeppetto/pygeppetto/model/values/values.py", line 139, in no_check_eattribute_extend
    self.owner.notify(Notification(new=sublist,
NameError: name 'Notification' is not defined

also made a pip install after the checkout.
Any idea?

aranega · 2019-03-18T16:20:42Z

@filippometacell My bad... I forgot to add some imports when I copied the code... Sorry about that, it should be ok now in the same branch...

filippomc · 2019-03-18T19:01:39Z

Hi @aranega, I ran some tests and it's working perfectly. Performances are much better and we are no more losing data in the timeseries. Will this be ported into pyecore?

aranega · 2019-03-19T13:26:54Z

@filippometacell Regarding the data losses in the series, it depends on the way the value meta-attribute is declared in the .ecore, for this one, I can perform the modification by hand in the .ecore and in the code (it's only one property to set to False). For the other, I will try to find a good way of integrated in PyEcore. At the moment, the solution I pushed just removes the container/opposite check as well as the runtime check. If I remove it for all elements, it could lead to models that does not conform to their metamodel anymore, so I should be careful.

On top of some performances improvement I will try to add, one thing I could do is to add either an annotation that you could place on some meta-attribute to manually disable some checks, or a context manager that could temporarily remove the checks.

I will try to add all of this quickly, but I need to do more profiling work on the check method that consumes a lot of time and that could probably be improved.

tarelli · 2019-03-19T16:48:47Z

Hey @aranega thanks for looking at this! Unique is shown as false in Eclipse for value, is this a problem of how PyEcore treats the default value for a property? cc @filippometacell

aranega · 2019-03-20T09:35:41Z

@tarelli outch, actually, I checked, the .ecore is well defining the property as it should... I perhaps introduced this error manually at some point. I will propose a new pull request on master to correct it. At the same time, I will go again on each unique=false properties to be sure that they are well set in pygeppetto.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue with big datasets #57

Performance issue with big datasets #57

filippomc commented Mar 13, 2019 •

edited

aranega commented Mar 13, 2019 •

edited

filippomc commented Mar 13, 2019

aranega commented Mar 16, 2019

filippomc commented Mar 18, 2019 •

edited

aranega commented Mar 18, 2019

filippomc commented Mar 18, 2019

aranega commented Mar 19, 2019

tarelli commented Mar 19, 2019

aranega commented Mar 20, 2019

Performance issue with big datasets #57

Performance issue with big datasets #57

Comments

filippomc commented Mar 13, 2019 • edited

aranega commented Mar 13, 2019 • edited

filippomc commented Mar 13, 2019

aranega commented Mar 16, 2019

filippomc commented Mar 18, 2019 • edited

aranega commented Mar 18, 2019

filippomc commented Mar 18, 2019

aranega commented Mar 19, 2019

tarelli commented Mar 19, 2019

aranega commented Mar 20, 2019

filippomc commented Mar 13, 2019 •

edited

aranega commented Mar 13, 2019 •

edited

filippomc commented Mar 18, 2019 •

edited