Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leakage in SVDpp #463

Open
amohar2 opened this issue Feb 24, 2023 · 1 comment
Open

Possible memory leakage in SVDpp #463

amohar2 opened this issue Feb 24, 2023 · 1 comment

Comments

@amohar2
Copy link

amohar2 commented Feb 24, 2023

Description

In my experiments, I have observed that multiple calls to SVDpp's fit() function would steadily increase the memory usage of the Python process that is running it.
To investigate this, I looked at matrix_factorization.pyx module for SVDpp's fit() call and seems that there is a possible memory leakage, where several "Iu" arrays are allocated using malloc for all users, however, at the end of fit(), the free() function is only called once.
Reference: https://github.com/NicolasHug/Surprise/blob/master/surprise/prediction_algorithms/matrix_factorization.pyx#L453-L458

https://github.com/NicolasHug/Surprise/blob/master/surprise/prediction_algorithms/matrix_factorization.pyx#L508-L514

This seems to be the case only when cache_ratings is False.

Steps/Code to Reproduce

This is part of a larger experiment that I am running so currently I do not have a simple reproducible example.

Versions

1.1.3

@NicolasHug
Copy link
Owner

Thanks for the report @amohar2 ,

I think you're right, looks like there are lots of unnecessary allocations.

            for u in range(trainset.n_users):
                # Might as well allocate the max size once and for all
                # instead of allocating the exact size each time
                max_Iu_length = max(max_Iu_length, len(trainset.ur[u]))
                Iu = <int *>malloc(max_Iu_length * sizeof(int))

should probably be instead

            for u in range(trainset.n_users):
                # Might as well allocate the max size once and for all
                # instead of allocating the exact size each time
                max_Iu_length = max(max_Iu_length, len(trainset.ur[u]))
            Iu = <int *>malloc(max_Iu_length * sizeof(int))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants