Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #34746: Add a size limit for variable size when pretty printing exception report #17809

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

keerthivasansa
Copy link

@keerthivasansa keerthivasansa commented Feb 2, 2024

  • Introduces a maximum variable size threshold (512KB) for pretty printing in exception reports, use the default pprint for variables smaller than the threshold
  • Uses repr for variables exceeding the size limit
  • Significantly reduces the time and memory used to create an exception report when a frame with a large variable raises an exception.

[UPDATE]

  • Seems like the sys.getsizeof call is not accurate. If we need accurate information, we can use some of the third party libraries to deeply evaluate the value of an object but that becomes really slow, I think it would be best to switch entirely to reprlib with indentation instead of pprint.
  • Please let me know your thoughts!

@keerthivasansa keerthivasansa changed the title fix: add a threshold limit for variable size in pretty printing excep… Fix #34746: Add a size limit for variable size when pretty printing exception report Feb 2, 2024
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! Thank you for your contribution 💪

As it's your first contribution be sure to check out the patch review checklist.

If you're fixing a ticket from Trac make sure to set the "Has patch" flag and include a link to this PR in the ticket!

If you have any design or process questions then you can ask in the Django forum.

Welcome aboard ⛵️!

@keerthivasansa
Copy link
Author

Please let me know if we can continue with this approach I will add the limits for other data structures as well if so!

@nessita
Copy link
Contributor

nessita commented Feb 2, 2024

Please let me know if we can continue with this approach I will add the limits for other data structures as well if so!

Hello! Thank you for your contribution. I checked the proposed code and it seems to be in the right direction.

In order for this PR to be listed as "needing review" in the Django Development Dashboard, please remember to set the proper Trac flags in the ticket as described in the PR checklist when this is ready for review. You may also want to assign the ticket to yourself!

factory = RequestFactory()

def test_large_sizable_object(self):
lg = list(range(50 * 1024 * 1024))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need 50 million elements?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just trying to check the efficiency of library's able to handle extreme sizes of objects and rendering them in a reasonable time. 😅What could be a more optimal value?

@keerthivasansa
Copy link
Author

Hey @nessita, thanks for the go ahead!
One of the problems I found is the old tests seem to be relying on some of behaviours with pprint that are not the same with reprlib. For eg, rprlib doesn't raise an exception when there is one and instead just returns a generic string with the class name of the object and ID so a few of the test cases are failing right now.

I am assuming it's best to just retain old behavior so I'm trying to work around that.

Another thing I wanted to change was the trim message.
If a string of length 6000 is trimmed to fit 4096 limit, the message still says: <trimmed 6000 byte string>

isn't it better to say:
..1004 more items or something like that basically the count indicates the remaining elements instead of the total count. Or should I retain the current behaviour?


if isinstance(v, Sized) and len(v) > 4096:
diff = len(v) - 4096
trim_msg = "...<trimmed %d bytes string>" % diff
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it to show the length that was cut instead of the total length, I can change it back if the old behaviour (total count) is preferred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider having both? Something like:

f"...<trimmed to {cutoff} bytes, {reminder} bytes not shown>"

@keerthivasansa
Copy link
Author

@felixxm I think the PR is ready for review!

@nessita
Copy link
Contributor

nessita commented Feb 9, 2024

@felixxm I think the PR is ready for review!

Hey @keerthivasansa, thank you again for your contribution!

There is no need to mention people directly when a PR is ready for a re-review, reviewers will get to this when they have some availability. As I posted in my previous comment and in the ticket, the proper procedure to get a re-review is described in these docs.
You need to unset the relevant ticket flags so this PR gets added back to the "patches needing review" section of the Django Developer Dahsboard .

Let me know if you have any question!

@keerthivasansa
Copy link
Author

Gotcha, thanks!
Sorry this is my first time using Django Trac system so I don't know how to operate it. Hopefully it will be smoother in future contributions.

I have unset the "needs test" and "needs improvement" flags assuming that's what I have to do.

@nessita
Copy link
Contributor

nessita commented Feb 22, 2024

Hey @nessita, thanks for the go ahead! One of the problems I found is the old tests seem to be relying on some of behaviours with pprint that are not the same with reprlib. For eg, rprlib doesn't raise an exception when there is one and instead just returns a generic string with the class name of the object and ID so a few of the test cases are failing right now.

I am assuming it's best to just retain old behavior so I'm trying to work around that.

Another thing I wanted to change was the trim message. If a string of length 6000 is trimmed to fit 4096 limit, the message still says: <trimmed 6000 byte string>

isn't it better to say: ..1004 more items or something like that basically the count indicates the remaining elements instead of the total count. Or should I retain the current behaviour?

Hello @keerthivasansa!

I'm starting to review this today so I will provide answers/advice as I go thru it.

Copy link
Contributor

@nessita nessita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @keerthivasansa for this work! 🥳

Below an initial set of comments:

  1. I'm inclined to have the custom "repr" functionality inside the existing text modules in utils.
  2. In any case, this PR should provide sufficient tests for the newly added functionality (DjangoRepr new class).
  3. I would rename DjangoRepr to something more explicit and specific such as DebugRepr, SizeLimitedRepr, or maybe SafeRepr.
  4. What's your view on using this as well for the debug template tag?

django/utils/repr.py Outdated Show resolved Hide resolved
django/utils/repr.py Outdated Show resolved Hide resolved
tests/view_tests/tests/test_debug.py Outdated Show resolved Hide resolved
tests/view_tests/tests/test_debug.py Outdated Show resolved Hide resolved
tests/view_tests/tests/test_debug.py Show resolved Hide resolved
d = reporter.get_traceback_data()
vars = d["lastframe"]["vars"]

for k, v in vars:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we don't (can't!) test Python, we assume Python and its libraries to be correct. So we can certainly use reprlib in the tests to assert our desired results.

What we should test is that our choosing of max limits and related are correct. In fact, we should most definitely provide tests for our custom Repr class. Once we ensure that our Repr class is correct, we could also use it to write these tests.

Does that make sense? (I can expand if not, no worries, let me know)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding tests to check the custom class on common data structures maybe?

trimmed_dict = self.repr_instance.print(long_dict)
self.assertIn("trimmed 100 bytes string", trimmed_dict)


Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nessita Are these the kind of tests you expected for testing this class? I can create more tests for data structures if so

@keerthivasansa
Copy link
Author

I just thought of something else. If we use reprlib for collections, it will only limit the number of elements. but let's say we have a list of strings where each element is 4096 characters long. It will still print 4096*4096 characters then. We don't want this behaviour, right? @nessita

@nessita
Copy link
Contributor

nessita commented May 31, 2024

@keerthivasansa Thank you for your patience. We've been busy with the Django 5.1 Alpha release but I'll go back to this PR in a few weeks.

In the meantime, could you please update your work with the latest changes from main? A passing test suite would also help!

/remind me about this PR in 3 weeks

Copy link

@nessita set a reminder for 6/21/2024

@nessita nessita self-requested a review May 31, 2024 20:33
Copy link

👋 @nessita, about this PR

@github-actions github-actions bot removed the reminder label Jun 21, 2024
@keerthivasansa
Copy link
Author

keerthivasansa commented Jun 22, 2024

@nessita Cool, i'll merge with the latest changes and fix the test cases now!

@keerthivasansa
Copy link
Author

@nessita The tests were failing because the implementation of reprlib differed between Python 3.11 and 3.10, there was no way to customize the fillvalue in Python 3.10 reprlib. So I have copied the latest reprlib implementation, customized it to work for our case. Does that sound good? But unfortunately it adds a lot of complexity and burden to the code I guess :(, let me know if you have better ideas!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants