Skip to content

Conversation

@PapaPedro
Copy link
Contributor

Creating schema in lxml can be a performance issue if we always recreate
a schema instead of reusing it.

Unfortunately this code is cython and does not appear in the flame
graph.

Tested with a local script.

Issue #, if available:

Description of changes:
This change searches for etree.XMLSchema in the source code
line to add a synthetic frame for it to appear in the graph.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

line = linecache.getline(frame.f_code.co_filename, line_no).strip()
if "sleep(" in line:
result.append(TIME_SLEEP_FRAME)
elif "etree.XMLSchema" in line:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. should it be etree.XMLSchema( as we include the open bracket in sleep(
  2. we would miss this if user had:

from etree import XMLSchema

a = XMLSchema()

Should we cover that case too and simply use XMLSchema? How likely would be gives us false positive in matching line like ComplexXMLSchema (<-- if that exists)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the parenthesis, will add it.

Yes you are right about missing out on from etree import XMLSchema but searching for XMLSchema could give many false positive and according to what I have seen in existing code I believe most people would do etree.XMLSchema as provided in the online documentation. I prefer to miss out on a few cases rather have false positives which could be very confusing.

Creating schema in lxml can be a performance issue if we always recreate
a schema instead of reusing it.

Unfortunately this code is cython and does not appear in the flame
graph, this change searches for `etree.XMLSchema(` in the source code
line to add a synthetic frame for it to appear in the graph.

Tested with a local script.
@PapaPedro PapaPedro force-pushed the lxml_synthetic_frame branch from 87092e1 to 79bbafe Compare March 1, 2021 18:14
TRUNCATED_FRAME = Frame(name="<Truncated>")

TIME_SLEEP_FRAME = Frame(name="<Sleep>")
LXML_SCHEMA_FRAME = Frame(name="lxml.etree:XMLSchema:__init__")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Should we follow the format of <...> for synthetic frame? For example, lxml.etree:XMLSchema:__init__ here?

Copy link
Contributor Author

@PapaPedro PapaPedro Mar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrary to the <Sleep>, I want this frame to remain in the flamegraph for customers to see it so I have put the frame as it would appear normally in a stack trace (plus the ":XMLSchema:" class name that we usually add). I considered creating a Frame object with all the different attributes but that means I have to build a fake file name that would give the appropriate result once serialized, I found it simpler and clearer to directly put the frame as I want the agent to report it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. We could always review this later :D.

@gimki gimki merged commit 6ed00bd into main Mar 2, 2021
@PapaPedro PapaPedro deleted the lxml_synthetic_frame branch March 5, 2021 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants