Remove nasty characters from xml before report is parsed by shenek · Pull Request #3 · beda42/pycounter

shenek · 2020-08-18T13:15:04Z

No description provided.

beda42

Looks good. Just one comment about safety - could you please check it? Maybe a test would be called for.

beda42 · 2020-08-27T13:14:01Z

+    # try to remove nasty characters from xml
+    raw_converted = "".join(
+        map(
+            lambda ch: ch if ch.isprintable() else " ",


.isprintable() seems like the correct method to use here, but I would be afraid of messing up something legitimate. Have you tried a few existing XMLs to see what gets replaced? I know it is extra work, but I think that it could prove useful in the long run.

beda42 · 2020-08-27T13:15:46Z

    # Missing some mandatory field to extract data ->
    # exit right away
-    if not c_report or not hasattr(c_report, "Customer") or not hasattr(c_report.Customer, "ReportItems"):
+    if (


This looks like 'blacking', are you sure we want to reformat external library code? Or is there some change I missed?

Actually this change fixes it. I manage to push a commit without a proper 'blacking'.

beda42 · 2020-08-27T14:12:44Z

Well, I made some tests and .isprintable() may not be what we are looking for - for example a TAB or a EOL return False. Even though it might not be such a problem for our type of data, it is still something I would rather not do.

beda42 · 2020-08-27T14:15:48Z

It seems there is a good solution here https://stackoverflow.com/questions/1707890/fast-way-to-filter-illegal-xml-unicode-chars-in-python It is based on characters really disallowed in XML by the standard.

Remove nasty characters from xml before report is parsed

5cfd624

shenek requested a review from beda42 August 27, 2020 08:50

shenek self-assigned this Aug 27, 2020

shenek added the bug Something isn't working label Aug 27, 2020

beda42 reviewed Aug 27, 2020

View reviewed changes

shenek force-pushed the 2.x branch from 6cc760f to fa5d03a Compare November 16, 2020 10:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove nasty characters from xml before report is parsed#3

Remove nasty characters from xml before report is parsed#3
shenek wants to merge 1 commit into2.xfrom
nasty-characters-in-xml

shenek commented Aug 18, 2020

Uh oh!

beda42 left a comment

Uh oh!

beda42 Aug 27, 2020

Uh oh!

beda42 Aug 27, 2020

Uh oh!

shenek Aug 27, 2020

Uh oh!

beda42 commented Aug 27, 2020

Uh oh!

beda42 commented Aug 27, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shenek commented Aug 18, 2020

Uh oh!

beda42 left a comment

Choose a reason for hiding this comment

Uh oh!

beda42 Aug 27, 2020

Choose a reason for hiding this comment

Uh oh!

beda42 Aug 27, 2020

Choose a reason for hiding this comment

Uh oh!

shenek Aug 27, 2020

Choose a reason for hiding this comment

Uh oh!

beda42 commented Aug 27, 2020

Uh oh!

beda42 commented Aug 27, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants