Fix regex in EventRecord class to prevent the removal of relevant data #124
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See #122 as reference.
Fix a wrong regex-replacement in the EventRecord class which removed relevant data from event records. The fix makes that this regex-replacement only remove the xmlns tag inside the tag but leaves other found xmlns-tags as they are, e.g. found in TaskContent. I guess at the time of the regex replacement in the code we missed that TaskContent could contain that pattern too (scheduled task for example) which than removes important data (like task command).
Fix
Replace
xmlns.+\"
regex with a more restricted regexxmlns.+\">
and replace found string with the ">" to close the tag correctly again. This prevents the removal of important event data when " is found inside the event data itself, like TaskContent for scheduled task events.Tests
I tested this new code against a lot of evtx files from a KAPE collection (195, all the standard evtx files from a production server) and everything's looks good, the only difference between the new and the old version is that the scheduled tasks events (updated, created, ...) have now the full TaskContent in the payload field in the CSV which was missing because of the wrong regex. Timeline Explorer shows the CSV correctly.
Issue described in detail
There are two xmlns values in some event log XML records (e.g task scheduler) and with the wrong currently used regex intended data is removed (first xmlns) but important data is removed too inside the TaskContent field which was never the intention. We removed the first xmlns correctly, but removed the second xmlns and much more too due to the used regex. We must prevent that.