Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Munin incorrectly writes semicolon-delimited files instead of CSVs #55

Closed
graememeyer opened this issue Feb 25, 2022 · 3 comments
Closed

Comments

@graememeyer
Copy link

It appears the latest version of munin is writing output files delimited by semicolon characters (;) rather than commas (,), even when the -o option is applied.

Example:

PS > python .\munin\munin.py -f .\munin\munin-demo.txt -o test.csv
   _________   _    _   ______  _____  ______
  | | | | | \ | |  | | | |  \ \  | |  | |  \ \     /.)
  | | | | | | | |  | | | |  | |  | |  | |  | |    /)\|
  |_| |_| |_| \_|__|_| |_|  |_| _|_|_ |_|  |_|   // /
                                                /'" "
  Online Hash Checker for Virustotal and Other Services
  Florian Roth - 0.21.0 June 2021

[+] 51611 cache entries read from cache database: vt-hash-db.json
[+] You can interrupt the process by pressing CTRL+C without losing the already gathered information
[+] Writing results to new file: test.csv
[+] Processing 22 lines ...

 1 / 22 > Clean
HASH: 1093B3F7D016C0E03CD0DB36D74BA09673A7BB03 COMMENT: bravo.wav
TYPE: WAV SIZE: 7.4 KB FILENAMES: bravo.wav, kogesrtg9.dll, 1s2rwn5t7.dll, file-5582314_wav
FIRST: 2013-06-12 17:27:52 LAST: 2016-07-28 16:50:34 SUBMISSIONS: 2 REPUTATION: 0
COMMENTS: 0 USERS: - TAGS: WAV KNOWN-DISTRIBUTOR
RESULT: 0 / 54


...


 22 / 22 > Clean
HASH: 61b6f3b3407dad1e10ee80684e945e28d21adbeec002548bcaba9a3bc6ffd244 COMMENT: EXE_Susp_Cmds /subfile
TYPE: Win32 EXE SIZE: 9.1 MB FILENAMES: MaypleHD Player, MaypleMp4Installer.exe, MaypleMp4Installer-5.2.0.2.exe
SIGNER: (); Thawte Code Signing CA - G2; thawte COPYRIGHT: Yozii Inc. All rights reserved. DESCRIPTION: MaypleHD Player Install Program
FIRST: 2016-10-19 08:03:48 LAST: 2018-02-27 08:02:11 SUBMISSIONS: 6 REPUTATION: -48
COMMENTS: 1 USERS: dviz TAGS: PEEXE OVERLAY REVOKED-CERT SIGNED NSIS INVALID-SIGNATURE
RESULT: 0 / 67

[+] Results written to file test.csv

[+] Saving 51633 cache entries to file vt-hash-db.json

Output:

PS > Get-Content .\test.csv -First 3
Lookup Hash;Rating;Comment;Positives;Virus;File Names;First Submitted;Last Submitted;File Type;MD5;SHA1;SHA256;Imphash;Matching Rule;Harmless;Revoked;Expired;Trusted;Signed;Signer;Hybrid Analysis Sample;MalShare Sample;VirusBay Sample;MISP;MISP Events;URLhaus;AnyRun;CAPE;VALHALLA;User Comments;Microsoft;Kaspersky;McAfee;CrowdStrike;TrendMicro;ESET-NOD32;Symantec;F-Secure;Sophos;GData;
1093B3F7D016C0E03CD0DB36D74BA09673A7BB03;clean;bravo.wav;0;-;bravo.wav, kogesrtg9.dll, 1s2rwn5t7.dll, file-5582314_wav;2013-06-12 17:27:52;2016-07-28 16:50:34;WAV;deb660600362263bf2cbd8975d23f3c5;1093b3f7d016c0e03cd0db36d74ba09673a7bb03;8dc215954c3f54574aacaa26981e26dfcf4c03de65bbd4bc9e37eb3265289087;-;False;False;False;False;False;False;-;False;False;False;False;;False;False;False;[];['-'];-;-;-;-;-;-;-;-;-;-;
13AEF2CCC4E45B7B8F440F0FDB7D3FBC;clean;ttf;0;-;LinBiolinum_Rah.ttf;2013-10-20 06:23:10;2018-12-24 08:40:26;TrueType Font;13aef2ccc4e45b7b8f440f0fdb7d3fbc;73119c2f63274fd0825c53ec639511ae2f1601ce;f7140084369db686c71e522f0e8de148f0f3f429310376d5f52325a9f0955ba5;-;False;False;False;False;False;False;-;False;False;False;False;;False;False;False;[];['-'];-;-;-;-;-;-;-;-;-;-;

I feel like this issue is too obvious to have gone unnoticed, so perhaps it's intentional? If so, the documentation should be updated to reflect this, and ideally an actual CSV option added. I am happy to contribute this if you can confirm my findings and the intentionality of the issue.

@graememeyer graememeyer changed the title Munin appears to incorrectly write semicolon-delimited files instead of CSVs Munin incorrectly writes semicolon-delimited files instead of CSVs Feb 25, 2022
@Neo23x0
Copy link
Owner

Neo23x0 commented Feb 26, 2022

It's a semicolon - as intended. The README doesn't say that the CSV uses comma as a separator.

Screenshot 2022-02-26 163052

You could add an option that allows a user to define a separator of his choice.

@graememeyer
Copy link
Author

graememeyer commented Mar 9, 2022

@Neo23x0 I've been playing around with the code a bit - is it intended functionality to end all lines with the delimiter? (I'm not aware that that's common CSV definition either).

Currently the header line for example comes out:

Lookup Hash;Rating;Comment;Positives;Virus;File Names;First Submitted;Last Submitted;File Type;MD5;SHA1;SHA256;Imphash;Matching Rule;Harmless;Revoked;Expired;Trusted;Signed;Signer;Hybrid Analysis Sample;MalShare Sample;VirusBay Sample;MISP;MISP Events;URLhaus;AnyRun;CAPE;VALHALLA;User Comments;Microsoft;Kaspersky;McAfee;CrowdStrike;TrendMicro;ESET-NOD32;Symantec;F-Secure;Sophos;GData;

Edit: I ask because I would "fix" this in a PR that adds CSV functionality (where the C=comma), but if it's intended functionality, that would potentially break some existing parsers.

I'm also considering adding an option for quoted CSVs (Excel compatible with character escaping) - is this something you're interested in?

Edit 2:
What I'm thinking is:

  • Add a delimiter character argument option and set it to ";" character by default
  • Remove the trailing delimiter character from the output
  • Submit that as a PR
  • Maybe work on quoted CSV option for another time/later

@anotherbridge
Copy link

@graememeyer @Neo23x0 Due to the delimiter being a semicolon the CSV is not valid, because some of the column values contain semicola themselves, thus giving pretending to be more columns than there should be.

I fixed this in PR #66. Since PR #59 is also tackling this problem, I did not quote the column.

Further, in my assumption the trailing delimiters arise due the fact that each element is added with a dedicated write to the file. Since then you don't know which one is the last element added you have to add the ; to avoid any issues.
I handled this by adding each column as a string that is appended to a list containing the content of one line. Instead of writing each element by itself, in the end the list is joined with the delimiter and then written to the file.

@graememeyer graememeyer closed this as not planned Won't fix, can't repro, duplicate, stale Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants