Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added version 0.1.0 of LSBF dump utility #3

Merged
merged 2 commits into from
Dec 15, 2023

Conversation

mbeckerle
Copy link
Contributor

Uses GraalVM and sbt-native-image plugin to create a small fast-loading executable.

Per the README.md:

Creates a data dump at bits level for data that has dfdl:bitOrder="leastSignificantBitFirst".

For example:

01000110 01001100 01000101 01111111 | 0x00000000
00000000 00000001 00000001 00000010 | 0x00000004
00000000 00000000 00000000 00000000 | 0x00000008
00000000 00000000 00000000 00000000 | 0x0000000C
00000000 00111110 00000000 00000011 | 0x00000010
00000000 00000000 00000000 00000001 | 0x00000014

The address (in hex) is on the right (its display is optional). The bytes start on the right and increase moving left and downward.
The least significant bit of each byte is on the right (as people usually write numbers).

The purpose of this is for use with data where the bit positions are numbered from right to left. I.e.,
the first bit (position 0, or position 1 if using 1-based indexing) in each byte is the rightmost bit.

Per the helptext:

Usage: lsbfdump --file <filename> [--offset <offset>] [--length <numBytes>] [--noAddress] [--help]
      
      <filename>   : The file to read bytes from or '-' for standard input.
      [offset]     : The starting offset in the file (default is 0).
      [length]     : The number of bytes to display (default is 128).
      --noAddress  : Do not display the address of each byte line.
      --help       : Display this help information.
      
      Examples:
       Default usage (128 bytes from standard input, starting at offset 0, with addresses):
         lsbfdump --file -
      
       With specific file, offset and byte count:
         lsbfdump --file filename --offset 10 --length 64
     
       With --noAddress to hide addresses:
         lsbfdump --file filename --offset 10 --length 64 --noAddress

Copy link

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Looks good.

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. License should be fixed, everything else is a minor suggestion that can be ignored or done later.

I also wonder if there would be value in getting something like this merged into xxd or some other binary tool available on many linux distros. It would be nice to just have this capability on all linux machines. An added benefit is xxd can convert a hex dump back to a file, so it could be useful for creating LSBF files in addition to outputting them. Or maybe it's not worth the effort, which the assumption that VS Code extension will soon have this capability, plus the ability to decode strings/numbers as well.

lsbfdump/.gitignore Outdated Show resolved Hide resolved
lsbfdump/.gitignore Outdated Show resolved Hide resolved
lsbfdump/.gitignore Outdated Show resolved Hide resolved
lsbfdump/.gitignore Outdated Show resolved Hide resolved
lsbfdump/LICENSE Outdated Show resolved Hide resolved
Copy link
Contributor Author

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see added comment about the fact that I used ChatGPT4 when writing this.

lsbfdump/.gitignore Show resolved Hide resolved
@mbeckerle
Copy link
Contributor Author

I also wonder if there would be value in getting something like this merged into xxd or some other binary tool available on many linux distros.

Worth considering, but I fear we'd end up having to teach the whole world about leastSignificantBitFirst and its issues to get it accepted.

In any case that's an issue for another day.

@mbeckerle
Copy link
Contributor Author

Most recent commit addresses all the functional and LICENSE issues that @stevedlawrence found.

I will do a separate commit with proposed language about the ChatGPT4 content, notices, etc.

# Generative AI Statement:

This version contains code initially generated by Mike Beckerle (mbeckerle)
using: ChatGPT 4.0 on 2023-12-13 but subsequently further modified by hand.

ChatGPT 4.0 provided this statement with respect to originality of its contribution:

> There are no direct citations or sources for this code, as **it was not taken from
> copyrighted material or external sources.**
@mbeckerle
Copy link
Contributor Author

Please review the commit:

22a35f0

which contains only the generative-AI notice, in the README.md (which is repeated in the commit message)

I did not put "Generated By ChatGPT 4.0" because that's really overstating the contribution from that system.

@tuxji
Copy link

tuxji commented Dec 14, 2023

Please review the commit:

22a35f0

Reviewed, and looks good enough.

Copy link

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I'm not sure how much I trust ChatGPTs statement to what it sourced (I'm not convinced it even knows what it sourced or their associated licenses), but nothing here looks obviously copied or would have licensing restrictions

@mbeckerle
Copy link
Contributor Author

+1, I'm not sure how much I trust ChatGPTs statement to what it sourced (I'm not convinced it even knows what it sourced or their associated licenses), but nothing here looks obviously copied or would have licensing restrictions

I agree that I am more reassured by scanning the code it created, which at this point is pretty heavily modified.

I also agree, in that these LLMs really are just trying to make what they say "sound good" w.r.t. all the text they've read. The heuristic is if it sounds good, it's likely to be correct (and make users happy).

OTOH, comparable tools, like Google Bard, have grown attribution features, and I expect chatGPT will also at some point.

Ex: Check out the citations from this google bard chat: https://g.co/bard/share/a20037e8491a - the latest google bard (came out last week) is also looking pretty useful.

@mbeckerle mbeckerle merged commit e4a417e into apache:main Dec 15, 2023
@mbeckerle mbeckerle deleted the lsbfdump branch December 15, 2023 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants