merge queue: embarking main (bd84315) and #135 together#136
Closed
mergify[bot] wants to merge 23 commits intomainfrom
Closed
merge queue: embarking main (bd84315) and #135 together#136mergify[bot] wants to merge 23 commits intomainfrom
mergify[bot] wants to merge 23 commits intomainfrom
Conversation
Centralize CI commands in justfile as single source of truth: - Replace raw cargo commands with just recipes in test, test-cross-platform, coverage, security, and docs workflows - Fix cargo-dist issue in security workflow by using `dist plan` via mise exec instead of `cargo dist check` (subcommand not found) - Add --all-features to test-ci, build-release, and coverage recipes - Add outdated recipe for cargo-outdated checks - Fix dist-check recipe to use `dist plan` (check subcommand does not exist in cargo-dist 0.30.3) Quality and MSRV jobs unchanged (use dtolnay Rust, not mise). Signed-off-by: Kevin Melton <kevin@evilbitlabs.io> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Move the ~400-line `impl StringExtractor for BasicExtractor` block from mod.rs into its own file, reducing mod.rs from 1038 to 630 lines. The new basic_extractor.rs (421 lines) contains the core extraction engine that orchestrates ASCII, UTF-8, and UTF-16 string extraction across binary sections. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Convert the 702-line filters.rs into a module directory with three focused files: - mod.rs (193 lines): FilterContext, NoiseFilter trait, CompositeNoiseFilter - implementations.rs (394 lines): all 6 filter structs and NoiseFilter impls - tests.rs (131 lines): all test code All public re-exports remain unchanged so the API is not affected. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Convert src/extraction/ascii.rs (832 lines) into a module directory with three focused files: - mod.rs (147 lines): AsciiExtractionConfig, is_printable_ascii(), re-exports - extraction.rs (316 lines): extract_ascii_strings(), extract_from_section() - tests.rs (378 lines): all unit tests All files are under the 500-line limit. All 473 tests pass. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Convert the 1,273-line src/extraction/utf16.rs into a module directory with six focused files, each under 500 lines: - mod.rs (122 lines): ByteOrder enum, extract_utf16_strings orchestrator, re-exports - config.rs (97 lines): Utf16ExtractionConfig struct and constructors - validation.rs (281 lines): UTF-16 sequence validation, printability checks, null patterns - confidence.rs (201 lines): Confidence scoring heuristics (ASCII ratio, printable ratio, etc.) - extraction.rs (380 lines): Core extraction functions and section-aware extraction - tests.rs (238 lines): All unit tests All public exports remain identical. No functional changes. Signed-off-by: Kyle Melton <kyle@evilbitlabs.io> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Convert the 838-line dedup.rs into a module directory with three files: - mod.rs (237 lines): CanonicalString, StringOccurrence structs, deduplicate(), merge_tags(), found_string_to_occurrence(), CanonicalString::to_found_string() - scoring.rs (56 lines): calculate_combined_score() - tests.rs (555 lines): All test code Public API and re-exports remain unchanged. Signed-off-by: Kevin Melton <kmelton@evilbitlabs.io> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Convert the largest file (1,449 lines) into a module directory with six focused files, each under 500 lines: - mod.rs (230): constants, entry points, orchestrator, decode_utf16le - detection.rs (208): detect_version_info, detect_string_tables, detect_manifests - version_info.rs (76): extract_version_info_strings - string_tables.rs (200): parse_string_table_block, extract_string_table_strings - manifests.rs (201): detect_manifest_encoding, decode_manifest, extract_manifest_strings - tests.rs (594): all test code All 473 tests pass. Public API unchanged. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Document the CI workflow patterns (dtolnay vs mise, just recipes as single source of truth) and Mergify behavior in AGENTS.md. Add the extraction refactor and CLI implementation plan for project history. Signed-off-by: Keith Melton <kmelton@evilbitlabs.io> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Fix panic in CanonicalString::to_found_string() on empty occurrences (now returns Option<FoundString>) - Add #[allow] justification comments per project rules (4 files) - Remove redundant empty checks in confidence.rs and validation.rs - Strengthen test assertions: replace conditional passes with asserts, fix unused result, fix tautological assertion, implement empty test - Add max_length validation to ExtractionConfig::validate() - Improve CLI tests: check stderr on error, verify min_length filtering - Clarify AGENTS.md CI Architecture wording Signed-off-by: Keith Melton <kmelton@evilbitlabs.io> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Fix UTF-16 length calculation in version_info.rs and string_tables.rs: use encode_utf16().count() * 2 instead of text.len() for source bytes - Fix missing trailing newline in CLI output for table/JSON formatters - Remove unnecessary clone in basic_extractor.rs canonical string path - Use encoding from map key in dedup instead of re-indexing found_strings - Implement empty test bodies for string_tables and manifests detection - Assert unused results in PE resource tests instead of discarding - Add max_length vs min_ascii_length/min_wide_length validation Signed-off-by: Keith Melton <kmelton@evilbitlabs.io> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Validate min_length at CLI level (clap range 1..) and call config.validate() before extraction - Set min_wide_length from CLI --min-length flag for consistent filtering across all encodings - Assert default CLI run succeeds before comparing output in tests Signed-off-by: Keith Melton <kmelton@evilbitlabs.io> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Use usize directly for min_length CLI field instead of u64 with cast. config.validate() handles the >= 1 constraint. Signed-off-by: Keith Melton <kmelton@evilbitlabs.io> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Fix STRINGTABLE truncated entry bug: break instead of continue when byte_length exceeds remaining buffer, preventing misaligned reads - Fix stale doc paths after refactoring (ascii/mod.rs, pe_resources/mod.rs) - Rename test module internal_tests -> tests (project convention) - Rename misleading test (no_translation -> fallback_language_handling) - Replace no-op match with actual type assertion in metadata validation - Split pe_resources/tests.rs (632 lines) into tests/ directory module - Split dedup/tests.rs (560 lines) into tests/ directory module - Add test for STRINGTABLE truncated entry parsing behavior Signed-off-by: Keith Melton <kmelton@evilbitlabs.io> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Address 8 pre-existing design issues surfaced during bot review of PR #135: 1. Remove dual encoding fields: delete `encodings` from ExtractionConfig, keep only `enabled_encodings`. Eliminates confusing OR-checks throughout basic_extractor.rs. 2. Extract shared logic from extract/extract_canonical: factor out the duplicated section-sorting, filtering, symbol-inclusion, and semantic enrichment pipeline into a private `collect_all_strings()` helper. 3. Fix O(n^2) dedup in Auto byte order mode: replace `strings.iter_mut().find()` with HashMap-based O(1) lookup in utf16/mod.rs. 4. Remove unused `_preserve_all_occurrences` parameter from `deduplicate()` signature, ExtractionConfig struct, and all callsites. 5. Remove broken byte order consistency heuristic: the `check_byte_order_consistency()` function operated on decoded u16 values which are identical regardless of original byte order, making LE always score 1.0 and BE always 0.0. Redistribute the 0.1 weight to valid_unicode (0.35) and printable (0.45) heuristics. 6. Fix char_count semantics for surrogate pairs: change `char_count += 1` to `char_count += consumed_units` in utf16/extraction.rs so min/max length checks count UTF-16 code units, not scalar values. 7. Replace `Result<_, ()>` with `StringyError` in UTF-16 decode functions: use `StringyError::ParseError` for the 3 decode functions in utf16/extraction.rs, removing `#[allow(clippy::result_unit_err)]`. 8. Add noncharacter validation in surrogate-pair path: reject U+xFFFE and U+xFFFF in supplementary planes (planes 1-16) in utf16/validation.rs, matching the BMP noncharacter check. Signed-off-by: Kevin Melton <kmelton@evilbitlabs.io> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Add .github/actionlint.yml config to suppress lint errors from release.yml, which is auto-generated by cargo-dist and cannot be hand-edited. Signed-off-by: Kevin Melton <kmelton@evilbitlabs.io> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
The cspell tool was configured but never installed in mise, so lint-spell always failed. The project's domain-heavy jargon (binary analysis terms, hex patterns) required 70+ exception words, making the maintenance cost outweigh the marginal value. Typos in identifiers are caught at compile time, and documentation quality is covered by CodeRabbit reviews. Signed-off-by: Kevin Melton <kmelton@evilbitlabs.io> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
4 tasks
Contributor
Author
🧪 CI InsightsHere's what we observed from your CI run for fd99943. 🟢 All jobs passed!But CI Insights is watching 👀 |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎉 This pull request has been checked successfully and will be merged soon. 🎉
Branch main (bd84315) and #135 are embarked together for merge.
This pull request has been created by Mergify to speculatively check the mergeability of #135.
You don't need to do anything. Mergify will close this pull request automatically when it is complete.
Required conditions of queue
defaultfor merge:check-success = coveragecheck-success = msrv (stable minus 1 releases)check-success = msrv (stable minus 2 releases)check-success = msrv (stable minus 3 releases)check-success = msrv (stable minus 4 releases)check-success = msrv (stable)check-success = qualitycheck-success = testcheck-success = test-cross-platform (macos-latest, macOS)check-success = test-cross-platform (ubuntu-latest, Linux)check-success = test-cross-platform (windows-latest, Windows)check-success = coveragecheck-success = msrv (stable minus 1 releases)check-success = msrv (stable minus 2 releases)check-success = msrv (stable minus 3 releases)check-success = msrv (stable minus 4 releases)check-success = msrv (stable)check-success = qualitycheck-success = testcheck-success = test-cross-platform (macos-latest, macOS)check-success = test-cross-platform (ubuntu-latest, Linux)check-success = test-cross-platform (windows-latest, Windows)#commits-behind <= 10title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?:main]:check-neutral = Mergify Merge Protectionscheck-skipped = Mergify Merge Protectionscheck-success = Mergify Merge ProtectionsRequired conditions to stay in the queue:
check-success = coveragecheck-success = msrv (stable minus 1 releases)check-success = msrv (stable minus 2 releases)check-success = msrv (stable minus 3 releases)check-success = msrv (stable minus 4 releases)check-success = msrv (stable)check-success = qualitycheck-success = testcheck-success = test-cross-platform (macos-latest, macOS)check-success = test-cross-platform (ubuntu-latest, Linux)check-success = test-cross-platform (windows-latest, Windows)#commits-behind <= 10title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?:main]:check-success = Mergify Merge Protectionscheck-neutral = Mergify Merge Protectionscheck-skipped = Mergify Merge Protections