Conversation
… enforcement - Add compile_sandboxed() for namespace-restricted exec (blocks open, exec, eval, __import__) - Add run_with_timeout() using ThreadPoolExecutor for wall-clock timeout (default 2s) - Integrate sandbox into FeatureDiscovery._compile_function and _verify_function - Integrate sandbox into RelationalShredder._compile_function and _verify_function - Add timeout_s constructor parameter to both modules - Add 15 sandbox unit tests and update existing module tests
…ceful fallback - Extend LocleanCache with code_cache table and get_code/set_code methods - Add compute_code_key() for deterministic SHA256 hash from structural metadata - Integrate cache hit/miss into FeatureDiscovery.discover() and RelationalShredder.shred() - Replace RuntimeError with graceful fallback on exhausted retries - discover() returns unmodified DataFrame - shred() returns empty dict - Add 12 cache-specific tests covering key generation, roundtrip, and fallback
…d wrapper methods - Switch extraction/__init__.py to lazy __getattr__ imports for advanced modules - Add _resolve_engine() helper replacing 9 duplicate engine-creation blocks - Add 6 wrapper methods to Loclean class (clean, resolve_entities, oversample, shred_to_relations, discover_features, validate_quality) - Add 9 unit tests for wrapper methods and engine resolution
- EntityResolver: LLM-driven entity resolution with fuzzy matching - Oversampler: SMOTE-based minority class oversampling via LLM - Add comprehensive unit tests for both modules
d870207 to
52690ec
Compare
- PipelineOrchestrator: configurable multi-step data cleaning pipeline - QualityGate: statistical data quality validation with threshold checks - Add comprehensive unit tests for both modules
52690ec to
4bc7924
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements strict execution security, compilation caching, and architectural improvements for the dynamic code generation modules.
Execution Sandbox
compile_sandboxed()replaces rawexecwith restricted__builtins__(blocksopen,exec,eval,__import__)run_with_timeout()enforces 2s wall-clock timeout per function call viaThreadPoolExecutorFeatureDiscoveryandRelationalShredderCompilation Cache
LocleanCachewithcode_cachetable andget_code/set_codemethodscolumns + dtypes + target_col + module_prefix)Graceful Fallback
RuntimeErrorUnified Client Refactor
__getattr__imports inextraction/__init__.py_resolve_engine()helper replaces 9 duplicate engine-creation blocksLocleanwrapper methods (clean,resolve_entities,oversample, etc.)New Modules
EntityResolver: LLM-driven entity resolutionOversampler: SMOTE-based minority class oversamplingPipelineOrchestrator: configurable multi-step cleaning pipelineQualityGate: statistical data quality validationTesting
ruff format,ruff check,mypy