Skip to content

Stop LLM Generated Code from Having Any Additional Imports#5879

Merged
christopherholland-workday merged 2 commits intomainfrom
csv-agent-injection
Mar 5, 2026
Merged

Stop LLM Generated Code from Having Any Additional Imports#5879
christopherholland-workday merged 2 commits intomainfrom
csv-agent-injection

Conversation

@christopherholland-workday
Copy link
Contributor

@christopherholland-workday christopherholland-workday commented Mar 2, 2026

Overview

Flowise-281 and Flowise-282

The previous python code sanitizer had flaws that would allow some imports to be part of LLM generated code. These updates harden the validations.

Testing

Manual tests

See Jira FLOWISE-281 for manual testing details.

Unit tests

Added new unit tests and ran them locally:

  PASS  test/pythonCodeValidator.test.ts (10.455 s)
  validatePythonCodeForDataFrame
    reported bypass: multi-name import with alias
      ✓ should block "import pandas as np, os as pandas" (the reported bypass) (3 ms)
      ✓ should block the combined bypass + exploitation snippet
    import bypass scenarios
      ✓ should block "import os"
      ✓ should block if import is stated more than once
      ✓ should block "import sys"
      ✓ should block "import subprocess"
      ✓ should block "import pandas" (redundant; already in prelude)
      ✓ should block "import numpy" (redundant; already in prelude) (1 ms)
      ✓ should block "import pandas as pd"
      ✓ should block "import pandas, os"
      ✓ should block "import numpy, subprocess"
      ✓ should block "from os import system"
      ✓ should block "from os import *"
      ✓ should block "from subprocess import Popen"
      ✓ should block import inside a function definition
      ✓ should block import inside a try/except block
    legitimate pandas and numpy code passes validation
      ✓ should allow simple column access (1 ms)
      ✓ should allow filtering
      ✓ should allow len(df)
      ✓ should allow groupby and aggregation
      ✓ should allow numpy operations using the "np" alias (provided by prelude)
      ✓ should allow pandas string methods
      ✓ should allow chained method calls
      ✓ should allow multi-line pandas code without imports
      ✓ should allow df.astype() (contains "as" but not the word "import")
      ✓ should allow df.describe()
    dangerous builtins are blocked
      ✓ should block eval() (1 ms)
      ✓ should block exec()
      ✓ should block __import__()
      ✓ should block open()
      ✓ should block globals()
      ✓ should block locals()
      ✓ should block getattr()
      ✓ should block setattr()
      ✓ should block compile()
      ✓ should block breakpoint()
    dangerous module access is blocked
      ✓ should block os. access
      ✓ should block subprocess. access
      ✓ should block sys. access (1 ms)
      ✓ should block socket. access
      ✓ should block urllib. access
      ✓ should block requests. access
    reflection dunder attributes are blocked
      ✓ should block __builtins__
      ✓ should block __subclasses__()
      ✓ should block __globals__
      ✓ should block __mro__
      ✓ should block __code__
      ✓ should block __closure__
    newly added patterns are blocked
      ✓ should block vars(df)
      ✓ should block vars() with no arguments
      ✓ should block dir(df)
      ✓ should block dir() with no arguments
      ✓ should block __dict__ attribute access
      ✓ should block __dict__ write (monkey-patching attempt)
      ✓ should block __module__ attribute access
      ✓ should block vars() used in a reflection chain
    edge cases and false positive prevention
      ✓ should return valid: true for empty string
      ✓ should allow "important" — word boundary prevents false match on "import"
      ✓ should allow df.dropna() — "os" inside "dropna" does not match \bos\.
      ✓ should return a reason string when rejected
      ✓ should return no reason when valid
      ✓ should handle multi-line valid code

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly strengthens the security of LLM-generated Python code by hardening the validation logic. It addresses previous flaws that allowed unauthorized imports, ensuring that only pre-approved modules like pandas and numpy are available. The changes also expand the detection of dangerous built-ins and reflection attributes, making the execution environment more secure.

Highlights

  • Enhanced Python Code Validation: The Python code validator has been significantly hardened to prevent LLM-generated code from introducing unauthorized imports and executing dangerous operations.
  • Strict Import Blocking: All import statements are now explicitly blocked in LLM-generated Python code, as pandas and numpy are pre-imported by the execution environment.
  • Expanded Forbidden Patterns: The validator now blocks additional dangerous built-ins and reflection attributes, including vars(), dir(), __dict__, and __module__, to further secure the execution sandbox.
  • Consistent Pre-imports: The AirtableAgent and CSVAgent components now consistently pre-import numpy alongside pandas in the Python execution context.
  • Comprehensive Unit Testing: A new, extensive suite of unit tests has been added to rigorously verify the robustness of the Python code validator against various bypass attempts and dangerous code patterns.
Changelog
  • packages/components/nodes/agents/AirtableAgent/AirtableAgent.ts
    • Added import numpy as np to the pre-imported Python code snippet.
  • packages/components/nodes/agents/CSVAgent/CSVAgent.ts
    • Added import numpy as np to the pre-imported Python code snippet.
  • packages/components/src/pythonCodeValidator.ts
    • Updated the import validation regex to block all import statements, clarifying that pandas and numpy are pre-imported by the executor.
    • Extended the FORBIDDEN_PATTERNS list to include vars(), dir(), __dict__, and __module__ for enhanced reflection attribute blocking.
  • packages/components/test/pythonCodeValidator.test.ts
    • Added a new test file containing extensive unit tests for validatePythonCodeForDataFrame, covering import bypasses, legitimate code, dangerous built-ins, module access, reflection dunder attributes, newly added patterns, and edge cases.
Activity
  • New unit tests were added and run locally, demonstrating comprehensive test coverage for the updated validation logic.
  • Manual testing details are referenced in Jira FLOWISE-281.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@christopherholland-workday christopherholland-workday marked this pull request as ready for review March 2, 2026 21:10
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the security of the Python code execution environment by tightening restrictions on LLM-generated code. The changes include pre-importing numpy alongside pandas and then strictly forbidding any further import statements from the LLM. Additionally, several dangerous built-in functions and reflection-related attributes (vars(), dir(), __dict__, __module__) are now explicitly blocked to prevent potential sandbox escapes or unauthorized access. The introduction of a comprehensive suite of unit tests for the pythonCodeValidator is a crucial improvement, ensuring the effectiveness of these new security measures and guarding against future regressions. This is a well-executed security hardening.

@christopherholland-workday christopherholland-workday merged commit cf36fb7 into main Mar 5, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants