Skip to content

Conversation

rudokemper
Copy link
Member

@rudokemper rudokemper commented Apr 15, 2025

Goal

Add a Postgres-to-CSV export script to our script library, enabling Windmill operators to generate CSV files directly—without needing to query the Postgres database manually.

What I changed

  • Refactored f/common_logic/db_connection.pyf/common_logic/db_operations.py to include a new function for fetching data from Postgres, in addition to connection logic.
  • Created f/common_logic/save-disk.py, which provides a save_export_file function that supports different file types (currently: csv, json, and geojson).
  • postgres_to_geojson.py was updated to use the shared utility functions, and a new script postgres_to_csv.py was added using the same pattern.
  • Added a test for postgres_to_csv using the same CoMapeo mock data as the GeoJSON script.

👀 @nicopace

@rudokemper rudokemper requested a review from IamJeffG April 15, 2025 19:13
Copy link
Contributor

@nicopace nicopace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great Rudo.
Have not run it, did an overview and left some comments.

"""
storage_path = Path(storage_path)
storage_path.mkdir(parents=True, exist_ok=True)
file_path = storage_path / f"{db_table_name}.{file_type}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line can be vulnerable to path traversal attacks if the input strings (db_table_name and file_type) come from an untrusted source and are not properly sanitized.

like:

file_path = storage_path / "../../etc/passwd.txt"

You mitigate this by:

  1. Ensure storage_path is absolute and resolved: storage_path = storage_path.resolve() (double check, ChatGPT code)
  2. Sanitize inputs
import re

def sanitize_filename(name):
    return re.sub(r"[^a-zA-Z0-9_-]", "_", name)

safe_name = sanitize_filename(db_table_name)
safe_type = sanitize_filename(file_type)
file_path = storage_path / f"{safe_name}.{safe_type}"
  1. Check final path is within allowed directory
final_path = (storage_path / f"{safe_name}.{safe_type}").resolve()

# Ensure the path is within the storage directory
if not str(final_path).startswith(str(storage_path)):
    raise ValueError("Attempted path traversal detected!")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler version of this would be:

from pathlib import Path

def get_safe_file_path(storage_path, db_table_name, file_type):
    # Build path safely
    storage_path = Path(storage_path).resolve()
    file_path = (storage_path / f"{db_table_name}.{file_type}").resolve()

    # Check the resolved path stays within the storage directory
    if not file_path.is_relative_to(storage_path):
        raise ValueError("Invalid path: possible path traversal detected.")

    return file_path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He does already validate file_type, just below.

And also I do like get_safe_file_path ⬆️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think at some point we could benefit from a repo-wide security audit, informed by any guardrails that Windmill has in place to prevent this kind of thing from happening already. But this being a shared code module, I opted to add this as a best practice 👍

"""
storage_path = Path(storage_path)
storage_path.mkdir(parents=True, exist_ok=True)
file_path = storage_path / f"{db_table_name}.{file_type}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He does already validate file_type, just below.

And also I do like get_safe_file_path ⬆️

@rudokemper rudokemper merged commit 988b928 into main Apr 16, 2025
1 check passed
@rudokemper rudokemper deleted the cvs-export-script branch April 16, 2025 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants