-
Notifications
You must be signed in to change notification settings - Fork 1
Add Postgres-to-CSV export script and refactor shared export utilities #87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great Rudo.
Have not run it, did an overview and left some comments.
f/common_logic/save_disk.py
Outdated
""" | ||
storage_path = Path(storage_path) | ||
storage_path.mkdir(parents=True, exist_ok=True) | ||
file_path = storage_path / f"{db_table_name}.{file_type}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line can be vulnerable to path traversal attacks if the input strings (db_table_name and file_type) come from an untrusted source and are not properly sanitized.
like:
file_path = storage_path / "../../etc/passwd.txt"
You mitigate this by:
- Ensure storage_path is absolute and resolved:
storage_path = storage_path.resolve()
(double check, ChatGPT code) - Sanitize inputs
import re
def sanitize_filename(name):
return re.sub(r"[^a-zA-Z0-9_-]", "_", name)
safe_name = sanitize_filename(db_table_name)
safe_type = sanitize_filename(file_type)
file_path = storage_path / f"{safe_name}.{safe_type}"
- Check final path is within allowed directory
final_path = (storage_path / f"{safe_name}.{safe_type}").resolve()
# Ensure the path is within the storage directory
if not str(final_path).startswith(str(storage_path)):
raise ValueError("Attempted path traversal detected!")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simpler version of this would be:
from pathlib import Path
def get_safe_file_path(storage_path, db_table_name, file_type):
# Build path safely
storage_path = Path(storage_path).resolve()
file_path = (storage_path / f"{db_table_name}.{file_type}").resolve()
# Check the resolved path stays within the storage directory
if not file_path.is_relative_to(storage_path):
raise ValueError("Invalid path: possible path traversal detected.")
return file_path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
He does already validate file_type
, just below.
And also I do like get_safe_file_path
⬆️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think at some point we could benefit from a repo-wide security audit, informed by any guardrails that Windmill has in place to prevent this kind of thing from happening already. But this being a shared code module, I opted to add this as a best practice 👍
f/common_logic/save_disk.py
Outdated
""" | ||
storage_path = Path(storage_path) | ||
storage_path.mkdir(parents=True, exist_ok=True) | ||
file_path = storage_path / f"{db_table_name}.{file_type}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
He does already validate file_type
, just below.
And also I do like get_safe_file_path
⬆️
Goal
Add a Postgres-to-CSV export script to our script library, enabling Windmill operators to generate CSV files directly—without needing to query the Postgres database manually.
What I changed
f/common_logic/db_connection.py
➜f/common_logic/db_operations.py
to include a new function for fetching data from Postgres, in addition to connection logic.f/common_logic/save-disk.py
, which provides asave_export_file
function that supports different file types (currently: csv, json, and geojson).postgres_to_geojson.py
was updated to use the shared utility functions, and a new scriptpostgres_to_csv.py
was added using the same pattern.postgres_to_csv
using the same CoMapeo mock data as the GeoJSON script.👀 @nicopace