Skip to content

Add text/diff/data ops, YAML/Parquet, OTel tracing, and OneDrive/Box backends#59

Merged
JE-Chen merged 4 commits intodevfrom
feat/new-ops-and-backends
Apr 24, 2026
Merged

Add text/diff/data ops, YAML/Parquet, OTel tracing, and OneDrive/Box backends#59
JE-Chen merged 4 commits intodevfrom
feat/new-ops-and-backends

Conversation

@JE-Chen
Copy link
Copy Markdown
Member

@JE-Chen JE-Chen commented Apr 24, 2026

Summary

Three-batch feature push covering new local ops, structured-data support, tracing, and two additional cloud backends. 105 → 137 registered actions (+32). Full test suite: 739 passed, 8 skipped.

Batch A — stdlib-only ops (commit c982a46)

  • text_ops: FA_file_split, FA_file_merge, FA_encoding_convert, FA_line_count, FA_sed_replace (literal + regex). All writes atomic.
  • diff_ops (extended): FA_diff_files, FA_diff_dirs, FA_apply_patch — new apply_text_patch() driver with strict hunk-context verification; diff_dirs_summary() wrapper returns plain dict for JSON-friendly payloads.
  • data_ops (CSV/JSONL): FA_csv_filter (column projection + where clause), FA_csv_to_jsonl, FA_jsonl_iter (with limit), FA_jsonl_append.
  • New exceptions: TextOpsException, DataOpsException.

Batch B — deps + structured data + tracing (commit 5d7eeed)

  • data_ops (YAML): FA_yaml_get/set/delete with dotted-path access via yaml.safe_load (never yaml.load).
  • data_ops (Parquet): FA_parquet_read (with limit + column projection), FA_parquet_write, FA_csv_to_parquet.
  • core/tracing: OpenTelemetry bridge. init_tracing() installs a global TracerProvider with a configurable SpanExporter (defaults to a no-op). action_span() context manager is a zero-cost no-op until initialised. ActionExecutor._execute_event wraps every action in an automation_file.action span tagged with fa.action=<name>. FA_tracing_init registered.
  • New exception: TracingException.
  • Deps added (required, pinned in requirements.txt + stable.toml + dev.toml): PyYAML>=6.0, pyarrow>=15.0.0, opentelemetry-api>=1.25.0, opentelemetry-sdk>=1.25.0.

Batch C — OneDrive + Box backends (commit 81737e5)

  • OneDrive (remote/onedrive/): OneDriveClient wraps a requests.Session with MSAL-powered auth; offers both later_init(access_token) and device_code_login(client_id, tenant_id=None). Ops: FA_onedrive_upload_file / upload_dir (4 MiB simple-upload cap) / download_file / delete_item / list_folder / close. Graph paginations (@odata.nextLink) are followed.
  • Box (remote/box/): BoxClient wraps boxsdk.OAuth2 + Client. Ops: FA_box_upload_file (returns new file id) / upload_dir (flattens tree — Box requires pre-existing folder ids) / download_file / delete_file / delete_folder (with recursive flag) / list_folder.
  • PySide6 tabs added (OneDriveTab, BoxTab) + plugged into Transfer sidebar and Home status probes.
  • New exceptions: OneDriveException, BoxException.
  • Deps added (required): msal>=1.28.0, boxsdk>=3.14.0.

New dependencies (all required, not extras)

Package Batch Why
PyYAML B FA_yaml_* ops (safe_load only)
pyarrow B FA_parquet_* ops
opentelemetry-api / sdk B always-on tracing hooks in the executor
msal C OneDrive device-code auth
boxsdk C Box client

Test plan

  • ruff check automation_file/ tests/
  • ruff format --check automation_file/ tests/
  • pytest tests/ — 739 passed, 8 skipped
  • Live OneDrive / Box / Parquet integration verification outside CI

Follow-ups (not in this PR)

  • OneDrive resumable upload sessions for files >4 MiB
  • Box folder-mirror upload (create nested folders instead of flattening)
  • Parquet predicate pushdown / row-group slicing for large files

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 24, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 383 complexity · 14 duplication

Metric Results
Complexity 383
Duplication 14

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

JE-Chen added 3 commits April 24, 2026 21:45
Batch A of the new-ops initiative — stdlib-only, no new dependencies.

- text_ops: FA_file_split, FA_file_merge, FA_encoding_convert,
  FA_line_count, FA_sed_replace (literal + regex). All writes atomic.
- diff_ops: FA_diff_files / FA_diff_dirs / FA_apply_patch. Extends the
  existing diff_text_files / diff_dirs helpers with
  apply_text_patch (unified-diff driver with strict context-line
  verification) and a JSON-friendly diff_dirs_summary wrapper.
- data_ops: FA_csv_filter (column projection + where clause),
  FA_csv_to_jsonl, FA_jsonl_iter (with limit), FA_jsonl_append.
- Facade + __all__ updated; new exception types TextOpsException and
  DataOpsException added under the shared exceptions hierarchy.

Tests: 42 new cases — all pass. Full suite: 686 passed, 8 skipped.
Batch B — introduces three required dependencies: PyYAML, pyarrow,
opentelemetry-api/sdk (all added to requirements.txt + dev.toml +
stable.toml).

YAML / Parquet:
- FA_yaml_get, FA_yaml_set, FA_yaml_delete — dotted-path access with
  yaml.safe_load (never yaml.load) so untrusted config can't construct
  arbitrary Python objects.
- FA_parquet_read (with limit + columns projection), FA_parquet_write,
  FA_csv_to_parquet. All writes atomic.

Tracing:
- core.tracing.init_tracing installs a global TracerProvider with a
  configurable SpanExporter (defaults to a null exporter so spans can
  still be inspected without wiring up a backend).
- action_span context manager is a zero-cost no-op until tracing is
  initialised.
- ActionExecutor._execute_event wraps every action in an
  "automation_file.action" span tagged with fa.action=<name>.
- FA_tracing_init exposed through the registry.

Tests: 22 new — all pass. Full suite: 708 passed, 8 skipped.
Batch C — two new cloud backends mirroring the existing S3 / Azure /
Dropbox pattern, plus matching PySide6 tabs.

OneDrive (Microsoft Graph via MSAL):
- OneDriveClient holds an OAuth2 access token and a requests.Session,
  exposes both later_init(token) and device_code_login(client_id)
  paths. graph_request() centralises auth header + error wrapping.
- Ops: upload_file / upload_dir (4 MiB simple-upload cap),
  download_file, delete_item, list_folder (follows @odata.nextLink).
- FA_onedrive_* actions registered, UI tab added to Transfer sidebar
  and Home status probes.

Box (boxsdk):
- BoxClient wraps boxsdk.OAuth2 + Client with access-token init.
- Ops: upload_file (returns new file id), upload_dir (flattens tree
  since Box requires pre-existing folder ids), download_file,
  delete_file, delete_folder (recursive flag), list_folder.
- FA_box_* actions registered, UI tab added.

Deps added to requirements.txt + stable.toml + dev.toml: msal,
boxsdk. Both declared as required runtime deps, consistent with the
existing boto3 / azure-storage-blob / dropbox / paramiko treatment.

Tests: 31 new — all pass. Full suite: 739 passed, 8 skipped. Real
Graph / Box endpoints are out of CI; tests exercise registry wiring,
guard clauses, and exception-wrapping paths via in-memory fakes.
@JE-Chen JE-Chen force-pushed the feat/new-ops-and-backends branch from 81737e5 to 8995da1 Compare April 24, 2026 13:45
CI lint (mypy, 9 errors):
- text_ops.file_merge: replace the iter(lambda) chunker with a plain
  while loop so mypy can infer the chunk type.
- data_ops: _resolve_fieldnames now accepts Sequence[str] | None to
  match csv.DictReader.fieldnames.
- tracing: drop unused # type: ignore[attr-defined] comments now that
  mypy resolves the Once / TracerProvider references on its own.
- tracing: replace # type: ignore[override] on the _NullExporter
  methods with plain signatures that already match the parent.
- box.upload_ops: wrap box_upload_file in a helper that returns bool so
  walk_and_upload's Callable[[Path, str], bool] is satisfied.

Codacy (11 new issues):
- Bandit B105 in test_onedrive_ops: promote the literal "fake-token"
  string to a named local and mark with # nosec B105 + comment.
- PyLint W0212 across test_onedrive_ops + test_tracing: add file-level
  # pylint: disable=protected-access with rationale — the tests
  legitimately poke tracing._state and OneDriveClient._session / _access_token
  to inject fakes.
- PyLint R0914 in data_ops.csv_filter (17 locals): refactor into
  csv_filter → _stream_csv_filter → _write_filtered_rows, each below
  the 15-locals cap.
- PyLint R0913 in onedrive.client.graph_request (8 args): collapse
  params / json_body / data / headers into **request_kwargs forwarded
  to requests.Session.request.
- PyLint R0917 too-many-positional-arguments on the new data_ops
  helpers: justified disable-next per helper.

SonarCloud (4 issues):
- python:S108 empty-body blocks in test_tracing: replace the
  `pass`-only with blocks with meaningful assertions / operations so
  the span bodies are non-empty.
- pythonsecurity:S2083 in apply_text_patch: document that the caller
  is the trust boundary for target (matching the rest of the local
  ops) and mark the write_text call with # NOSONAR pythonsecurity:S2083
  + explanation.
- python:S3776 cognitive complexity in _apply_unified_patch (30) and
  _iter_hunks (17): split both into small helpers (_copy_up_to,
  _apply_hunk_ops, _verify_live_line for the former; a
  _HunkParseState dataclass + _consume_patch_line for the latter).
  Each helper now scores well under the 15-complexity cap.

Verified: `ruff check`, `ruff format --check`, `mypy`, and `pytest`
(739 passed, 8 skipped).
@sonarqubecloud
Copy link
Copy Markdown

@JE-Chen JE-Chen merged commit 008f3c9 into dev Apr 24, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant