- Source lives under
src/.src/app/application features (transformation, matching, QA, web extraction).src/core/env loading, logging, orchestration.src/common/shared errors, responses, validation, context.src/handlers/HTTP handlers and health checks.src/db/persistence (e.g.,postgresql/).src/cloud_services/AWS helpers.src/settings/configs (config.yml, app settings/secrets).src/tests/pytest suites and fixtures.
- Dev scripts in
src/devops/scripts/(format, lint, reset env). - Entrypoints:
start_app.py(preferred) andmain.py(FastAPIapp).
- Install (Python 3.12+):
pip install -e . - Run API (auto-reload in dev):
- PowerShell:
$env:DEPLOYMENT_CONFIG__SERVER='dev'; python .\\start_app.py - Bash:
export DEPLOYMENT_CONFIG__SERVER=dev && python start_app.py
- PowerShell:
- Tests:
pytest -q(usessrc/tests/, asyncio auto mode). - Lint/format (Unix):
bash src/devops/scripts/format_and_lint.sh- Or individually:
ruff check .andblack .
- Or individually:
The SmartCrop solution in src/app/ is split into four cooperating components. Each section below lists the public endpoints, highlights the key business flow, and calls out the database tables, FTP/S3 touchpoints, and supporting services you will interact with most often.
This FastAPI surface mirrors the legacy .NET SmartCrop controllers. Query parameter names are intentionally case-sensitive and JSON payloads retain PascalCase fields defined in src/app/smartcrop/common/models.py.
Image Workflow (/image)
| Method | Path | Purpose | Workflow Highlights |
|---|---|---|---|
| GET | /image/GetImageInfo?providerName={name}&user={id} |
Locks the next batch of rows for a provider. | Selects rows from smartcrop_programimage in Status=1/CurrentState=0, stamps CurrentState=1, records locker, and streams the image (rewriting S3 URLs to CDN). |
| POST | /image/SaveCropedImages |
Persists cropped renditions from Operators. | Updates smartcrop_programimage status/state, pushes base64 payloads to S3, and upserts smartcrop_aspectratioimages rows with crop dimensions. |
| GET | /image/GetCroppedImageInfo?programId=&user= |
Loads programme records for QA review. | Re-locks programme rows, hydrates aspect ratio images, overlays lookups (type/language/reject reason), and downloads crops back as base64 when available. |
| GET | /image/DeleteCroppedImageInfo?programId=&id=&imageid= |
Removes a single cropped rendition. | Deletes smartcrop_aspectratioimages by (programid, aspectratioid, imageid) after confirming the lock. |
| GET | /image/UpdateStatus?programId=&status=¤tState= |
Finishes or resets a programme. | Bulk updates smartcrop_programimage rows scoped to link_object_id when they are still CurrentState=1. |
| GET | /image/UpdateUserLockStatus?userId= |
Releases abandoned locks. | Finds rows owned by the user in CurrentState=1, restores CurrentState to the previous state or fresh queue. |
| POST | /image/UpdateCroppedImageInfo |
Bulk updates crop metadata. | Synchronises parent smartcrop_programimage status/state and upserts aspect-ratio rows, re-uploading images to S3 when a payload is present. |
Dashboard (/dashboard)
| Method | Path | Purpose | Workflow Highlights |
|---|---|---|---|
| GET | /dashboard/Totals |
Dashboard counters. | Reads from view_smartcrop_total and maps to the Total Pydantic model. |
| GET | /dashboard/ProviderList |
Paginated provider metrics. | Pulls records from view_smartcrop_providerlist_total, slices in memory, and formats as Providers. |
Lookup (/lookup)
| Method | Path | Purpose | Workflow Highlights |
|---|---|---|---|
| GET | /lookup/GetImageTypeLookup |
Image type picklist. | SELECT id,title FROM smartcrop_imagetype_lookup ORDER BY id. |
| GET | /lookup/GetImageLanguageLookup |
Language picklist. | SELECT id,title FROM smartcrop_imagelanguage_lookup ORDER BY id. |
| GET | /lookup/GetRejectReasonLookup |
Reject reason picklist. | SELECT id,reason FROM smartcrop_rejectreason_lookup ORDER BY id. |
Reports (/report)
| Method | Path | Purpose | Workflow Highlights |
|---|---|---|---|
| GET | /report/CroppedImageReport |
Paginated activity report. | Calls public.fncroppedimagereports(...), wraps rows in CroppedImageReport, and returns paging metadata. |
| GET | /report/CroppedImageReportDownLoad |
XLSX export. | Calls the same function with a high page size, renders an openpyxl workbook, and streams it. |
Roles (/role)
| Method | Path | Purpose | Workflow Highlights |
|---|---|---|---|
| POST | /role/Create |
Add a role. | Inserts into smartcrop_role with default audit fields. |
| POST | /role/Delete |
Remove a role. | Deletes by name from smartcrop_role. |
| POST | /role/Update |
Rename role. | Updates smartcrop_role.role. |
| GET | /role/Get |
List non-active roles. | Filters smartcrop_role where status <> 'active'. |
| GET | /role/QCdisable |
Toggle QA availability. | Blocks activation when QA work is in progress; otherwise flips smartcrop_role.status for qa. |
| GET | /role/QCdisablestatus |
QA status. | Reads the qa record from smartcrop_role. |
Rules (/rule)
| Method | Path | Purpose | Workflow Highlights |
|---|---|---|---|
| POST | /rule/Create |
Activate a rule. | Inserts smartcrop_rule with isactive=1. |
| POST | /rule/Delete |
Soft delete. | Sets isactive=0 in smartcrop_rule. |
| POST | /rule/Update |
Edit rule. | Updates status, percentage, isactive fields. |
| GET | /rule/Get |
List active rules. | Selects smartcrop_rule WHERE isactive=1. |
Users (/user)
| Method | Path | Purpose | Workflow Highlights |
|---|---|---|---|
| POST | /user/Register |
Create user. | Validates uniqueness, hashes password, resolves role ID, inserts smartcrop_user. |
| POST | /user/Update |
Update profile. | Allows password rotation, role change, and isactive toggle. |
| POST | /user/UpdatePassword |
Reset password. | Updates smartcrop_user password and salt by email. |
| POST | /user/Login |
Authenticate. | Looks up smartcrop_user joined to smartcrop_role, issues base64 token. |
| POST | /user/UserRoleDetails?EmailID= |
Fetch role options. | Joins user and role tables excluding status='active'. |
| GET | /user/UserList |
List users. | Returns all smartcrop_user rows with role titles. |
| POST | /user/Delete |
Soft delete. | Marks smartcrop_user.isactive=0. |
All JSON responses (other than the report download stream) are wrapped in the shared ResponseModel.
- Services live in
src/app/smartcrop/core/services.py. Handlers only orchestrate dependency injection and response wrapping. - Image workflows keep
smartcrop_programimageandsmartcrop_aspectratioimagesin sync, driving Operator → QA → Completed state transitions while logging every lock/unlock. - Cropped images are uploaded to an S3 bucket resolved from
aws_config.service_accounts[].uat.smartcrop_bucket; URLs are rewritten tosmartcrop_service.cdn.rewrite_base_urlfor public use. - Lookup data is cached through simple
SELECTstatements against reference tables; reports and dashboard numbers rely on database views/functions.
- Core tables:
smartcrop_programimage,smartcrop_aspectratioimages,smartcrop_role,smartcrop_rule,smartcrop_user. - Lookup tables:
smartcrop_imagetype_lookup,smartcrop_imagelanguage_lookup,smartcrop_rejectreason_lookup. - Reporting views/functions:
view_smartcrop_total,view_smartcrop_providerlist_total,public.fncroppedimagereports.
Filewatcher automates pulling provider Excel manifests (S3 or SFTP), parsing the rows, and pushing them into the SmartCrop intake tables.
| Method | Path | Purpose | Workflow Highlights |
|---|---|---|---|
| POST | /filewatcher/run-local?folder_path= |
Process local Excel drop folder. | Dispatches to FileService.run_local_folder, classifies files by prefix, and persists rows. |
| POST | /filewatcher/download-s3 |
Pull from S3 sources. | Uses S3Downloader to pull configured prefixes into the processing directory. |
| POST | /filewatcher/download-sftp |
Pull from SFTP sources. | Uses SFTPDownloader (Paramiko) with proxy support to copy provider files locally. |
| POST | /filewatcher/batch/run |
Generate SmartCrop manifest from parsed data. | Wraps BatchProcess, which in turn uses BatchFTPService for SFTP uploads. |
FileServicechooses the processing directory fromprocessing_middleware.middleware[PROCESSING_MIDDLEWARE]and classifies files with provider-specific readers (PressSiteProvider,SlingProvider,EmailProvider, etc.).- Parsed rows become
InputFileModelobjects and are bulk inserted intosmartcrop_inputfiles; new programmes are derived intosmartcrop_programimagewith default states. - Batch sub-module (
core/batch_*) reads aggregated rows, writes CSV manifests, and uploads them via configured transports.
- Landing:
smartcrop_upload_files(download dedupe),smartcrop_inputfiles. - Derived:
smartcrop_programimage,smartcrop_aspectratioimages. - Batch: View/table names are supplied through
smartcrop_filewatcher_batchjobsettings (view_name,programimage_table,aspectratio_table,imagetype_table,role_table).
- S3 download sources are defined under
app_settings.filewatcher.s3_sources; uploads use helpers insrc/cloud_services/aws/storage_utils.py. - SFTP connections reuse the global
smartcrop_ftp_config, including optional proxy host/port settings. - Credentials are stored base64-encoded (
credentials.username_b64,credentials.password_b64) and decoded on demand.
smartcrop_filewatcher_settings.provider_file_type.group_[abc]controls filename prefixes mapped to providers.processing_middleware.local_filesystem.root_pathis the fallback when no middleware override is set.smartcrop_service.cdn.origin_base_urlandrewrite_base_urlare reused when rewriting asset URLs inside batch CSV rows.
- Every download, parse, and persistence step logs through
src/core/log.py::logger.log_agnostic, making it easy to trace runs in CloudWatch or locallogs/. - Deduplication and parsing decisions emit
DEBUGlogs to help diagnose skipped files or malformed rows.
- Set
DEPLOYMENT_CONFIG__SERVER=devand start the API (python start_app.py). - Ensure SmartCrop tables exist and DB credentials are valid.
- Drop test Excel files into the configured folder and call
/filewatcher/run-local. - Inspect
smartcrop_upload_files,smartcrop_inputfiles, andsmartcrop_programimageto confirm ingestion.
{"status": "no_files"}from download endpoints means dedupe rejected all candidates—checksmartcrop_upload_files.- Provider mismatch logs indicate the filename prefix did not align with any configured group.
- Parsing errors usually stem from header mismatches; enable DEBUG logging to inspect per-row failures.
This module builds the nightly manifest that the legacy downstream systems consume.
| Method | Path | Purpose | Workflow Highlights |
|---|---|---|---|
| POST | /batchjob/run |
Execute batch export. | BatchJobRunner pulls rows via BatchJobDataService, writes a timestamped CSV, uploads to S3 or SFTP, and updates programme rows with the emitted filename. |
BatchJobRunnerreads configuration fromsmartcrop_filewatcher_batchjob(output directory, view/table names, row restriction, transfer targets).BatchJobDataService.collect_rowsenriches export rows, rewriting image URLs to the CDN and resolving missing image type metadata.BatchJobDataAccess.update_output_filenamesstampssmartcrop_programimagewith the exported filename and handles QA/operator status transitions when QA is disabled.- Transfers are handled by
S3TransferService(AWS Boto client) andSftpTransferService(Paramiko), both logging each upload.
- Primary view:
smartcrop_filewatcher_batchjob.view_name(typically a view over programme images). - Tables touched:
smartcrop_programimage,smartcrop_aspectratioimages,smartcrop_imagetype_lookup,smartcrop_role.
- S3 bucket/prefix reuse the
aws_config.service_accounts[].uat.smartcrop_bucketpath. - SFTP output and audit directories come from
smartcrop_ftp_config.directorieswith ports supplied bysmartcrop_ftp_config.ports.output|audit. - Usernames/passwords are stored base64 encoded in
smartcrop_ftp_config.credentials.
Dump Data exports ad-hoc CSV snapshots for reporting and auditing.
| Method | Path | Purpose | Workflow Highlights |
|---|---|---|---|
| POST | /dumpdata/run?date_range={1-30} |
Generate exports. | DumpDataService.run executes SQL for programme, aspect, summary, and audit datasets, writes CSVs under smartcrop_dumpdata.local_path, then uploads primary files to the SFTP output directory and audit files to the audit directory. |
- Configuration merges
smartcrop_dumpdata(local paths & filenames) withsmartcrop_ftp_config(shared FTP credentials). - SQL sources include
vwProgramImage,vwAspectRatioImages,smartcrop_reports(date_range), andprogramimage_audit_reports(date_range). - Each CSV is logged and removed locally after successful SFTP upload.
- Uses the same SFTP host as Filewatcher/Batch with optional distinct audit port/directory.
- Credentials are decoded from base64 on the fly before opening the Paramiko SFTP session.
- Views/functions:
vwProgramImage,vwAspectRatioImages,smartcrop_reports,programimage_audit_reports. - Output filenames are timestamped using the prefix values from configuration (
program,aspect,summary,audit).
- Python 3.12, 4-space indent, type hints encouraged.
- Ruff in
pyproject.toml(line length 88,target-version = py312). - Prefer Black defaults; keep line length consistent with Ruff (88).
- Naming:
snake_casefiles/functions,PascalCaseclasses,UPPER_SNAKE_CASEconstants. - Keep handlers thin; business logic in
src/app/**/core/.
- Use pytest; place tests under
src/tests/namedtest_*.py. - Add unit tests for pure functions and handlers; mock I/O, network, AWS.
- Aim for solid coverage on changed code; include error-path tests.
- Run locally with
pytest -qbefore submitting.
- Conventional Commits:
feat:,fix:,docs:,refactor:,test:,chore:. - Commits: imperative subject, concise body when needed.
- PRs: clear description, link issues, outline tests, include example requests/responses for new endpoints.
- Configuration loads from
env.ymlandsrc/settings/config.yml. Do not commit secrets; use env vars for sensitive values. - Set
DEPLOYMENT_CONFIG__SERVERtodev(local) oraws(prod). - Review logs under
logs/; avoid uploading real data to the repo.
tivo-smartcrop-python
| .venv
| .vscode
| logs
| secrets
| src
| | app
| | | | smartcrop
| | auth
| | | __pycache__
| | +-- auth.py
| | cloud_services
| | | aws
| | | | __pycache__
| | | +-- aws_wrapper.py
| | | +-- dynamodb_utils.py
| | | +-- storage_utils.py
| | common
| | | __pycache__
| | +-- common.py
| | +-- context.py
| | +-- exceptions.py
| | +-- responses.py
| | +-- traceback_utils.py
| | +-- validators.py
| | core
| | | orchestration
| | | | __pycache__
| | | +-- api_orchestrator.py
| | | +-- __init__.py
| | | __pycache__
| | +-- env_loader.py
| | +-- log.py
| | data
| | |
| | db
| | | postgresql
| | | | __pycache__
| | | +-- postgresql.py
| | | +-- schema.sql
| | devops
| | | scripts
| | | +-- add_semantic_ver_tag.sh
| | | +-- format_and_lint.sh
| | | +-- reset_dev_env__linux.sh
| | | +-- reset_dev_env__windows.sh
| | | +-- tree_view_powershell.ps1
| | docs
| | | api_documentation
| | handlers
| | | __pycache__
| | +-- generic_handlers.py
| | settings
| | | __pycache__
| | +-- aws_config.py
| | +-- config.py
| | +-- config.yml
| | tests
| | | aws
| | | +-- test_dynamodb.py
| | +-- conftest.py
| | +-- standalone_calls.py
| | +-- test_logging.py
| | +-- test_transaction.py
| | utils
| | | __pycache__
| | +-- transaction.py
| | +-- utils.py
+-- .eslintrc.yml
+-- Api-Dockerfile
+-- env.yml
+-- main.py
+-- package-lock.json
+-- pyproject.toml
+-- README.md
+-- start_app.py
Key Directories:
-
app/: Application-specific modules
-
auth/: Authentication and authorization logic
-
cloud_services/: Cloud service integrations
- aws/: AWS service wrappers and utilities (S3, DynamoDB, SQS)
-
common/: Global shared utilities
- Exception handling
- Request context management
- Response formatting
- Input validation
-
core/: Core system components
- orchestration/: API workflow orchestration and management
- Environment configuration
- Logging system
-
data/: Data resources and files
- parquet/: Parquet format data storage
-
db/: Database operations
- postgresql/: PostgreSQL integration and queries
-
devops/: DevOps utilities
- release_notes/: Release notes generation utilities
- scripts/: Automation and maintenance scripts
-
docs/: Documentation
- api_documentation/: API specifications and guides
- Implementation details
- Deployment guides
-
handlers/: Generic request handlers
-
settings/: Global application configuration
-
tests/: Test suites and utilities
- aws/: AWS-specific test modules
-
utils/: General utility functions
| Method | Path | Handler | Description | Notes |
|---|---|---|---|---|
| POST | /filewatcher/run-local |
run_local (src/app/smartcrop_filewatcher/handlers/ingestion_handlers.py) |
Process Excel files that already exist on disk. | Requires query parameter folder_path pointing to a local directory accessible by the service. Returns {"status": "success"} or {"status": "failed"}. |
| POST | /filewatcher/download-s3 |
download_s3 |
Lists configured S3 prefixes, downloads new Excel files, and records them in smartcrop_upload_files. |
Response is {"status": "success"} when at least one file downloads, otherwise {"status": "no_files"}. |
| POST | /filewatcher/download-sftp |
download_sftp |
Connects to configured SFTP sources, pulls new Excel files, and records them in smartcrop_upload_files. |
Response is {"status": "success"} or {"status": "no_files"} depending on download activity. |
The SmartCrop Filewatcher module ingests provider spreadsheets into the SmartCrop data pipeline. It mirrors legacy .NET services by downloading Excel files from external sources, parsing provider-specific schemas, and persisting normalized records to PostgreSQL tables used by downstream image-processing jobs.
- Acquisition –
S3DownloaderandSFTPDownloadercopy Excel files into the local processing directory configured underprocessing_middlewareand record each successful download insmartcrop_upload_files. - Local Processing –
FileService.run_local_folderscans the working directory, matches each filename to a provider implementation, and converts rows intoInputFileModelobjects. - Persistence & Derivation –
DataService.save_databatches records intosmartcrop_inputfiles, then derives missing rows forsmartcrop_programimageso image jobs can reference newly ingested metadata.
- Handlers
src/app/smartcrop_filewatcher/handlers/ingestion_handlers.py– Exposes the three FastAPI endpoints listed above. Instantiates core services on demand and returns simple JSON status payloads.
- Core
src/app/smartcrop_filewatcher/core/downloaders.py– ImplementsS3DownloaderandSFTPDownloader. Usesboto3andparamikorespectively, ensures idempotency viasmartcrop_upload_files, and normalizes timestamps through_normalize_created_at.src/app/smartcrop_filewatcher/core/file_service.py– Orchestrates an ingestion run. Resolves the local staging directory fromapp_settings, picks provider readers based on filename prefixes, and hands parsed records toDataService.src/app/smartcrop_filewatcher/core/data_service.py– Handles all database writes. Converts provider output into pandas DataFrames, enforces numeric coercions, inserts intosmartcrop_inputfiles, and back-fillssmartcrop_programimage.src/app/smartcrop_filewatcher/core/providers.py– Contains provider-specific Excel parsers (PressSiteProvider,TMDBProvider,SlingProvider,EmailProvider). Each implementsread_exceland returns a list ofInputFileModelrows using helper utilities.src/app/smartcrop_filewatcher/core/typing_aliases.py– Defines shared type aliases such asInputListfor improved readability across provider classes.
- Common
src/app/smartcrop_filewatcher/common/constants.py– Houses reusable constants likeRIGHTS_NAME_DEFAULTandIMAGE_NOTES_DEFAULT.src/app/smartcrop_filewatcher/common/models.py– Declares theInputFileModeldataclass with the full schema required by SmartCrop ingestion (mirrors legacy .NETInputFileModel).
- Utilities
src/app/smartcrop_filewatcher/utils/excel_utils.py– Wraps pandas Excel loading (read_excel_as_dataframe) and safe cell extraction (get_column_value) used by provider readers.
- Integration Points
src/core/log.py– Supplies thelogger.log_agnosticinterface leveraged throughout the module for structured logging.src/db/postgresql/postgresql.py– Exposespostgresql_obj__xperi_db, used for deduplication checks (check_entry_exists), SQL execution, and bulk inserts.src/settings/config.py– Loadsapp_settings, including SmartCrop-specificprocessing_middlewaredirectories and source definitions for S3/SFTP.src/cloud_services/aws/storage_utils.py– Provides S3 helpers referenced byFileServicewhen additional AWS interactions are required.
smartcrop_upload_files– Tracks downloaded files. Inserted via_insert_smartcrop_upload_files_if_new, ensuring each(filename, created_at)pair is unique before downloading from S3/SFTP.smartcrop_inputfiles– Primary landing table for parsed rows. Populated in bulk withDataService.save_data.smartcrop_programimage– Derived table populated with rows that do not yet exist forinputfileref. DataService appends defaults for state, audit fields, and timestamps to match legacy expectations.
processing_middleware.middleware[PROCESSING_MIDDLEWARE].root_pathandfolder_namedefine where the module stages files locally;PROCESSING_MIDDLEWAREdefaults toec2_instance.processing_middleware.local_filesystem.root_pathprovides the fallback directory used by downloaders when no override is supplied.filewatcher.s3_sourcesandfilewatcher.sftp_sources(withinapp_settings) enumerate source-specific connection details such as prefixes, credentials, and optional proxy settings.
- Each major step logs at
INFOlevel (start/end of download cycles, per-file outcomes) withDEBUGentries capturing decisions like deduplication checks and skipped files. - Errors are recorded via
logger.log_agnostic("ERROR", ...)alongside JSON-serialized tracebacks when present, making it easier to trace failures in CloudWatch or local log files underlogs/.
- Start the FastAPI application through
python start_app.py(ensureDEPLOYMENT_CONFIG__SERVER=dev). - Confirm database credentials and SmartCrop tables (
smartcrop_upload_files,smartcrop_inputfiles,smartcrop_programimage) exist. - Use an HTTP client (e.g., curl, Postman) to invoke the desired endpoint:
- Trigger downloads from S3 or SFTP to populate the staging directory.
- Run
/filewatcher/run-local?folder_path=/path/to/stagingto parse and persist Excel content.
- Monitor logs and database tables to verify ingestion results and troubleshoot using the debug statements outlined above.
- Responses of
{"status": "no_files"}indicate that no new Excel files passed the deduplication check; inspectsmartcrop_upload_filesto confirm existing entries. - If provider matching fails, ensure filenames follow the expected prefixes defined in
FileService.group_a/group_b/group_c. - For parsing errors, enable DEBUG logging to review per-row extraction and confirm column headers align with provider expectations (case-sensitive after trimming).