Skip to content

amangour01/tivo-smartcrop-python

Repository files navigation

Repository Guidelines

Project Structure & Modules

  • Source lives under src/.
    • src/app/ application features (transformation, matching, QA, web extraction).
    • src/core/ env loading, logging, orchestration.
    • src/common/ shared errors, responses, validation, context.
    • src/handlers/ HTTP handlers and health checks.
    • src/db/ persistence (e.g., postgresql/).
    • src/cloud_services/ AWS helpers.
    • src/settings/ configs (config.yml, app settings/secrets).
    • src/tests/ pytest suites and fixtures.
  • Dev scripts in src/devops/scripts/ (format, lint, reset env).
  • Entrypoints: start_app.py (preferred) and main.py (FastAPI app).

Build, Test, Run

  • Install (Python 3.12+): pip install -e .
  • Run API (auto-reload in dev):
    • PowerShell: $env:DEPLOYMENT_CONFIG__SERVER='dev'; python .\\start_app.py
    • Bash: export DEPLOYMENT_CONFIG__SERVER=dev && python start_app.py
  • Tests: pytest -q (uses src/tests/, asyncio auto mode).
  • Lint/format (Unix): bash src/devops/scripts/format_and_lint.sh
    • Or individually: ruff check . and black .

Component Guides

The SmartCrop solution in src/app/ is split into four cooperating components. Each section below lists the public endpoints, highlights the key business flow, and calls out the database tables, FTP/S3 touchpoints, and supporting services you will interact with most often.

SmartCrop API (src/app/smartcrop)

This FastAPI surface mirrors the legacy .NET SmartCrop controllers. Query parameter names are intentionally case-sensitive and JSON payloads retain PascalCase fields defined in src/app/smartcrop/common/models.py.

Endpoints

Image Workflow (/image)

Method Path Purpose Workflow Highlights
GET /image/GetImageInfo?providerName={name}&user={id} Locks the next batch of rows for a provider. Selects rows from smartcrop_programimage in Status=1/CurrentState=0, stamps CurrentState=1, records locker, and streams the image (rewriting S3 URLs to CDN).
POST /image/SaveCropedImages Persists cropped renditions from Operators. Updates smartcrop_programimage status/state, pushes base64 payloads to S3, and upserts smartcrop_aspectratioimages rows with crop dimensions.
GET /image/GetCroppedImageInfo?programId=&user= Loads programme records for QA review. Re-locks programme rows, hydrates aspect ratio images, overlays lookups (type/language/reject reason), and downloads crops back as base64 when available.
GET /image/DeleteCroppedImageInfo?programId=&id=&imageid= Removes a single cropped rendition. Deletes smartcrop_aspectratioimages by (programid, aspectratioid, imageid) after confirming the lock.
GET /image/UpdateStatus?programId=&status=&currentState= Finishes or resets a programme. Bulk updates smartcrop_programimage rows scoped to link_object_id when they are still CurrentState=1.
GET /image/UpdateUserLockStatus?userId= Releases abandoned locks. Finds rows owned by the user in CurrentState=1, restores CurrentState to the previous state or fresh queue.
POST /image/UpdateCroppedImageInfo Bulk updates crop metadata. Synchronises parent smartcrop_programimage status/state and upserts aspect-ratio rows, re-uploading images to S3 when a payload is present.

Dashboard (/dashboard)

Method Path Purpose Workflow Highlights
GET /dashboard/Totals Dashboard counters. Reads from view_smartcrop_total and maps to the Total Pydantic model.
GET /dashboard/ProviderList Paginated provider metrics. Pulls records from view_smartcrop_providerlist_total, slices in memory, and formats as Providers.

Lookup (/lookup)

Method Path Purpose Workflow Highlights
GET /lookup/GetImageTypeLookup Image type picklist. SELECT id,title FROM smartcrop_imagetype_lookup ORDER BY id.
GET /lookup/GetImageLanguageLookup Language picklist. SELECT id,title FROM smartcrop_imagelanguage_lookup ORDER BY id.
GET /lookup/GetRejectReasonLookup Reject reason picklist. SELECT id,reason FROM smartcrop_rejectreason_lookup ORDER BY id.

Reports (/report)

Method Path Purpose Workflow Highlights
GET /report/CroppedImageReport Paginated activity report. Calls public.fncroppedimagereports(...), wraps rows in CroppedImageReport, and returns paging metadata.
GET /report/CroppedImageReportDownLoad XLSX export. Calls the same function with a high page size, renders an openpyxl workbook, and streams it.

Roles (/role)

Method Path Purpose Workflow Highlights
POST /role/Create Add a role. Inserts into smartcrop_role with default audit fields.
POST /role/Delete Remove a role. Deletes by name from smartcrop_role.
POST /role/Update Rename role. Updates smartcrop_role.role.
GET /role/Get List non-active roles. Filters smartcrop_role where status <> 'active'.
GET /role/QCdisable Toggle QA availability. Blocks activation when QA work is in progress; otherwise flips smartcrop_role.status for qa.
GET /role/QCdisablestatus QA status. Reads the qa record from smartcrop_role.

Rules (/rule)

Method Path Purpose Workflow Highlights
POST /rule/Create Activate a rule. Inserts smartcrop_rule with isactive=1.
POST /rule/Delete Soft delete. Sets isactive=0 in smartcrop_rule.
POST /rule/Update Edit rule. Updates status, percentage, isactive fields.
GET /rule/Get List active rules. Selects smartcrop_rule WHERE isactive=1.

Users (/user)

Method Path Purpose Workflow Highlights
POST /user/Register Create user. Validates uniqueness, hashes password, resolves role ID, inserts smartcrop_user.
POST /user/Update Update profile. Allows password rotation, role change, and isactive toggle.
POST /user/UpdatePassword Reset password. Updates smartcrop_user password and salt by email.
POST /user/Login Authenticate. Looks up smartcrop_user joined to smartcrop_role, issues base64 token.
POST /user/UserRoleDetails?EmailID= Fetch role options. Joins user and role tables excluding status='active'.
GET /user/UserList List users. Returns all smartcrop_user rows with role titles.
POST /user/Delete Soft delete. Marks smartcrop_user.isactive=0.

All JSON responses (other than the report download stream) are wrapped in the shared ResponseModel.

Business Flow & Integrations

  • Services live in src/app/smartcrop/core/services.py. Handlers only orchestrate dependency injection and response wrapping.
  • Image workflows keep smartcrop_programimage and smartcrop_aspectratioimages in sync, driving Operator → QA → Completed state transitions while logging every lock/unlock.
  • Cropped images are uploaded to an S3 bucket resolved from aws_config.service_accounts[].uat.smartcrop_bucket; URLs are rewritten to smartcrop_service.cdn.rewrite_base_url for public use.
  • Lookup data is cached through simple SELECT statements against reference tables; reports and dashboard numbers rely on database views/functions.

Tables, Views & Functions

  • Core tables: smartcrop_programimage, smartcrop_aspectratioimages, smartcrop_role, smartcrop_rule, smartcrop_user.
  • Lookup tables: smartcrop_imagetype_lookup, smartcrop_imagelanguage_lookup, smartcrop_rejectreason_lookup.
  • Reporting views/functions: view_smartcrop_total, view_smartcrop_providerlist_total, public.fncroppedimagereports.

SmartCrop Filewatcher (src/app/smartcrop_filewatcher)

Filewatcher automates pulling provider Excel manifests (S3 or SFTP), parsing the rows, and pushing them into the SmartCrop intake tables.

Endpoints

Method Path Purpose Workflow Highlights
POST /filewatcher/run-local?folder_path= Process local Excel drop folder. Dispatches to FileService.run_local_folder, classifies files by prefix, and persists rows.
POST /filewatcher/download-s3 Pull from S3 sources. Uses S3Downloader to pull configured prefixes into the processing directory.
POST /filewatcher/download-sftp Pull from SFTP sources. Uses SFTPDownloader (Paramiko) with proxy support to copy provider files locally.
POST /filewatcher/batch/run Generate SmartCrop manifest from parsed data. Wraps BatchProcess, which in turn uses BatchFTPService for SFTP uploads.

Workflow Overview

  • FileService chooses the processing directory from processing_middleware.middleware[PROCESSING_MIDDLEWARE] and classifies files with provider-specific readers (PressSiteProvider, SlingProvider, EmailProvider, etc.).
  • Parsed rows become InputFileModel objects and are bulk inserted into smartcrop_inputfiles; new programmes are derived into smartcrop_programimage with default states.
  • Batch sub-module (core/batch_*) reads aggregated rows, writes CSV manifests, and uploads them via configured transports.

Tables & Views

  • Landing: smartcrop_upload_files (download dedupe), smartcrop_inputfiles.
  • Derived: smartcrop_programimage, smartcrop_aspectratioimages.
  • Batch: View/table names are supplied through smartcrop_filewatcher_batchjob settings (view_name, programimage_table, aspectratio_table, imagetype_table, role_table).

FTP & Storage Notes

  • S3 download sources are defined under app_settings.filewatcher.s3_sources; uploads use helpers in src/cloud_services/aws/storage_utils.py.
  • SFTP connections reuse the global smartcrop_ftp_config, including optional proxy host/port settings.
  • Credentials are stored base64-encoded (credentials.username_b64, credentials.password_b64) and decoded on demand.

Configuration Keys

  • smartcrop_filewatcher_settings.provider_file_type.group_[abc] controls filename prefixes mapped to providers.
  • processing_middleware.local_filesystem.root_path is the fallback when no middleware override is set.
  • smartcrop_service.cdn.origin_base_url and rewrite_base_url are reused when rewriting asset URLs inside batch CSV rows.

Logging & Monitoring

  • Every download, parse, and persistence step logs through src/core/log.py::logger.log_agnostic, making it easy to trace runs in CloudWatch or local logs/.
  • Deduplication and parsing decisions emit DEBUG logs to help diagnose skipped files or malformed rows.

Running Locally

  1. Set DEPLOYMENT_CONFIG__SERVER=dev and start the API (python start_app.py).
  2. Ensure SmartCrop tables exist and DB credentials are valid.
  3. Drop test Excel files into the configured folder and call /filewatcher/run-local.
  4. Inspect smartcrop_upload_files, smartcrop_inputfiles, and smartcrop_programimage to confirm ingestion.

Troubleshooting Tips

  • {"status": "no_files"} from download endpoints means dedupe rejected all candidates—check smartcrop_upload_files.
  • Provider mismatch logs indicate the filename prefix did not align with any configured group.
  • Parsing errors usually stem from header mismatches; enable DEBUG logging to inspect per-row failures.

SmartCrop Batch Job (src/app/smartcrop_batchjob)

This module builds the nightly manifest that the legacy downstream systems consume.

Endpoint

Method Path Purpose Workflow Highlights
POST /batchjob/run Execute batch export. BatchJobRunner pulls rows via BatchJobDataService, writes a timestamped CSV, uploads to S3 or SFTP, and updates programme rows with the emitted filename.

Workflow Overview

  • BatchJobRunner reads configuration from smartcrop_filewatcher_batchjob (output directory, view/table names, row restriction, transfer targets).
  • BatchJobDataService.collect_rows enriches export rows, rewriting image URLs to the CDN and resolving missing image type metadata.
  • BatchJobDataAccess.update_output_filenames stamps smartcrop_programimage with the exported filename and handles QA/operator status transitions when QA is disabled.
  • Transfers are handled by S3TransferService (AWS Boto client) and SftpTransferService (Paramiko), both logging each upload.

Tables & Views

  • Primary view: smartcrop_filewatcher_batchjob.view_name (typically a view over programme images).
  • Tables touched: smartcrop_programimage, smartcrop_aspectratioimages, smartcrop_imagetype_lookup, smartcrop_role.

FTP & Storage Notes

  • S3 bucket/prefix reuse the aws_config.service_accounts[].uat.smartcrop_bucket path.
  • SFTP output and audit directories come from smartcrop_ftp_config.directories with ports supplied by smartcrop_ftp_config.ports.output|audit.
  • Usernames/passwords are stored base64 encoded in smartcrop_ftp_config.credentials.

SmartCrop Dump Data (src/app/smartcrop_dumpdata)

Dump Data exports ad-hoc CSV snapshots for reporting and auditing.

Endpoint

Method Path Purpose Workflow Highlights
POST /dumpdata/run?date_range={1-30} Generate exports. DumpDataService.run executes SQL for programme, aspect, summary, and audit datasets, writes CSVs under smartcrop_dumpdata.local_path, then uploads primary files to the SFTP output directory and audit files to the audit directory.

Workflow Overview

  • Configuration merges smartcrop_dumpdata (local paths & filenames) with smartcrop_ftp_config (shared FTP credentials).
  • SQL sources include vwProgramImage, vwAspectRatioImages, smartcrop_reports(date_range), and programimage_audit_reports(date_range).
  • Each CSV is logged and removed locally after successful SFTP upload.

FTP & Storage Notes

  • Uses the same SFTP host as Filewatcher/Batch with optional distinct audit port/directory.
  • Credentials are decoded from base64 on the fly before opening the Paramiko SFTP session.

Tables & Views

  • Views/functions: vwProgramImage, vwAspectRatioImages, smartcrop_reports, programimage_audit_reports.
  • Output filenames are timestamped using the prefix values from configuration (program, aspect, summary, audit).

Coding Style & Naming

  • Python 3.12, 4-space indent, type hints encouraged.
  • Ruff in pyproject.toml (line length 88, target-version = py312).
  • Prefer Black defaults; keep line length consistent with Ruff (88).
  • Naming: snake_case files/functions, PascalCase classes, UPPER_SNAKE_CASE constants.
  • Keep handlers thin; business logic in src/app/**/core/.

Testing Guidelines

  • Use pytest; place tests under src/tests/ named test_*.py.
  • Add unit tests for pure functions and handlers; mock I/O, network, AWS.
  • Aim for solid coverage on changed code; include error-path tests.
  • Run locally with pytest -q before submitting.

Commit & Pull Requests

  • Conventional Commits: feat:, fix:, docs:, refactor:, test:, chore:.
  • Commits: imperative subject, concise body when needed.
  • PRs: clear description, link issues, outline tests, include example requests/responses for new endpoints.

Security & Config

  • Configuration loads from env.yml and src/settings/config.yml. Do not commit secrets; use env vars for sensitive values.
  • Set DEPLOYMENT_CONFIG__SERVER to dev (local) or aws (prod).
  • Review logs under logs/; avoid uploading real data to the repo.

folder structure

tivo-smartcrop-python
|   .venv
|   .vscode
|   logs
|   secrets
|   src
|   |   app
|   |   |   |   smartcrop
|   |   auth
|   |   |   __pycache__
|   |   +-- auth.py
|   |   cloud_services
|   |   |   aws
|   |   |   |   __pycache__
|   |   |   +-- aws_wrapper.py
|   |   |   +-- dynamodb_utils.py
|   |   |   +-- storage_utils.py
|   |   common
|   |   |   __pycache__
|   |   +-- common.py
|   |   +-- context.py
|   |   +-- exceptions.py
|   |   +-- responses.py
|   |   +-- traceback_utils.py
|   |   +-- validators.py
|   |   core
|   |   |   orchestration
|   |   |   |   __pycache__
|   |   |   +-- api_orchestrator.py
|   |   |   +-- __init__.py
|   |   |   __pycache__
|   |   +-- env_loader.py
|   |   +-- log.py
|   |   data
|   |   |
|   |   db
|   |   |   postgresql
|   |   |   |   __pycache__
|   |   |   +-- postgresql.py
|   |   |   +-- schema.sql
|   |   devops
|   |   |   scripts
|   |   |   +-- add_semantic_ver_tag.sh
|   |   |   +-- format_and_lint.sh
|   |   |   +-- reset_dev_env__linux.sh
|   |   |   +-- reset_dev_env__windows.sh
|   |   |   +-- tree_view_powershell.ps1
|   |   docs
|   |   |   api_documentation
|   |   handlers
|   |   |   __pycache__
|   |   +-- generic_handlers.py
|   |   settings
|   |   |   __pycache__
|   |   +-- aws_config.py
|   |   +-- config.py
|   |   +-- config.yml
|   |   tests
|   |   |   aws
|   |   |   +-- test_dynamodb.py
|   |   +-- conftest.py
|   |   +-- standalone_calls.py
|   |   +-- test_logging.py
|   |   +-- test_transaction.py
|   |   utils
|   |   |   __pycache__
|   |   +-- transaction.py
|   |   +-- utils.py
+-- .eslintrc.yml
+-- Api-Dockerfile
+-- env.yml
+-- main.py
+-- package-lock.json
+-- pyproject.toml
+-- README.md
+-- start_app.py

Key Directories:

  • app/: Application-specific modules

  • auth/: Authentication and authorization logic

  • cloud_services/: Cloud service integrations

    • aws/: AWS service wrappers and utilities (S3, DynamoDB, SQS)
  • common/: Global shared utilities

    • Exception handling
    • Request context management
    • Response formatting
    • Input validation
  • core/: Core system components

    • orchestration/: API workflow orchestration and management
    • Environment configuration
    • Logging system
  • data/: Data resources and files

    • parquet/: Parquet format data storage
  • db/: Database operations

    • postgresql/: PostgreSQL integration and queries
  • devops/: DevOps utilities

    • release_notes/: Release notes generation utilities
    • scripts/: Automation and maintenance scripts
  • docs/: Documentation

    • api_documentation/: API specifications and guides
    • Implementation details
    • Deployment guides
  • handlers/: Generic request handlers

  • settings/: Global application configuration

  • tests/: Test suites and utilities

    • aws/: AWS-specific test modules
  • utils/: General utility functions

API Endpoints

SmartCrop Filewatcher

Method Path Handler Description Notes
POST /filewatcher/run-local run_local (src/app/smartcrop_filewatcher/handlers/ingestion_handlers.py) Process Excel files that already exist on disk. Requires query parameter folder_path pointing to a local directory accessible by the service. Returns {"status": "success"} or {"status": "failed"}.
POST /filewatcher/download-s3 download_s3 Lists configured S3 prefixes, downloads new Excel files, and records them in smartcrop_upload_files. Response is {"status": "success"} when at least one file downloads, otherwise {"status": "no_files"}.
POST /filewatcher/download-sftp download_sftp Connects to configured SFTP sources, pulls new Excel files, and records them in smartcrop_upload_files. Response is {"status": "success"} or {"status": "no_files"} depending on download activity.

SmartCrop Filewatcher Module

Purpose and Context

The SmartCrop Filewatcher module ingests provider spreadsheets into the SmartCrop data pipeline. It mirrors legacy .NET services by downloading Excel files from external sources, parsing provider-specific schemas, and persisting normalized records to PostgreSQL tables used by downstream image-processing jobs.

End-to-End Workflow

  1. AcquisitionS3Downloader and SFTPDownloader copy Excel files into the local processing directory configured under processing_middleware and record each successful download in smartcrop_upload_files.
  2. Local ProcessingFileService.run_local_folder scans the working directory, matches each filename to a provider implementation, and converts rows into InputFileModel objects.
  3. Persistence & DerivationDataService.save_data batches records into smartcrop_inputfiles, then derives missing rows for smartcrop_programimage so image jobs can reference newly ingested metadata.

Module Breakdown

  • Handlers
    • src/app/smartcrop_filewatcher/handlers/ingestion_handlers.py – Exposes the three FastAPI endpoints listed above. Instantiates core services on demand and returns simple JSON status payloads.
  • Core
    • src/app/smartcrop_filewatcher/core/downloaders.py – Implements S3Downloader and SFTPDownloader. Uses boto3 and paramiko respectively, ensures idempotency via smartcrop_upload_files, and normalizes timestamps through _normalize_created_at.
    • src/app/smartcrop_filewatcher/core/file_service.py – Orchestrates an ingestion run. Resolves the local staging directory from app_settings, picks provider readers based on filename prefixes, and hands parsed records to DataService.
    • src/app/smartcrop_filewatcher/core/data_service.py – Handles all database writes. Converts provider output into pandas DataFrames, enforces numeric coercions, inserts into smartcrop_inputfiles, and back-fills smartcrop_programimage.
    • src/app/smartcrop_filewatcher/core/providers.py – Contains provider-specific Excel parsers (PressSiteProvider, TMDBProvider, SlingProvider, EmailProvider). Each implements read_excel and returns a list of InputFileModel rows using helper utilities.
    • src/app/smartcrop_filewatcher/core/typing_aliases.py – Defines shared type aliases such as InputList for improved readability across provider classes.
  • Common
    • src/app/smartcrop_filewatcher/common/constants.py – Houses reusable constants like RIGHTS_NAME_DEFAULT and IMAGE_NOTES_DEFAULT.
    • src/app/smartcrop_filewatcher/common/models.py – Declares the InputFileModel dataclass with the full schema required by SmartCrop ingestion (mirrors legacy .NET InputFileModel).
  • Utilities
    • src/app/smartcrop_filewatcher/utils/excel_utils.py – Wraps pandas Excel loading (read_excel_as_dataframe) and safe cell extraction (get_column_value) used by provider readers.
  • Integration Points
    • src/core/log.py – Supplies the logger.log_agnostic interface leveraged throughout the module for structured logging.
    • src/db/postgresql/postgresql.py – Exposes postgresql_obj__xperi_db, used for deduplication checks (check_entry_exists), SQL execution, and bulk inserts.
    • src/settings/config.py – Loads app_settings, including SmartCrop-specific processing_middleware directories and source definitions for S3/SFTP.
    • src/cloud_services/aws/storage_utils.py – Provides S3 helpers referenced by FileService when additional AWS interactions are required.

Data Persistence

  • smartcrop_upload_files – Tracks downloaded files. Inserted via _insert_smartcrop_upload_files_if_new, ensuring each (filename, created_at) pair is unique before downloading from S3/SFTP.
  • smartcrop_inputfiles – Primary landing table for parsed rows. Populated in bulk with DataService.save_data.
  • smartcrop_programimage – Derived table populated with rows that do not yet exist for inputfileref. DataService appends defaults for state, audit fields, and timestamps to match legacy expectations.

Configuration Keys

  • processing_middleware.middleware[PROCESSING_MIDDLEWARE].root_path and folder_name define where the module stages files locally; PROCESSING_MIDDLEWARE defaults to ec2_instance.
  • processing_middleware.local_filesystem.root_path provides the fallback directory used by downloaders when no override is supplied.
  • filewatcher.s3_sources and filewatcher.sftp_sources (within app_settings) enumerate source-specific connection details such as prefixes, credentials, and optional proxy settings.

Logging and Monitoring

  • Each major step logs at INFO level (start/end of download cycles, per-file outcomes) with DEBUG entries capturing decisions like deduplication checks and skipped files.
  • Errors are recorded via logger.log_agnostic("ERROR", ...) alongside JSON-serialized tracebacks when present, making it easier to trace failures in CloudWatch or local log files under logs/.

Running Locally

  1. Start the FastAPI application through python start_app.py (ensure DEPLOYMENT_CONFIG__SERVER=dev).
  2. Confirm database credentials and SmartCrop tables (smartcrop_upload_files, smartcrop_inputfiles, smartcrop_programimage) exist.
  3. Use an HTTP client (e.g., curl, Postman) to invoke the desired endpoint:
    • Trigger downloads from S3 or SFTP to populate the staging directory.
    • Run /filewatcher/run-local?folder_path=/path/to/staging to parse and persist Excel content.
  4. Monitor logs and database tables to verify ingestion results and troubleshoot using the debug statements outlined above.

Troubleshooting Tips

  • Responses of {"status": "no_files"} indicate that no new Excel files passed the deduplication check; inspect smartcrop_upload_files to confirm existing entries.
  • If provider matching fails, ensure filenames follow the expected prefixes defined in FileService.group_a/group_b/group_c.
  • For parsing errors, enable DEBUG logging to review per-row extraction and confirm column headers align with provider expectations (case-sensitive after trimming).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages