Skip to content
This repository was archived by the owner on Jan 5, 2026. It is now read-only.

bodharma/data_handler

Repository files navigation

Data Handler

A comprehensive data scraping, processing, and integration system built with Python and AWS Lambda. This project consists of multiple specialized modules for web scraping, data transformation, and API integration with Notion and Prom.ua platforms.

Table of Contents

Project Overview

Data Handler is a multi-purpose data processing system designed to automate the collection, transformation, and distribution of data from various sources. The project consists of three primary use cases:

  1. Real Estate Scraping (Dom.ria): Automated scraping of real estate listings from Dom.ria API
  2. E-commerce Product Scraping (STN Craft): Web scraping of product information from STN Craft website
  3. Notion Integration: Data export/import workflows between Notion databases and e-commerce platforms
  4. Dog Breeding Data: Specialized scraper for collie breeding information with genealogy visualization

The system is built with serverless architecture using AWS Lambda functions, scheduled to run daily via CloudWatch Events, with data storage in Amazon S3.

Architecture

High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                       Data Handler System                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌────────────────┐        ┌──────────────────┐                │
│  │  GitHub Actions│───────▶│  AWS SAM CLI     │                │
│  │  CI/CD Pipeline│        │  Build & Deploy  │                │
│  └────────────────┘        └──────────────────┘                │
│                                     │                            │
│                                     ▼                            │
│              ┌──────────────────────────────────┐               │
│              │   AWS Lambda Functions           │               │
│              ├──────────────────────────────────┤               │
│              │                                  │               │
│              │  ┌──────────────────────────┐   │               │
│              │  │ Dom.ria Scraper         │   │               │
│              │  │ (Scheduled: Daily 6AM)  │───┼──┐            │
│              │  └──────────────────────────┘   │  │            │
│              │                                  │  │            │
│              │  ┌──────────────────────────┐   │  │            │
│              │  │ STN Craft Scraper       │   │  │            │
│              │  │ (Scheduled: Daily 6AM)  │───┼──┤            │
│              │  └──────────────────────────┘   │  │            │
│              │                                  │  │            │
│              └──────────────────────────────────┘  │            │
│                                                     ▼            │
│  ┌────────────────┐        ┌──────────────────────────────┐    │
│  │  External APIs │        │     Amazon S3 Storage        │    │
│  ├────────────────┤        │  (eu-central-1-scraper-data) │    │
│  │ • Dom.ria API  │        │                              │    │
│  │ • Notion API   │        │  /dom-ria/YYYY/MM/DD/        │    │
│  │ • STN Craft    │        │  /stn-craft/YYYY/MM/DD/      │    │
│  │ • Collie.com   │        └──────────────────────────────┘    │
│  └────────────────┘                                             │
│         │                                                        │
│         ▼                                                        │
│  ┌────────────────────────────────────────┐                    │
│  │      Local Processing Modules          │                    │
│  ├────────────────────────────────────────┤                    │
│  │ • Notion Integration (magic_shop)      │                    │
│  │ • Collie Dog Data Parser               │                    │
│  │ • Prom.ua CSV Export                   │                    │
│  └────────────────────────────────────────┘                    │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

Component Architecture

The system follows a modular architecture with clear separation of concerns:

  • API Layer: Abstraction for external API integrations (Notion, Prom)
  • Scraper Layer: Lambda functions for automated data collection
  • Processing Layer: Data transformation and normalization utilities
  • Storage Layer: S3-based persistent storage with date-based partitioning

Technology Stack

Core Technologies

  • Language: Python 3.9
  • Cloud Platform: Amazon Web Services (AWS)
  • Serverless Framework: AWS SAM (Serverless Application Model)
  • Package Management: Pipenv
  • CI/CD: GitHub Actions

Key Libraries and Frameworks

Web Scraping

  • requests-html (0.10.0): HTML parsing and JavaScript rendering
  • httpx (0.22.0): Modern HTTP client with async support
  • BeautifulSoup4 (4.10.0): HTML/XML parsing
  • lxml (4.7.1): Fast XML/HTML processing
  • pyquery (1.4.3): jQuery-like HTML manipulation

Data Processing

  • pandas (1.4.0): Data manipulation and analysis
  • numpy (1.22.2): Numerical computing

AWS Integration

  • boto3 (1.20.52): AWS SDK for Python
  • aws-lambda-powertools (1.25.0): Lambda utilities and logging
  • s3fs: S3 filesystem interface
  • fsspec (2022.1.0): Filesystem abstraction

Utilities

  • loguru (0.6.0): Simplified logging
  • pydot (1.4.2): Graph visualization (for genealogy trees)

Infrastructure

  • AWS Lambda: Serverless compute for scrapers
  • Amazon S3: Data storage with date-based partitioning
  • CloudWatch Events: Scheduled Lambda execution (cron-based)
  • AWS CloudFormation: Infrastructure as Code via SAM templates
  • GitHub Actions: Automated CI/CD pipeline

Project Structure

data_handler/
│
├── .github/
│   └── workflows/
│       ├── build_dom_ria_scraper.yaml      # CI/CD for Dom.ria scraper
│       └── build_stn_craft_scraper.yaml    # CI/CD for STN Craft scraper
│
├── api/                                     # API Integration Layer
│   ├── notion.py                           # Notion API client
│   └── prom.py                             # Prom.ua integration
│
├── collie/                                  # Dog Breeding Data Module
│   ├── main.py                             # Collie data scraper
│   └── dogs_data.json                      # Dog breeding data
│
├── dom-ria-scraper-lambda/                 # Real Estate Scraper
│   ├── scraper/
│   │   ├── app.py                          # Lambda handler
│   │   ├── requirements.txt                # Dependencies
│   │   └── __init__.py
│   ├── dom.yaml                            # SAM template
│   ├── README.md                           # Module documentation
│   └── __init__.py
│
├── magic_shop/                             # Notion Data Export Module
│   └── notion_magic_db.py                  # Notion to Prom export
│
├── stn-craft-scraper-lambda/               # E-commerce Scraper
│   ├── scraper/
│   │   ├── app.py                          # Lambda handler
│   │   ├── requirements.txt                # Dependencies
│   │   ├── templates/
│   │   │   └── prom_import_template.csv    # Prom.ua CSV template
│   │   └── __init__.py
│   ├── tests/                              # Test suite
│   │   ├── unit/
│   │   ├── integration/
│   │   └── requirements.txt
│   ├── runes.yaml                          # SAM template
│   ├── README.md                           # Module documentation
│   └── __init__.py
│
├── Makefile                                # Build automation
├── Pipfile                                 # Python dependencies
├── Pipfile.lock                            # Locked dependencies
└── helpers.py                              # Shared utilities

Modules and Components

1. Dom.ria Scraper Lambda (dom-ria-scraper-lambda/)

Purpose: Automated scraping of real estate listings from Dom.ria API

Key Features:

  • Rate-limited API requests (1000 requests/hour)
  • Pagination handling for large result sets
  • Retry logic with timeout handling
  • Incremental data export to S3
  • Structured logging with AWS Lambda Powertools

Data Collection:

  • Flat/apartment listings in Odessa region
  • Search criteria: 2-3 rooms, specific metro stations
  • Daily scheduled execution at 6:00 AM UTC

Implementation Details:

class DomScraper:
    - get_flats_ids_df(): Retrieves all listing IDs with pagination
    - get_flat_info(flat_id): Fetches detailed information per listing
    - export_flats_ids_to_s3(): Saves listing IDs with checkpoint
    - export_flats_info_to_s3(): Saves detailed listing data
    - quota_is_not_reached(): Rate limiting management

S3 Storage Pattern: s3://bucket/dom-ria/YYYY/MM/DD/HH/MM/flat_*.csv

2. STN Craft Scraper Lambda (stn-craft-scraper-lambda/)

Purpose: E-commerce product scraping from STN Craft website

Key Features:

  • HTML parsing with requests-html
  • Product data extraction (name, price, description, images)
  • Prom.ua CSV export format
  • Multi-page scraping support

Data Collection:

  • Rune sets from product category pages
  • Product details: title, price, description, images
  • Automatic CSV generation for Prom.ua import

Implementation Details:

class ExportSTNCraft:
    - get_links(): Extract links from page with filtering
    - get_page(): Fetch page HTML
    - parse_product_details(): Extract product information
    - export_runes_data(): Main scraping orchestration

class Import2Prom:
    - build_prom_csv(): Generate Prom.ua import CSV

S3 Storage Pattern: s3://bucket/stn-craft/YYYY/MM/DD/stn_craft_runes_prom.csv

3. Notion API Integration (api/notion.py)

Purpose: Bidirectional data sync with Notion databases

Key Features:

  • Database schema introspection
  • Type-safe data import with schema validation
  • Data export with normalization
  • Support for multiple field types (text, select, multiselect, date, files)

Supported Field Types:

  • Rich text
  • Title
  • Select / Multi-select
  • Date
  • Files / Images
  • URL

Implementation Details:

class Notion:
    - build_table_request(): Construct Notion API request
    - import_data_to_table(): Insert rows into database
    - export_data_from_table(): Query and retrieve data
    - convert_notion_to_dict(): Normalize Notion response
    - get_table_column_names_list(): Schema introspection

4. Prom.ua Integration (api/prom.py)

Purpose: E-commerce data transformation for Prom.ua platform

Key Features:

  • CSV generation following Prom.ua import format
  • Data normalization and validation
  • Currency and measurement unit standardization

Implementation Details:

class Prom:
    - convert_notion2prom(): Transform Notion data to Prom format
    - build_prom_csv(): Generate importable CSV file

5. Collie Dog Data Module (collie/)

Purpose: Specialized scraper for dog breeding information with genealogy

Key Features:

  • Multi-page dog catalog scraping
  • Genealogy tree extraction and visualization
  • Graph generation with pydot
  • Notion database integration for dog records

Data Collection:

  • Dog profiles (name, gender, category)
  • Images and photo galleries
  • Pedigree information (4 generations)
  • Descriptive text and characteristics

Implementation Details:

Functions:
    - parse_dog_page(): Main orchestration
    - get_dog_details(): Extract profile and genealogy
    - build_dict_data_from_links(): Organize data structure
    - import_data_to_notion_db(): Upload to Notion
    - visit(): Graph traversal for genealogy visualization

Genealogy Visualization: Generates PNG graph files using pydot

6. Magic Shop Module (magic_shop/)

Purpose: Automated data export from Notion to Prom.ua

Key Features:

  • Notion database querying
  • Data normalization with pandas
  • CSV export with timestamp organization
  • Currency and measurement unit injection

Workflow:

  1. Export data from Notion database
  2. Normalize field types to flat dictionary
  3. Add required Prom.ua fields (currency, measure)
  4. Save to date-structured path

7. Shared Utilities (helpers.py)

Purpose: Common functionality across modules

Key Functions:

- create_todays_path(): Generate date-based directory structure

Date Structure: YYYY/MM/DD hierarchy for organized data storage

Dependencies

Production Dependencies (Pipfile)

[packages]
requests-html = "*"      # Web scraping with JavaScript support
pandas = "*"             # Data manipulation
httpx = "*"              # Modern HTTP client
pydot = "*"              # Graph visualization
loguru = "*"             # Structured logging
boto3 = "*"              # AWS SDK
fsspec = "*"             # Filesystem abstraction
aws-lambda-powertools = "*"  # Lambda utilities
s3fs = "*"               # S3 filesystem interface

Lambda-Specific Dependencies

Both Lambda functions share similar requirements:

aws-lambda-powertools==1.25.0
httpx==0.22.0
pandas==1.4.0
requests-html==0.10.0
boto3==1.20.52
s3fs
loguru==0.6.0
beautifulsoup4==4.10.0
numpy==1.22.2

Setup and Installation

Prerequisites

  • Python 3.9
  • AWS Account with appropriate permissions
  • AWS CLI configured
  • Docker (for SAM local testing)
  • Git

Local Development Setup

  1. Clone the repository:
git clone <repository-url>
cd data_handler
  1. Install pipenv:
pip install pipenv
  1. Install dependencies:
make install-dependencies
# Or manually:
pipenv install --dev
  1. Activate virtual environment:
pipenv shell

Environment Variables

Dom.ria Scraper

RIA_API_KEY=<your-dom-ria-api-key>
S3_BUCKET=eu-central-1-scraper-data

STN Craft Scraper

S3_BUCKET=eu-central-1-scraper-data

Notion Integration

NOTION_TOKEN=<your-notion-integration-token>
NOTION_DB_ID=<your-database-id>

AWS Configuration

  1. Create S3 bucket:
aws s3 mb s3://eu-central-1-scraper-data --region eu-central-1
  1. Configure AWS credentials:
aws configure

Deployment

Automated Deployment via GitHub Actions

The project includes two GitHub Actions workflows for automated CI/CD:

Dom.ria Scraper Deployment:

  • Triggered on push to main or master branches
  • Workflow file: .github/workflows/build_dom_ria_scraper.yaml
  • Stack name: dom-scraper

STN Craft Scraper Deployment:

  • Triggered on push to main or master branches
  • Workflow file: .github/workflows/build_stn_craft_scraper.yaml
  • Stack name: runes-scraper

Manual Deployment

Deploy Dom.ria Scraper

cd dom-ria-scraper-lambda
sam build --use-container -t dom.yaml
sam package \
  --s3-bucket eu-central-1-scraper-data \
  --output-template-file packaged.yaml \
  --region eu-central-1
sam deploy \
  --template-file packaged.yaml \
  --stack-name dom-scraper \
  --capabilities CAPABILITY_IAM \
  --region eu-central-1

Deploy STN Craft Scraper

cd stn-craft-scraper-lambda
sam build --use-container -t runes.yaml
sam package \
  --s3-bucket eu-central-1-scraper-data \
  --output-template-file packaged.yaml \
  --region eu-central-1
sam deploy \
  --template-file packaged.yaml \
  --stack-name runes-scraper \
  --capabilities CAPABILITY_IAM \
  --region eu-central-1

Local Testing

Test Lambda Functions Locally

# Dom.ria Scraper
cd dom-ria-scraper-lambda
sam local invoke DomRiaScraperFunction

# STN Craft Scraper
cd stn-craft-scraper-lambda
sam local invoke STNCraftScraperFunction

Run Unit Tests

cd stn-craft-scraper-lambda
pip install -r tests/requirements.txt
python -m pytest tests/unit -v

Usage Examples

1. Running Dom.ria Scraper

Lambda Handler: The Lambda function runs automatically via CloudWatch Events (daily at 6:00 AM), but can also be invoked manually:

# app.py main execution
if __name__ == "__main__":
    dom = DomScraper()
    flats_id_df = dom.get_flats_ids_df()
    dom.get_flats_data(flats_id_df)

Key Features:

  • Automatic rate limiting (1000 requests/hour)
  • Checkpointing for resumable scraping
  • Graceful quota handling with early exit

2. Running STN Craft Scraper

Lambda Handler:

def lambda_handler(event, context):
    parsed_df = ExportSTNCraft().export_runes_data()
    Import2Prom().build_prom_csv(parsed_df)

    return {
        "statusCode": 200,
        "body": json.dumps({
            "message": "Products successfully parsed and saved to s3 bucket"
        }),
    }

3. Notion Data Export

Export from Notion Database:

from api.notion import Notion

notion = Notion(token="your-token")
notion_data = notion.export_data_from_table(_id="database-id")
normalized_data = notion.convert_notion_to_dict(input_data=notion_data)

# Convert to pandas DataFrame
import pandas as pd
df = pd.DataFrame(normalized_data)
df.to_csv('exported_data.csv')

Import to Notion Database:

notion = Notion(token="your-token")

# Build request
request_body = notion.build_table_request(
    db_id="database-id",
    Name={'data_type': 'title', 'value': 'Product Name'},
    Price={'data_type': 'text', 'value': '100'},
    Category={'data_type': 'select', 'value': 'Electronics'}
)

# Import data
notion.import_data_to_table(request_body)

4. Collie Dog Data Scraping

Run the scraper:

from collie.main import parse_dog_page

# Scrape dog data
dog_data = parse_dog_page()

# Data structure:
# {
#   "category": {
#     "dog_name": {
#       "link": "...",
#       "details": {
#         "images_list": [...],
#         "breed_tree": {...},
#         "dog_description": [...]
#       }
#     }
#   }
# }

Generate genealogy graph:

import pydot

graph = pydot.Dot(graph_type='graph')
visit(dog_data['details']['breed_tree'])
graph.write_png('dog_genealogy.png')

5. Magic Shop: Notion to Prom Export

Export and convert:

from magic_shop.notion_magic_db import export_data_from_notion

# Exports Notion data and saves as CSV
df = export_data_from_notion()

# Output: data/YYYY/MM/DD/exported_notion_db_data.csv

CI/CD Pipeline

GitHub Actions Workflow Architecture

Both scrapers use identical CI/CD pipeline structure:

Pipeline Steps:

  1. Checkout: Clone repository
  2. Setup Python: Install Python 3.9.10
  3. Setup SAM: Install AWS SAM CLI
  4. Configure AWS: Authenticate with AWS credentials
  5. Build: Create Lambda deployment package with Docker
  6. Package: Upload artifacts to S3
  7. Deploy: Deploy CloudFormation stack

Secrets Required:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY

Deployment Region: eu-central-1

Stack Names:

  • Dom.ria: dom-scraper
  • STN Craft: runes-scraper

Continuous Deployment Strategy

  • Trigger: Push to main or master branches
  • Strategy: Full stack replacement
  • Rollback: CloudFormation automatic rollback on failure
  • Zero Downtime: Lambda versioning with alias updates

Configuration

SAM Templates

Dom.ria Scraper (dom.yaml)

Globals:
  Function:
    Timeout: 900          # 15 minutes
    MemorySize: 256       # 256 MB
    Environment:
      Variables:
        RIA_API_KEY: <api-key>
        S3_BUCKET: eu-central-1-scraper-data

Resources:
  DomRiaScraperFunction:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: python3.9
      Handler: app.lambda_handler
      Events:
        Schedule:
          Type: Schedule
          Properties:
            Schedule: cron(0 6 * * ? *)  # Daily at 6 AM UTC

STN Craft Scraper (runes.yaml)

Globals:
  Function:
    Timeout: 900          # 15 minutes
    MemorySize: 256       # 256 MB
    Environment:
      Variables:
        S3_BUCKET: eu-central-1-scraper-data

Resources:
  STNCraftScraperFunction:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: python3.9
      Handler: app.lambda_handler
      Events:
        Schedule:
          Type: Schedule
          Properties:
            Schedule: cron(0 6 * * ? *)  # Daily at 6 AM UTC

Makefile Commands

help                    # Display available commands
install-dependencies    # Install pipenv and all dependencies
create-artifact-bucket  # Create S3 bucket for SAM artifacts
deploy-stn-craft-scraper # Build and deploy STN Craft scraper

Data Flow

Dom.ria Scraper Flow

1. CloudWatch Event (6:00 AM UTC)
   │
   ▼
2. Lambda Invocation
   │
   ▼
3. Get Listing IDs (paginated)
   │  - API Rate Limiting: 1000 req/hour
   │  - Checkpoint every page
   │
   ▼
4. For Each Listing ID:
   │  - Fetch detailed information
   │  - Handle timeouts and retries
   │  - Quota monitoring
   │
   ▼
5. Export to S3
   │  - Path: /dom-ria/YYYY/MM/DD/HH/MM/
   │  - Files: flat_ids_list_*.csv, flats_data_list_*.csv
   │
   ▼
6. Lambda Completion

STN Craft Scraper Flow

1. CloudWatch Event (6:00 AM UTC)
   │
   ▼
2. Lambda Invocation
   │
   ▼
3. Scrape Product Pages
   │  - Category page: /product-category/rune-sets/
   │  - Extract product links
   │
   ▼
4. For Each Product:
   │  - Parse product details
   │  - Extract images, price, description
   │  - Build product dictionary
   │
   ▼
5. Convert to Prom Format
   │  - Load CSV template
   │  - Normalize prices
   │  - Add currency and units
   │
   ▼
6. Export to S3
   │  - Path: /stn-craft/YYYY/MM/DD/
   │  - File: stn_craft_runes_prom.csv
   │
   ▼
7. Lambda Completion

Notion Integration Flow

1. Manual Execution
   │
   ▼
2. Query Notion Database
   │  - Database ID from environment
   │  - API Version: 2021-08-16
   │
   ▼
3. Convert Notion Types
   │  - Rich text → plain text
   │  - Select → name
   │  - Files → URL list
   │
   ▼
4. Normalize to DataFrame
   │  - Flatten nested structures
   │  - Handle missing values
   │
   ▼
5. Export or Import
   │  - Export: CSV to date-structured path
   │  - Import: Build Notion API request
   │
   ▼
6. Completion

AWS Resources

Lambda Functions

Dom.ria Scraper:

  • Function Name: DomRiaScraperFunction
  • Runtime: Python 3.9
  • Memory: 256 MB
  • Timeout: 900 seconds (15 minutes)
  • Trigger: CloudWatch Events (cron)
  • Permissions: S3 write access

STN Craft Scraper:

  • Function Name: STNCraftScraperFunction
  • Runtime: Python 3.9
  • Memory: 256 MB
  • Timeout: 900 seconds (15 minutes)
  • Trigger: CloudWatch Events (cron)
  • Permissions: S3 write access

S3 Bucket Structure

Bucket Name: eu-central-1-scraper-data

Structure:

s3://eu-central-1-scraper-data/
├── dom-ria/
│   └── YYYY/
│       └── MM/
│           └── DD/
│               └── HH/
│                   └── MM/
│                       ├── flat_ids_list_HH_page_N.csv
│                       └── flats_data_list_HH_page.csv
│
└── stn-craft/
    └── YYYY/
        └── MM/
            └── DD/
                └── stn_craft_runes_prom.csv

CloudWatch Events Rules

Dom.ria Schedule:

  • Rule Name: daily-run-dom-ria
  • Schedule: cron(0 6 * * ? *)
  • Description: Daily run at 06:00 AM UTC
  • State: Enabled

STN Craft Schedule:

  • Rule Name: daily-run
  • Schedule: cron(0 6 * * ? *)
  • Description: Daily run at 06:00 AM UTC
  • State: Enabled

IAM Roles

Auto-generated by CloudFormation with least-privilege permissions:

  • CloudWatch Logs write access
  • S3 bucket read/write access
  • Lambda execution role

Development Notes

Recent Changes (Git History)

6cf02f8 - fixed pandas errors
c66c1e3 - fixed pandas errors
64bcbcb - adding env variables
a1e78b0 - adding env variables
cdff19b - adding env variables
ab773bf - adding env variables
d6956b0 - updated: change mkdir to mkdirstring
0f9d38e - changed cron name
7d46ccb - changed pipeline name
435ddf2 - added: dom-ria scraper updated: stn data

Known Issues and Considerations

  1. API Keys: Hard-coded API keys present in configuration files (should be rotated before archival)
  2. Rate Limiting: Dom.ria API limited to 1000 requests/hour
  3. Pandas Deprecation: .append() method deprecated in newer pandas versions
  4. Error Handling: Some functions use recursive retry logic which could stack overflow
  5. S3 Permissions: Lambda functions require appropriate IAM roles for S3 access

Design Decisions

  1. Date-based Partitioning: S3 storage organized by date for easy data lifecycle management
  2. Serverless Architecture: No infrastructure management, automatic scaling
  3. Scheduled Execution: Daily runs at 6:00 AM UTC to capture fresh data
  4. CSV Output Format: Prom.ua-compatible format for direct import
  5. Checkpointing: Intermediate saves to handle quota limits and timeouts

Future Improvements (Not Implemented)

  1. Move API keys to AWS Secrets Manager
  2. Implement SNS notifications for scraper failures
  3. Add data validation and quality checks
  4. Implement incremental scraping (delta detection)
  5. Add CloudWatch dashboards for monitoring
  6. Upgrade pandas code to use concat() instead of deprecated append()
  7. Add comprehensive error handling and alerting
  8. Implement data archival policies for S3

Testing Strategy

  • Unit tests exist for STN Craft scraper (stn-craft-scraper-lambda/tests/)
  • Integration tests scaffolding present but minimal coverage
  • Local testing available via SAM CLI
  • No automated test execution in CI/CD pipeline

Performance Characteristics

Dom.ria Scraper:

  • Average execution: 10-15 minutes (depends on listing count)
  • API calls: ~100-150 per execution
  • Data output: 2-5 MB per day

STN Craft Scraper:

  • Average execution: 2-5 minutes
  • HTTP requests: ~30-50 per execution
  • Data output: <1 MB per day

Logging and Monitoring

  • Lambda Powertools: Structured JSON logging
  • Log Groups: /aws/lambda/DomRiaScraperFunction, /aws/lambda/STNCraftScraperFunction
  • Metrics: Available via CloudWatch Lambda metrics
  • Retention: Default CloudWatch Logs retention

License

No license information provided in the repository.

Project Status

ARCHIVED - This repository is being archived. All documentation has been preserved for future reference.

Archive Date

January 5, 2026

Archival Context

This project served as a personal data automation and integration system combining web scraping, cloud infrastructure, and API integrations. The codebase demonstrates serverless architecture patterns, automated data pipelines, and multi-platform data synchronization.

Contact Information

No contact information provided in the repository.


Archive Note: This README was generated for archival purposes and represents the complete state of the project as of the archival date. All code, configurations, and documentation are preserved as-is for historical reference.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published