## How I'd build this on a tight budget:

### Event Collection & Storage

**Real-time ingestion:**
- **Amazon Kinesis Data Firehose** → S3 (not Kinesis Streams - Firehose is cheaper)
- Mobile app sends events via API Gateway → Lambda → Firehose
- Firehose batches and compresses before writing to S3 (major cost saver)

**Storage strategy:**
- **Raw events**: S3 in Parquet format, partitioned by date (`s3://bucket/events/year=2025/month=01/day=15/`)
- Keep last 90 days in Standard, move to Glacier for longer retention
- **Processed data**: AWS Glue Data Catalog + Athena for querying (serverless, pay-per-query)

**Why not Kinesis Streams?** Firehose costs '~$0.029/GB vs Streams at $0.015/hr per shard. For a startup, Firehose's automatic scaling is worth the slight premium.

### Training & Feature Pipelines

**Batch training:**
```
S3 raw events → AWS Glue ETL job (Spark) → Feature generation → S3 feature store
                                                    ↓
                                          SageMaker Training (spot instances)
                                                    ↓
                                             Model artifacts → S3
```

**Specific services:**
- **AWS Glue** for ETL (serverless Spark, only pay when running)
- **SageMaker with Spot Instances** for training (70% cost reduction vs on-demand)
- **Training frequency**: Daily at 3 AM when traffic is low
- **Feature store**: Combination of:
  - **S3** for historical features (cheap bulk storage)
  - **DynamoDB on-demand** for user profiles (~100KB per user, only for active users)
  - **ElastiCache Redis** for hot cache of top 10K users (your power users)

**Cost optimization trick**: Don't store features for every ping. Pre-compute features only for pings created in last 30 days + top 10% evergreen content.

### Real-time Serving

**Architecture:**
```
User request → API Gateway → Lambda (feature fetch) → Lambda (inference) → Response
                                ↓                              ↓
                          DynamoDB/Redis              Pre-computed candidates
```

**Two-stage serving:**
1. **Candidate generation** (runs every 15 min via EventBridge + Lambda):
   - For each active user, generate 100 candidates
   - Store in DynamoDB: `user_id → [ping_ids with scores]`
   
2. **Real-time ranking** (on-demand):
   - Lambda fetches 100 candidates from DynamoDB
   - Applies real-time signals (trending boost, just-followed creators)
   - Returns Top-10 in <50ms

**Latency targets:**
- P50: 30ms
- P99: 100ms

**Why this works**: You're trading freshness (15-min stale candidates) for cost. Lambda compute is cheap, DynamoDB reads are $0.25 per million requests.

### AWS Personalize vs Custom Model

**My recommendation: Start custom, consider Personalize at scale**

**Don't use Personalize initially because:**
- Minimum cost is ~$500/month for smallest deployment
- Less control over features (can't easily add freshness, country-specific signals)
- Cold-start handling requires custom logic anyway

**Low-cost alternative (custom model):**
- Simple logistic regression or LightGBM (trains in minutes on SageMaker Spot)
- Feature engineering in Pandas (runs on Glue for <$1/day)
- Inference in Lambda (first 1M requests/month free)
- Total cost for 10K DAU: **~$50-150/month**

**When to switch to Personalize:**
- Hit 100K+ DAU
- Need sophisticated collaborative filtering
- Team doesn't have ML expertise

**If using Personalize, I'd:**
- Send: `user_id`, `ping_id`, `event_type`, `timestamp`, `watch_time_ratio`
- Use Personalize's "User-Personalization" recipe
- Cost control: Batch inference daily instead of real-time, cache results in Redis

### Complete Cost Estimate (10K DAU)

| Service | Monthly Cost |
|---------|-------------|
| Kinesis Firehose | $15 |
| S3 storage | $10 |
| Glue ETL (daily) | $30 |
| DynamoDB | $25 |
| Lambda (serving) | $20 |
| SageMaker training | $15 (spot) |
| ElastiCache (optional) | $15 |
| **Total** | **~$130/month** |

### A Simple Diagram of my proposed methods
```
┌─────────────┐
│  Mobile App │
└──────┬──────┘
       │ events
       ↓
┌─────────────────┐      ┌──────────┐
│  API Gateway    │ →    │  Lambda  │ → Kinesis Firehose → S3
│  /track-event   │      │ (buffer) │                       │
└─────────────────┘      └──────────┘                       │
                                                             │
                    ┌────────────────────────────────────────┘
                    ↓
            ┌────────────────┐
            │  AWS Glue ETL  │ (daily 3 AM)
            │  Feature Eng   │
            └────────┬───────┘
                     ↓
        ┌────────────────────────┐
        │  SageMaker Training    │ (spot instances)
        │  → Model to S3         │
        └────────────────────────┘
                     ↓
    ┌────────────────────────────────────┐
    │  EventBridge (every 15 min)        │
    └────────────┬───────────────────────┘
                 ↓
    ┌─────────────────────────────┐
    │  Lambda: Generate Candidates │
    │  → Store in DynamoDB         │
    └─────────────────────────────┘
                 ↑
    ┌────────────┴─────────────┐
    │  /get-feed API           │
    │  Lambda (real-time rank) │ ← User request
    └──────────────────────────┘