# TasteTrend - AWS GenAI PoC Demo (End-to-End on AWS)

This notebook demonstrates the deployed AWS architecture end-to-end:
- ETL Lambda reads raw files from S3, standardizes them, and writes a processed dataset to S3.
- Embedding Lambda generates Titan embeddings and upserts vectors into Amazon OpenSearch.
- Retrieval lambda is available for Bedrock Agent action group calls.
- The public Query API (API Gateway + Proxy Lambda + Bedrock Agent) answers business questions over the indexed data.
- Automated evaluation runs accuracy and latency checks against a small gold set.

Prerequisites:
- AWS credentials configured for the target account and region with permissions to invoke the Lambdas and read S3.
- The infrastructure has been deployed via Terraform (S3 buckets, Lambdas, OpenSearch, API Gateway, Bedrock Agent).


In [None]:
# 1. Setup environment
import os, sys, json, time
from getpass import getpass
from dotenv import load_dotenv

# Load local .env if present
load_dotenv()

# Make src importable
project_root = os.path.abspath(os.path.join(os.getcwd(), "..", "src"))
if project_root not in sys.path:
    sys.path.append(project_root)

# --- User-editable configuration (use values from your Terraform outputs) ---
AWS_REGION = os.environ.get("AWS_REGION", "eu-central-1")
RAW_BUCKET = os.environ.get("TT_RAW_BUCKET", "<your-raw-bucket-name>")
PROCESSED_BUCKET = os.environ.get("TT_PROCESSED_BUCKET", "<your-processed-bucket-name>")

ETL_LAMBDA_NAME     = os.environ.get("TT_ETL_LAMBDA", "tt-etl-handler")
EMBED_LAMBDA_NAME   = os.environ.get("TT_EMBED_LAMBDA", "tt-embed-handler")
SEARCH_LAMBDA_NAME  = os.environ.get("TT_SEARCH_LAMBDA", "tt-search-reviews-handler")

# Public API Gateway base URL for Bedrock Agent proxy (no trailing slash)
os.environ.setdefault("TT_API_URL", os.environ.get("TT_API_URL", "https://your-api-id.execute-api.eu-central-1.amazonaws.com/prod"))
if not os.getenv("TT_API_KEY"):
    os.environ["TT_API_KEY"] = getpass("Please enter your TasteTrend API key: ")

print("Environment ready.")
print("AWS_REGION:", AWS_REGION)
print("RAW_BUCKET:", RAW_BUCKET)
print("PROCESSED_BUCKET:", PROCESSED_BUCKET)
print("ETL_LAMBDA_NAME:", ETL_LAMBDA_NAME)
print("EMBED_LAMBDA_NAME:", EMBED_LAMBDA_NAME)
print("SEARCH_LAMBDA_NAME:", SEARCH_LAMBDA_NAME)
print("TT_API_URL:", os.getenv("TT_API_URL"))
print("TT_API_KEY loaded:", bool(os.getenv("TT_API_KEY")))

In [None]:
# 2. Optional: list raw S3 files to confirm inputs
import boto3

s3 = boto3.client("s3", region_name=AWS_REGION)
resp = s3.list_objects_v2(Bucket=RAW_BUCKET)
print("Objects in raw bucket:")
for obj in resp.get("Contents", []):
    print(" -", obj["Key"])

In [None]:
# 3. Run ETL on AWS Lambda (reads from RAW_BUCKET, writes to PROCESSED_BUCKET)
import boto3, json

lambda_client = boto3.client("lambda", region_name=AWS_REGION)

etl_event = {
    # Your ETL lambda reads RAW_BUCKET and writes processed file(s) to PROCESSED_BUCKET.
    # No extra payload is strictly required if the lambda uses env vars. This is here for traceability.
    "raw_bucket": RAW_BUCKET,
    "processed_bucket": PROCESSED_BUCKET
}

print("Invoking ETL lambda:", ETL_LAMBDA_NAME)
etl_resp = lambda_client.invoke(
    FunctionName=ETL_LAMBDA_NAME,
    InvocationType="RequestResponse",
    Payload=json.dumps(etl_event).encode("utf-8"),
)
etl_payload = etl_resp["Payload"].read().decode("utf-8")
print("ETL response:", etl_payload)

# The ETL lambda writes a processed dataset. We expect a file like 'processed_final.csv' in the processed bucket.
processed_key = "processed_final.csv"  # keep in sync with your lambda
print("Expected processed key:", processed_key)

In [None]:
# 4. Preview first few lines of the processed dataset in S3
import io, csv

obj = s3.get_object(Bucket=PROCESSED_BUCKET, Key=processed_key)
body = obj["Body"].read().decode("utf-8", errors="replace")

reader = csv.reader(io.StringIO(body))
rows = []
for i, row in enumerate(reader):
    rows.append(row)
    if i >= 5:
        break

print("Processed file preview (first 6 rows):")
for r in rows:
    print(r)

In [None]:
# 5. Embedding and indexing step on AWS Lambda (Bedrock -> OpenSearch upsert)
# This calls your embedding lambda with the S3 CSV the ETL produced.
import json

embed_event = {
    "s3_csv_uri": f"s3://{PROCESSED_BUCKET}/{processed_key}",
    # Optionally override index name if your lambda supports it
    # "os_index": "reviews_v2"
}

print("Invoking Embedding lambda:", EMBED_LAMBDA_NAME)
embed_resp = lambda_client.invoke(
    FunctionName=EMBED_LAMBDA_NAME,
    InvocationType="RequestResponse",
    Payload=json.dumps(embed_event).encode("utf-8"),
)
embed_payload = embed_resp["Payload"].read().decode("utf-8")
print("Embedding response:", embed_payload)

In [None]:
# 6. Optional: directly invoke vector search lambda (as used by the Bedrock Agent action group)
search_query = "Great food but slow service. What location is that likely about?"

search_event = {
    "body": json.dumps({"query": search_query})
}

print("Invoking Search lambda:", SEARCH_LAMBDA_NAME)
search_resp = lambda_client.invoke(
    FunctionName=SEARCH_LAMBDA_NAME,
    InvocationType="RequestResponse",
    Payload=json.dumps(search_event).encode("utf-8"),
)
search_payload = search_resp["Payload"].read().decode("utf-8")
print("Search response:", search_payload)

In [None]:
# 7. Public Query API (API Gateway + Proxy Lambda + Bedrock Agent)
# Uses src/api/query_client.py
from api.query_client import ask

query = "What do customers like most about the Uptown location?"
answer, refs, ms = ask(query)
print(f"{ms:.0f} ms | {answer}\nReferences: {refs}")

In [None]:
# 8. Automated evaluation (Bedrock embeddings for semantic metric)
from api.eval import run_eval
import json

questions = [
    "What is the best restaurant overall?",
    "What is the general consensus of the downtown restaurant?",
    "What do customers like most about the Uptown location?",
    "What do people complain about in the Riverside restaurant?",
    "How does service quality compare between Uptown and Riverside?",
]

results = run_eval(questions)
print(json.dumps(results, indent=2))

# 9. MVP Plan and Cost

| Component | AWS Service | Est. Monthly Cost | Notes |
|------------|--------------|------------------|--------|
| Storage | Amazon S3 | ~$1 | Raw and processed data. Free Tier covers small PoC. |
| Compute | AWS Lambda | <$5 | ETL, Embedding, Search, Proxy. |
| Vector DB | Amazon OpenSearch Service | ~$15 | Small single-node for vectors. |
| GenAI | Amazon Bedrock (Titan Embeddings) | ~$10–20 | Depends on volume of embeddings and queries. |
| API Layer | Amazon API Gateway | <$5 | Public query endpoint. |
| Monitoring & Security | CloudWatch, IAM, KMS | ~$2 | Basic logs and encryption. |
| Total (MVP estimate) |  | **~$30–45** | Low-cost PoC; scales with data and traffic. |

Next steps:
- Add real-time ingestion from review platforms.
- Build a dashboard for business users.
- Implement multi-tenant access and authentication.
- Automate daily refresh and extend analytics.
