# Assignment: Stream Input/Output Logs to BigQuery from Application
## Objective
This assignment focuses on implementing a robust logging solution for a web application, specifically streaming application input/output (I/O) and request logs directly to Google BigQuery. This allows for powerful analytics, auditing, and debugging of application behavior. You will set up BigQuery, modify a simple application to emit structured logs, and configure Google Cloud's logging mechanisms to route these logs to BigQuery.

## Part 1: GCP Setup and Basic Web Application (25 Marks)

1.  **GCP Project Setup:**
    * Ensure you have an active Google Cloud Platform (GCP) project with billing enabled.
    * Enable the following APIs: Cloud Run API, Cloud Logging API, BigQuery API, Artifact Registry API.
    * Provide `gcloud services enable` commands for each required API.

2.  **BigQuery Dataset and Table:**
    * Create a new BigQuery dataset (e.g., `app_logs_dataset`).
    * Create a BigQuery table (e.g., `request_logs`) within this dataset with a schema suitable for storing application I/O and request details. Consider fields like:
        * `timestamp` (TIMESTAMP)
        * `request_id` (STRING)
        * `user_id` (STRING, if applicable)
        * `endpoint` (STRING)
        * `http_method` (STRING)
        * `request_payload` (JSON or STRING)
        * `response_payload` (JSON or STRING)
        * `status_code` (INTEGER)
        * `latency_ms` (INTEGER)
        * `log_level` (STRING)
        * `message` (STRING)
    * Provide `bq mk` command for the dataset and `bq mk --table` command with the schema definition for the table.

3.  **Simple Web Application:**
    * Create a simple web application (e.g., Python Flask/FastAPI, Node.js Express).
    * It should have at least one API endpoint (e.g., `/process` or `/calculate`).
    * This endpoint should:
        * Accept a JSON `POST` request with some input data.
        * Perform a very simple operation (e.g., sum two numbers, concatenate strings, or a mock calculation).
        * Return a JSON response.
    * Include a `Dockerfile` for your application.
    * Provide the source code for your application and its `Dockerfile`.
    * Implement basic `print()` or `logging.info()` statements for requests and responses for now; you'll enhance these in Part 2.

In [None]:
# Your GCP CLI commands for API enablement.
        # `bq mk` and `bq mk --table` commands with your chosen schema.
        # Source code for your simple web application and Dockerfile.

## Part 2: Structured Logging and Deployment to Cloud Run (35 Marks)

1.  **Structured Logging in Application:**
    * Modify your web application to emit **structured logs** (JSON format) to standard output (stdout/stderr).
    * For each incoming request to your API endpoint, log:
        * The full request payload.
        * The generated response payload.
        * `request_id` (generate a unique ID for each request).
        * `timestamp`.
        * `endpoint`, `http_method`.
        * `status_code`, `latency_ms`.
        * A descriptive `message` (e.g., "Request processed successfully").
        * (Optional but recommended) `user_id` if you can mock one.
    * Use standard logging libraries (e.g., Python's `logging` with `json_formatter`, or Node.js's `winston` with JSON output).
    * Provide the updated source code for your application, highlighting the structured logging implementation.

2.  **Containerization and Deployment:**
    * Build your Docker image using Cloud Build (or locally and push to Artifact Registry).
    * Push the image to Artifact Registry (e.g., `gcr.io/[PROJECT-ID]/my-app-logger:latest`).
    * Deploy your application to Cloud Run, allowing unauthenticated invocations.
    * Provide `gcloud builds submit` and `gcloud run deploy` commands.

3.  **Test Deployed Application and Cloud Logging:**
    * Access your Cloud Run service URL and send several requests to your API endpoint (e.g., using `curl`, Postman, or a simple Python script).
    * Navigate to Cloud Logging in the GCP Console.
    * Filter logs by your Cloud Run service and verify that:
        * Your structured JSON logs are correctly parsed and displayed in the Log Explorer.
        * Fields like `request_payload`, `response_payload`, `status_code`, etc., are appearing as individual fields in the structured log entries.
    * Take a screenshot of the Cloud Logging interface showing your parsed structured logs.
    * Discuss how structured logging improves log readability and queryability.

In [None]:
# Updated source code of your web application with structured logging.
        # `gcloud builds submit` and `gcloud run deploy` commands.
        # Screenshot of structured logs in Cloud Logging.
        # Discussion on structured logging benefits.

## Part 3: Log Sink to BigQuery (30 Marks)

1.  **Create a Cloud Logging Sink:**
    * Create a Cloud Logging sink that routes logs from your Cloud Run service to your BigQuery table.
    * The sink should:
        * Have a destination of `bigquery.table`.
        * Specify the BigQuery table created in Part 1 (e.g., `bigquery.googleapis.com/projects/[PROJECT-ID]/datasets/app_logs_dataset/tables/request_logs`).
        * Include a filter to capture only logs from your specific Cloud Run service.
        * Crucially, ensure the sink is configured to use the `destination.use_partitioned_tables` property if you want daily partitioned tables, or `destination.schema_options` for auto-detecting schema (though defining it manually is safer).
    * Provide the `gcloud logging sinks create` command.

2.  **Grant Sink Permissions:**
    * After creating the sink, you will receive a service account email for the sink.
    * Grant this service account `BigQuery Data Editor` role on your BigQuery dataset (or table).
    * Provide the `gcloud projects add-iam-policy-binding` command.

3.  **Generate Traffic and Verify in BigQuery:**
    * Send a significant number of requests to your deployed Cloud Run application (e.g., 20-50 requests) to generate enough logs.
    * Navigate to BigQuery in the GCP Console.
    * Query your `request_logs` table.
    * Verify that the structured log entries from your application are appearing as rows in your BigQuery table, with fields correctly mapped to columns.
    * Run a few analytical queries (e.g., count requests by endpoint, calculate average latency, find requests with errors).
    * Provide a screenshot of your BigQuery table showing the ingested logs and the results of at least one analytical query.
    * Discuss how the BigQuery integration enables powerful log analytics.

In [None]:
# `gcloud logging sinks create` command with the filter.
        # `gcloud projects add-iam-policy-binding` command for the sink service account.
        # Screenshot of logs in BigQuery table and results of analytical query.
        # Discussion on BigQuery analytics benefits.

## Part 4: Reflection and Clean-up (10 Marks)

1.  **Benefits of Centralized Logging:**
    * Summarize the benefits of streaming application I/O logs to a centralized data warehouse like BigQuery.
    * How does this approach enhance debugging, auditing, and business intelligence compared to just viewing logs in Cloud Logging?

2.  **Considerations for Production:**
    * What additional considerations would be important for implementing this logging solution in a production environment (e.g., cost optimization, data retention, PII handling, log sampling, error handling in application)?

3.  **Clean Up Resources:**
    * After completing the assignment, delete all created GCP resources to avoid incurring unnecessary costs:
        * Cloud Run service.
        * BigQuery dataset (which deletes the table).
        * Cloud Logging sink.
        * Artifact Registry images.
    * Provide the relevant `gcloud` and `bq` commands for thorough cleanup.

## Submission Guidelines

* Submit this Jupyter Notebook (.ipynb file) with all cells executed and outputs visible.
* Provide the full source code for your web application and its `Dockerfile` in a compressed archive (e.g., `.zip` file) or a link to a Git repository.
* Clearly present all requested commands, schemas, URLs, and screenshots.
* Ensure your application can be deployed and verified, and logs can be streamed to BigQuery as demonstrated.