Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update database with validators report #383

Merged
merged 9 commits into from
Apr 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/db-deployer.yml
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,6 @@ jobs:
- name: Populate Variables
run: |
scripts/replace-variables.sh -in_file infra/backend.conf.rename_me -out_file infra/postgresql/backend.conf -variables BUCKET_NAME,OBJECT_PREFIX
cp infra/postgresql/backend.conf infra/terraform-init/backend.conf
scripts/replace-variables.sh -in_file infra/postgresql/vars.tfvars.rename_me -out_file infra/postgresql/vars.tfvars -variables ENVIRONMENT,PROJECT_ID,REGION,DEPLOYER_SERVICE_ACCOUNT,POSTGRE_SQL_INSTANCE_NAME,POSTGRE_SQL_DB_NAME,POSTGRE_USER_NAME,POSTGRE_USER_PASSWORD,POSTGRE_INSTANCE_TIER,MAX_CONNECTIONS

- name: Install Terraform
Expand Down Expand Up @@ -164,7 +163,7 @@ jobs:
- name: Create or Update Secret in DEV
run: |
SECRET_NAME="DEV_FEEDS_DATABASE_URL"
SECRET_VALUE="postgresql://${{ env.POSTGRE_USER_NAME }}:${{ env.POSTGRE_USER_PASSWORD }}@${{ env.DB_INSTANCE_HOST }}/${{ env.POSTGRE_SQL_DB_NAME }}"
SECRET_VALUE="postgresql://${{ env.POSTGRE_USER_NAME }}:${{ env.POSTGRE_USER_PASSWORD }}@${{ env.DB_INSTANCE_HOST }}/${{ env.POSTGRE_SQL_DB_NAME }}DEV"
echo $SECRET_VALUE

if gcloud secrets describe $SECRET_NAME --project=mobility-feeds-dev; then
Expand Down
2 changes: 1 addition & 1 deletion functions-python/extract_bb/src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ def get_gtfs_feed_bounds(url: str, dataset_id: str) -> numpy.ndarray:
feed = gtfs_kit.read_feed(url, "km")
return feed.compute_bounds()
except Exception as e:
print(f"[{dataset_id}] Error retrieving GTFS feed from {url}: {e}")
logging.error(f"[{dataset_id}] Error retrieving GTFS feed from {url}: {e}")
cka-y marked this conversation as resolved.
Show resolved Hide resolved
raise Exception(e)


Expand Down
3 changes: 2 additions & 1 deletion functions-python/extract_bb/tests/test_extract_bb.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,8 @@ def test_get_gtfs_feed_bounds(self, mock_gtfs_kit):
for i in range(4):
self.assertEqual(bounds[i], expected_bounds[i])

def test_extract_bb_exception(self):
@patch("extract_bb.src.main.Logger")
def test_extract_bb_exception(self, _):
file_name = faker.file_name()
resource_name = (
f"{faker.uri_path()}/{faker.pystr()}/{faker.pystr()}/{file_name}"
Expand Down
9 changes: 9 additions & 0 deletions functions-python/validation_report_processor/.coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[run]
omit =
*/test*/*
*/helpers/*
*/database_gen/*

[report]
exclude_lines =
if __name__ == .__main__.:
2 changes: 2 additions & 0 deletions functions-python/validation_report_processor/.env.rename_me
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Environment variables for the validation report information extraction to run locally
export FEEDS_DATABASE_URL=${{FEEDS_DATABASE_URL}}
25 changes: 25 additions & 0 deletions functions-python/validation_report_processor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Validation Report Processing
This directory contains the Google Cloud Platform function designed for processing GTFS dataset validation reports to create and update entities in the database based on the contents of these reports. The function is triggered via an HTTP request, parses the report data, and stores validation results.

## Function Workflow
1. **HTTP Request Trigger**: The function is invoked through an HTTP request that includes identifiers for a dataset and feed.
2. **Report Validation**: Validates the JSON format and content of the report fetched from a predefined URL.
3. **Entity Creation**: Based on the contents of the validation report, the function creates several entities including validation reports, features, and notices associated with the dataset.
4. **Database Update**: Adds new entries to the database or updates existing ones based on the validation report.

## Function Configuration
The function depends on several environment variables:
- `FILES_ENDPOINT`: The endpoint URL where report files are located.
- `FEEDS_DATABASE_URL`: The database URL for connecting to the database containing GTFS datasets and related entities.

## Local Development
Follow standard practices for local development of GCP serverless functions. Refer to the main [README.md](../README.md) for general setup instructions for the development environment.

### Testing
For testing, simulate HTTP requests using tools like Postman or curl. Ensure to include both `dataset_id` and `feed_id` in the JSON payload:
```json
{
"dataset_id": "example_dataset_id",
"feed_id": "example_feed_id"
}
```
19 changes: 19 additions & 0 deletions functions-python/validation_report_processor/function_config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"name": "process-validation-report",
"description": "Processes the GTFS validation report to update the database",
"entry_point": "process_validation_report",
"timeout": 540,
"memory": "2Gi",
"trigger_http": true,
"include_folders": ["database_gen", "helpers"],
"secret_environment_variables": [
{
"key": "FEEDS_DATABASE_URL"
}
],
"ingress_settings": "ALLOW_INTERNAL_AND_GCLB",
"max_instance_request_concurrency": 1,
"max_instance_count": 5,
"min_instance_count": 0,
"available_cpu": 1
}
13 changes: 13 additions & 0 deletions functions-python/validation_report_processor/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
functions-framework==3.*
google-cloud-logging
psycopg2-binary==2.9.6
aiohttp~=3.8.6
asyncio~=3.4.3
urllib3~=2.1.0
SQLAlchemy==2.0.23
geoalchemy2==0.14.7
requests~=2.31.0
cloudevents~=1.10.1
attrs~=23.1.0
pluggy~=1.3.0
certifi~=2023.7.22
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Faker
pytest~=7.4.3
Empty file.
266 changes: 266 additions & 0 deletions functions-python/validation_report_processor/src/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
#
# MobilityData 2024
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

import os
import logging
from datetime import datetime
import requests
import functions_framework
from database_gen.sqlacodegen_models import (
Validationreport,
Feature,
Notice,
Gtfsdataset,
)
from helpers.database import start_db_session, close_db_session
from helpers.logger import Logger

logging.basicConfig(level=logging.INFO)

FILES_ENDPOINT = os.getenv("FILES_ENDPOINT")


def read_json_report(json_report_url):
"""
Fetches and returns the JSON content from a given URL.

:param json_report_url: URL to the JSON report
:return: Dict representation of the JSON report
"""
response = requests.get(json_report_url)
return response.json(), response.status_code


def get_feature(feature_name, session):
"""
Retrieves a Feature object by its name from the database.
If the feature does not exist, it creates a new one.

:param feature_name: Name of the feature
:param session: Database session instance
:return: Feature instance
"""
feature = session.query(Feature).filter(Feature.name == feature_name).first()
cka-y marked this conversation as resolved.
Show resolved Hide resolved
if not feature:
feature = Feature(name=feature_name)
return feature


def get_dataset(dataset_stable_id, session):
"""
Retrieves a GTFSDataset object by its stable ID from the database.

:param dataset_stable_id: Stable ID of the dataset
:param session: Database session instance
:return: GTFSDataset instance or None if not found
"""
return (
session.query(Gtfsdataset)
.filter(Gtfsdataset.stable_id == dataset_stable_id)
.one_or_none()
cka-y marked this conversation as resolved.
Show resolved Hide resolved
)


def validate_json_report(json_report_url):
"""
Validates the JSON report by fetching and reading it.
:param json_report_url: The URL of the JSON report
:return: Tuple containing the JSON report or an error message and the status code
"""
try:
json_report, code = read_json_report(json_report_url)
if code != 200:
logging.error(f"Error reading JSON report: {code}")
return f"Error reading JSON report at url {json_report_url}.", code
return json_report, 200
except Exception as error: # JSONDecodeError or RequestException
logging.error(f"Error reading JSON report: {error}")
return f"Error reading JSON report at url {json_report_url}: {error}", 500


def parse_json_report(json_report):
"""
Parses the JSON report and extracts the validatedAt and validatorVersion fields.
:param json_report: The JSON report
:return: A tuple containing the validatedAt datetime and the validatorVersion
"""
try:
dt = json_report["summary"]["validatedAt"]
validated_at = datetime.fromisoformat(dt.replace("Z", "+00:00"))
version = json_report["summary"]["validatorVersion"]
logging.info(
f"Validation report validated at {validated_at} with version {version}."
)
return validated_at, version
except Exception as error:
logging.error(f"Error parsing JSON report: {error}")
raise Exception(f"Error parsing JSON report: {error}")


def generate_report_entities(
version, validated_at, json_report, dataset_stable_id, session, feed_stable_id
):
"""
Creates validation report entities based on the JSON report.
:param version: The version of the validator
:param validated_at: The datetime the report was validated
:param json_report: The JSON report object
:param dataset_stable_id: Stable ID of the dataset
:param session: The database session
:param feed_stable_id: Stable ID of the feed
:return: List of entities created
"""
entities = []
report_id = f"{dataset_stable_id}_{version}"
logging.info(f"Creating validation report entities for {report_id}.")

html_report_url = (
f"{FILES_ENDPOINT}/{feed_stable_id}/{dataset_stable_id}/report.html"
)
json_report_url = (
f"{FILES_ENDPOINT}/{feed_stable_id}/{dataset_stable_id}/report.json"
)
if get_validation_report(report_id, session): # Check if report already exists
logging.warning(f"Validation report {report_id} already exists. Terminating.")
raise Exception(f"Validation report {report_id} already exists.")

validation_report_entity = Validationreport(
id=report_id,
validator_version=version,
validated_at=validated_at,
html_report=html_report_url,
json_report=json_report_url,
)
entities.append(validation_report_entity)

dataset = get_dataset(dataset_stable_id, session)
for feature_name in json_report["summary"]["gtfsFeatures"]:
feature = get_feature(feature_name, session)
feature.validations.append(validation_report_entity)
entities.append(feature)
for notice in json_report["notices"]:
notice_entity = Notice(
dataset_id=dataset.id,
validation_report_id=report_id,
notice_code=notice["code"],
severity=notice["severity"],
total_notices=notice["totalNotices"],
)
entities.append(notice_entity)
return entities


def create_validation_report_entities(feed_stable_id, dataset_stable_id):
"""
Creates and stores entities based on a validation report.
This includes the validation report itself, related feature entities,
and any notices found within the report.

:param feed_stable_id: Stable ID of the feed
:param dataset_stable_id: Stable ID of the dataset
:return: Tuple List of all entities created (Validationreport, Feature, Notice) and status code
"""
json_report_url = (
f"{FILES_ENDPOINT}/{feed_stable_id}/{dataset_stable_id}/report.json"
)
logging.info(f"Accessing JSON report at {json_report_url}.")
json_report, code = validate_json_report(json_report_url)
if code != 200:
return json_report, code

try:
validated_at, version = parse_json_report(json_report)
except Exception as error:
return str(error), 500

session = None
try:
session = start_db_session(os.getenv("FEEDS_DATABASE_URL"))
logging.info("Database session started.")

# Generate the database entities required for the report
try:
entities = generate_report_entities(
version,
validated_at,
json_report,
dataset_stable_id,
session,
feed_stable_id,
)
except Exception as error:
return str(error), 200 # Report already exists

# Commit the entities to the database
for entity in entities:
session.add(entity)
logging.info(f"Committing {len(entities)} entities to the database.")
session.commit()

logging.info("Entities committed successfully.")
return f"Created {len(entities)} entities.", 200
except Exception as error:
logging.error(f"Error creating validation report entities: {error}")
davidgamez marked this conversation as resolved.
Show resolved Hide resolved
if session:
session.rollback()
return f"Error creating validation report entities: {error}", 500
finally:
close_db_session(session)
logging.info("Database session closed.")


def get_validation_report(report_id, session):
"""
Retrieves a ValidationReport object by its ID from the database.
:param report_id: The ID of the report
:param session: The database session
:return: ValidationReport instance or None if not found
"""
return (
session.query(Validationreport).filter(Validationreport.id == report_id).first()
)


@functions_framework.http
def process_validation_report(request):
"""
Processes a validation report by creating necessary entities in the database.
It expects a JSON request body with 'dataset_id' and 'feed_id'.

:param request: Request object containing 'dataset_id' and 'feed_id'
:return: HTTP response indicating the result of the operation
"""
Logger.init_logger()
request_json = request.get_json(silent=True)
logging.info(
f"Processing validation report function called with request: {request_json}"
)
if (
not request_json
or "dataset_id" not in request_json
or "feed_id" not in request_json
):
return (
f"Invalid request body: {request_json}. We expect 'dataset_id' and 'feed_id' to be present.",
400,
)

dataset_id = request_json["dataset_id"]
feed_id = request_json["feed_id"]
logging.info(
f"Processing validation report for dataset {dataset_id} in feed {feed_id}."
)
return create_validation_report_entities(feed_id, dataset_id)
Loading
Loading