Skip to content

Commit

Permalink
Water Use (#195)
Browse files Browse the repository at this point in the history
* Update IngestAPI with changes to accommodate usage (#91)

* Improve logging configuration

* Update IngestAPI to accept and store allocation messages with usage

* Message format tweaks

* Make Metered Allocations nullable in API

* Add sourceId to ingested water allocation

* Update water allocation consumer (#92)

* Improve logging configuration

* Update IngestAPI to accept and store allocation messages with usage

* Message format tweaks

* Make Metered Allocations nullable in API

* Add sourceId to ingested water allocation

* Remove unused code

* Updated WaterAllocationMessage and Consumer to handle allocations by consent (rather than area)

* Fix typo

* Ensure Cache manifest for new PlanLimit queries is periodically updated

* Fix failing tests by truncating datetime before comparison to avoid precision differences.

* Update GeoJsonControllerIntegrationTest to reflect new plan limits controller, and remove redundant test

* Changes from PR review

* Ensure only Rainfall measurements are used in the Rainfall queries (#96)

* Manager - Kafka Consumer for Observations (#97)

* Manager - Hilltop Crawler Changes

* Shuffle files around

* martin/aggregate water use (#95)

* add water use daily aggregation sql view

* optimise sql query performance

* Adding Error handling to Observations Consumer

---------

Co-authored-by: Martin Peak <martin.peak@gw.govt.nz>

* Hilltop Crawler (#98)

* Manager - Hilltop Crawler Changes

* Shuffle files around

* Crawler - Updates

* Intergration testing

* Improve DB naming

* Big refactor of test setup

* Correct time to wait between quiet polls

* Add global error handling around task

* Add check to make sure Virtual Measurement matches Measurement Name

* Add a bunch to the readme

* Add views to aggregate daily allocations and usage by area (#100)

* Add views to aggregate consent allocations and usage by day

* Start writing tests

* Allow easier setup and assertion against test data, start adding more tests

* Add tests for various allocation scenarios

* Add failing test for aggregating data across different areas

* Simplify views

* Splitting out the calculation of daily useage by area

* Add more assertions for aggregation of different areas

* water_allocation.meter should correlate with observation_site.name

* Format code

* Feedback from PR review

---------

Co-authored-by: Steve Mosley <steve@starter4ten.com>

* Remove unnesscary `yield`

* Adding index on next_fetch_at column

* Add some more readme details about the task queue

* Adding notes about partition sizes

* Improve note about how schedulign tasks works

---------

Co-authored-by: Vimal Jobanputra <vim@noodle.io>

* Fixup the path being passed to archive build reports (#99)

* Adding in sample data scripts and way to run from gradle (#101)

* Adding in sample data scripts and way to run from gradle

* Update so source id is unique per row

* Update to insert observation data for 50 sites

* Adding in a simple script to add time to the observed data

* Tweak how loading is done to avoid the function

* Update the refresh to be relative to current date

* Adding in transactions for the scripts

* Bump msw from 1.2.1 to 1.3.2 in /packages/PlanLimitsUI (#108)

Bumps [msw](https://github.com/mswjs/msw) from 1.2.1 to 1.3.2.
- [Release notes](https://github.com/mswjs/msw/releases)
- [Changelog](https://github.com/mswjs/msw/blob/main/CHANGELOG.md)
- [Commits](mswjs/msw@v1.2.1...v1.3.2)

---
updated-dependencies:
- dependency-name: msw
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump net.logstash.logback:logstash-logback-encoder (#107)

Bumps [net.logstash.logback:logstash-logback-encoder](https://github.com/logfellow/logstash-logback-encoder) from 7.3 to 7.4.
- [Release notes](https://github.com/logfellow/logstash-logback-encoder/releases)
- [Commits](logfellow/logstash-logback-encoder@logstash-logback-encoder-7.3...logstash-logback-encoder-7.4)

---
updated-dependencies:
- dependency-name: net.logstash.logback:logstash-logback-encoder
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add query and controller endpoint for usage data (#102)

* Weekly overview table (#124)

* Minor code cleanups/comments

* Move non-component files into lib, and move sidebar into a subdir

* Remove redundant files

* Extract Sidebar components

* Wire up basic loading mechanism

* Copy across button and indicator components from Interrain

* Extract hook to load and transform data, start formatting usage table

* Add GroundWater and populate tooltips with the right data

* Take into account WaterTakeFilter when showing usage table

* Clean up some naming

* Ensure WaterAllocationMessage messages match IngestAPI types

* Added fields to water usage API required by the front-end

* Camelise API properties

* Ensure usage dates are sorted

* Better handling of errors loading usage data

* Feedback from code review

* Moving WaterAllocation View to a Materialised View (#125)

* Moving WaterAllocation View to a Materialised View

* Remove spots

* Updates to allow water_allocation_and_usage_by_area to be refreshed

* Fix up permission issue with materialized view

* Turn effective_daily_consents back into a regular migration so we can ensure it is run before water_allocation_and_usage_by_area

* Add materialized_views_role definition to LocalInfrastructure init script

* Ensure materialized view is refreshed before running tests

* Ensure materialized_views_role has the same permissions as eop_manager_app_user

* Ensure view is materialized after generating test data

* Remove old version of DB config

- Has been superceded by LocalInfra folder

* Limit materialized views role to Read only

---------

Co-authored-by: Vim <vim@noodle.io>

* Update plan limit libs (#127)

* Update react-router-dom

* Update Vite and associated packages

* Update typescript and related packages, sort dev and main packages

* Update linting and formatting libs

And remove pretty-quick which is not compatible with prettier 3: prettier/pretty-quick#164

* Update testing libs

* Update autoprefixer, remove unused web-vitals

* Update tailwind, postcss and related libs

* Update data/state libraries

* npm audit fix

* Update react-map-gl and maplibre-gl

* Update MSW

* Moving the materalized view permissions into main migrations (#129)

* Move permissions required for materialized_views to migration

* Add RESET ROLE to script

- Otherwise the connection goes back to the pool with the wrong role,
  and causes pain for later threads.

* Fix permissions so app can query usage view (#137)

* Config ErrorHandlingDeserializer to DLQ unparsable messages (#134)

* Re-apply changes from PR #130 (#138)

* Changes for Water Use View (#139)

* Changes for Water Use View

- Move data to NZST timezone when aggregating daily
- Fix issue with Water Meter Volume, previously we were treating it as a
  l/s value when it is actually a raw M^3 used since the last data
  point. Which makes the aggregation a simple sum

* Update test with new data

* Weekly usage graph (#148)

* WIP: Detailed usage page with initial weekly usage

* WIP: Group heatmap into regions and order

* WIP: Tidy up presentation and data

* Use full path in link to detailed usage page to fix AWS Amplify weirdness

* Display tweaks

* Add a link back to the limits viewer

* Error handling and code cleanup

* Fix bug formatting date in tooltip

* Make UsageDisplayGroups part of council data

* Crawler fix start of month (#149)

* Tweak to limit amount of history kept.

* Extend measurement list so it will be on month ahead if there are recent
measurements

* Add new fetch task that is constantly refreshing the latest data

* Refactor to make comparsion more explicit

* Update README for recent changes

* Use correct allocation measure (#151)

* Bump @typescript-eslint/parser in /packages/PlanLimitsUI (#140)

Bumps [@typescript-eslint/parser](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/parser) from 6.9.0 to 6.10.0.
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/parser/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v6.10.0/packages/parser)

---
updated-dependencies:
- dependency-name: "@typescript-eslint/parser"
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump maplibre-gl from 3.5.1 to 3.5.2 in /packages/PlanLimitsUI (#132)

Bumps [maplibre-gl](https://github.com/maplibre/maplibre-gl-js) from 3.5.1 to 3.5.2.
- [Release notes](https://github.com/maplibre/maplibre-gl-js/releases)
- [Changelog](https://github.com/maplibre/maplibre-gl-js/blob/main/CHANGELOG.md)
- [Commits](maplibre/maplibre-gl-js@v3.5.1...v3.5.2)

---
updated-dependencies:
- dependency-name: maplibre-gl
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Daily Usage Graph (#158)

* Extract results to separate components

* Display daily usage

* Use the same color scheme for daily usage we weekly usage

* Format data in tooltip

* Add a key to the daily usage chart

* Add legend to weekly usage

* Tweaks based on feedback

* Fix rounding issue

* Test alternate visualisation type

* Add a key to the heatmapped daily data

* Code tidy-ups

* Remove TimeRange graph and tidy up code

* Remove unused renderCell Override

* Ensure top axis is not generated for weekly graphs other than first

* Copy tweaks

* Water meter reading view bugfix (#159)

* fix for bug in daily usage calc

* remove date filter.

no longer needed as filtering occurs on a downstream view

* Handle missing data in the UI (#165)

* Handle missing data for Usage table

* Handle missing data in Detailed Daily usage graph

* Display tweaks

* Handle missing and show more detailed info for weekly usage

* Formatting tweaks

* Only show extended tooltip data when the d key is held down

* Remove console.log

* Formatting tweaks

* More explicit handling of 0 vs NULL values when displaying usage

* Always show daily usage table in weekly tooltip

* Show CMU if CMSU is not present

* Handle missing data in view (#173)

* Update field names to match updated view

* Don't default to 0's where NULLs are present in Usage and Allocation data

* Update view with showing allocation vs usage

* Update view tests

---------

Co-authored-by: Vim <vim@noodle.io>

* Add overview docs (#174)

* Add some docs

* Docs tweaks

* Remove log error message about rack assigner

* Adding date check back in because of how it effect materlizing the view (#161)

* Lower number of consumers for Kafka to hopefully stabalise things

* Update Jackson max message size

* Crawler handle large messages (#225)

* Skip large messages from Hilltop

- Some messages will cause issues later in processing because of their
  size

* Fixup off by one day error in test

* Update manager so observation updated_at is only changed if the value
changes

- This is better information, but also should imporve loading
  performance

* TEAMU-876 - Show what percentage of allocation is measured (#232)

* Update Ingest API for new message format

* Update manager with new fields

* Update Plan Limits with new data

* Adding in Measured vs Total Allocation for usage

* https://gwrc.atlassian.net/browse/TEAMU-929 : Daily Aggregated Observations Multiple values for same day (#235)

* Update Ingest API for new message format

* Update manager with new fields

* Update Plan Limits with new data

* Adding in Measured vs Total Allocation for usage

* added the .sql file which removes the daily duplicates for each site

* fix the where clause position

* drop the view and create it as there are structural changes

* drop the view and create it as there are structural changes

* drop the cascaded view and recreate them

* removed the drop command

* removed the drop command

* changes to ALTER view

* changes to ALTER view

* added drop and removed alter

* added both view script in one migration file

* added both view script in one migration file

* added both view script in one migration file

* added both view in the same migration file and corrected the sql query

* keeping the naming conventions same

* Adjusting roles to allow removeing the materalised view

---------

Co-authored-by: Steve Mosley <steve@starter4ten.com>

* TEAMU-1016-Fix-leap-year-issue (#243)

* TEAMU-1016 Add FE and BE unit tests

* TEAMU-1016 Add FE and BE unit tests

* TEAMU-1016 Improve BE exception handling

* TEAMU-1016 Teak start scripts

* TEAMU-1016 Check error message in test

* TEAMU-1016 Lint fixes

* TEAMU-1016 Spelling error

* TEAMU-1016 Linting and batect run

* TEAMU-1016 Resolve merge conflict in readme

---------

Co-authored-by: wtcarter-gw <wayne.carter@gw.govt.nz>

* TEAMU-940 Add new migration 41 replacing MAX and MIN with last and first (#248)

* TEAMU-940 Add new migration 41

* TEAMU-940 Correct migration 41 as per failing test

* TEAMU-940 Apply review feedback and add some tests

* TEAMU-940 Augment tests

* TEAMU-940 Spotless linting

---------

Co-authored-by: wtcarter-gw <wayne.carter@gw.govt.nz>

* fix to exclude all combined_meters (#249)

* fix to exclude all combined_meters

* Remove the extra space

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Vimal Jobanputra <vim@noodle.io>
Co-authored-by: Martin Peak <martin.peak@gw.govt.nz>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Swapna Josmi Sam <56238575+swapnasam@users.noreply.github.com>
Co-authored-by: Wayne Carter <wayne@ednastreet.nz>
Co-authored-by: wtcarter-gw <wayne.carter@gw.govt.nz>
  • Loading branch information
7 people committed Mar 25, 2024
1 parent 94a4f3b commit 9b8a459
Show file tree
Hide file tree
Showing 159 changed files with 15,729 additions and 8,017 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/hilltop-crawler.yml
Expand Up @@ -51,7 +51,7 @@ jobs:
if: always()
with:
name: reports
path: format('{0}/build/reports', env.WORKING_DIR)
path: ${{ env.WORKING_DIR }}/build/reports

- name: Configure AWS credentials
if: ${{ github.actor != 'dependabot[bot]' }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ingest-api.yml
Expand Up @@ -51,7 +51,7 @@ jobs:
if: always()
with:
name: reports
path: format('{0}/build/reports', env.WORKING_DIR)
path: ${{ env.WORKING_DIR }}/build/reports

- name: Configure AWS credentials
if: ${{ github.actor != 'dependabot[bot]' }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/manager.yml
Expand Up @@ -51,7 +51,7 @@ jobs:
if: always()
with:
name: reports
path: format('{0}/build/reports', env.WORKING_DIR)
path: ${{ env.WORKING_DIR }}/build/reports

- name: Configure AWS credentials
if: ${{ github.actor != 'dependabot[bot]' }}
Expand Down
13 changes: 13 additions & 0 deletions .run/Plan Limits UI.run.xml
@@ -0,0 +1,13 @@
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="Plan Limits UI" type="js.build_tools.npm">
<package-json value="$PROJECT_DIR$/packages/PlanLimitsUI/package.json" />
<command value="run" />
<scripts>
<script value="dev" />
</scripts>
<arguments value="-- --open" />
<node-interpreter value="project" />
<envs />
<method v="2" />
</configuration>
</component>
10 changes: 8 additions & 2 deletions README.md
Expand Up @@ -35,7 +35,7 @@ Include diagrams in the site:

### How to Run the application locally

To run Ha Kākano locally, you will need to start the shared infrastructure services and then start the individual applications.
To run He Kākano locally, you will need to start the shared infrastructure services and then start the individual applications.

#### Shared Infrastructure

Expand All @@ -59,4 +59,10 @@ _And from a new shell session in the same folder_

#### Application Packages

Having started the shared infrastructure, you will find specific run instructions for each application package in its own README.md file. For example, [here are the run instructions for the Plan Limits UI](packages/PlanLimitsUI/README.md).
Having started the shared infrastructure, you will find specific run instructions for each application package in its own README.md file. For example, [here are the run instructions for the Plan Limits UI](./packages/PlanLimitsUI/README.md).

## `start.sh` convenience script

To simplify running EOP locally, a `start.sh` script has been provided in the root of the repository. This script will start the shared infrastructure services and then another application component that you name as an argument.

For example, to start the PlanUnitsUI application, you would run `./start.sh PlanUnitsUI`.
108 changes: 108 additions & 0 deletions packages/HilltopCrawler/README.md
@@ -0,0 +1,108 @@
# Hilltop Crawler

## Description

This app extracts observation data from Hilltop servers and publishes them as to a Kafka `observations` topic for other
applications to consume. Which Hilltop servers and which types to extract are configured in a database table which can
be added to while the system is running.

The app intends to keep all versions of data pulled from the Hilltop. Allowing downstream systems to keep track of when
data has been updated in Hilltop. Noting that because Hilltop does not expose when changes were made we can only capture
when the crawler first saw data change. This is also intended to allow us to update some parts of the system without
having to re-read all of the data from hilltop.

## Process

Overall, this is built around the Hilltop API which exposes three levels of GET API request

* SITES_LIST — List of Sites, includes names and locations
* MEASUREMENTS_LIST — Per site, list of measurements available for that site, including details about the first and last
observation date
* MEASUREMENT_DATA / MEASUREMENT_DATA_LATEST — Per site, per measurement, the timeseries observed data either historical or most recent

The app crawls these by reading each level to decide what to read from the next level. i.e., The SITES_LIST tells the
app which sites call MEASUREMENTS_LIST which in turn tells the app which measurements to call MEASUREMENT_DATA for. Requests
for MEASUREMENT_DATA are also split into monthly chunks to avoid issues with too much data being returned in one
request. If a site appears to be reporting data frequently then a MEASUREMENT_DATA_LATEST will be queued up to get data for the most recent 35 days.

The app keeps a list of API requests that it will keep up to date by calling that API on a schedule. This list is stored
in the `hilltop_fetch_tasks` table and works like a task queue. Each time a request is made, the result is used to try
and determine when next to schedule the task. An example is for MEASUREMENT_DATA_LATEST if the last observation was
recent, then a refresh should be attempted soon, if it was a long way in the past, it should be refreshed less often.

The next schedule time has been implemented as a random time in an interval, to provide some jitter when between task
requeue times to hopefully spread them out, and the load on the servers we are calling.

The task queue also keeps meta-data about the previous history of tasks that are not used by the app this is to allow
engineers to monitor how the system is working.

This process is built around three main components:

* Every hour, monitor the configuration table
* Read the configuration table
* From the configuration, add any new tasks to the task queue

* Continuously monitor the task queue
* Read the next task to work on
* Fetch data from hilltop for that task
* If a valid result which is not the same as the previous version
* Queue any new tasks from the result
* Send the result to Kafka `hilltop_raw` topic (note the Kafka topic does not distinguish between
MEASUREMENT_DATA / MEASUREMENT_DATA_LATEST it just calls them both MEASUREMENT_DATA)
* Requeue the task for sometime in the future, based on the type of request

* Kafka streams component
* Monitor the stream
* For each message map it to either a `site_details` or and `observations` message

Currently, the Manager component listens to the `observations` topic and stores the data from that in a DB table.

## Task Queue

The task queue is currently a custom implementation on top of a single table `hilltop_fetch_tasks`.

This is a slightly specialized queue where

* Each task has a scheduled run time backed by the `next_fecth_at` column
* Each time a task runs it will be re-queued for some time in the future
* The same task can be added multiple times and rely on Postgres “ON CONFLICT DO NOTHING” to avoid the task being added
multiple times

The implementation relies on the postgres `SKIP LOCKED` feature to allow multiple worker threads to pull from the queue
at the same time without getting the same task.

See this [reference](https://www.2ndquadrant.com/en/blog/what-is-select-skip-locked-for-in-postgresql-9-5/) for
discussion about the `SKIP LOCKED` query.

The queue implementation is fairly simple for this specific use. If it becomes more of a generic work queue then a
standard implementation such as Quartz might be worthwhile moving to.

## Example Configuration

These are a couple of insert statements that are not stored in migration scripts, so developer machines don't index
them by default.

GW Water Use

```sql
INSERT INTO hilltop_sources (council_id, hts_url, configuration)
VALUES (9, 'https://hilltop.gw.govt.nz/WaterUse.hts',
'{ "measurementNames": ["Water Meter Volume", "Water Meter Reading"] }');
```

GW Rainfall

```sql
INSERT INTO hilltop_sources (council_id, hts_url, configuration)
VALUES (9, 'https://hilltop.gw.govt.nz/merged.hts', '{ "measurementNames": ["Rainfall"] }');
```

### TODO / Improvement

* The algorithm for determining the next time to schedule an API refresh could be improved, something could be built
from the previous history based on how often data is unchanged.
* To avoid hammering Hilltop there is rate limiting using a "token bucket" library, currently this uses one bucket for
all requests. It could be split to use one bucket per server
* Each time the latest measurement API is called we will receive data that has already been seen and processed previously and store it
again in the `hilltop_raw` topic. This seems wasteful and there are options for cleaning up when a record just
supersedes the previous. But cleaning up could mean losing some knowledge about when a observation was first seen.
19 changes: 4 additions & 15 deletions packages/HilltopCrawler/build.gradle.kts
Expand Up @@ -41,10 +41,14 @@ dependencies {
implementation("org.flywaydb:flyway-core:10.6.0")
implementation("org.flywaydb:flyway-database-postgresql:10.6.0")
implementation("io.github.microutils:kotlin-logging-jvm:3.0.5")
implementation("org.apache.kafka:kafka-streams")
implementation("com.bucket4j:bucket4j-core:8.3.0")

testImplementation("org.springframework.boot:spring-boot-starter-test")
testImplementation("org.springframework.kafka:spring-kafka-test")
testImplementation("io.kotest:kotest-assertions-core:5.8.0")
testImplementation("io.kotest:kotest-assertions-json:5.8.0")
testImplementation("org.mockito.kotlin:mockito-kotlin:5.1.0")
}

// Don't repackage build in a "-plain" Jar
Expand All @@ -70,21 +74,6 @@ configure<com.diffplug.gradle.spotless.SpotlessExtension> {
kotlinGradle { ktfmt() }
}

val dbConfig =
mapOf(
"url" to
"jdbc:postgresql://${System.getenv("CONFIG_DATABASE_HOST") ?: "localhost"}:5432/eop_test",
"user" to "postgres",
"password" to "password")

flyway {
url = dbConfig["url"]
user = dbConfig["user"]
password = dbConfig["password"]
schemas = arrayOf("hilltop_crawler")
locations = arrayOf("filesystem:./src/main/resources/db/migration")
}

testlogger {
showStandardStreams = true
showPassedStandardStreams = false
Expand Down
@@ -1,33 +1,21 @@
package nz.govt.eop.hilltop_crawler

import java.security.MessageDigest
import com.fasterxml.jackson.core.StreamReadConstraints
import org.springframework.boot.autoconfigure.SpringBootApplication
import org.springframework.boot.autoconfigure.jackson.Jackson2ObjectMapperBuilderCustomizer
import org.springframework.boot.context.properties.EnableConfigurationProperties
import org.springframework.boot.runApplication
import org.springframework.context.annotation.Bean
import org.springframework.http.converter.json.Jackson2ObjectMapperBuilder
import org.springframework.kafka.annotation.EnableKafka
import org.springframework.scheduling.annotation.EnableScheduling

@SpringBootApplication
@EnableScheduling
@EnableKafka
@EnableConfigurationProperties(ApplicationConfiguration::class)
class Application {

@Bean
fun jsonCustomizer(): Jackson2ObjectMapperBuilderCustomizer {
return Jackson2ObjectMapperBuilderCustomizer { _: Jackson2ObjectMapperBuilder -> }
}
}
class Application {}

fun main(args: Array<String>) {
System.setProperty("com.sun.security.enableAIAcaIssuers", "true")

StreamReadConstraints.overrideDefaultStreamReadConstraints(
StreamReadConstraints.builder().maxStringLength(50_000_000).build())

runApplication<Application>(*args)
}

fun hashMessage(message: String) =
MessageDigest.getInstance("SHA-256").digest(message.toByteArray()).joinToString("") {
"%02x".format(it)
}
@@ -0,0 +1,34 @@
package nz.govt.eop.hilltop_crawler

import java.util.concurrent.TimeUnit
import nz.govt.eop.hilltop_crawler.api.requests.buildSiteListUrl
import nz.govt.eop.hilltop_crawler.db.DB
import nz.govt.eop.hilltop_crawler.db.HilltopFetchTaskType
import org.springframework.context.annotation.Profile
import org.springframework.scheduling.annotation.Scheduled
import org.springframework.stereotype.Component

/**
* This task is responsible for triggering the first fetch task for each source stored in the DB.
*
* It makes sure any new rows added to the DB will start to be pulled from within an hour.
*
* Each time it runs, it will create the initial fetch task for each source found in the DB. This
* relies on the task queue (via "ON CONFLICT DO NOTHING") making sure that duplicate tasks will not
* be created.
*/
@Profile("!test")
@Component
class CheckForNewSourcesTask(val db: DB) {

@Scheduled(fixedDelay = 1, timeUnit = TimeUnit.HOURS)
fun triggerSourcesTasks() {
val sources = db.listSources()

sources.forEach {
db.createFetchTask(
DB.HilltopFetchTaskCreate(
it.id, HilltopFetchTaskType.SITES_LIST, buildSiteListUrl(it.htsUrl)))
}
}
}
@@ -1,3 +1,6 @@
package nz.govt.eop.hilltop_crawler

const val HILLTOP_RAW_DATA_TOPIC_NAME = "hilltop.raw"
const val OUTPUT_DATA_TOPIC_NAME = "observations"

const val MAX_RESPONSE_SIZE = 20_000_000

0 comments on commit 9b8a459

Please sign in to comment.