catalog harvester process runs in cloud.gov #2810

jbrown-xentity · 2021-02-12T20:19:10Z

User Story

As a catalog admin, I want a harvester in cloud.gov so that data can be updated.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

GIVEN a catalog web application is running in cloud.gov environment
WHEN a harvest request is made through the UI
THEN the gather and harvest processes pick up the job
AND the harvest runs to completion

Background

Will need to turn on catalog application but use different start/run commands to implement the various components of the jobs. Will require running the gather, fetch, and run jobs. See here for examples. Should implement each of these as a separate application.

Security Considerations (required)

None, all harvests are done of publicly available data.

Sketch

Should use the Procfile to specify different start commands

The text was updated successfully, but these errors were encountered:

pjsharpe07 · 2021-02-18T16:41:53Z

fgdc2iso app is up in all three spaces but is currently stopped. Turn them off before attempting to harvest any geospatial harvests.

adborden · 2021-03-15T22:08:16Z

Instead of porting the existing harvest system to cloud.gov, we should look at picking up the harvest-ng work #2995

jbrown-xentity · 2021-10-27T21:21:41Z

Using processes in manifest file instead of the Procfile, as the Procfile doesn't give the full options that the manifest file has...

mogul · 2021-10-29T17:31:36Z

Currently hitting quota limits... Need to revisit them after we move Solr from apps to brokered

mogul · 2021-11-08T21:19:04Z

@jbrown-xentity is this still blocked now that we raised our quota?

jbrown-xentity · 2021-11-08T22:01:47Z

Have not checked. There were 2 main blockers: quota, and solr changing to service in cloud.gov. The current solr service was not connecting to the dev app, and so creating and running a harvest would have been pointless. Turns out we needed to run this step of add-network-policy to setup catalog and catalog-solr to connect. So this can be unblocked, and the following would need to be run to check if harvests are working:

Merge the draft branch onto develop
Run cf run-task to create a harvest source that is working on prod
Verify harvests worked

Normally the above could be done via the UI, but is blocked by #3508, although testing ckan one-off commands through cf run-task would be a good thing to validate...

jbrown-xentity · 2021-11-08T22:58:18Z

Never mind, it's blocked by #3508. While you can create a harvest source and users through the command line, we can't create organizations with the cli. Since a harvest source has to be in an org, this breaks the workflow of validating this.
Once the login is working, this becomes very straightforward to login, setup the user as a sysadmin as a one-off, and then do all other testing via the UI. The branch is pushed to development as a test, and all pieces are working as expected...

jbrown-xentity · 2021-11-11T16:26:03Z

The code is deployed, but to demonstrate our acceptance criteria we need to be able to log in. Leaving as blocked for now...

jbrown-xentity · 2021-11-19T20:51:23Z

The processes are running, and restart if they fail.
You can see that staging harvested data.
You can see harvest sources here.
Created #3549 and #3550 as follow up items; not sure that either needs to be fixed before go-live.

jbrown-xentity added the component/catalog Related to catalog component playbooks/roles label Feb 12, 2021

jbrown-xentity mentioned this issue Feb 12, 2021

Deploy catalog application on cloud.gov #2604

Closed

17 tasks

mogul changed the title ~~Create catalog harvester application~~ catalog harvester process runs in cloud.gov Feb 25, 2021

jbrown-xentity mentioned this issue Oct 19, 2021

Add java and saxon to Docker GSA/catalog.data.gov#358

Merged

FuhuXia self-assigned this Oct 20, 2021

jbrown-xentity self-assigned this Oct 27, 2021

jbrown-xentity mentioned this issue Oct 27, 2021

Feature/harvest running GSA/catalog.data.gov#369

Merged

jbrown-xentity mentioned this issue Nov 19, 2021

Add harvest-run, rename restart GSA/catalog.data.gov#373

Merged

jbrown-xentity unassigned FuhuXia Nov 29, 2021

mogul added this to the Sprint 20211129 milestone Nov 29, 2021

mogul closed this as completed Nov 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

catalog harvester process runs in cloud.gov #2810

catalog harvester process runs in cloud.gov #2810

jbrown-xentity commented Feb 12, 2021 •

edited

Loading

pjsharpe07 commented Feb 18, 2021

adborden commented Mar 15, 2021

jbrown-xentity commented Oct 27, 2021

mogul commented Oct 29, 2021

mogul commented Nov 8, 2021

jbrown-xentity commented Nov 8, 2021 •

edited

Loading

jbrown-xentity commented Nov 8, 2021

jbrown-xentity commented Nov 11, 2021

jbrown-xentity commented Nov 19, 2021

catalog harvester process runs in cloud.gov #2810

catalog harvester process runs in cloud.gov #2810

Comments

jbrown-xentity commented Feb 12, 2021 • edited Loading

User Story

Acceptance Criteria

Background

Security Considerations (required)

Sketch

pjsharpe07 commented Feb 18, 2021

adborden commented Mar 15, 2021

jbrown-xentity commented Oct 27, 2021

mogul commented Oct 29, 2021

mogul commented Nov 8, 2021

jbrown-xentity commented Nov 8, 2021 • edited Loading

jbrown-xentity commented Nov 8, 2021

jbrown-xentity commented Nov 11, 2021

jbrown-xentity commented Nov 19, 2021

jbrown-xentity commented Feb 12, 2021 •

edited

Loading

jbrown-xentity commented Nov 8, 2021 •

edited

Loading