Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catalog harvester process runs in cloud.gov #2810

Closed
1 task done
jbrown-xentity opened this issue Feb 12, 2021 · 9 comments
Closed
1 task done

catalog harvester process runs in cloud.gov #2810

jbrown-xentity opened this issue Feb 12, 2021 · 9 comments
Assignees
Labels
component/catalog Related to catalog component playbooks/roles

Comments

@jbrown-xentity
Copy link
Contributor

jbrown-xentity commented Feb 12, 2021

User Story

As a catalog admin, I want a harvester in cloud.gov so that data can be updated.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN a catalog web application is running in cloud.gov environment
    WHEN a harvest request is made through the UI
    THEN the gather and harvest processes pick up the job
    AND the harvest runs to completion

Background

Will need to turn on catalog application but use different start/run commands to implement the various components of the jobs. Will require running the gather, fetch, and run jobs. See here for examples. Should implement each of these as a separate application.

Security Considerations (required)

None, all harvests are done of publicly available data.

Sketch

Should use the Procfile to specify different start commands

@jbrown-xentity jbrown-xentity added the component/catalog Related to catalog component playbooks/roles label Feb 12, 2021
@pjsharpe07
Copy link
Contributor

fgdc2iso app is up in all three spaces but is currently stopped. Turn them off before attempting to harvest any geospatial harvests.

@mogul mogul changed the title Create catalog harvester application catalog harvester process runs in cloud.gov Feb 25, 2021
@adborden
Copy link
Contributor

Instead of porting the existing harvest system to cloud.gov, we should look at picking up the harvest-ng work #2995

@jbrown-xentity
Copy link
Contributor Author

Using processes in manifest file instead of the Procfile, as the Procfile doesn't give the full options that the manifest file has...

@mogul
Copy link
Contributor

mogul commented Oct 29, 2021

Currently hitting quota limits... Need to revisit them after we move Solr from apps to brokered

@mogul
Copy link
Contributor

mogul commented Nov 8, 2021

@jbrown-xentity is this still blocked now that we raised our quota?

@jbrown-xentity
Copy link
Contributor Author

jbrown-xentity commented Nov 8, 2021

Have not checked. There were 2 main blockers: quota, and solr changing to service in cloud.gov. The current solr service was not connecting to the dev app, and so creating and running a harvest would have been pointless. Turns out we needed to run this step of add-network-policy to setup catalog and catalog-solr to connect. So this can be unblocked, and the following would need to be run to check if harvests are working:

  1. Merge the draft branch onto develop
  2. Run cf run-task to create a harvest source that is working on prod
  3. Verify harvests worked

Normally the above could be done via the UI, but is blocked by #3508, although testing ckan one-off commands through cf run-task would be a good thing to validate...

@jbrown-xentity
Copy link
Contributor Author

Never mind, it's blocked by #3508. While you can create a harvest source and users through the command line, we can't create organizations with the cli. Since a harvest source has to be in an org, this breaks the workflow of validating this.
Once the login is working, this becomes very straightforward to login, setup the user as a sysadmin as a one-off, and then do all other testing via the UI. The branch is pushed to development as a test, and all pieces are working as expected...

@jbrown-xentity
Copy link
Contributor Author

The code is deployed, but to demonstrate our acceptance criteria we need to be able to log in. Leaving as blocked for now...

@jbrown-xentity
Copy link
Contributor Author

The processes are running, and restart if they fail.
You can see that staging harvested data.
You can see harvest sources here.
Created #3549 and #3550 as follow up items; not sure that either needs to be fixed before go-live.

@mogul mogul added this to the Sprint 20211129 milestone Nov 29, 2021
@mogul mogul closed this as completed Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/catalog Related to catalog component playbooks/roles
Projects
None yet
Development

No branches or pull requests

5 participants