Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate existing pieces to new pluggable component architecture, part 1 clean up the unused pieces #214

Closed
Tracked by #26
tlvu opened this issue Nov 3, 2021 · 22 comments · Fixed by #311
Closed
Tracked by #26
Assignees

Comments

@tlvu
Copy link
Collaborator

tlvu commented Nov 3, 2021

The following are scheduled to be removed:

  frontend:
    image: pavics/pavics-frontend:1.0.5
  project-api:
    image: pavics/pavics-project-api:0.9.0
  catalog:
    image: pavics/pavics-datacatalog:0.6.11
  solr:
    image: pavics/solr:5.2.1
  ncwms2:
    image: pavics/ncwms2:2.0.4

Includes finding and removing all references from

  • configs of those components above, and remove the following unused directory birdhouse/config/ncops
  • canarie-api, magpie and other remaining components if exist
  • notebooks tested by Jenkins only. Other notebooks are not included.
@mishaschwartz
Copy link
Collaborator

Hi @tlvu. I'm working on rethinking the deployment strategy for DACCS/PAVICS over at UofT. Do you mind if I take over this issue as well as the other "Migrate existing pieces to new pluggable component architecture" issues in this repository?

@tlvu
Copy link
Collaborator Author

tlvu commented Feb 7, 2023

@mishaschwartz not at all. Please let me know if you need any further info.

You can see the pluggable component architectures already in place for new components/ and optional-components/already. It also works for external repos, ex: https://github.com/bird-house/birdhouse-deploy-ouranos

The plan is to break out all the currently enabled components by default the same way.

The tricky part is to keep backward compatibility with the current default when user do not specify any extra components.

The other tricky part is inter-dependencies between the various components. New WPS services (birds) are self-contained with their own DB (ex: https://github.com/bird-house/birdhouse-deploy/tree/master/birdhouse/optional-components/generic_bird), but all current WPS services (Finch, Raven, Hummingbird, ...) all share the same DB !

There are also some tightly coupled components, ex: Twitcher and Magpie. These 2 versions have to always match and they probably do not make sense to exist as separate components, ex: activating Magpie alone without Twitcher make a little sense.

There are currently 3 PAVICS deployments: Ouranos, CRIM and PCIC. I will tag the persons responsible for CRIM and PCIC below.

Please share your thoughts and code change early with us so we can plan the upgrade path on our side.

Ideally all changes are small, incremental and backward compatible so upgrade is painless. For sure this is not 100% possible so we have a find a middle ground. I intentionally break the task into many smaller parts to hopefully make it small and incremental.

Note there is an open PR #287 that removes unused components, could be used to implement this part 1.

@matprov (CRIM) @eyvorchuk (PCIC)

@tlvu
Copy link
Collaborator Author

tlvu commented Feb 7, 2023

If at some part, we can not maintain backward compat, then we could also include a small change that has a big compatibility impact: remove the version in each docker-compose.yml file. That version field is optional and the fact that it is there block us from using any newer features, unless we bump the version and break all external components. Very annoying.

@tlvu
Copy link
Collaborator Author

tlvu commented Feb 7, 2023

@mishaschwartz
Copy link
Collaborator

@tlvu thanks for all that info! As far as backwards compatibility, our eventual goal is to transform the PAVICS platform to be a node in a larger network of similar nodes (the DACCS project). This might, and probably should, require us to break backwards compatibility with the current PAVICS stack at some point.
I will make sure that if we break backwards compatibility at any point we make sure that the migration instructions are very clear so that we don't lose any data or functionality upgrading the PAVICS stack to something new.

Another issue that we currently have is that we currently don't have a server over at UofT to spin up an instance of PAVICS for staging or testing. What do you currently do when you're developing the project? Do you spin up an instance in a vagrant VM locally on your own laptop or is there a friendlier development environment that you use?

@tlvu
Copy link
Collaborator Author

tlvu commented Feb 8, 2023

@mishaschwartz Agreed, 100% back compat is impossible. The way I see this is, this task is broken up in many parts, some parts can be back compat and some can not.

Each part should go-live incrementally so we do not have a big bang style of go-live with all the risks and stress associated.

So for the parts we can not maintain back compat, let's try to group all the broken changes together so the number of not back compat go-live is not too often and not too big to ease the upgrade path. We can broke this effort into even smaller parts if it helps make the upgrade path smoother.

Can you elaborate about "a node in a larger network of similar nodes (the DACCS project)"? Currently, I think it is already possible. Any scripts can talk to any WPS services, and those WPS services do not have to be from the same server. Also all data via Thredds and Geoserver are also available for all WPS services, not just the WPS services locally on the same server. This is possible because we use standards (WPS, WMS, OpenDAP, ...) as our "interface" between the various components.

As for my test environment, I have several Vagrant VM so I can work simultaneously between different PR, vagrant instruction here https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/README.rst#vagrant-instructions

You would need a rather beefy machine to run that Vagrant VM. Give the VM at least 100G disk and 12G ram, but the more the better. My various VMs are on my workstation at work, not on my laptop.

I never got self-signed SSL cert to work so your VM need to be in a DMZ so you can open port 80 and 443 for LetsEncrypt SSL cert to work.

Jenkins have the ability to target a specific hostname so you can have many instances of PAVICS but you only need one instance of Jenkins.

Let me know if you need any further help to get started with setting up your environment. We can video conference at some point.

@mishaschwartz
Copy link
Collaborator

@tlvu

Can you elaborate about "a node in a larger network of similar nodes (the DACCS project)"?

The goal is to make PAVICS/DACCS work similarly to how ESGF sets up their nodes (https://esgf.llnl.gov/node-design.html). So, yes the services are all visible and usable no matter which server they're actually running on; but a user still needs to know where each service is available in order to know which URL to point to. In the DACCS network, each node is aware of which services, data, etc. each other node is hosting which makes it much easier for a user to discover tools and data that fits their needs. Whether the service/data is hosted on the current node or another node somewhere else in the network.

@tlvu
Copy link
Collaborator Author

tlvu commented Feb 8, 2023

In the DACCS network, each node is aware of which services, data, etc. each other node is hosting which makes it much easier for a user to discover tools and data that fits their needs.

@mishaschwartz I understand now! Eager to see your take on this interesting feature.

@tlvu
Copy link
Collaborator Author

tlvu commented Feb 9, 2023

I never got self-signed SSL cert to work so your VM need to be in a DMZ so you can open port 80 and 443 for LetsEncrypt SSL cert to work.

Forgot to say, this means you will also need DHCP reservation so your VM have a specific hostname for the LetsEncrypt cert to be associated to.

@tlvu
Copy link
Collaborator Author

tlvu commented Feb 9, 2023

Other PR removing unused components, related to this issue: #292, #291

@mishaschwartz Once you get your PAVICS VM and Jenkins up and running, testing for those PR will let you fully exercice your setup. Let me know if you need help for anything.

@mishaschwartz
Copy link
Collaborator

Can we also remove the following directory birdhouse/config/ncops or is being used somewhere?

@tlvu
Copy link
Collaborator Author

tlvu commented Feb 14, 2023

Can we also remove the following directory birdhouse/config/ncops or is being used somewhere?

Not used somewhere as far as I know.

@tlvu
Copy link
Collaborator Author

tlvu commented Feb 16, 2023

@eyvorchuk do you know anyone on PCIC side use Phoenix? Can we remove it from the stack as part of this effort?

@eyvorchuk
Copy link
Collaborator

@eyvorchuk do you know anyone on PCIC side use Phoenix? Can we remove it from the stack as part of this effort?

I haven't used it so far, but I'll check with my supervisor.

@mishaschwartz
Copy link
Collaborator

@eyvorchuk any word on whether phoenix is being used at PCIC?

@eyvorchuk
Copy link
Collaborator

@eyvorchuk any word on whether phoenix is being used at PCIC?

I believe we use it for a few of our birds for logging purposes, but I don't know if this is necessary.

@mishaschwartz
Copy link
Collaborator

@eyvorchuk Ok why don't we keep it in the stack for now.

We may eventually decide to remove it though so if there is another way to do the logging it may be worth looking into. What sort of logging info are you getting from phoenix (I wonder if that info is easily available elsewhere)?

@eyvorchuk
Copy link
Collaborator

@eyvorchuk Ok why don't we keep it in the stack for now.

We may eventually decide to remove it though so if there is another way to do the logging it may be worth looking into. What sort of logging info are you getting from phoenix (I wonder if that info is easily available elsewhere)?

I've just been going off of comments from my predecessor, so I haven't actually personally used phoenix. How would I figure out where it's being used?

@mishaschwartz
Copy link
Collaborator

@eyvorchuk My best guess would be to look for any monitoring/logging scripts that are looking at port 8443.

Another option would be to stop it running with docker stop phoenix, make a few requests to some of the WPS birds and see if your logs are as you expect (this might break things though so I'd be careful doing this for your production environment if it is currently in use).

@tlvu do you have any other suggestions for how to check this?

@tlvu
Copy link
Collaborator Author

tlvu commented Mar 22, 2023

@eyvorchuk
Not sure what kind of logs you are referring to.

If it is to get docker logs of the various birds, technically this issue #218 might solve it. I have not tried what CRIM has deployed for CCDP so not 100% sure it will have docker logs but I think it would have the logs under /var/log.

@mishaschwartz FYI the CCDP stack is the duplicated stack with PAVICS but having only Finch and Weaver. Hopefully with your modular change, CRIM will be able to use the PAVICS stack and even re-integrate/merge their improvements (ex: this logs monitoring) back into PAVICS stack.

@eyvorchuk
Copy link
Collaborator

Just did some runs of the birds with and without phoenix running, and our docker logs were the same. I should mention that in our logging method, we output to a logger and use response.update_status. Does the latter method make use of phoenix?

@tlvu
Copy link
Collaborator Author

tlvu commented Mar 22, 2023

Just did some runs of the birds with and without phoenix running, and our docker logs were the same. I should mention that in our logging method, we output to a logger and use response.update_status. Does the latter method make use of phoenix?

@eyvorchuk

Not sure I can answer this since I do not use Phoenix.

But my understanding is Phoenix act as a web client for the various WPS birds, allowing you to craft WPS calls without writing raw XML.

This is the same as OWSLib and our birdy that wraps OWSLib. Both avoid us from having to manually write all the XML and deal with HTTP POST/GET directly.

This is an example of our notebook using birdy as a WPS client https://github.com/bird-house/finch/blob/master/docs/source/notebooks/finch-usage.ipynb

Documentation about birdy https://birdy.readthedocs.io/en/latest/api.html. Looks like birdy also works as a commandline client if that is needed.

Not sure if birdy is able to replace your usage of Phoenix or provide equivalent of response.update_status, which I am not sure what it means. Maybe if you can provide an real example how Phoenix is being used, with that response.update_status?

mishaschwartz added a commit that referenced this issue Jul 6, 2023
## Overview

Move unused and unmaintained components to a separate
`deprecated-components/` subdirectory and remove
them from the DEFAULT_CONF_DIRS list if required.

## Changes

**Non-breaking changes**

**Breaking changes**
- The following components are no longer present by default:
  - catalog
  - frontend
  - malleefowl
  - ncops
  - ncwms2
  - project-api
  - solr
  - phoenix

## Related Issue / Discussion

- Resolves #214
- Closes #287 
- Resolves #206
- Closes #291 
- Resolves #9
- Closes #292 
- Resolves #290 

## Additional Information
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment