-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Postgresql Statefulsets with Replication to OpenShift/Kubernetes #4598
Comments
Pull request: #4599 For context, our project description is available in the class repo, which I just switched from private to public: |
@patrickdillon thanks for the pull request! As we discussed in today's call ( https://bluejeans.com/s/7dg33/ ) it sounds like there are a few tweaks you might want to make (possibly stetting replicas to 1?) and some additional testing you might want to do before we pass this to QA. If there's anything you need, please let us know! Thanks! |
@patrickdillon I assigned this issue to you and at the moment it's in code review at our kanban board at https://waffle.io/IQSS/dataverse . I can move it back to the "development" column if you'd like. Please just keep me posted when you'd like more code review or if you're done making commits and want to move it to QA. Thanks! |
I spoke with @djbrooke about this issue and pull request this morning and the plan is for me to QA it. I'm in the middle of other coding but I'll try to get to it soon. @patrickdillon made some changes which are reflected in the pull request and which we just discussed at https://bluejeans.com/s/ygEhi |
The change to First, I add a remote and switch to the branch behind the pull request:
Then, I edit
To get ready to push images to DockerHub, I clean out the war file and the installer:
I take a look at https://hub.docker.com/r/iqss/dataverse-glassfish/tags/ to make sure I'm not going to overwrite someone else's tag. Then, I run the build script, pushing to the branch name:
This is the part that takes forever. Stay tuned! |
@patrickdillon I'm afraid I'm struggling a bit to test pull request #4599. "dataverse-glassfish" is saying "No deployments. A new deployment will start automatically when an image is pushed to project1/dataverse-plus-glassfish:4598-openshift-postgresql." But I've already pushed the tag to Docker Hub. I'll included screenshots below. I'll also attach my Here's my @danmcp @DirectXMan12 if you have any ideas for me, please let me know. I hope it's just that I'm forgetting to do something simple. @patrickdillon maybe you or one of the other students can try the image I pushed to the |
@pdurbin Unfortunately I never really figured out tags. I spent some time this morning trying to look at this in particular but I would need more time. Perhaps @danmcp or @DirectXMan12 could fix it the right way or show best practices. I could never get any tags besides latest to work so my workflow was always to copy the image to my own personal repo with the latest tag. So it doesn't solve your exact problem but you could copy the image with a different name and the latest tag to your repo or the IQSS repo. That might let you test until we resolve the tag issue. Let me know if that doesn't make sense. |
@pdurbin To be clear, if you manually hit the deploy button, does it work? |
@patrickdillon ok, I guess I'd rather not pollute the "latest" tag at https://hub.docker.com/r/iqss/ with experimental images but it sounds like you're saying I could set up a "pdurbin" Docker Hub account or whatever instead of using the "iqss" organization. This is like how I have my own fork of Dataverse under my GitHub username. That way I could leave the tag along and use "latest". Sure. I could try that. Good idea Thanks. I didn't realize that you were having an issue with tags. @danmcp right. Here are screenshots from when I tried clicking "Deploy": It's as if the tag I expected to be there ( |
Ok, I'm hacking on Please stay tuned. Here are the changes I made:
|
@patrickdillon success! It worked on the first try when I pushed images to https://hub.docker.com/u/pdurbin/ rather than https://hub.docker.com/u/iqss/ and left the tag alone, keeping it as "latest". Here's the change from "iqss" to "pdurbin" I made locally, which I won't commit:
Here's a screenshot of Dataverse running under this new OpenShift config. I'm still at 51cecfc because I haven't committed my change to I can see we're using Stateful Sets now ("Technology Preview"! 😄 ): Questions:
|
@pdurbin I think the change to build.sh is great. I am glad to see these changes work with the Solr update. Regarding further testing, in the pull request there is a short example of how to open the psql client on each container to show replication. In my example I just listed tables but once you open the client you could obviously do any sql query; such as check for your new dataverse. Let me kno if those instructions are unclear. |
@patrickdillon ah, thanks. I didn't even think to look at the pull request but now I have more questions. And these are fundamental OpenShift questions so please forgive me. I only have one instance of postgres, right? Since replicas is set to 1. How in the GUI or via I'm glad you're cool with the change to |
Whoops, I meant to put the issue number rather than the pull request number in my commit for |
I just found @patrickdillon honestly, I'm thinking about going ahead and merging your pull request because it doesn't seem to do any harm and I'd like @MichaelClifford to have his changed made on top of your changes. I assume you're both editing |
@pdurbin Oh, right, I forgot I reduced the number of replicas before committing. The only way I know of increasing the number of replicas is by editing the .json and starting a new project. That should be enough for testing, but there is probably another way, such as |
@patrickdillon whoops, I think we both posted at the same time. 😃 How do you and @MichaelClifford feel about me merging your pull request and maybe testing some of the replica stuff in his? His will have replicas too, right? Three Glassfish replicas or whatever? Again, I'm mostly just trying to stay conscious of merge conflicts in |
@pdurbin That sounds good. Personally, I think merging it would be fine and it will work. Obviously we need to test, but I expect that to work out. Regarding process, should we submit a new pr after having merged on our fork? |
…postgresql Add Postgresql statefulsets with master/slave replication on OpenShift. #4598
@patrickdillon @MichaelClifford I just merged the pull request for this issue (#4599) because (again) it does no harm and even thought I haven't full tested all the replica goodness there will be another opportunity in the new pull request that @MichaelClifford makes for #4617. So let's move our attention to that issue and make sure we branch from the latest in develop so we can make further edits to |
As I mentioned at standup, I wanted to spend a little more time on this issue before moving on since I have it all running on my laptop anyway. At a high level, I wanted to:
I'm happy to report that I was able to do all of the things above, but as a project we should think more about where we're going with this effort. It's really interesting technology. I'll go through each of the items above. Scaling the number of replicasIn a previous comment, I already posted an image of how the number of replicas is set to one. Here's how you can tell from the command line:
To scale from 1 postgres replica to 3, you can run
The GUI also reflects that the number of postgres replicas is now 3: Confirm that data is being replicated from a postgres master to slaves"dataverse-postgresql-0" is the master so I connected to the console for "dataverse-postgresql-1" to make sure data is being replicated there, and it is: I then followed the approach by @patrickdillon in the video posted at https://groups.google.com/d/msg/dataverse-community/TSxf4MTYYjg/7VJB_-GJBAAJ to make an edit in Dataverse and check that the edit is replicated from the postgres master to a postgres slave: Kill the the postgres master and see what happensThen I deleted the postgres master and not surprisingly, Dataverse isn't happy about this, showing errors on the home page and a stack trace in the Glassfish log: Think about what the next steps might beWhat surprised me a bit is that the postgres master came back. I guess this is because I set the number of replicas to 3 so the system is just bringing the number back in line with what I had declared. However, while the newly recreated master is back and had a "dvndb" database, the database was empty: I'm not familiar enough with postgres to know what to do to get Dataverse back up at this point but I guess I'd work on...
Something like that. Anyway, I don't want this disaster recover scenario I'm talking about to overshadow the fact that all of this automatic replication is very cool and a great step forward. Thanks you @patrickdillon and the rest of the students for working on this! I'm going to close this issue but anyone reading this is welcome to leave comments! |
@pdurbin Regarding master recovery we were aware of this issue but I forgot to document in the pull request. I have been working with @ryanmorano on recovering the master's state. The solution we are trying is to create a persistent volume on the cluster and then a persistent volume claim for the statefulset. In our testing, we were only able to make this work with one replica but after discussion with @danmcp he suggested that if we create as many persistent volumes as statefulsets then it should work. We have yet to test it. |
@patrickdillon right, now that you mention it I remember that we all talked about how the data in these containers is ephemeral and talk about persistent volumes. And back when I did the initial work on OpensShift support in #4168 I added an item under "known issues" in this area. http://guides.dataverse.org/en/4.8.6/developers/containers.html#known-issues-with-dataverse-images-on-docker-hub says, "The storage should be abstracted. Storage of data files and PostgreSQL data. Probably Solr data." This business of persistent data is what I was getting at. I still think it's neat that the replication "just works". You can have only a postgres master and be humming along. Then you come along and bump up the number of replicas and each of those new slaves get the data from the master. It's like magic. 😄 |
I just ran
That way anyone who tries to follow http://guides.dataverse.org/en/4.8.6/developers/containers.html#openshift will get a working Dataverse installation. It works on my machine, anyway. It's easy to forget to push to "latest" on Docker Hub after a pull request that touches Anyway, again, I just tested "latest" after pushing and it seems fine. |
As students in Boston University's EC528 Cloud Computing Course, my team has been working with @pdurbin @danmcp & @DirectXMan12 to further work on #4040 & #4168.
I have been working on scaling the postgres pods and am ready to open a pull request. My solution entails creating a statefulset and providing a new command at startup for the centos/postgres image.
The text was updated successfully, but these errors were encountered: