Conversation
44c4ba4
to
19b1758
Compare
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
@@ -41,6 +41,7 @@ class WelcomeMessageModel(BaseModel): | |||
pubkey: str | |||
cephconf: str | |||
keyring: str | |||
peer_url: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm just nitpicking, but can we give this a slightly different name? Because it's peer_url, it makes me think it's a URL, but it's actually a string in the form "name=URL". Maybe call it, ummm... etcd_peer? Just something to make it slightly more obvious it's weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(oops, that comment should have been part of the review below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it should work :-) Missing a cache, for the case where etcd is dead, but you already noted that we'll need a cache in one of the commit messages, so that's fine. I have no particular reason not to approve this, but a few questions inline.
@@ -174,7 +174,8 @@ EOF | |||
fi | |||
|
|||
if [[ "$kiwi_profiles" == *"Ceph"* ]]; then | |||
pip install fastapi uvicorn aetcd3 | |||
pip install fastapi uvicorn \ | |||
https://celeuma.wipwd.dev/aqr/aetcd3/aetcd3-0.1.0a5.dev2+g0aa852e.d20210410-py3-none-any.whl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this referring to the fix for martyanov/aetcd#4, which was since closed? In that case, do we still need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the initial comment, but no. Even though that PR was closed, there's a couple of additional patches on top of the original repo, one of them to fix an aiofiles's dependency version -- original repo requires <0.6, while uvicorn requires >= 0.6, and it's annoying to have to deal with those dependencies when pip installs. Another one is a missing await
when shutting down the lib IIRC.
@@ -7,4 +7,4 @@ starlette==0.13.6 | |||
uvicorn==0.13.3 | |||
pip | |||
websockets==8.1 | |||
aetcd3 | |||
https://celeuma.wipwd.dev/aqr/aetcd3/aetcd3-0.1.0a5.dev2+g0aa852e.d20210410-py3-none-any.whl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in pip install
line
src/gravel/controllers/nodes/mgr.py
Outdated
logger.info(f"started etcd process pid = {process.pid}") | ||
t = threading.Thread(target=_bootstrap_process) | ||
t.start() | ||
logger.info("started etcd thread") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Starting it as a process, for some reason, allows etcd to squash the event loop's signal handlers, which breaks our "on shutdown" cleanup." Huh? As a separate process it shouldn't be able to do anything to the parent, should it? And if it can mess up the parent process, why wouldn't it also mess up the process when run as a thread? (I kinda feel like etcd should really be a separate process, just because it's its own daemon, rather than being some class or library that we import...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the docs, that seems to be the daemon's instance name. initial-cluster
seems much like mon_initial_cluster
from Ceph. Even if that might set the cluster's name to the name of the first node, maybe there's a way to do that without having to name the first node something that does not match the hostname? (I'll look into that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wtf - this last comment should have been a reply to something else entirely. github seems to be on fire today.
Anyhoo, to @tserong actual comment:
"Starting it as a process, for some reason, allows etcd to squash the event loop's signal handlers, which breaks our "on shutdown" cleanup." Huh? As a separate process it shouldn't be able to do anything to the parent, should it? And if it can mess up the parent process, why wouldn't it also mess up the process when run as a thread? (I kinda feel like etcd should really be a separate process, just because it's its own daemon, rather than being some class or library that we import...)
Yeah, I know. This is weird, really. I assumed the same thing, but for some reason, when starting an asyncio subprocess from a multithreading.Process, the signal handler is squashed; when starting the asyncio subprocess from a threading.Thread, it does not. I'll try getting a script to reproduce the behavior just to ensure I'm not imagining things though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'm struggling to reproduce this with a script because it works fine with a Process. I truly can't tell wtf was going on. I'll try reverting it and see what happens, I guess.
async def ensure_connection(self) -> None: | ||
""" Open k/v store connection """ | ||
# try getting the status, loop until we make it. | ||
opened = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we never make it? The KV store is unavailable forever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. This might warrant a timeout or something. But yeah, that's it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably fine for now. It'll also presumably benefit once we have a cache later (at least we'll be able to read cached K/Vs while we're waiting for etcd)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason github did not allow me to reply to @tserong without reviewing my own PR? 🤷
@@ -174,7 +174,8 @@ EOF | |||
fi | |||
|
|||
if [[ "$kiwi_profiles" == *"Ceph"* ]]; then | |||
pip install fastapi uvicorn aetcd3 | |||
pip install fastapi uvicorn \ | |||
https://celeuma.wipwd.dev/aqr/aetcd3/aetcd3-0.1.0a5.dev2+g0aa852e.d20210410-py3-none-any.whl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the initial comment, but no. Even though that PR was closed, there's a couple of additional patches on top of the original repo, one of them to fix an aiofiles's dependency version -- original repo requires <0.6, while uvicorn requires >= 0.6, and it's annoying to have to deal with those dependencies when pip installs. Another one is a missing await
when shutting down the lib IIRC.
src/gravel/controllers/nodes/mgr.py
Outdated
logger.info(f"started etcd process pid = {process.pid}") | ||
t = threading.Thread(target=_bootstrap_process) | ||
t.start() | ||
logger.info("started etcd thread") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the docs, that seems to be the daemon's instance name. initial-cluster
seems much like mon_initial_cluster
from Ceph. Even if that might set the cluster's name to the name of the first node, maybe there's a way to do that without having to name the first node something that does not match the hostname? (I'll look into that)
async def ensure_connection(self) -> None: | ||
""" Open k/v store connection """ | ||
# try getting the status, loop until we make it. | ||
opened = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. This might warrant a timeout or something. But yeah, that's it.
@@ -41,6 +41,7 @@ class WelcomeMessageModel(BaseModel): | |||
pubkey: str | |||
cephconf: str | |||
keyring: str | |||
peer_url: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, sounds good.
def _load(self) -> None: | ||
self._token = self._load_token() | ||
self._token = await self._load_token() | ||
await self._kvstore.watch("/nodes/token", _watcher) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a token obtained, and then the key watched for updates? Do we expect the token to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope so. In this case was meant mostly as a PoC that watching keys works, but thinking ahead, given I hope we'll let the user refresh their token should they want to.
Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Add new members on join. Relies on python's aetcd3 library for etcd shenanigans. Signed-off-by: Joao Eduardo Luis <joao@suse.com>
So we can have a consistent and up to date ceph.conf across all nodes, let's rely on cephadm to manage it. We don't drop the ceph.conf initially shared with a joining node because we want to ensure that node is able to perform operations on the cluster as soon as join finishes, and we don't want to have to wait for cephadm to write the ceph.conf. Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Also adds needed typings for aetcd3.locks module. Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Signed-off-by: Joao Eduardo Luis <joao@suse.com>
We needed to patch aetcd3 and the fix hasn't been merged upstream yet. So, we created our own package and uploaded it somewhere it can be reached. This is what we are installing now. Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Obtain state and watch changes, update as needed. Signed-off-by: Joao Eduardo Luis <joao@suse.com>
We're going to cache things in the near-future, and obtain state from the kvstore, watches and whatnot. We need it to be a full fledged service. Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Signed-off-by: Joao Eduardo Luis <joao@suse.com>
For those services needing to cleanup state, add a shutdown method to be called when we are shutting down. Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Signed-off-by: Joao Eduardo Luis <joao@suse.com>
Signed-off-by: Joao Eduardo Luis <joao@suse.com>
@tserong addressed all comments, I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested this manually, but I'm in favour of the general approach etc.
introduced by PR aquarist-labs#403 Signed-off-by: Michael Fritch <mfritch@suse.com>
We need something to share state across multiple nodes, consistently, and, ideally, without a single point of failure. This patchset proposes etcd as that thing.
Why etcd?
Because it exists, is simple enough to setup, and does not require Ceph to work.
Why not Ceph's monitors kvstore?
We could use the monitors' config-key/kv store, but a few things strike me as good enough reasons not to:
Few things of note
Resolves #32
Signed-off-by: Joao Eduardo Luis <joao@suse.com>