Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use troupe crate #202

Open
wants to merge 11 commits into
base: development
Choose a base branch
from
Open

Use troupe crate #202

wants to merge 11 commits into from

Conversation

orblivion
Copy link
Contributor

It compiles at least. Seems like the core stuff were handlers for stores that should be permanent. The stuff in clients were created with different ids and such so they should be transient. Though looking again, I'm unsure about whether the ManagerState should have been permanent.

send now seems to return a boolean where it didn't before. Assuming the value implies success, I figured we may not want to ignore it, but I assume the panics I put in place are not the right answer.

@orblivion
Copy link
Contributor Author

Note that this would merge into development

Copy link
Member

@TylerBloom TylerBloom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good so far. What else needs to be done?

Comment on lines 21 to 31
use troupe::{
ActorBuilder,
ActorState,
Permanent,
Scheduler,
sink::{
SinkActor,
SinkClient,
permanent::Tracker,
},
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You should be able to use a wildcard import for this.

Suggested change
use troupe::{
ActorBuilder,
ActorState,
Permanent,
Scheduler,
sink::{
SinkActor,
SinkClient,
permanent::Tracker,
},
};
use troupe::prelude::*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I would have thought that wildcard is less preferable because it could clobber names. Though I'm guessing that Rust would tell you if that happened (unlike Python). Do you recommend it generally or just if I'm importing a ton of stuff like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinions on wildcard imports vary. You are correct that Rust will warn you about this (even between different versions of the same crate). But, generally speaking, if a crate has a prelude module, this module should be thought of as "tools you need to use this crate" and/or "things you are going to commonly import".

As a general rule, I usually only use wildcard imports on prelude modules or something similar.

@@ -16,3 +16,14 @@ impl Debug for SessionCommand {
}
}
}

impl Debug for AccountCommand {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary for this PR, but I wonder if we could use the derive_more crates macros to generate this.

@orblivion
Copy link
Contributor Author

This looks good so far. What else needs to be done?

  1. Did you double-check the permanence choices I made? After making this PR I decided some of the Transient front end stuff should probably be Permanent.

  2. SinkClient.send now "bubbles up the panic" (per a comment in the deleted actor.rs) by returning a bool instead of (). I figured we should handle it somehow but I didn't know how. As a placeholder I have it panic if it returns false. I doubt that's actually what you want. So I have to figure out a better response.

@orblivion
Copy link
Contributor Author

Could you clarify - if an actor message send fails, what exactly happened? Should we just retry until it succeeds? (Should that retry just be wrapped in a helper?) I took a crash course on Erlang one time long ago but I think I have more to learn about the actor model.

state.handle_new_onlooker(id, user, ws).await;
if !state.handle_new_onlooker(id, user, ws).await {
panic!("TODO what if this fails")
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the point it gets here, we've already assumed that AnyUser::convert returned Ok(...), so it seems we're already ready to panic. Should we just panic here? Does the router handle panics nicely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait that's not right. It's not panicking in the case of not Ok(token), it's just returning. So I should probably silently fail like before, then, right?

@@ -85,7 +94,9 @@ where
match msg {
GatheringHallMessage::NewGathering(id) => self.process_new_gathering(id).await,
GatheringHallMessage::NewConnection(id, user, ws) => {
self.process_new_onlooker(id, user, ws).await
if !self.process_new_onlooker(id, user, ws).await {
panic!("TODO what if this fails")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this is the handler for an event fired off by this function call:

https://github.com/SquireTournamentServices/SquireCore/pull/202/files#diff-40dc5ead6a1b57fd7eb11a9736bf00f2ebf2f32b97bb5ea994dc2eaea1d67a58R110

If the above event firing fails and we never get here, might panic, and maybe the router handles it (as I mentioned in a comment there). But what if this message send fails? I'd think we'd want to propagate that failure back to the router, but I don't think we can anymore. But from what I remember from Erlang, the actor model is supposed to be tolerant of errors. So I'm not really sure what to do.

I guess before I put this in, you were effectively ignoring the errors anyway. Should I just go back to that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From another thread:

The only time a message can fail is when a permanent actor has panicked.

Okay that's something I hadn't internalized. I see that track waits for a response, and will panic if the actor is permanent and panics (not sure about transient actors that panic) whereas send fires and forgets, ignoring(?) actor panics.

With that in mind, it seems that this panic would be ignored?

        self.gatherings
            .send(GatheringHallMessage::NewConnection(id, user, ws))

As well as the panics below in GatheringHallMessage::Persist since they're scheduled with a delay:

    scheduler.schedule(
        Instant::now() + Duration::from_secs(5),
        GatheringHallMessage::Persist,
    );

Are these actually okay to ignore?

let tourns = ActorClient::builder(TournPersister::new(tourn_db.clone())).launch();
let gatherings = ActorBuilder::new(GatheringHall::new(tourns.clone())).launch();
let tourns = ActorBuilder::new(TournPersister::new(tourn_db.clone())).launch();
let gatherings = ActorBuilder::new(GatheringHall::<TournPersister>::new(tourns.clone())).launch();
AppState {
Copy link
Contributor Author

@orblivion orblivion Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to be sure of my reasoning here: it looks like sessions, accounts, and gatherings are clients attached the app state, so those handlers should be Permanent. I think that tourns stays attached to gatherings, which means that that should also be Permanent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. All of those should be permanent. The only actors that should be transient are the Gatherings, which are internally managed by the GatheringHall actor.

@@ -61,7 +66,11 @@ pub enum NetworkCommand {

#[async_trait]
impl ActorState for NetworkState {
type Permanence = Transient;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetworkState attaches to a SquireClient which is created at the top level of main. I guess it should be Permanent.

(I initially thought it was a per-request sort of thing because I misunderstood what url was)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is no reason for this to fail.

@@ -24,7 +32,7 @@ use crate::{
/// A container for the channels used to communicate with the tournament management task.
#[derive(Debug, Clone)]
pub struct TournsClient {
client: ActorClient<ManagerState>,
client: ActorClient<Transient, ManagementCommand>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, ActorClient? How did this compile? Maybe the front end doesn't compile when you run cargo shuttle run?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The troupe actor model is built with compiling to WASM in mind. Both the front and back ends used it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was rather unclear - My point is that ActorClient is the old name that you used in actor.rs, which became SinkClient etc. I'm confused as to how cargo shuttle run started without complaining about this. My hypothesis was that it wasn't compiling the front end (which means I probably have plenty more errors to fix).

@orblivion
Copy link
Contributor Author

orblivion commented Dec 11, 2023

I think ManagerState clients should be changed to Permanent for the same reason as NetworkState.

Gathering (in the client) however seems like it may want to stay as Transient, since you spawn them and add them to a collection.

But I'm realizing now that the type checker should correct me if I'm getting permanence wrong. I just need to be able to compile the client.

@TylerBloom
Copy link
Member

  1. Did you double-check the permanence choices I made? After making this PR I decided some of the Transient front end stuff should probably be Permanent.

Hmm, basically all actors we had were permanent except for the Gathering. @akbulutdora implemented a feature to have the Gathering and GatheringHall communicate was the gathering was going to end. That should probably be taken into consideration.

@TylerBloom
Copy link
Member

Could you clarify - if an actor message send fails, what exactly happened? Should we just retry until it succeeds? (Should that retry just be wrapped in a helper?) I took a crash course on Erlang one time long ago but I think I have more to learn about the actor model.

Sure. The only time a message can fail is when a permanent actor has panicked. Messages to and from an actor use in-memory channels, not something like interprocess communication, etc. Under the current model, this is an unrecoverable error from an outsider's perspective. We do not have something like a manager model yet.

Copy link

codecov bot commented Dec 20, 2023

Codecov Report

Attention: 50 lines in your changes are missing coverage. Please review.

Comparison is base (bf0eac7) 30.25% compared to head (cb4d1c3) 29.28%.
Report is 3 commits behind head on development.

❗ Current head cb4d1c3 differs from pull request most recent head b0b55f4. Consider uploading reports for the commit b0b55f4 to get more accurate results

Files Patch % Lines
squire_sdk/src/client/tournaments.rs 0.00% 14 Missing ⚠️
squire_sdk/src/server/gathering/hall.rs 9.09% 10 Missing ⚠️
squire_core/src/state/boilerplate.rs 0.00% 6 Missing ⚠️
squire_core/src/state/session.rs 14.28% 6 Missing ⚠️
squire_sdk/src/client/network.rs 0.00% 4 Missing ⚠️
squire_core/src/state/mod.rs 40.00% 3 Missing ⚠️
squire_sdk/src/server/gathering/mod.rs 0.00% 3 Missing ⚠️
squire_core/src/state/accounts.rs 33.33% 2 Missing ⚠️
squire_sdk/src/server/tournaments.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff               @@
##           development     #202      +/-   ##
===============================================
- Coverage        30.25%   29.28%   -0.98%     
===============================================
  Files               77       76       -1     
  Lines             4022     3961      -61     
===============================================
- Hits              1217     1160      -57     
+ Misses            2805     2801       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TylerBloom TylerBloom force-pushed the development branch 3 times, most recently from e393898 to bf0eac7 Compare December 21, 2023 19:17
@@ -63,6 +63,8 @@ headers = { version = "0.4", optional = true }
# To be moved
hashbag = { version = "0.1.11", features = ["serde"] }
derive_more = "0.99.17"
hyper = "1.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The history of this one is a bit weird. Within the history of development, we see hyper = "0.14" added and removed.

Should this line be here in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, the history is a bit weird, but this line is fine. This change is needed regardless.

@@ -239,7 +259,7 @@ impl ManagerState {
let (sink, stream) = ws.split();
let (broad, sub) = watch_channel(());
entry.get_mut().comm = Some((sink, broad));
scheduler.add_stream(stream);
scheduler.attach_stream(stream.fuse());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like .fuse() showed up during the rebase. I tried removing it, and I get a type error in the front end code. I guess this is something I would have seen if I had compiled the front end before the rebase.

But the weird thing is that .fuse() is also called within troupe, in attach_stream_inner. Is this correct, and is it necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really more of an issue with troupe. I remember writing this function signature and was torn on this question. fuse adds (basically) no overhead, so it isn't a problem. You need to give attached_stream a fused stream because I want the caller to acknowledge that the stream is capable of terminating.

Internally, the second call to fuse can be gotten rid of. If you would like, you can open a PR on the troupe repo to address this.

@@ -64,18 +65,23 @@ pub enum GatheringHallMessage {
/// through message passing and tokio tasks.
#[derive(Debug)]
pub struct GatheringHall<P: ActorState<Message = PersistMessage>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again and thinking about it, it seems like we can just remove this type restriction. (That it requires a phantom should have been a sign). I'm going to push a change that removes it, lmk what you think.

You added it here:

f5a8189

self.process_new_onlooker(id, user, ws).await
if !self.process_new_onlooker(id, user, ws).await {
panic!("process_new_onlooker failed")
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now I have to decide on these panics I added. I think the right call is context specific. But I will say that if ignore the error instead, it won't be a regression (and as such maybe we can handle it in a separate PR). The question is about whether to handle a failure to send now that we have access to it (bool return value of send()), and whether panic is the right option.

process_new_onlooker gets or inits a gathering. Supposing we get one and it fails to send. Rather than panic, it seems like maybe what needs to happen is that these gatherings should be removed from the collection? Since they've panicked or shut down?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I will say that if ignore the error instead, it won't be a regression
This is correct. This function returns false when the user attempts to connect to a tournament that has already shutdown for whatever reason. This is not an unrecoverable error for the GatheringHall. A panic here means, amount other things, that all tournaments will fail to get new connections.

I think the best solution for now is to just ignore this error. I'm working to refactor most of the websocket management logic so that a user only ever has one WS open at a time. That will address most of the open questions around this problem.

sender.send(msg);
if !sender.send(msg) {
panic!("GatheringMessage::GetTournament failed")
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar, maybe a gathering panicked or shut down. But this is a failure to write to disk. That's pretty bad, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to what I was saying before. A panic for Permanent actors should be reserved for errors that are completely unrecoverable. If we failed to get a copy of the tournament, that's fine. The tournament just shutdown. That should not stop the entire tournament management system.

Copy link
Member

@TylerBloom TylerBloom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks mostly good. There are two panics in the GatheringHall that should be removed. After that, I'll happily merge this.

@TylerBloom
Copy link
Member

There is a conflict. I will resolve that error before merging.

@orblivion orblivion changed the title WIP - use troupe crate Use troupe crate Jan 8, 2024
@orblivion
Copy link
Contributor Author

Should I squash-rebase, or do you use the squash Github feature? I have a lot of junk commit messages.

* Don't panic so frivolously
* Remove type restriction on GatheringHall (we could have done this
  before the troupe crate IMO)
* Drop a couple drops that we no longer need thanks to troupe doing it
  for us
* Undo a weird commit that snuck in somehow via rebase
* Derive/Implement `Debug` for a few things that we now need
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants