Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Groups Config Gets Erased Randomly #116

Closed
gotbletu opened this issue Oct 12, 2020 · 7 comments
Closed

[BUG] Groups Config Gets Erased Randomly #116

gotbletu opened this issue Oct 12, 2020 · 7 comments
Assignees
Labels

Comments

@gotbletu
Copy link

Describe the bug

I can't really reproduce it as it happens randomly but it happens a few times already. The program would be working perfect for a few mins/hours/days then i would get the error message below. I have not touch any config files. Only been using it from the commandline. I do have a few pueue groups. which i think is the cause of the issue since if i compare the config files in the beginning and when i get the error is totally different. The groups gets delete from the configs somehow.

$ pueued -d && sleep 3 && pueue group --add fm && pueue group --add cl && pueue group --add dl && pueue group --add wb && pueue group --add vids && pueue group --add clips

NOT WORKING

Error: mapping values are not allowed in this context at line 13 column 13 in .config/pueue.yml

$ cat ~/.config/pueue.yml

---
client:
  daemon_port: "6924"
  secret: ROJRqR1K9nptpKl!4F%1
  read_local_logs: true
daemon:
  pueue_directory: /home/heoyea-ant/.local/share/pueue
  port: "6924"
  secret: ROJRqR1K9nptpKl!4F%1
  default_parallel_tasks: 1
  pause_on_failure: false
  callback: ~
  groups: {}: {}

Temporary Fix

  1. killall pueued
  2. rm ~/.config/pueue.yml
  3. pueued -d && sleep 3 && pueue group --add fm && pueue group --add cl && pueue group --add dl && pueue group --add wb && pueue group --add vids && pueue group --add clips

WORKING

$ cat ~/.config/pueue.yml

---
client:
  daemon_port: "6924"
  secret: "*Oimyf*M4YB#Jtg&Pj~l"
  read_local_logs: true
daemon:
  pueue_directory: /home/heoyea-ant/.local/share/pueue
  port: "6924"
  secret: "*Oimyf*M4YB#Jtg&Pj~l"
  default_parallel_tasks: 1
  pause_on_failure: false
  callback: ~
  groups:
    wb: 1
    dl: 1
    vids: 1
    fm: 1
    clips: 1
    cl: 1

Additional context

  • Kernel: 5.8.12-arch1-1 x86_64 bits: 64 Desktop: Xfce 4.14.2 Distro: Antergos Linux
  • Pueue client 0.7.2
@Nukesor
Copy link
Owner

Nukesor commented Oct 12, 2020

Well, this doesn't look good at all.

This is weird on multiple levels, especially because the groups key in your config is malformed.

groups: {}: {} shouldn never happen.
groups is a hashmap/dictionary, which means it should be groups: {} in an empty state.

I don't see how this can happen, especially since Rust has a very strict type system and group is of type Hashmap<String, usize> and not Hashmap<String, Hashmap<_,_>

The only thing I can currently come up with is, that two different threads of pueue try to write the config file at the same time.
But this wouldn't deterministically result in the same broken configuration file every time.

I'll try to reproduce this bug in the next few days.

@gotbletu
Copy link
Author

you might be right about writing to configs at the same time

not sure if this is related but if i execute the kill and remove command real fast, the daemon will crash

Steps To Reproduce

1. pueued -d && sleep 3 && pueue group --add fm && pueue group --add cl && pueue group --add dl && pueue group --add wb && pueue group --add vids && pueue group --add clips
2. pueue add --group vids 'mpv https://www.youtube.com/watch?v=jyCF9x1R_lw'
3. pueue add --group vids 'mpv https://www.youtube.com/watch?v=QCwqNWPaExQ'
4. pueue kill 1 && pueue remove 1
5. pueue kill 2 && pueue remove 2

$ pueue status

thread 'async-std/runtime' panicked at 'called `Result::unwrap()` on an `Err` value: "PoisonError { inner: .. }"', daemon/socket.rs:61:34
Error: Connection reset by peer (os error 104)

if i use sleep command then is working fine as normal

4. pueue kill 1 && sleep 1 && pueue remove 1
5. pueue kill 2 && sleep 1 && pueue remove 2

@Nukesor
Copy link
Owner

Nukesor commented Oct 13, 2020

That's actually another problem, which comes from me seemingly not properly using mutexes...

I'm not exactly sure how this can happen, but somehow this manages to poison the mutex on the shared state (shared memory between threads).

Damn. I'm really having trouble to reproduce those issues. No matter what I'm trying, it just won't break :D

I'll try to continue debugging in the near future. Unfortunately, I don't have a lot of time lately.

@gotbletu
Copy link
Author

no problems take your time. lols I will try to find more bugs for you to fix when you come back

@Nukesor
Copy link
Owner

Nukesor commented Oct 17, 2020

The second problem (The PoisonError) should now be a lot easier to debug. Though, I'm still not able to reproduce the panic that lead to your scenario, I was able to reproduce the PoisonError in general and managed to handle it in such a way, that pueued actually shuts down when this happens again, instead of continuing while being broken.

I created a new issue #119 to track this bug.

In case you want to help out and install pueued with the newest master branch:

There are some breaking changes in the planned 0.8.0 version.

  1. The configuration files will be moved to .config/pueue/... instead of being placed directly in .config.
  2. The configuration file layout will be changed and a shared section will be introduced for configuration variables shared between the daemon and the client.

But besides that, you'll be finally able to use unix-sockets instead of tcp-sockets :)

@Nukesor
Copy link
Owner

Nukesor commented Oct 18, 2020

Ok. Here's what I got on the actual bug of this issue.

The daemon requires the mutex to be aquired for being able to save the config file.
This means, that the daemon cannot save the config file concurrently. It'll always be sequential.
On top of this, the daemon only saves the config file on these occasions:

  • On Startup
  • While adding a group
  • While revoving a group
  • When changing the amount of parallel tasks for any group or the default queue.

However, the client saves the settings file every time it starts up!

One possible scenario I came up with is:

  1. First client sends an, for instance, parallel command to the daemon.
  2. Second client get's invoked at a slightly later time.
  3. Daemon changes the configs due to changes.
  4. Second client saves the configs at the exact same time.

I'm not too familiar with how writing and exclusive acces are normally handled in Unix.
Anyhow, this might be a possible reason why this may break? (At least if I didn't get anything fundamentally wrong):

The client might read the config file at the exact same time the daemon is writing it.
This might result in the client reading an unifinished config file, filling the rest with default values and then writing this config back in place?
Still, this reasoning could be completely wrong and be already mitigated by normal unix file handling procedures.

However, stuff like this will be rendered impossible with commit 2432674. This prevents the client from ever touching the config.
All config changes will only be made by the daemon, which is only capable of persisting the config file sequentially.

On top of this, commit 02e9231 removes the default behavior of always writing the configuration file at daemon startup. With the new behavior, this will only happen, if there's actually a change to the configuration, i.e. missing values which can be patched with default values.

I think, that this should remove a LOT of error potential and unnecessary writes to the configuration files.
I also guess that this removed the bug, even though I couldn't exactly reproduce it.

If the bug is still there, I would suspect an upstream bug in the serde-yml crate.

@gotbletu
Copy link
Author

for sure ill give it a try in a few days

@Nukesor Nukesor closed this as completed Oct 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants