Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty lockfile, likely previous process crashed or storage medium failure; treating as stale #232

Closed
cattyhouse opened this issue May 17, 2023 · 34 comments
Labels
bug Something isn't working

Comments

@cattyhouse
Copy link

since 79babff , caddy produces lots of logs (32285 lines so far and keep increasing) like: imgk/caddy-trojan#55 (comment)

in above case the plugin caddy-trojan does not create .lock file at all.

Suggestion:

maybe certmagic checks if .lock file exists before checking io.EOF?

@mholt
Copy link
Member

mholt commented May 17, 2023

Never heard of something like this...

Are you sure the .lock files don't ever exist? Because those files tend to get cleaned up pretty quick, or at least they should.

maybe certmagic checks if .lock file exists before checking io.EOF?

It is doing that already; just a few lines earlier:

79babff#diff-bb094528ba8793b4ce43d063c43a23b31ca8aefbc412cbbe8dc2188b21c36022R171-R175

If the file does not exist, os.Open() would return an error and it wouldn't even try reading the file. So the OS is telling CertMagic that the file does in fact exist...

I'd like more details on what's actually on the disk, the contents of the files, and preferably how to reproduce this.

@cattyhouse
Copy link
Author

Matt, thanks for your quick reply, here is the info

files and content related to the .lock issue

  • sudo tree -a /home/caddy
/home/caddy
├── .bash_logout
├── .bash_profile
├── .bashrc
├── .config
│   └── caddy
│       └── autosave.json
├── .local
│   └── share
│       └── caddy
│           ├── acme
│           │   └── acme-v02.api.letsencrypt.org-directory
│           │       ├── challenge_tokens
│           │       └── users
│           │           └── hide_email@gmail.com
│           │               ├── hide_email.json
│           │               └── hide_email.key
│           ├── certificates
│           │   └── acme-v02.api.letsencrypt.org-directory
│           │       └── hide_domain
│           │           ├── hide_domain.crt
│           │           ├── hide_domain.json
│           │           └── hide_domain.key
│           ├── locks
│           ├── ocsp
│           │   ├── hide_domain-xxx
│           │   └── hide_domain-xxx
│           └── trojan
│               └── hide_uuid
└── .step
  • sudo cat /home/caddy/.local/share/caddy/trojan/hide_uuid
{"up":33476150381,"down":309475170011}%

@cattyhouse
Copy link
Author

btw, as i mentioned in the caddy-trojan issue:

the log produced by certmagic can't be disabled because it does not have a logger ( namespace field), thus not possible to exclude

@mholt
Copy link
Member

mholt commented May 17, 2023

I think in order to give you any answers and, if necessary, produce a patch, I'll need to be able to reproduce the behavior in the most minimal way possible.

The full and unredacted logs could be a good start, that should at least give some insight into what's going on. Make sure to enable debug-level logging (will be very noisy) and don't redact any information, as that defeats the purpose and I can't help you then. Feel free to attach the logs to a reply here.

@cattyhouse
Copy link
Author

I think in order to give you any answers and, if necessary, produce a patch, I'll need to be able to reproduce the behavior in the most minimal way possible.

this plugin provides a trojan protocol server, it is must be used with a trojan protocol client ( it will be used as proxy like let's say, socks5 ), it is not a common use case, but here is the steps of minimal reproducing (although it is already well written on the project page), i'll repeat here :

  • build caddy with and only with caddy-trojan , using xcaddy

  • create a Caddyfile as mentioned on the project page.

  • start caddy

  • connect the caddy-trojan server with a trojan protocol client, and actually use it,( e.g. watch a youtube video via the trojan protocol), if not using it, there is no such log produced

  • most importantly, i think:

    • the file /home/caddy/.local/share/caddy/trojan/hide_uuid is used by caddy-trojan to update upload/download traffics periodically, as you may see from the content of that file i pasted above
    • before updating the file, it creates a .lock file?
    • and the content of that .lock file is simply empty?
    • that's why we see Empty lockfile logs?
    • those question marks are my guess, as i am not the developer.

i'll try to get the debug log later.

thanks for your help.

@cattyhouse
Copy link
Author

i have another question, as you mentioned, there must be a lock file created according to the checks in 79babff#diff-bb094528ba8793b4ce43d063c43a23b31ca8aefbc412cbbe8dc2188b21c36022R171-R175

i agree with that, but why does certmagic care if it is empty or not? i mean, that .lock file created in locks dir ( and it is related to /home/caddy/.local/share/caddy/trojan/hide_uuid, because they have the same ID in file name) already have a uniq name, does it matter if that file is empty or has some kind of content?

@francislavoie
Copy link
Member

Lock files created by the filesystem storage driver

meta := lockMeta{
contain JSON which encodes the created and updated times of the files. This is done because not all filesystems support keeping track of both those timestamps, so storing it in the file is a more reliable way to ensure we can track how long a lock has been open.

@cattyhouse
Copy link
Author

cattyhouse commented May 17, 2023

interesting findings:

sometimes the .lock file has content, sometimes it was deleted so fast that even [[ -f $lockfile ]] && cat $lockfile has no time to track it, because cat only run if -f tests true. , if cat report no such file, that means the file was deleted instantly.

  • a loop to monitor the .lock file content :
lockfile=/home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock
# there is no sleep what so ever between -f test and cat and while loop
while true ; do [[ -f $lockfile ]] && cat $lockfile ; done
  • And output:
{"created":"2023-05-17T14:32:24.582528408+08:00","updated":"2023-05-17T14:32:24.582528408+08:00"}
{"created":"2023-05-17T14:32:24.582528408+08:00","updated":"2023-05-17T14:32:24.582528408+08:00"}
{"created":"2023-05-17T14:32:24.582528408+08:00","updated":"2023-05-17T14:32:24.582528408+08:00"}
{"created":"2023-05-17T14:32:24.582528408+08:00","updated":"2023-05-17T14:32:24.582528408+08:00"}
{"created":"2023-05-17T14:32:24.582528408+08:00","updated":"2023-05-17T14:32:24.582528408+08:00"}
{"created":"2023-05-17T14:32:24.582528408+08:00","updated":"2023-05-17T14:32:24.582528408+08:00"}
{"created":"2023-05-17T14:32:24.582528408+08:00","updated":"2023-05-17T14:32:24.582528408+08:00"}
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
{"created":"2023-05-17T14:33:04.230748973+08:00","updated":"2023-05-17T14:33:04.230748973+08:00"}
{"created":"2023-05-17T14:33:04.67879331+08:00","updated":"2023-05-17T14:33:04.67879331+08:00"}
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory

@mholt
Copy link
Member

mholt commented May 17, 2023

Yeah the lockfiles don't often stay around for long.

I wonder if the file is removed after opening the file but before reading its contents. That would return an empty file maybe? Is that something you could test on your system? Create a file with some contents, then close it. Then open it, delete it, then read from it.

@cattyhouse
Copy link
Author

like this?

echo hi > hi ; cat hi ; rm hi ; cat hi

@mholt
Copy link
Member

mholt commented May 18, 2023

No, you'll have to open the file for reading, then delete it, then read from it. Running cat twice will open, read, then close the file twice.

@cattyhouse
Copy link
Author

No, you'll have to open the file for reading, then delete it, then read from it. Running cat twice will open, read, then close the file twice.

that's easy to simulate with a program language like Go, but not easy to to with existing Linux userland tools, i guess?

@cattyhouse
Copy link
Author

the file system is ext4 with these features:

tune2fs -l /dev/sda2 | grep -i features

Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg metadata_csum_seed sparse_super large_file huge_file dir_nlink extra_isize metadata_csum

@mholt
Copy link
Member

mholt commented May 18, 2023

Hmm, yeah, I don't know how you'd do it with just a shell. There might be something fancy you can do with bash but I don't know it.

Here's a simple Go program you can use to test this though:

package main

import (
	"fmt"
	"io"
	"log"
	"os"
)

func main() {
	const filename = "foo.txt"

	f, err := os.Open(filename)
	if err != nil {
		log.Fatal(err)
	}

	if err = os.Remove(filename); err != nil {
		log.Fatal(err)
	}

	contents, err := io.ReadAll(f)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println(string(contents))

	if err := f.Close(); err != nil {
		log.Fatal(err)
	}
}

First create a file called "foo.txt" (if you use a different filename, just update the program as well) with something, like: echo "Hello world" > foo.txt

Then run that program: go run main.go

See what happens?

@cattyhouse
Copy link
Author

  • build it on macOS (where i have go installed)
 CGO_ENABLED=0 GOARCH=amd64 GOOS=linux go build -trimpath -ldflags="-s -w -buildid=" caddytest.go
  • copy to remote x86_64 (where caddy server runs)

    • echo "Hello world" > foo.txt

    • ./caddytest | cat -A (use cat -A to show everything) , output:

      Hello world$
      $
      
    • check foo.txt, it's gone

@mholt
Copy link
Member

mholt commented May 18, 2023

Ok, so that shows that when a file is deleted while opened it can still read the contents, i.e. it's not empty.

There must be some other explanation.

@cattyhouse
Copy link
Author

cattyhouse commented May 18, 2023

another loop test

while true ; do
    [[ -e $lockfile ]] && {
      echo "=========================="
      ls -la $lockfile
      cat $lockfile
    }
done

sometimes see this:

# ls sees the file, cat CAN NOT see it
==========================
-rw-r--r-- 1 caddy caddy 98 May 18 11:07 /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory

# both ls and cat CAN NOT see it
==========================
ls: cannot access '/home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock': No such file or directory
cat: /home/caddy/.local/share/caddy/locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock: No such file or directory

may be the time window between os.Remove and io.ReadAll is too small?

@cattyhouse
Copy link
Author

Ok, so that shows that when a file is deleted while opened it can still read the contents, i.e. it's not empty.

There must be some other explanation.

        if err = os.Remove(filename); err != nil {
		log.Fatal(err)
	}

	contents, err := io.ReadAll(f)
	if err != nil {
		log.Fatal(err)
	}

in this case, it remove then read. what about read first then remove?

@mholt
Copy link
Member

mholt commented May 19, 2023

We call Sync() before closing the file, which should guarantee by the OS that the contents have been written to the file (i.e. non-empty).

I would expect that any read after that, if the file was successfully opened, should read data.

However, at one point, in updateLockfileFreshness(), we do run this code:

	// truncate file and reset I/O offset to beginning
	if err := f.Truncate(0); err != nil {
		return true, err
	}
	if _, err := f.Seek(0, io.SeekStart); err != nil {
		return true, err
	}

This is done so that we can replace the file contents. The Seek() should only affect the ephemeral file handle, so I don't think that's involved. But maybe Truncate() is persisted immediately and so it's between Truncate() and Write() that ReadAll() happens.

@cattyhouse
Copy link
Author

there is a loop (caddy-trojan project) here too to update the traffic to the file
trojan/fb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950

which will involve creating a lock file
locks/trojanfb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950.lock for each connection.

i can make a patch to stop it from updating the traffic (caddy-trojan project):

diff --git a/listener/listener.go b/listener/listener.go
index b8116c2..88efe02 100644
--- a/listener/listener.go
+++ b/listener/listener.go
@@ -192,11 +192,11 @@ func (l *Listener) loop() {
                lg.Info(fmt.Sprintf("handle trojan net.Conn from %v", c.RemoteAddr()))
            }

-           nr, nw, err := l.Proxy.Handle(io.Reader(c), io.Writer(c))
+           _, _, err := l.Proxy.Handle(io.Reader(c), io.Writer(c))
            if err != nil {
                lg.Error(fmt.Sprintf("handle net.Conn error: %v", err))
            }
-           up.Consume(utils.ByteSliceToString(b[:trojan.HeaderLen]), nr, nw)
+           // up.Consume(utils.ByteSliceToString(b[:trojan.HeaderLen]), nr, nw)
        }(conn, l.Logger, l.Upstream)
    }
 }

and now the content of trojan/fb3936685511492c06bb12b60cdfb40968e1072ced30b5244b860950 is always :

{"up":0,"down":0}

and of course, caddy no longer output stuff like Empty lockfile, is stale ... removing then retrying

@cattyhouse
Copy link
Author

cattyhouse commented May 19, 2023

do you think there is a race condition between the loop in caddy-trojan and the loop of updateLockfileFreshness() ?

@mholt
Copy link
Member

mholt commented May 19, 2023

Seems likely.

I'm not sure how to replace the contents of a file atomically. I thought opening a file for writing would cause other readers to block while the writer was open, but I guess not...?

@mholt mholt added bug Something isn't working help wanted Extra attention is needed labels May 30, 2023
@mholt mholt closed this as completed in 321ed64 Jun 15, 2023
@mholt
Copy link
Member

mholt commented Jun 15, 2023

@cattyhouse Would you mind trying with 321ed64? I'm guessing that if it's simply reading the lock file while it's still being truncated/written (not sure why that's allowed by the FS, but whatever) and thus seeing a size of 0, retrying after a brief delay should resolve the problem because by that time, the file should be fully written. If it truly is stale/abandoned, we still detect that, but it's after ~200ms instead of right away.

@mholt mholt removed the help wanted Extra attention is needed label Jun 15, 2023
@cattyhouse
Copy link
Author

@mholt

thanks, i am building with this options:

CGO_ENABLED=0 \
GOARCH=amd64 \
GOOS=linux \
XCADDY_GO_BUILD_FLAGS="-ldflags '-s -w'" \
xcaddy build master \
--output caddy@master_with-certmagic@master_with-caddy-trojan@master \
--with github.com/imgk/caddy-trojan@master \
--with github.com/caddyserver/certmagic@master

it looks like the version is correct

./caddy\@master_with-certmagic\@master_with-caddy-trojan\@master build-info | grep -e certmagic -e trojan:

dep     github.com/caddyserver/certmagic        v0.18.1-0.20230615213225-321ed6491253   h1:don6uLKNemTG3fQXq14KBJSJ0iOECS1s6rb6aLKj6NU=
dep     github.com/imgk/caddy-trojan    v0.0.0-20230413030132-1eea7af93ca2      h1:OS/45+X038/rGHQXnlEkfm5I/SiDd+jXN4eWVOOSWAM=

i am gonna run it and report back in one hour

@cattyhouse
Copy link
Author

cattyhouse commented Jun 16, 2023

@mholt looks like the issue remains. do i need to update the certmagic version in caddy-trojan plugin here too?

@cattyhouse
Copy link
Author

@mholt looks like the issue is still there after using certmagic master and also update the certmagic to v0.18.1-0.20230615213225-321ed6491253 in caddy-trojan plugin.

  • caddy build-info | grep -e certmagic -e trojan -e 'caddy/v2'
dep     github.com/caddyserver/caddy/v2 v2.7.0-beta.1.0.20230615231855-3b19aa2b5a77     h1:PfPoowkj+TipJZxwZp+Dh7924bBhhjsWr5pfFeNAv3Y=
dep     github.com/caddyserver/certmagic        v0.18.1-0.20230615213225-321ed6491253   h1:don6uLKNemTG3fQXq14KBJSJ0iOECS1s6rb6aLKj6NU=
dep     github.com/imgk/caddy-trojan    v0.0.0-20221206043256-2631719e16c8
=>      /Users/user/make/caddy-trojan     (devel)

@mholt
Copy link
Member

mholt commented Jun 16, 2023

@cattyhouse Ahh, dangit.

Thank you for testing, and figuring that out; this is super helpful.

So that means the file is still empty (but still exists) after 200ms...? Seems extremely unlikely. Either the storage medium is very slow, or something is lying.

I mean, would we want to wait even longer? Right now we're waiting 100ms and trying up to 2 times (so 200ms). Should we be trying up to 5 times (500ms)? 10 (1s)? That seems ludicrous though. We're even closing and re-opening the file, so if the file is deleted in the meantime it shouldn't re-open successfully. That seems like a broken file system, but I can't believe ext4 would be broken like that. Is something else happening? 🤔

@cattyhouse
Copy link
Author

let me do some experiment by changing if emptyCount < 2 to if emptyCount < 10, i'll be right back.

@mholt
Copy link
Member

mholt commented Jun 16, 2023

(THANK YOU, by the way, for going to the effort to help try my ideas. I hope we can figure this out soon. But I'll be honest I'm running out of ideas.)

@cattyhouse
Copy link
Author

cattyhouse commented Jun 16, 2023

build steps

  • certmagic
git clone certmagic

# in filestorage.go
# change "if emptyCount < 2" to "if emptyCount < 10"
  • caddy-trojan plugin

EDIT: this step is not needed : as long as we build a new certmagic into caddy, it is the only option caddy-trojan could use, disregarding whatever version in it's go.mod

git clone caddy-trojan
git switch master
go mod edit -replace github.com/caddyserver/certmagic=../certmagic
go mod tidy
  • caddy
CGO_ENABLED=0 \
GOARCH=amd64 \
GOOS=linux \
XCADDY_GO_BUILD_FLAGS="-ldflags '-s -w'" \
xcaddy build master --output caddy-master_certmagic-master_caddy-trojan-master \
--with github.com/imgk/caddy-trojan@master \
--with github.com/caddyserver/certmagic=./certmagic
  • caddy build-info

caddy build-info | grep -E 'certmagic|trojan|caddy/v2'
dep     github.com/caddyserver/caddy/v2 v2.7.0-beta.1.0.20230615231855-3b19aa2b5a77     h1:PfPoowkj+TipJZxwZp+Dh7924bBhhjsWr5pfFeNAv3Y=
dep     github.com/caddyserver/certmagic        v0.17.3-0.20230511183644-8728b186fa68
=>      /Users/user/make/certmagic        (devel)
dep     github.com/imgk/caddy-trojan    v0.0.0-20221206043256-2631719e16c8
=>      /Users/user/make/caddy-trojan     (devel)

  • result

    • Empty lockfile (EOF) - likely previous process crashed or storage medium failure; treating as stale and is stale .... ; removing then retrying: GONE
    • but i got one line of {"level":"info","ts":1686894511.2648416,"msg":"[ERROR] Keeping lock file fresh: unexpected end of JSON input - terminating lock maintenance (lockfile: /home/caddy/.local/share/caddy/locks/trojan__HIDEHASH__.lock)"} this is new, but only one line for now.
    • caddy-trojan traffic update is working fine.
  • conclusion

looks like the brute 10 tries of total 1000ms works

@cattyhouse
Copy link
Author

btw, can you backport this patch (10 tries) to v1.7.2 also? as this is the version caddy v2.6.4 and caddy-trojan uses. (i am not sure if it is possible to patch a tag though)

@mholt
Copy link
Member

mholt commented Jun 16, 2023

@cattyhouse Wow, that's really interesting. I am glad we seemed to have made progress here! Thank you for troubleshooting/debugging so thoroughly.

I'm going to change the delay interval to 250ms and set the retry count to 6.

Question: Any idea why your storage backend is so slow? Are you using an SSD or an HDD? It's taking about a second to create and write a file. That's 10x longer than I would have guessed.

It does seem you are using the locks in a very busy part of the code (for every request, I think?) and the locks are very very short-lived. So there's lots of writes to disk. I could see why that's potentially slowing things down. Are you actually needing this to coordinate multiple instances in a cluster? Or is this all for a single machine? If it's just the same instance of Caddy then why not use a regular sync.Mutex...

Anyway just wanted to get some more info so I can be confident with the patch. :) Thanks.

but i got one line of {"level":"info","ts":1686894511.2648416,"msg":"[ERROR] Keeping lock file fresh: unexpected end of JSON input - terminating lock maintenance (lockfile: /home/caddy/.local/share/caddy/locks/trojan__HIDEHASH__.lock)"} this is new, but only one line for now.

That is because that function, updateLockfileFreshness(), also opens and reads the file. This only executes once every 5 seconds. So that implies that in this case it took your disk over 5 seconds between creating the file and writing it -- yikes! We can put a retry loop in there but I don't think that makes any sense since the function already only runs after 5 whole seconds.

(i am not sure if it is possible to patch a tag though)

Unfortunately tags are immutable (with respect to Go modules). This means we can't change a tag. I'll get it into the next Caddy release though: 2.7 beta 2.

@cattyhouse
Copy link
Author

@mholt

thanks for all the help!

  1. change the delay interval to 250ms and set the retry count to 6 , i bet this is even better
  2. according to the VPS provider, the host provides SSD: 20 GB RAID-10 to my guest, which has debian 12 installed. i think it shouldn't be considered slow.
  3. according to the code here, it updates the traffic record file on every request? i am not the developer but rather a user, and i don't know a thing about Go, all are my guess based on my shell script experience. But i could notice sync.Mutex to the author with an issue.
  4. it is a single machine running one instance of caddy on port 80 and 443 providing a trojan-protocol proxy and file server.

@mholt
Copy link
Member

mholt commented Jun 16, 2023

Alright, gotcha gotcha.

(I decided to change the count to 8; that's two seconds total.)

according to the VPS provider, the host provides SSD: 20 GB RAID-10 to my guest, which has debian 12 installed. i think it shouldn't be considered slow.

Wow, well there must be some bottleneck, I wonder if it's the RAID, or maybe just shared hardware in general.

according to the code here, it updates the traffic record file on every request? i am not the developer but rather a user, and i don't know a thing about Go, all are my guess based on my shell script experience. But i could notice sync.Mutex to the author with an issue.

I didn't realize you aren't the author, so now I really extra-appreciate your willingness to help investigate the code. :) Yes, I would highly recommend just using sync.Mutex unless shared synchronization across Caddy instances is necessary. The locking mechanism we use is only designed for longer-running operations like certificate management.

it is a single machine running one instance of caddy on port 80 and 443 providing a trojan-protocol proxy and file server.

For in-process locking, definitely just use sync.Mutex.

Either way, glad we are able to have figured out a workaround, even if it's not absolutely great. I'm still bewildered that the FS is letting readers read while it is writing the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants