-
-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread panics on read operations of FST set file #57
Comments
Trying to use
|
I can provide a sample, broken file (and a sample of a functional file) if you like. |
Interesting bug! Yeah I definitely need a sample to reproduce. Could you also show output with Also, the Python interpreter should not be segfaulting. The ffi layer should be catching panics and converting them to aborts. Otherwise it is UB. |
backtrace:
|
Working on getting the files somewhere. |
@BurntSushi : here are the files: https://drive.google.com/file/d/1xs9NSIEU2yEDoUg0Vl2qEtL06hOvo9f1/view?usp=sharing . Let me know when you get them so I can unshare them. The data in them should be anonymized to not be a problem, but would like to limit exposure. |
@davidblewett Thanks! I downloaded it. Won't be able to look into this until a bit later, hopefully today, but no promises. Also, in the future, if it's a concern, you can email me files if they are small enough. jamslam@gmail.com Out of curiosity (in case it becomes relevant), how did you build these files? Did you use |
Yes,
Steps 1-4 are actually in Ruby; 5 in Python. |
@BurntSushi could it be step 2, both reading from |
@davidblewett I don't think so. Is it possible to provide the original array of strings that produced one of these FSTs? That would help debugging on my end. |
@BurntSushi : Unfortunately, it would be fairly involved to try to track that down. I don't have the telemetry for what files were combined in step 5 above. I'm going to purge the files that are exhibiting this behavior, and resume the aggregation process. If it occurs again, I might be able to track that down. |
@davidblewett No worries, thanks! I'll see what I can do with what I have. :) |
I've added some telemetry to our process that should allow me to reconstruct the input data. Will let you know if I see it happen again. |
@BurntSushi : after letting this run for a few days, I have not been able to reproduce the error. It's possible that the sample file here hadn't been completely finalized, and was accidentally included in a globbing expression to load finished files. |
@BurntSushi this has raised it's ugly head again. I'm pretty sure it's a sequence of events in my larger application that ends up writing out corrupt FST data. However, would you be opposed to using Alternatively, perhaps some form of check method could be added that can validate the structure of a given FST file? |
@davidblewett Hmmm... So I haven't had a chance to look into this. There's unfortunately a large amount of context switching overhead required to dive into fst internals and debug this kind of thing. I think using A check method or at least some kind of optional checksum seems like a better path to me. In theory, a full check method wouldn't be too bad, but that's only if you use the existing code that deserializes the FST, which I think is the actual problem here, so that doesn't help. A checksum would probably work well to weed out corrupt FST files, but would not fix any issues arising from a non-corrupt but incorrect FST file (e.g., if the builder is producing incorrect data as opposed to some external thing preventing a complete FST from being written). I'm not too sure what I'll have time for in the short term here unfortunately. How severe would you rate this bug? |
The incidince rate has gone down in the last few days. I like the idea of a checksum, but how would you verify it? Could it be something that is always the "longest" chain in that specific FST? |
@davidblewett Naively, a checksum would be written at the end of the FST, and it would correspond to a crc32c sum of all previous bytes in the FST. If they don't line up, then you have pretty high confidence that the FST has been corrupted somehow. The checksum would not however help you if the panics you're seeing are a result of a bug in the FST builder itself. |
@BurntSushi I believe I discovered the root of the issue. I don't think the FST files were corrupt, it was due to doing multiple operations in different threads on the same handle. Since the Python binding is basically C, those actions don't trigger any warnings. In pure Rust, are the Set structs not re-entrant? We have a few hundred thousand FST segments, so need to be careful to not go over the operating system limit on the number of open mmap'd files. If we spawned new Set instances in different threads, we could easily go over that limit if we had multiple concurrent requests for a customer that had tens of thousands of segments. |
@davidblewett The FST sets themselves are purely immutable and can be inspected from multiple threads/processes simultaneously without issue. However, the streams produced by FSTs require mutable access and are themselves not legal to access from multiple threads simultaneously without explicit synchronization. Of course, you can have as many streams operating simultaneously as you want. You just need to make sure you're only accessing each stream from one thread. |
Perhaps I discover your problem ...? In a personal code in Rust, I write an fst coming from a sorted list of strings. Then I tryied to open my fst with the fst-bin and use a regex on it. But I get the almost same error : thread 'main' panicked at 'index out of bounds: the len is 267524517 but the index is 148671801088665313', (...)github.com-1ecc6299db9ec823/fst-0.3.3/src/raw/node.rs:307:17 Then I realise there was an error on my code writting the fst. I used
If you don't "finish()", the fst file is a little bit smaller than what it must be, and at opening it, it give this error given. PS: correcting my code, create a correct fst file and the problem doesn't come again. |
I'm going to close this issue because it seems like there isn't a problem with FST reading/writing itself. Happy to dig into this more with a better reproduction. |
I'm seeing a very small percentage of "corrupt" FST set files that are triggering panics in Rust (leading to a Python interpreter segfault). The errors look like:
This occurs on approximately 13 out of 4635 files, ranging in size from 20MB to > 100MB. I have not been able to narrow things down past this, but wanted to know what might cause this?
I'm shelling out to the
fst-bin
crate to combine multiple input files into larger files, then doing set operations on the merged output. Thefst
binary was built on Rust nightly; I'm not sure of the exact version at the moment.The text was updated successfully, but these errors were encountered: