Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After a crash, locked mutex remains #2278

Closed
sgillies opened this issue Feb 27, 2020 · 8 comments
Closed

After a crash, locked mutex remains #2278

sgillies opened this issue Feb 27, 2020 · 8 comments

Comments

@sgillies
Copy link
Contributor

Some background: rasterio/rasterio#1876.

After reproducing the crash reported by my user (which I'm debugging), I modified their program to not import fiona (shown below)

# import fiona  # note: with this line in, the user experiences crashes 
import rasterio as rio
from multiprocessing.pool import ThreadPool as Pool

def read(file):
    with rio.open(file, sharing=False) as src:
        print("starting to read", flush=True)
        src.read(window=((0, 5000), (0, 5000)))
        print("done reading", flush=True)

failing = "/vsicurl/https://storage.googleapis.com/temporary_eu_west_4/mostly_white.tif"

pool = Pool(4)
pool.map(read, [failing]*4)
pool.close()
pool.join()

and when I rerun the program, I see

CPLReleaseMutex: Error = 1 (Operation not permitted)
starting to read
starting to read
starting to read
starting to read
done reading
done reading
done reading
done reading

A mutex left behind strikes me as something we should fix in GDAL and maybe a clue to the problem. I'm new to pthreads, is the mutex represented by a file on disk? Is this the mutex around initialization of SSL in curl?

@rouault
Copy link
Member

rouault commented Feb 27, 2020

I'm new to pthreads, is the mutex represented by a file on disk?

No, a mutex is purely a in-memory data structure, used cooperatively by the user process (through pthread) and the kernel.

Is this the mutex around initialization of SSL in curl?

Who knows... There are many use of mutexes in GDAL...

"CPLReleaseMutex: Error = 1 (Operation not permitted)" is really intriguing. This shouldn't normally happen...

Seeing "from multiprocessing.pool import ThreadPool as Pool", I'm wondering which Python multiprocessing is uses: is it a Unix fork() only or thread creation or fork()+exec() (see https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods) ? The 2 later should be fine, but I'd suspect fork() only and mutex to be potentially buggy (by design. see https://brauner.github.io/2018/03/04/locking-in-shared-libraries.html), and potentially leading to such "CPLReleaseMutex: Error = 1 (Operation not permitted)"

@sgillies
Copy link
Contributor Author

I'll look into the python details. I don't think the thread pool from multiprocessing is the recommended approach anymore.

Meanwhile, the unreleasable mutex is persistent. I'm seeing it in new shells. Will a service restart be required or will I need to shutdown and restart?

@rouault
Copy link
Member

rouault commented Feb 27, 2020

the unreleasable mutex is persistent. I'm seeing it in new shells

Hum, can you explain what you mean ?

@sgillies
Copy link
Contributor Author

I closed the shell I was running the program in, where "CPLReleaseMutex: Error = 1 (Operation not permitted)" was reported, started a new shell, and ran the program again with the same error message.

@mihi314
Copy link

mihi314 commented Feb 27, 2020

Seeing "from multiprocessing.pool import ThreadPool as Pool", I'm wondering which Python multiprocessing is uses

While ThreadPool is located in the multiprocessing package, it is not actually doing multiprocessing, just regular mulithreading. The reason it probably exists is to give the same interface as multiprocessing.pool.Pool, for convenient switching between multiprocessing and multithreading.

@rouault
Copy link
Member

rouault commented Feb 27, 2020

For what is worth, the following works fine with latest GDAL master (multireadtest must be explicitly built with cd apps; make multireadtest)

multireadtest -t 8 -oi 10 -width 5000 -height 5000  /vsicurl/https://storage.googleapis.com/temporary_eu_west_4/mostly_white.tif --debug on

@sgillies
Copy link
Contributor Author

@rouault thanks, I'm going to close this up unless you think that GDAL should be able to recover from this kind of stuck mutex.

@sgillies
Copy link
Contributor Author

sgillies commented May 6, 2020

Update: I'm still seeing the mutex warning in a program that otherwise runs without problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants