Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiresolver: crash when used with cache #2694

Open
markus2330 opened this issue May 13, 2019 · 11 comments

Comments

Projects
None yet
2 participants
@markus2330
Copy link
Contributor

commented May 13, 2019

src/plugins/resolver/resolver.c:1175 seems to cause a crash in some situations.

Tried to reproduce. Does not work:

# create a new folder to not mess up with existing data
mkdir x
cd x

# create two mountpoints
kdb mount `pwd`/csv system/tests/csv/lists/cur csvstorage header=colname,columns/index=student/id
kdb mount -R multifile -c storage="ini",pattern="*/*",resolver="resolver" `pwd`/multi system/tests/multi

# create a csv file
echo "student/id,ue/5/kreuzerl" >> csv
echo "01234567,X" >> csv

# create a multiresolver directory
mkdir -p multi/pool
cd multi/pool
echo "[]" >> 01234567 >> 01234568 >> 01234569
echo "[student]" >> 01234567 >> 01234568 >> 01234569
echo "id = 01234567" >> 01234567
echo "[ue/5]" >> 01234567

 # create caches
kdb ls system/tests > /dev/null
kdb ls system/tests/multi/pool > /dev/null
kdb ls system/tests/csv/lists/cur > /dev/null

# now do something directly on the files
rm 01234569
touch 01234566
echo "kreuzerl = O" >> 01234567
echo "[something]" >> 01234567 >> 01234568
echo ""  >> 01234567 >> 01234568
echo ""  >> 01234568

# trigger
kdb cp -rf system/tests/csv/lists/cur system/tests/multi/pool

# debug
kdb export system/tests mini
tail *

kdb umount system/tests/csv/lists/cur
kdb umount system/tests/multi
cd ../../..
rm -r x

Did not work:

# create a new folder to not mess up with existing data
mkdir x
cd x

# create two mountpoints
kdb mount `pwd`/csv system/tests/csv csvstorage header=colname,columns/index=sec/somekey 
kdb mount -R multifile -c storage="ini",pattern="*",resolver="resolver" `pwd`/multi system/tests/multi

# create a csv file
echo "sec/somekey,othersec/deep/otherkey" >> csv
echo "a,data2a" >> csv
# echo "b,data2b" >> csv # but do not write in the other

# create a multiresolver directory
mkdir multi
cd multi
echo "[sec]" >> a >> b
echo "somekey = a" >> a
echo "somekey = b" >> b
echo "" >> a >> b
echo "" >> a >> b

kdb cp -rf system/tests/csv system/tests/multi

kdb umount system/tests/csv
kdb umount system/tests/multi

The problem seems to be filename=0x2 <error: Cannot access memory at address 0x2>, most likely set wrongly by the multiresolver?

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f902d39542a in __GI_abort () at abort.c:89
#2  0x0000564e84a716fc in catchSignal (signum=<optimized out>) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/tools/kdb/main.cpp:110
#3  <signal handler called>
#4  strlen () at ../sysdeps/x86_64/strlen.S:106
#5  0x00007f902d3a9da8 in _IO_vfprintf_internal (s=s@entry=0x7fffc1371d80, format=<optimized out>, format@entry=0x7f902c448ddd "the file \"%s\" because of \"%s\"", ap=ap@entry=0x7fffc1371f48) at vfprintf.c:1637
#6  0x00007f902d457cf6 in ___vsnprintf_chk (s=0x564e85b9e3b0 "the file \"o-\220\177", maxlen=<optimized out>, maxlen@entry=512, flags=flags@entry=1, slen=slen@entry=18446744073709551615, 
    format=format@entry=0x7f902c448ddd "the file \"%s\" because of \"%s\"", args=args@entry=0x7fffc1371f48) at vsnprintf_chk.c:63
#7  0x00007f902dc9cd34 in vsnprintf (__ap=0x7fffc1371f48, __fmt=0x7f902c448ddd "the file \"%s\" because of \"%s\"", __n=512, __s=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/stdio2.h:77
#8  elektraVFormat (format=format@entry=0x7f902c448ddd "the file \"%s\" because of \"%s\"", arg_list=arg_list@entry=0x7fffc1371f48)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/libs/elektra/internal.c:430
#9  0x00007f902c43f3bb in elektraAddWarningf36 (warningKey=warningKey@entry=0x564e85b09330, reason=0x7f902c448ddd "the file \"%s\" because of \"%s\"", 
    file=0x7f902c448578 "/home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/plugins/resolver/resolver.c", line=0x7f902c448dd8 "1175", line=0x7f902c448dd8 "1175", 
    file=0x7f902c448578 "/home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/plugins/resolver/resolver.c", reason=0x7f902c448ddd "the file \"%s\" because of \"%s\"")
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/obj-x86_64-linux-gnu/src/include/kdberrors.h:3458
#10 0x00007f902c43f4a4 in elektraUnlinkFile (filename=0x2 <error: Cannot access memory at address 0x2>, parentKey=parentKey@entry=0x564e85b09330)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/plugins/resolver/resolver.c:1175
#11 0x00007f902c440cc5 in libelektra_resolver_fm_hpu_b_fm_hpu_b_LTX_elektraPluginerror (handle=<optimized out>, r=<optimized out>, parentKey=0x564e85b09330)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/plugins/resolver/resolver.c:1191
#12 0x00007f9029b7cb2d in elektraMultifileError (handle=0x564e85a68a10, returned=0x564e85b80490, parentKey=0x564e85b09330)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/plugins/multifile/multifile.c:900
#13 0x00007f902deb2bf6 in elektraSetRollback (parentKey=0x564e85b09330, split=0x564e85b7b520) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/libs/elektra/kdb.c:1331
#14 kdbSet (handle=0x564e85a3dce0, ks=0x564e85b0c0c0, parentKey=0x564e85b09330) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/libs/elektra/kdb.c:1564
#15 0x0000564e84a3fc59 in kdb::KDB::set (parentKey=..., returned=..., this=0x564e85a3dc78)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/bindings/cpp/include/kdb.hpp:229
#16 CpCommand::execute (this=0x564e85a3dc70, cl=...) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/tools/kdb/cp.cpp:111
#17 0x0000564e84a23659 in main (argc=<optimized out>, argv=0x7fffc1372bd8) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/tools/kdb/main.cpp:198

@markus2330 markus2330 changed the title multiresolver: crash because of warning multiresolver: crash May 13, 2019

@markus2330 markus2330 changed the title multiresolver: crash multiresolver: crash when used with cache May 13, 2019

@markus2330

This comment has been minimized.

Copy link
Contributor Author

commented May 13, 2019

Further infos:

kdb ls works without any problems (and is very fast). The crash happens, when doing a kdb cp inside the multiresolver, i.e. with kdbSet.

The problem might be, that the multiresolver has some internal state (filename) which does not get recovered when it is not called in kdbGet because of the cache hit.

@markus2330

This comment has been minimized.

Copy link
Contributor Author

commented May 13, 2019

I added "urgent" because I needed to disable the cache because of this problem:

cd /usr/lib/x86_64-linux-gnu/elektra4 && sudo mv libelektra-cache.so libelektra-cache.so-backup
@mpranj

This comment has been minimized.

Copy link
Member

commented May 13, 2019

Thank you for reporting! I'll prioritize this and hopefully fix it today.

@mpranj mpranj referenced this issue May 16, 2019

Merged

disable cache plugin, add regression tests #2703

4 of 9 tasks complete
@mpranj

This comment has been minimized.

Copy link
Member

commented May 16, 2019

I could not reproduce this, but I found a different bug (#2702).

Your valgrind log suggests this was an error in kdbSet, causing a kdbError call to multifile resolver. Can you be more specific how you triggered this, so I can reproduce it?

@markus2330

This comment has been minimized.

Copy link
Contributor Author

commented May 18, 2019

I added steps to reproduce above.

The new backtrace from these steps (Hopfully the same. I simplified some steps from the script that originally caused the problem):

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f1ca7f2642a in __GI_abort () at abort.c:89
#2  0x0000564d997ef1cc in catchSignal (signum=<optimized out>) at ./src/tools/kdb/main.cpp:110
#3  <signal handler called>
#4  strlen () at ../sysdeps/x86_64/strlen.S:106
#5  0x00007f1ca7f3ada8 in _IO_vfprintf_internal (s=s@entry=0x7fffca6e3580, format=<optimized out>, 
    format@entry=0x7f1ca71dcddd "the file \"%s\" because of \"%s\"", ap=ap@entry=0x7fffca6e3748) at vfprintf.c:1637
#6  0x00007f1ca7fe8cf6 in ___vsnprintf_chk (s=0x564d9a52cd50 "the file \"(\250\034\177", maxlen=<optimized out>, maxlen@entry=512, flags=flags@entry=1, 
    slen=slen@entry=18446744073709551615, format=format@entry=0x7f1ca71dcddd "the file \"%s\" because of \"%s\"", args=args@entry=0x7fffca6e3748)
    at vsnprintf_chk.c:63
#7  0x00007f1ca882dd34 in vsnprintf (__ap=0x7fffca6e3748, __fmt=0x7f1ca71dcddd "the file \"%s\" because of \"%s\"", __n=512, __s=<optimized out>)
    at /usr/include/x86_64-linux-gnu/bits/stdio2.h:77
#8  elektraVFormat (format=format@entry=0x7f1ca71dcddd "the file \"%s\" because of \"%s\"", arg_list=arg_list@entry=0x7fffca6e3748)
    at ./src/libs/elektra/internal.c:430
#9  0x00007f1ca71d33bb in elektraAddWarningf36 (warningKey=warningKey@entry=0x564d9a527a00, reason=0x7f1ca71dcddd "the file \"%s\" because of \"%s\"", 
    file=0x7f1ca71dc578 "/home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA/libelektra/src/plugins/resolver/resolver.c", line=0x7f1ca71dcdd8 "1175", line=0x7f1ca71dcdd8 "1175", 
    file=0x7f1ca71dc578 "/home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA/libelektra/src/plugins/resolver/resolver.c", reason=0x7f1ca71dcddd "the file \"%s\" because of \"%s\"") at ./obj-x86_64-linux-gnu/src/include/kdberrors.h:3458
#10 0x00007f1ca71d34a4 in elektraUnlinkFile (filename=0x21 <error: Cannot access memory at address 0x21>, parentKey=parentKey@entry=0x564d9a527a00)
    at ./src/plugins/resolver/resolver.c:1175
#11 0x00007f1ca71d4cc5 in libelektra_resolver_fm_hpu_b_fm_hpu_b_LTX_elektraPluginerror (handle=<optimized out>, r=<optimized out>, parentKey=0x564d9a527a00)
    at ./src/plugins/resolver/resolver.c:1191
#12 0x00007f1ca4d1bb2d in elektraMultifileError (handle=0x564d9a499120, returned=0x564d9a5424c0, parentKey=0x564d9a527a00)
    at ./src/plugins/multifile/multifile.c:900
#13 0x00007f1ca8c46bf6 in elektraSetRollback (parentKey=0x564d9a527a00, split=0x564d9a52c2d0) at ./src/libs/elektra/kdb.c:1331
#14 kdbSet (handle=0x564d9a44ff70, ks=0x564d9a544180, parentKey=0x564d9a527a00) at ./src/libs/elektra/kdb.c:1564
#15 0x0000564d997bb5d9 in kdb::KDB::set (parentKey=..., returned=..., this=0x564d9a44ff08) at ./src/bindings/cpp/include/kdb.hpp:229
#16 CpCommand::execute (this=0x564d9a44ff00, cl=...) at ./src/tools/kdb/cp.cpp:111
#17 0x0000564d9979ed69 in main (argc=<optimized out>, argv=0x7fffca6e43f8) at ./src/tools/kdb/main.cpp:198
@markus2330

This comment has been minimized.

Copy link
Contributor Author

commented May 18, 2019

I just see that the steps above also crash without the cache plugin. The original script, however, only crashed with the cache plugin enabled... So maybe it is not the same bug.

@mpranj

This comment has been minimized.

Copy link
Member

commented May 18, 2019

Thank you for the details!

I just see that the steps above also crash without the cache plugin.

That is what I thought too, but I was not sure. I‘ll look into it nevertheless. As I mentioned there is at least one other critical bug with the cache.

@mpranj

This comment has been minimized.

Copy link
Member

commented May 20, 2019

I checked out version 0.8.25 / 6978802 with no cache at all (neither core nor in multifile). I get the same segfault there, so it seems like it has nothing to do with the cache.

@markus2330

This comment has been minimized.

Copy link
Contributor Author

commented May 21, 2019

Yes, multiresolver is very buggy and it does not support creating new files. Nevertheless, there was a case where it crashed only with the cache and worked after disabling cache.

I now investigated this problem in detail and the problem is very tricky. I could not reproduce it without local cache files. With my local cache files, it is easy to trigger and I found a quite minimal set of files to trigger it. The CSV only contains:

student/id,ue/5/kreuzerl
01234567,X

and I needed only two INI files:

tail *
==> 01234567 <==
[]
[student]
id = 01234567
[ue/5]
kreuzerl = X

==> 01234568 <==
[]
[student]

Then the cp crashed only if cache is enabled and a .cache of the INI files exist.

I updated the top-post but as said, without the problematic cache file it does not work. Unfortunately, the cache file is big (2.2MB) and contains private data.

Do you have some idea what it could be?

@mpranj

This comment has been minimized.

Copy link
Member

commented May 21, 2019

Unfortunately, I have no idea what it could be. The example you gave is quite elaborate but it works fine in our debian stretch docker image.

We already have a few separate issues here:

  1. The trace of the segfault is easy to reproduce by kdb cp-ying some key from another backend into the multifile backend. (which was your first example).

  2. Copying stuff between two files inside one mutifile backend causes a corrupt cache. (#2702)

  3. The complex problem that we can't easily reproduce.

I can only suggest that we start with the first two which are easy to reproduce, make regression tests, fix and then move on to the third problem.

@markus2330

This comment has been minimized.

Copy link
Contributor Author

commented May 21, 2019

Of course the obvious problems should be fixed first. I thought you already fixed problems 1+2 in some branch and are waiting for a test to reproduce this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.