Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Quota: a sample #1

Closed
wants to merge 94 commits into from
Closed

Project Quota: a sample #1

wants to merge 94 commits into from

Conversation

amarts
Copy link
Owner

@amarts amarts commented Oct 29, 2019

:-) For testing .mergify.yml

amarts and others added 30 commits September 6, 2019 11:20
fixes for CI/CD on github

* fix for build-aux/pkg-config file (autoconf)
* Fix for rpc/xdr/src/Makefile.am (make clean)

Change-Id: I9dbbaceaa379ca15b999f91bedfa4fd8c023aa77
Signed-off-by: Amar Tumballi <amarts@gmail.com>
Problem: After enabling transport-type to inet6 and passed ipv6
         transport.socket.bind-address in glusterd.vol clients are
         not started.

Solution: Need to update address-family based on remote-address for
          all gluster client process

Change-Id: Iaa3588cd87cebc45231bfd675745c1a457dc9b31
Fixes: bz#1747746
Credits: Amgad Saleh <amgad.saleh@nokia.com>
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
Problem:
Since commit 600ba94, shd starts
healing as soon as it is toggled from disabled to enabled. This was
causing the following line in the .t to fail on a 'fast' machine (always
on my laptop and sometimes on the jenkins slaves).

EXPECT_NOT "^0$" get_pending_heal_count $V0

because by the time shd was disabled, the heal was already completed.

Fix:
Increase the no. of files to be healed and make it a variable called
FILE_COUNT, should we need to bump it up further because the machines
become even faster. Also created pending metadata heals to increase the
time taken to heal a file.

fixes: bz#1748744
Change-Id: I5a26b08e45b8c19bce3c01ce67bdcc28ed48198d
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Problem:
While running markdown-link-checker it was
observed that there were a large number of
404 links present in the documentation present
in the form of markdown files in the project.

This was casued due to the following reasons:
1. Repos being removed.
2. Typo in markdown links.
3. Restructring of directoires.

Solution:
Fixing all the 404 links present in the project.

fixes: bz#1746810
Change-Id: I30de745f848fca2e9c92eb7493f74738f0890ed9
Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
These functions do not seem to be in use.

Change-Id: Ie76baf2a9727b9ba0e66f234226b1e62788245f2
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
In various places, we can re-use knowledge of string length
or result of snprintf() and such instead of strlen().

Change-Id: I4c9b1decf1169b3f8ac83699a0afbd7c38fad746
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
We were not passing xattr_req when doing a name self heal
as well as a meta data heal. Because of this, some xdata
was missing which causes i/o errors

Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f
Fixes: bz#1728770
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
We were unconditionally cleaning up the grap when we get
child_down followed by parent_down. But this is prone to
race condition when some of the bricks are already disconnected.
In this case, even before the last child down is executed in the
client xlator code,we might have freed the graph. Because the
child_down event is alreadt recevied.

To fix this race, we have introduced a check to see if all client
xlator have cleared thier reconnect chain, and called the child_down
for last time.

Change-Id: I7d02813bc366dac733a836e0cd7b14a6fac52042
fixes: bz#1727329
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
there is no need to cleanup the .x files.

Fixes: bz#1743094
Change-Id: I89d8deb3939c83069709c701cb8f1972e3746168
Signed-off-by: Amar Tumballi <amarts@gmail.com>
Problem: IPV6 hostname address is not parsed correctly in function
         glusterd_check_brick_order

Solution: Update the code to parse hostname address

Change-Id: Ifb2f83f9c6e987b2292070e048e97eeb51b728ab
Fixes: bz#1747746
Credits: Amgad Saleh <amgad.saleh@nokia.com>
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
Problem:
If update size/version is not successful on the file, updates on the
same stripe could lead to data corruptions if the earlier un-aligned
write is not successful on all the bricks. Application won't have
any knowledge of this because update size/version happens in the
background.

Fix:
Fail fsync/flush on fds that are opened before update-size-version
went bad.

fixes: bz#1748836
Change-Id: I9d323eddcda703bd27d55f340c4079d76e06e492
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Fixed the following coverity issue in both flush/fsync
>>>     CID 1404964:  Null pointer dereferences  (REVERSE_INULL)
>>>     Null-checking "fd" suggests that it may be null, but it has already
been dereferenced on all paths leading to the check.
>>>         if (fd != NULL) {
>>>           fop->fd = fd_ref(fd);
>>>             if (fop->fd == NULL) {
>>>                 gf_msg(this->name, GF_LOG_ERROR, 0,
>>>                        "Failed to reference a "
>>>                        "file descriptor.");

fixes bz#1748836
Change-Id: I19c05d585e23f8fbfbc195d1f3775ec528eed671
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
fixes: #721
Change-Id: I5333540e3c635ccf441cf1f4696e4c8986e38ea8
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
…tations

1404965 - Null pointer dereference
1404316 - Program hangs
1401715 - Program hangs
1401713 - Program hangs

Updates: bz#789278
Change-Id: I6e6575daafcb067bc910445f82a9d564f43b75a2
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
If heal from next brick starts after the first brick completes heal, then
opendir on the brick can change atime leading to failure of the test. When
ctime is disabled it is better to just check mtime to be same after heal.

fixes: bz#1751134
Change-Id: Ia03e30fd547e6bbe85c1e299845ffa122f3a2692
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Fixes: bz#1708929
Change-Id: I9cc81a9047ff874df752ca5552e00bf033485bd8
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Problem:
Mount-1                                Mount-2
1)Tries to acquire lock on 'dir1'   1)Tries to acquire lock on 'dir1'
2)Lock is granted on brick-0        2)Lock gets EAGAIN on brick-0 and
				      leads to blocking lock on brick-0
3)Gets a lock-contention            3) Doesn't matter what happens on mount-2
  notification, marks lock->release    from here on.
  to true.
4)New fop comes on 'dir1' which will
  be put in frozen list as lock->release
  is set to true.
5) Lock acquisition from step-2 fails because
3 bricks went down in 4+2 setup.

Fop on mount-1 which is put in frozen list will hang because no codepath will
move it from frozen list to any other list and the lock will not be retried.

Fix:
Don't set lock->release to true if lock is not acquired at the time of
lock-contention-notification

fixes: bz#1743573
Change-Id: Ie6630db8735ccf372cc54b873a3a3aed7a6082b7
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
In test tests/bugs/gfapi/bug-1447266/bug-1447266.t
actual file created is -
tests/bugs/gfapi/bug-1447266/bug-1447266
which is not cleaned up later

fixes: bz#1750618
Change-Id: I93120418e54b95018a7213d106a1f1c990766281
Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
There could be cases (either due to insufficient memory or
corrupted mem-pool) due to which frame creation fails. Bail out
with error in such cases.

Change-Id: I8cc0a5852f6f04d2bac991e4eb79ecb42577da11
Fixes: bz#1748448
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
glfs_init when called with volume name prefixed by '/'
sets errno to 0. Setting errno to EINVAL to resolve the issue.
Also volname is a parameter to glfs_new.
Thus, validating volname in glfs_new itself and
returning EINVAL from that function

fixes: bz#1507896

Change-Id: I0d4d2423e26cc07644d50ec8cce788ecc639203d
Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
… a reconnect

Bricks cleanup any granted locks after a client disconnects and
currently these locks are not healed after a reconnect. This means
post reconnect a competing process could be granted a lock even though
the first process which was granted locks has not unlocked. By not
re-opening fds, subsequent operations on such fds will fail forcing
the application to close the current fd and reopen a new one. This way
we prevent any silent corruption.

A new option "client.strict-locks" is introduced to control this
behaviour. This option is set to "off" by default.

Change-Id: Ieed545efea466cb5e8f5a36199aa26380c301b9e
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
updates: bz#1694920
This test case is failing in upstream. Marking this test as
bad for now.

Change-Id: I014c67628c14683c32a3c1dd770b10aaf35ad4cc
Updates: bz#1752331
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
In many cases, we will not be logging at all, since the log level
is irrelevant. In that case, we can just bail out.
Worth doing this check before the args check, which may not be needed
in this case.

Change-Id: Ia90a38bb2a49b09bfea5b5abc8b5d0c3bab4e9ac
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
After add-brick and rebalance, the ctime xattr is not present
on rebalanced directories on new brick. This patch fixes the
same.

Note that ctime still doesn't support consistent time across
distribute sub-volume.

This patch also fixes the in-memory inconsistency of time attributes
when metadata is self healed.

Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df
fixes: bz#1734026
Signed-off-by: Kotresh HR <khiremat@redhat.com>
server.sin_family was set to AF_INET while creating socket connection,
this was failing if the input address is IPv6(`::1`).

With this patch, sin_family is set by reading the ai_family of
`getaddrinfo` result.

Fixes: bz#1752330
Change-Id: I499f957b432842fa989c698f6e5b25b7016084eb
Signed-off-by: Aravinda VK <avishwan@redhat.com>
File truncate operations during a migration were not handled properly.
This has been fixed.

Change-Id: Ic642d257e893641236a4a21ab69fcc7a569dd70a
Fixes: bz#1745967
Signed-off-by: N Balachandran <nbalacha@redhat.com>
posix-metadata.c: 462 in posix_set_mdata_xattr()
...
460         GF_VALIDATE_OR_GOTO(this->name, time, out);
461
>>>     CID 1405665:  Control flow issues  (DEADCODE)
>>>     Execution cannot reach the expression "flag->atime" inside this
>>>     statement: "if (update_utime && (flag->...".
462         if (update_utime && (flag->ctime && !time) && (flag->atime && !u_atime) &&

Change-Id: Id31d81d04ea2785a669eafe0dc1307303cb2271b
updates: bz#789278
Signed-off-by: Kotresh HR <khiremat@redhat.com>
The arguments for gluster-mountbroker is missing help="< Some Text>"
parameter wherever add_argument() is used due to which gluster-mountbroker
help is missing  a brief description of what the argument does in the help
of {add,setup,remove} as shown below:

usage: gluster-mountbroker remove [-h] [--volume VOLUME] [--user USER]

optional arguments:
  -h, --help       show this help message and exit
  --volume VOLUME
  --user USER
usage: gluster-mountbroker add [-h] volume user

positional arguments:
  volume
  user

optional arguments:
  -h, --help  show this help message and exit
usage: gluster-mountbroker setup [-h] mount_root group

positional arguments:
  mount_root
  group

optional arguments:
  -h, --help  show this help message and exit

fixes: bz#1728683

Change-Id: I7eabcd2c103f01e40160ba35500b0a4e5c9f5e7a
Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
In 3 cases, there was a memory allocation and zeroing, followed
directly by populating it with content. Replaced with memory
allocation that did not zero the memory.

Change-Id: I4fbb5c924fb3a144e415d2368126b784dde760ea
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Fixed typos and reworded log messages for clarity.

Change-Id: I46f616ce7d3eb993c77a5812e8bc044e5f283354
Fixes: bz#1753859
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Pranith Kumar K and others added 11 commits October 17, 2019 11:21
Problem:
Total blocks of XFS partition are dependent on XFS parameters used
for formatting. So hardcoded lower-bound may not be the lower bound
for different default parameters.

Used blocks are lesser on my machine for a freshly formatted XFS partition
compared to where the test fails.
On my machine:
Filesystem     1024-blocks  Used Available Capacity Mounted on
/dev/loop0           98980  5472     93508       6% /d/backends/patchy1
/dev/loop1           98980  5472     93508       6% /d/backends/patchy2

On a machine where this test fails:
Filesystem     1024-blocks  Used Available Capacity Mounted on
/dev/loop0           96928  6112     90816       7% /d/backends/patchy1
/dev/loop1           96928  6112     90816       7% /d/backends/patchy2

Fix:
Make lower bound 2% less than the brick-blocks available

fixes: bz#1761759
Change-Id: I974d5e75766f7ff44780a2e4c2a19cd5d1d14a79
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
On some distros default bs is very slow and the test takes
close to 2 minutes instead of 20 seconds.

fixes: bz#1761769
Change-Id: If10d595a7ca05f053237f3c5ffbb09c5151eab35
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
fixes: bz#1759002
Change-Id: I4d49e1c2ca9b3c1d74b9dd5a30f1c66983a76529
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
updates: bz#1193929

Change-Id: I517fa29e57bde970c2c22ebc2de80fec1509cd2d
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
After upgrade, if the config files are of old format, it
gets migrated to new format. Monitor process migrates it.
Since monitor doesn't run on nodes where bricks are not
hosted, it doesn't get migrated there. So this patch fixes
the config upgrade on nodes which doesn't host bricks.
This happens during config either on get/set/reset.

Change-Id: Ibade2f2310b0f3affea21a3baa1ae0eb71162cba
Signed-off-by: Kotresh HR <khiremat@redhat.com>
fixes: bz#1762220
squash tens of warnings on padding of structs in afr structures.
The warnings were found by manually added '-Wpadded' to the GCC
command line.

Also made relevant structs and definitions static, where it
was applicable.

Change-Id: Ib71a7e9c6179378f072d796d11172d086c343e53
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Problem:glusterfind saves var data under $prefix,
even though the localstatedir is mentioned while installing i.e, when we
install glusterfs with ./configure --prefix=/path/to/prefix
--localstatedir=/path/to/var

This happens because, glusterfind stores working dir as,
working_dir=@GLUSTERFSD_MISCDIR@/glusterfind/
and in configure.ac, GLUSTERFSD_MISCDIR is defined as:
GLUSTERFSD_MISCDIR="$(eval echo $prefix)/var/lib/misc/glusterfsd"

Solution: $localstatedir is used instead of $prefix while defining,
GLUSTERFSD_MISCDIR to avoid such anomaly.

fixes: bz#1489610
Change-Id: I431a63c14bee23a1aac612a3dcb0431cb29ca4a2
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
Problem: After completion of rebalance, if glusterd is restarted,
rebalance status displays wrong rebalance data size in its output.

Cause: While glusterd restoring the information from /var/lib/glusterd/
into its memory, glusterd fetches rebalance_data from
/var/lib/glusterd/vols/volname/node_state.info. This value is
converted into an integer using atoi(), which is returning
incorrect value for larger values.

Solution: use sscanf() instead of atoi() to convert string to
integer(in this case it is unsigned long)

fixes: bz#1762438

Change-Id: Icbdb096919612b4a1d6fb0e315f09d38900abf4e
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
https://review.gluster.org/#/c/glusterfs/+/23439/ introduced this warning:
In file included from posix-helpers.c:50:
posix-helpers.c: In function ‘posix_health_check_thread_proc’:
../../../../libglusterfs/src/glusterfs/events.h:31:9: warning: ‘timeout’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   31 |         _gf_event(event, ##fmt);                                               \
      |         ^~~~~~~~~
posix-helpers.c:2024:9: note: ‘timeout’ was declared here
 2024 |     int timeout;

This patch fixes it, by re-setting the initial value to 0.

Change-Id: I90aee4a1366e21a00f5e138f44247d8c42b0f5ae
updates: bz#789278
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
The code is simplified to avoid needless copy as well as simplified
overall for readability.
Such changes are needed elsewhere too (see
gluster/glusterfs#720 )

Few other minor changes here and there, nothing functional.

updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>

Change-Id: I14f9dd2c32a8932bfcc80ebe92c9aa77701095ff
Change-Id: I838220a5cfd7ff2876431e9292124d648630a7af
Signed-off-by: Amar Tumballi <amarts@gmail.com>
* for focused subdirectory specific quota setting
* current version very focused on kadalu specific use-case, where
  the hard-limits are not strictly done.

Change-Id: I7159acd4fda19fc16da2c81ab83f06cc2dfd0182
Change-Id: I2319c4500c8ef410582d0a5549affa256a0c95a4
Change-Id: Ie7b3985dc709e39ab6224ec444dc8241190e20bb
@amarts amarts closed this Mar 24, 2020
amarts pushed a commit that referenced this pull request May 5, 2020
The general idea of the changes is to prevent resetting event generation
to zero in the inode ctx, since event gen is something that should
follow 'causal order'.

Change #1:
For a read txn, in inode refresh cbk, if event_generation is
found zero, we are failing the read fop. This is not needed
because change in event gen is only a marker for the next inode refresh to
happen and should not be taken into account by the current read txn.

Change #2:
The event gen being zero above can happen if there is a racing lookup,
which resets even get (in afr_lookup_done) if there are non zero afr
xattrs. The resetting is done only to trigger an inode refresh and a
possible client side heal on the next lookup. That can be acheived by
setting the need_refresh flag in the inode ctx. So replaced all
occurences of resetting even gen to zero with a call to
afr_inode_need_refresh_set().

Change #3:
In both lookup and discover path, we are doing an inode refresh which is
not required since all 3 essentially do the same thing- update the inode
ctx with the good/bad copies from the brick replies. Inode refresh also
triggers background heals, but I think it is okay to do it when we call
refresh during the read and write txns and not in the lookup path.

The .ts which relied on inode refresh in lookup path to trigger heals are
now changed to do read txn so that inode refresh and the heal happens.

Change-Id: Iebf39a9be6ffd7ffd6e4046c96b0fa78ade6c5ec
Fixes: #1179
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reported-by: Erik Jacobson <erik.jacobson at hpe.com>
amarts pushed a commit that referenced this pull request Jun 22, 2020
The general idea of the changes is to prevent resetting event generation
to zero in the inode ctx, since event gen is something that should
follow 'causal order'.

Change #1:
For a read txn, in inode refresh cbk, if event_generation is
found zero, we are failing the read fop. This is not needed
because change in event gen is only a marker for the next inode refresh to
happen and should not be taken into account by the current read txn.

Change #2:
The event gen being zero above can happen if there is a racing lookup,
which resets even get (in afr_lookup_done) if there are non zero afr
xattrs. The resetting is done only to trigger an inode refresh and a
possible client side heal on the next lookup. That can be acheived by
setting the need_refresh flag in the inode ctx. So replaced all
occurences of resetting even gen to zero with a call to
afr_inode_need_refresh_set().

Change #3:
In both lookup and discover path, we are doing an inode refresh which is
not required since all 3 essentially do the same thing- update the inode
ctx with the good/bad copies from the brick replies. Inode refresh also
triggers background heals, but I think it is okay to do it when we call
refresh during the read and write txns and not in the lookup path.

The .ts which relied on inode refresh in lookup path to trigger heals are
now changed to do read txn so that inode refresh and the heal happens.

Change-Id: Iebf39a9be6ffd7ffd6e4046c96b0fa78ade6c5ec
Fixes: #1179
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reported-by: Erik Jacobson <erik.jacobson at hpe.com>
amarts pushed a commit that referenced this pull request Jun 22, 2020
Found with GCC ASan:

Direct leak of 202 byte(s) in 2 object(s) allocated from:
    #0 0x7fc6c6ef0667 in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb0667)
    #1 0x7fc6c6bd145b in __gf_malloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:175
    #2 0x7fc6c6bd17a3 in gf_vasprintf /path/to/glusterfs/libglusterfs/src/mem-pool.c:223
    #3 0x7fc6c6bd1993 in gf_asprintf /path/to/glusterfs/libglusterfs/src/mem-pool.c:243
    #4 0x7fc6b0dc92f6 in init /path/to/glusterfs/xlators/cluster/afr/src/afr.c:590
    ...

Change-Id: I29feb1d30a045fb70472758e6ed4e195888090b2
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1278
amarts pushed a commit that referenced this pull request Jun 22, 2020
Found with GCC's address sanitizer:

==67190==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 24624 byte(s) in 6 object(s) allocated from:
    #0 0x7f62535c0837 in __interceptor_calloc (/usr/lib64/libasan.so.6+0xb0837)
    #1 0x7f62532a1690 in __gf_default_calloc glusterfs/mem-pool.h:122
    #2 0x7f62532a20ca in __gf_calloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:144
    #3 0x7f62532c8128 in gf_store_iter_new /path/to/glusterfs/libglusterfs/src/store.c:511
    #4 0x7f623e2f9ed7 in glusterd_store_retrieve_bricks /path/to/glusterfs/xlators/mgmt/glusterd/src/glusterd-store.c:2389

Direct leak of 8208 byte(s) in 2 object(s) allocated from:
    #0 0x7f62535c0837 in __interceptor_calloc (/usr/lib64/libasan.so.6+0xb0837)
    #1 0x7f62532a1690 in __gf_default_calloc glusterfs/mem-pool.h:122
    #2 0x7f62532a20ca in __gf_calloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:144
    #3 0x7f62532c8128 in gf_store_iter_new /path/to/glusterfs/libglusterfs/src/store.c:511
    #4 0x7f623e2f9cf0 in glusterd_store_retrieve_bricks /path/to/glusterfs/xlators/mgmt/glusterd/src/glusterd-store.c:2363
    #5 0x7fff5cb70bcf  ([stack]+0x15bcf)
    #6 0x7f623e309113 in glusterd_store_retrieve_volumes /path/to/glusterfs/xlators/mgmt/glusterd/src/glusterd-store.c:3505
    #7 0xfffeb96e61d  (<unknown module>)
    #8 0x7f623e4586d7  (/usr/lib64/glusterfs/9dev/xlator/mgmt/glusterd.so+0x2f86d7)

Change-Id: I9b2a543dc095f4fa739cd664fd4d608bf8c87d60
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1263
amarts pushed a commit that referenced this pull request Jun 22, 2020
Found with GCC ThreadSanitizer:

WARNING: ThreadSanitizer: use of an invalid mutex (e.g. uninitialized or destroyed) (pid=188590)
    #0 pthread_mutex_lock <null> (libtsan.so.0+0x528ac)
    #1 client_ctx_del /path/to/glusterfs/libglusterfs/src/client_t.c:535 (libglusterfs.so.0+0xc681a)
    #2 client_destroy_cbk /path/to/glusterfs/xlators/protocol/server/src/server.c:944 (server.so+0xaf6e)
    #3 gf_client_destroy_recursive /path/to/glusterfs/libglusterfs/src/client_t.c:295 (libglusterfs.so.0+0xc5058)
    #4 client_destroy /path/to/glusterfs/libglusterfs/src/client_t.c:330 (libglusterfs.so.0+0xc60e4)
    ...

  Location is heap block of size 272 at 0x7b440001a180 allocated by thread T7:
    #0 calloc <null> (libtsan.so.0+0x3075a)
    #1 __gf_calloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:151 (libglusterfs.so.0+0x6e42b)
    #2 gf_client_get /path/to/glusterfs/libglusterfs/src/client_t.c:155 (libglusterfs.so.0+0xc571a)
    ...

The problem is that client_destroy() may call client_ctx_del() (which attempts to lock
'sratch_ctx.lock') via recursive deletion from gf_client_destroy_recursive(), so
destroying mutex before entering recursive deletion is an error. It should be destroyed
later - just before the client context is freed.

Change-Id: I730a628714d2b404e3f019ae552403da16b51b68
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1285
amarts pushed a commit that referenced this pull request Jun 22, 2020
Found with GCC UBsan:

rpcsvc.c:102:36: runtime error: passing zero to ctz(), which is not a valid argument
    #0 0x7fcd1ff6faa4 in rpcsvc_get_free_queue_index /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:102
    #1 0x7fcd1ff81e12 in rpcsvc_handle_rpc_call /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:837
    #2 0x7fcd1ff833ad in rpcsvc_notify /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:1000
    #3 0x7fcd1ff8829d in rpc_transport_notify /path/to/glusterfs/rpc/rpc-lib/src/rpc-transport.c:520
    #4 0x7fcd0dd72f16 in socket_event_poll_in_async /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2502
    #5 0x7fcd0dd8986a in gf_async ../../../../libglusterfs/src/glusterfs/async.h:189
    #6 0x7fcd0dd8986a in socket_event_poll_in /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2543
    #7 0x7fcd0dd8986a in socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2934
    #8 0x7fcd0dd8986a in socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2854
    #9 0x7fcd2048aff7 in event_dispatch_epoll_handler /path/to/glusterfs/libglusterfs/src/event-epoll.c:640
    #10 0x7fcd2048aff7 in event_dispatch_epoll_worker /path/to/glusterfs/libglusterfs/src/event-epoll.c:751
    ...

Fix, simplify, and prefer 'unsigned long' as underlying bitmap type.

Change-Id: If3f24dfe7bef8bc7a11a679366e219a73caeb9e4
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1283
amarts pushed a commit that referenced this pull request Jun 22, 2020
Found with GCC ThreadSanitizer:

WARNING: ThreadSanitizer: data race (pid=287943)
  Write of size 4 at 0x00000047dfa0 by thread T4:
    #0 cli_rpc_notify /path/to/glusterfs/cli/src/cli.c:313 (gluster+0x40a6df)
    #1 rpc_clnt_handle_disconnect /path/to/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:821 (libgfrpc.so.0+0x13f04)
    #2 rpc_clnt_notify /path/to/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:882 (libgfrpc.so.0+0x13f04)
    #3 rpc_transport_notify /path/to/glusterfs/rpc/rpc-lib/src/rpc-transport.c:520 (libgfrpc.so.0+0xf070)
    #4 socket_event_poll_err /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:1364 (socket.so+0x812c)
    #5 socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2958 (socket.so+0xc453)
    #6 socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2854 (socket.so+0xc453)
    #7 event_dispatch_epoll_handler /path/to/glusterfs/libglusterfs/src/event-epoll.c:640 (libglusterfs.so.0+0xcaf23)
    #8 event_dispatch_epoll_worker /path/to/glusterfs/libglusterfs/src/event-epoll.c:751 (libglusterfs.so.0+0xcaf23)
    #9 <null> <null> (libtsan.so.0+0x2d33f)

  Previous read of size 4 at 0x00000047dfa0 by thread T3 (mutexes: write M3587):
    #0 cli_cmd_await_connected /path/to/glusterfs/cli/src/cli-cmd.c:321 (gluster+0x40ca37)
    #1 cli_cmd_process /path/to/glusterfs/cli/src/cli-cmd.c:123 (gluster+0x40cc74)
    #2 cli_batch /path/to/glusterfs/cli/src/input.c:29 (gluster+0x40c2b9)
    #3 <null> <null> (libtsan.so.0+0x2d33f)

  Location is global 'connected' of size 4 at 0x00000047dfa0 (gluster+0x00000047dfa0)

Change-Id: Ie85a8a80a2c5b82252c0c1d45e68ebe9938da2eb
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1311
amarts pushed a commit that referenced this pull request Aug 12, 2020
Found with GCC ASan:

Direct leak of 202 byte(s) in 2 object(s) allocated from:
    #0 0x7fc6c6ef0667 in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb0667)
    #1 0x7fc6c6bd145b in __gf_malloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:175
    #2 0x7fc6c6bd17a3 in gf_vasprintf /path/to/glusterfs/libglusterfs/src/mem-pool.c:223
    #3 0x7fc6c6bd1993 in gf_asprintf /path/to/glusterfs/libglusterfs/src/mem-pool.c:243
    #4 0x7fc6b0dc92f6 in init /path/to/glusterfs/xlators/cluster/afr/src/afr.c:590
    ...

Change-Id: I29feb1d30a045fb70472758e6ed4e195888090b2
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1278
amarts pushed a commit that referenced this pull request Aug 12, 2020
Found with GCC's address sanitizer:

==67190==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 24624 byte(s) in 6 object(s) allocated from:
    #0 0x7f62535c0837 in __interceptor_calloc (/usr/lib64/libasan.so.6+0xb0837)
    #1 0x7f62532a1690 in __gf_default_calloc glusterfs/mem-pool.h:122
    #2 0x7f62532a20ca in __gf_calloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:144
    #3 0x7f62532c8128 in gf_store_iter_new /path/to/glusterfs/libglusterfs/src/store.c:511
    #4 0x7f623e2f9ed7 in glusterd_store_retrieve_bricks /path/to/glusterfs/xlators/mgmt/glusterd/src/glusterd-store.c:2389

Direct leak of 8208 byte(s) in 2 object(s) allocated from:
    #0 0x7f62535c0837 in __interceptor_calloc (/usr/lib64/libasan.so.6+0xb0837)
    #1 0x7f62532a1690 in __gf_default_calloc glusterfs/mem-pool.h:122
    #2 0x7f62532a20ca in __gf_calloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:144
    #3 0x7f62532c8128 in gf_store_iter_new /path/to/glusterfs/libglusterfs/src/store.c:511
    #4 0x7f623e2f9cf0 in glusterd_store_retrieve_bricks /path/to/glusterfs/xlators/mgmt/glusterd/src/glusterd-store.c:2363
    #5 0x7fff5cb70bcf  ([stack]+0x15bcf)
    #6 0x7f623e309113 in glusterd_store_retrieve_volumes /path/to/glusterfs/xlators/mgmt/glusterd/src/glusterd-store.c:3505
    #7 0xfffeb96e61d  (<unknown module>)
    #8 0x7f623e4586d7  (/usr/lib64/glusterfs/9dev/xlator/mgmt/glusterd.so+0x2f86d7)

Change-Id: I9b2a543dc095f4fa739cd664fd4d608bf8c87d60
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1263
amarts pushed a commit that referenced this pull request Aug 12, 2020
Found with GCC ThreadSanitizer:

WARNING: ThreadSanitizer: use of an invalid mutex (e.g. uninitialized or destroyed) (pid=188590)
    #0 pthread_mutex_lock <null> (libtsan.so.0+0x528ac)
    #1 client_ctx_del /path/to/glusterfs/libglusterfs/src/client_t.c:535 (libglusterfs.so.0+0xc681a)
    #2 client_destroy_cbk /path/to/glusterfs/xlators/protocol/server/src/server.c:944 (server.so+0xaf6e)
    #3 gf_client_destroy_recursive /path/to/glusterfs/libglusterfs/src/client_t.c:295 (libglusterfs.so.0+0xc5058)
    #4 client_destroy /path/to/glusterfs/libglusterfs/src/client_t.c:330 (libglusterfs.so.0+0xc60e4)
    ...

  Location is heap block of size 272 at 0x7b440001a180 allocated by thread T7:
    #0 calloc <null> (libtsan.so.0+0x3075a)
    #1 __gf_calloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:151 (libglusterfs.so.0+0x6e42b)
    #2 gf_client_get /path/to/glusterfs/libglusterfs/src/client_t.c:155 (libglusterfs.so.0+0xc571a)
    ...

The problem is that client_destroy() may call client_ctx_del() (which attempts to lock
'sratch_ctx.lock') via recursive deletion from gf_client_destroy_recursive(), so
destroying mutex before entering recursive deletion is an error. It should be destroyed
later - just before the client context is freed.

Change-Id: I730a628714d2b404e3f019ae552403da16b51b68
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1285
amarts pushed a commit that referenced this pull request Aug 12, 2020
Found with GCC UBsan:

rpcsvc.c:102:36: runtime error: passing zero to ctz(), which is not a valid argument
    #0 0x7fcd1ff6faa4 in rpcsvc_get_free_queue_index /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:102
    #1 0x7fcd1ff81e12 in rpcsvc_handle_rpc_call /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:837
    #2 0x7fcd1ff833ad in rpcsvc_notify /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:1000
    #3 0x7fcd1ff8829d in rpc_transport_notify /path/to/glusterfs/rpc/rpc-lib/src/rpc-transport.c:520
    #4 0x7fcd0dd72f16 in socket_event_poll_in_async /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2502
    #5 0x7fcd0dd8986a in gf_async ../../../../libglusterfs/src/glusterfs/async.h:189
    #6 0x7fcd0dd8986a in socket_event_poll_in /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2543
    #7 0x7fcd0dd8986a in socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2934
    #8 0x7fcd0dd8986a in socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2854
    #9 0x7fcd2048aff7 in event_dispatch_epoll_handler /path/to/glusterfs/libglusterfs/src/event-epoll.c:640
    #10 0x7fcd2048aff7 in event_dispatch_epoll_worker /path/to/glusterfs/libglusterfs/src/event-epoll.c:751
    ...

Fix, simplify, and prefer 'unsigned long' as underlying bitmap type.

Change-Id: If3f24dfe7bef8bc7a11a679366e219a73caeb9e4
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1283
amarts pushed a commit that referenced this pull request Aug 12, 2020
Found with GCC ThreadSanitizer:

WARNING: ThreadSanitizer: data race (pid=287943)
  Write of size 4 at 0x00000047dfa0 by thread T4:
    #0 cli_rpc_notify /path/to/glusterfs/cli/src/cli.c:313 (gluster+0x40a6df)
    #1 rpc_clnt_handle_disconnect /path/to/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:821 (libgfrpc.so.0+0x13f04)
    #2 rpc_clnt_notify /path/to/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:882 (libgfrpc.so.0+0x13f04)
    #3 rpc_transport_notify /path/to/glusterfs/rpc/rpc-lib/src/rpc-transport.c:520 (libgfrpc.so.0+0xf070)
    #4 socket_event_poll_err /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:1364 (socket.so+0x812c)
    #5 socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2958 (socket.so+0xc453)
    #6 socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2854 (socket.so+0xc453)
    #7 event_dispatch_epoll_handler /path/to/glusterfs/libglusterfs/src/event-epoll.c:640 (libglusterfs.so.0+0xcaf23)
    #8 event_dispatch_epoll_worker /path/to/glusterfs/libglusterfs/src/event-epoll.c:751 (libglusterfs.so.0+0xcaf23)
    #9 <null> <null> (libtsan.so.0+0x2d33f)

  Previous read of size 4 at 0x00000047dfa0 by thread T3 (mutexes: write M3587):
    #0 cli_cmd_await_connected /path/to/glusterfs/cli/src/cli-cmd.c:321 (gluster+0x40ca37)
    #1 cli_cmd_process /path/to/glusterfs/cli/src/cli-cmd.c:123 (gluster+0x40cc74)
    #2 cli_batch /path/to/glusterfs/cli/src/input.c:29 (gluster+0x40c2b9)
    #3 <null> <null> (libtsan.so.0+0x2d33f)

  Location is global 'connected' of size 4 at 0x00000047dfa0 (gluster+0x00000047dfa0)

Change-Id: Ie85a8a80a2c5b82252c0c1d45e68ebe9938da2eb
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1311
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.