Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configure: add an option to build with tcmalloc library #4

Closed
wants to merge 127 commits into from

Conversation

amarts
Copy link
Owner

@amarts amarts commented Nov 19, 2019

With this patch, one can now use '--enable-tcmalloc' while running
configure, and they can see that their glusterfs is linked with
libtcmalloc.

[atumball@local build]$ ldd /usr/local/sbin/glusterfs | grep tcmalloc
libtcmalloc.so.4 => /lib64/libtcmalloc.so.4 (0x00007feec0b87000)

Once we establish a standard performance number with and without this option,
we will see how to make it default.

Updates: #237
Change-Id: I3377f57bfe4e17f101a212e1914a6d3c1687d528
Signed-off-by: Amar Tumballi amarts@kadalu.io

amarts and others added 30 commits September 6, 2019 11:20
fixes for CI/CD on github

* fix for build-aux/pkg-config file (autoconf)
* Fix for rpc/xdr/src/Makefile.am (make clean)

Change-Id: I9dbbaceaa379ca15b999f91bedfa4fd8c023aa77
Signed-off-by: Amar Tumballi <amarts@gmail.com>
Problem: After enabling transport-type to inet6 and passed ipv6
         transport.socket.bind-address in glusterd.vol clients are
         not started.

Solution: Need to update address-family based on remote-address for
          all gluster client process

Change-Id: Iaa3588cd87cebc45231bfd675745c1a457dc9b31
Fixes: bz#1747746
Credits: Amgad Saleh <amgad.saleh@nokia.com>
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
Problem:
Since commit 600ba94, shd starts
healing as soon as it is toggled from disabled to enabled. This was
causing the following line in the .t to fail on a 'fast' machine (always
on my laptop and sometimes on the jenkins slaves).

EXPECT_NOT "^0$" get_pending_heal_count $V0

because by the time shd was disabled, the heal was already completed.

Fix:
Increase the no. of files to be healed and make it a variable called
FILE_COUNT, should we need to bump it up further because the machines
become even faster. Also created pending metadata heals to increase the
time taken to heal a file.

fixes: bz#1748744
Change-Id: I5a26b08e45b8c19bce3c01ce67bdcc28ed48198d
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Problem:
While running markdown-link-checker it was
observed that there were a large number of
404 links present in the documentation present
in the form of markdown files in the project.

This was casued due to the following reasons:
1. Repos being removed.
2. Typo in markdown links.
3. Restructring of directoires.

Solution:
Fixing all the 404 links present in the project.

fixes: bz#1746810
Change-Id: I30de745f848fca2e9c92eb7493f74738f0890ed9
Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
These functions do not seem to be in use.

Change-Id: Ie76baf2a9727b9ba0e66f234226b1e62788245f2
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
In various places, we can re-use knowledge of string length
or result of snprintf() and such instead of strlen().

Change-Id: I4c9b1decf1169b3f8ac83699a0afbd7c38fad746
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
We were not passing xattr_req when doing a name self heal
as well as a meta data heal. Because of this, some xdata
was missing which causes i/o errors

Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f
Fixes: bz#1728770
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
We were unconditionally cleaning up the grap when we get
child_down followed by parent_down. But this is prone to
race condition when some of the bricks are already disconnected.
In this case, even before the last child down is executed in the
client xlator code,we might have freed the graph. Because the
child_down event is alreadt recevied.

To fix this race, we have introduced a check to see if all client
xlator have cleared thier reconnect chain, and called the child_down
for last time.

Change-Id: I7d02813bc366dac733a836e0cd7b14a6fac52042
fixes: bz#1727329
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
there is no need to cleanup the .x files.

Fixes: bz#1743094
Change-Id: I89d8deb3939c83069709c701cb8f1972e3746168
Signed-off-by: Amar Tumballi <amarts@gmail.com>
Problem: IPV6 hostname address is not parsed correctly in function
         glusterd_check_brick_order

Solution: Update the code to parse hostname address

Change-Id: Ifb2f83f9c6e987b2292070e048e97eeb51b728ab
Fixes: bz#1747746
Credits: Amgad Saleh <amgad.saleh@nokia.com>
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
Problem:
If update size/version is not successful on the file, updates on the
same stripe could lead to data corruptions if the earlier un-aligned
write is not successful on all the bricks. Application won't have
any knowledge of this because update size/version happens in the
background.

Fix:
Fail fsync/flush on fds that are opened before update-size-version
went bad.

fixes: bz#1748836
Change-Id: I9d323eddcda703bd27d55f340c4079d76e06e492
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Fixed the following coverity issue in both flush/fsync
>>>     CID 1404964:  Null pointer dereferences  (REVERSE_INULL)
>>>     Null-checking "fd" suggests that it may be null, but it has already
been dereferenced on all paths leading to the check.
>>>         if (fd != NULL) {
>>>           fop->fd = fd_ref(fd);
>>>             if (fop->fd == NULL) {
>>>                 gf_msg(this->name, GF_LOG_ERROR, 0,
>>>                        "Failed to reference a "
>>>                        "file descriptor.");

fixes bz#1748836
Change-Id: I19c05d585e23f8fbfbc195d1f3775ec528eed671
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
fixes: #721
Change-Id: I5333540e3c635ccf441cf1f4696e4c8986e38ea8
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
…tations

1404965 - Null pointer dereference
1404316 - Program hangs
1401715 - Program hangs
1401713 - Program hangs

Updates: bz#789278
Change-Id: I6e6575daafcb067bc910445f82a9d564f43b75a2
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
If heal from next brick starts after the first brick completes heal, then
opendir on the brick can change atime leading to failure of the test. When
ctime is disabled it is better to just check mtime to be same after heal.

fixes: bz#1751134
Change-Id: Ia03e30fd547e6bbe85c1e299845ffa122f3a2692
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Fixes: bz#1708929
Change-Id: I9cc81a9047ff874df752ca5552e00bf033485bd8
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Problem:
Mount-1                                Mount-2
1)Tries to acquire lock on 'dir1'   1)Tries to acquire lock on 'dir1'
2)Lock is granted on brick-0        2)Lock gets EAGAIN on brick-0 and
				      leads to blocking lock on brick-0
3)Gets a lock-contention            3) Doesn't matter what happens on mount-2
  notification, marks lock->release    from here on.
  to true.
4)New fop comes on 'dir1' which will
  be put in frozen list as lock->release
  is set to true.
5) Lock acquisition from step-2 fails because
3 bricks went down in 4+2 setup.

Fop on mount-1 which is put in frozen list will hang because no codepath will
move it from frozen list to any other list and the lock will not be retried.

Fix:
Don't set lock->release to true if lock is not acquired at the time of
lock-contention-notification

fixes: bz#1743573
Change-Id: Ie6630db8735ccf372cc54b873a3a3aed7a6082b7
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
In test tests/bugs/gfapi/bug-1447266/bug-1447266.t
actual file created is -
tests/bugs/gfapi/bug-1447266/bug-1447266
which is not cleaned up later

fixes: bz#1750618
Change-Id: I93120418e54b95018a7213d106a1f1c990766281
Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
There could be cases (either due to insufficient memory or
corrupted mem-pool) due to which frame creation fails. Bail out
with error in such cases.

Change-Id: I8cc0a5852f6f04d2bac991e4eb79ecb42577da11
Fixes: bz#1748448
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
glfs_init when called with volume name prefixed by '/'
sets errno to 0. Setting errno to EINVAL to resolve the issue.
Also volname is a parameter to glfs_new.
Thus, validating volname in glfs_new itself and
returning EINVAL from that function

fixes: bz#1507896

Change-Id: I0d4d2423e26cc07644d50ec8cce788ecc639203d
Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
… a reconnect

Bricks cleanup any granted locks after a client disconnects and
currently these locks are not healed after a reconnect. This means
post reconnect a competing process could be granted a lock even though
the first process which was granted locks has not unlocked. By not
re-opening fds, subsequent operations on such fds will fail forcing
the application to close the current fd and reopen a new one. This way
we prevent any silent corruption.

A new option "client.strict-locks" is introduced to control this
behaviour. This option is set to "off" by default.

Change-Id: Ieed545efea466cb5e8f5a36199aa26380c301b9e
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
updates: bz#1694920
This test case is failing in upstream. Marking this test as
bad for now.

Change-Id: I014c67628c14683c32a3c1dd770b10aaf35ad4cc
Updates: bz#1752331
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
In many cases, we will not be logging at all, since the log level
is irrelevant. In that case, we can just bail out.
Worth doing this check before the args check, which may not be needed
in this case.

Change-Id: Ia90a38bb2a49b09bfea5b5abc8b5d0c3bab4e9ac
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
After add-brick and rebalance, the ctime xattr is not present
on rebalanced directories on new brick. This patch fixes the
same.

Note that ctime still doesn't support consistent time across
distribute sub-volume.

This patch also fixes the in-memory inconsistency of time attributes
when metadata is self healed.

Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df
fixes: bz#1734026
Signed-off-by: Kotresh HR <khiremat@redhat.com>
server.sin_family was set to AF_INET while creating socket connection,
this was failing if the input address is IPv6(`::1`).

With this patch, sin_family is set by reading the ai_family of
`getaddrinfo` result.

Fixes: bz#1752330
Change-Id: I499f957b432842fa989c698f6e5b25b7016084eb
Signed-off-by: Aravinda VK <avishwan@redhat.com>
File truncate operations during a migration were not handled properly.
This has been fixed.

Change-Id: Ic642d257e893641236a4a21ab69fcc7a569dd70a
Fixes: bz#1745967
Signed-off-by: N Balachandran <nbalacha@redhat.com>
posix-metadata.c: 462 in posix_set_mdata_xattr()
...
460         GF_VALIDATE_OR_GOTO(this->name, time, out);
461
>>>     CID 1405665:  Control flow issues  (DEADCODE)
>>>     Execution cannot reach the expression "flag->atime" inside this
>>>     statement: "if (update_utime && (flag->...".
462         if (update_utime && (flag->ctime && !time) && (flag->atime && !u_atime) &&

Change-Id: Id31d81d04ea2785a669eafe0dc1307303cb2271b
updates: bz#789278
Signed-off-by: Kotresh HR <khiremat@redhat.com>
The arguments for gluster-mountbroker is missing help="< Some Text>"
parameter wherever add_argument() is used due to which gluster-mountbroker
help is missing  a brief description of what the argument does in the help
of {add,setup,remove} as shown below:

usage: gluster-mountbroker remove [-h] [--volume VOLUME] [--user USER]

optional arguments:
  -h, --help       show this help message and exit
  --volume VOLUME
  --user USER
usage: gluster-mountbroker add [-h] volume user

positional arguments:
  volume
  user

optional arguments:
  -h, --help  show this help message and exit
usage: gluster-mountbroker setup [-h] mount_root group

positional arguments:
  mount_root
  group

optional arguments:
  -h, --help  show this help message and exit

fixes: bz#1728683

Change-Id: I7eabcd2c103f01e40160ba35500b0a4e5c9f5e7a
Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
In 3 cases, there was a memory allocation and zeroing, followed
directly by populating it with content. Replaced with memory
allocation that did not zero the memory.

Change-Id: I4fbb5c924fb3a144e415d2368126b784dde760ea
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Fixed typos and reworded log messages for clarity.

Change-Id: I46f616ce7d3eb993c77a5812e8bc044e5f283354
Fixes: bz#1753859
Signed-off-by: N Balachandran <nbalacha@redhat.com>
harigowtham and others added 17 commits November 13, 2019 09:04
Problem: One of the prints in the script have been using
%i as the format for printing which doesn't work.

Fix: use %s as the format in the place of %i

Fixes: bz#1764129
Change-Id: I4480ede7bf62906ddedbe5f880a1e89c76946641
Signed-off-by: Hari Gowtham <hgowtham@redhat.com>
posix_xattr_fill() is called from several POSIX functions.
Made minor changes to it and the functions called from it:
1. Dict functions to use known lengths (dict_getn() instead of dict_get(), etc.)
2. Re-ordered some static char[] arrays, to account (hopefully)
to the frequency of the xattrs usage (based on grep in the code...)
3. Before strcmp(), check if the strings lengths match.
4. Removed some dead code.

Hopefully, no functional changes.

Change-Id: I510c0d2785e54ffe0f82c4c449782f2302d63a32
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Minor changes - remove unused functions and unused variables.
Switch dict functions to use sizen() when relevant.

Change-Id: I737ce04a4beaf4df9b1eea25a90100d315627c14
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
- libgfchangelog is simplified by removing unnecessary API Class
- Merged Agent logic into Worker instead of running Worker and Agent as
  two separate processes and maintaining RPC between Worker and Agent.
- Geo-rep command Pause and Resume will continue without any changes.
  But Agent functionality also gets paused with that.

Updates: #755
Change-Id: Ie2c00fa7dddf21f180f0649e0aaf084d29023c98
Signed-off-by: Aravinda VK <avishwan@redhat.com>
updates bz#1193929
Signed-off-by: Csaba Henk <csaba@redhat.com>
Change-Id: I12cbe1d87f60fb497654d0e13e12171940867f76
By default, the script mangles statedump records
to JSON, which has two benefits:
- easier machine processing
- more friendly with line oriented tools, as one
  line will correspond to one record

'--format=memstat' is also available which displays
memory allocation types along the size of their allocations.

Change-Id: I1685d3afcea6009fbcfafb33798f85bcd645c82f
updates: bz#1193929
Signed-off-by: Csaba Henk <csaba@redhat.com>
- Align structures
- gf_iobuf_get_pagesize() will now also return the index, reducing the need
for an additional very similar call.
- Removal of an inefficient loop I've inadvertently added previously.
It was harmless, but just inefficient.
- New pool initialization does not need to be done under lock - no
one can touch that pool yet, so no need to protect it.

Change-Id: I61c50f2f14fa79edc131e515e9615a9928ee2dca
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
It removes ~2% of the object size, and did not add meaningful checks
whatsoever or to readability (IMHO).

Change-Id: I54346c2af47dcab13bd8d5c8ad52fc765e6afad4
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
1. Both read and write tests required writing first. Either just
writing (write test) or write and then read (read test).
So the code is now unified.
2. There's no reason to read zeros from /dev/zero. Just use a
CALLOC'ed buffer.
I don't think we should read and write zeros, but I did not change
the code yet (I think compression and/or dedup will offset results)

It appears neither read-perf nor write-perf were tested, so added
basic tests for them.

Change-Id: I24b1f249fa0335ed652a8982e99c0687d940230e
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Problem: At the time of cleanup rpc object ssl specific data
         is not freeing so it has become a leak.

Solution: To avoid the leak cleanup ssl specific data at the
          time of cleanup rpc object

Credits: l17zhou <cynthia.zhou@nokia-sbell.com.cn>
Fixes: bz#1768407
Change-Id: I37f598673ae2d7a33c75f39eb8843ccc6dffaaf0
Avoid one function call to set the gfid_path in buffer

Change-Id: If9b95801b05c34d262fac9a275492c794d12bf58
Updates #748
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Problem: When one of the node is down in cluster,
rebalance status is not displaying detailed
information.

Cause: In glusterd_volume_rebalance_use_rsp_dict()
we are aggregating rsp from all the nodes into a
dictionary and sending it to cli for printing. While
assigning a index to keys we are considering all the
peers instead of considering only the peers which are
up. Because of which, index is not reaching till 1.
while parsing the rsp cli unable to find status-1
key in dictionary and going out without printing
any information.

Solution: The simplest fix for this without much
code change is to continue to look for other keys
when status-1 key is not found.

fixes: bz#1764119

Change-Id: I0062839933c9706119eb85416256eade97e976dc
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
Configure the list of gluster servers in the key
GLUSTERD_BRICK_SERVERS at the time of GETSPEC RPC CALL
and access the value in client side to update volfile
serve list so that client would be able to connect
next volfile server in case of current volfile server
is down

Updates #741
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

Change-Id: I23f36ddb92982bb02ffd83937a8bd8a2c97e8104
The last major change to the file happened 2 years 5 months back.
For the better health of the project, and to motivate the active
contributors, and keeping the file up-to-date with who is working
on what, it is critical for us to keep refreshing the file once
every 2 years (at least).

This is one such effort.

Highlighted changes:

* Moving both Jeff and Vijay from Maintainers to 'Special Thanks'
  section.
* Moving Amar and Xavi as Maintainers, and Atin as Peer, mainly
  looking at the activities (patches, reviews, merges) across
  the codebase, and also the contributions in discussions.
* Moving Shyam and Niels out of Peer list highlighting the 6+
  months of changed priorities.

* Changed Xavi's contact from Datalab's to Red Hat's
* Changed Amar's contact from Red Hat's to his Personal.

* Removed:
  - Block Device (BD)
  - Experimental (RIO / JBR)
  - Gluster Object
  - Gluster Hadoop Plugin
  - Nagios Monitoring
  - GlusterD2 (and renamed glusterd1 to glusterd)

* Marked 'NFS (gluster-NFS)' component as 'Deprecated / Orphan'

* Moved few people, who stopped major contribution, mainly
  because of changing companies, changing projects inside
  their own company etc, to 'Special Thanks' section.

* Additions:
  - Kotresh added as Peer in Posix
  - Raghavendra Gowdappa added as Peer in FUSE Bridge,
    Maintainer of readdir-ahead
  - Nithya Added as peer in readdir-ahead
  - Hari Gowtham as Peer in Quota, Maintainer in Releases
  - Yaniv Kaul for xxhash
  - Rinku Kothiya added as Peer in releases.
  - Sheetal Pamecha and Shwetha Acharya added as Peers in
    packaging for Debian/Ubuntu and OpenSUSE.

Updates: bz#1193929
Change-Id: I0d6eccfee4306e26cdbc2b94f43ac493e2c25a61
Signed-off-by: Amar Tumballi <amarts@gmail.com>
To avoid process "TRANSLATOR INFO" "BARRIER" if graph is not ready,
also see commit ee630e2.

Updates: bz#1769712
Signed-off-by: Xie Changlong <xiechanglong@cmss.chinamobile.com>
Change-Id: Ibd446a35962206d3689667cda7e6712d72e4ec2f
fixes: bz#1771365
Change-Id: Id31687c4704cbbfea1084f912ed0420c7d1cc1ec
Signed-off-by: Xi Jinyu <xijinyu@cmss.chinamobile.com>
With this patch, one can now use '--enable-tcmalloc' while running
configure, and they can see that their glusterfs is linked with
libtcmalloc.

[atumball@local build]$ ldd /usr/local/sbin/glusterfs | grep tcmalloc
	libtcmalloc.so.4 => /lib64/libtcmalloc.so.4 (0x00007feec0b87000)

Once we establish a standard performance number with and without this option,
we will see how to make it default.

Updates: #237
Change-Id: I3377f57bfe4e17f101a212e1914a6d3c1687d528
Signed-off-by: Amar Tumballi <amarts@redhat.com>
@amarts amarts closed this Mar 24, 2020
amarts pushed a commit that referenced this pull request Jun 22, 2020
Found with GCC ASan:

Direct leak of 202 byte(s) in 2 object(s) allocated from:
    #0 0x7fc6c6ef0667 in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb0667)
    #1 0x7fc6c6bd145b in __gf_malloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:175
    #2 0x7fc6c6bd17a3 in gf_vasprintf /path/to/glusterfs/libglusterfs/src/mem-pool.c:223
    #3 0x7fc6c6bd1993 in gf_asprintf /path/to/glusterfs/libglusterfs/src/mem-pool.c:243
    #4 0x7fc6b0dc92f6 in init /path/to/glusterfs/xlators/cluster/afr/src/afr.c:590
    ...

Change-Id: I29feb1d30a045fb70472758e6ed4e195888090b2
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1278
amarts pushed a commit that referenced this pull request Jun 22, 2020
Found with GCC's address sanitizer:

==67190==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 24624 byte(s) in 6 object(s) allocated from:
    #0 0x7f62535c0837 in __interceptor_calloc (/usr/lib64/libasan.so.6+0xb0837)
    #1 0x7f62532a1690 in __gf_default_calloc glusterfs/mem-pool.h:122
    #2 0x7f62532a20ca in __gf_calloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:144
    #3 0x7f62532c8128 in gf_store_iter_new /path/to/glusterfs/libglusterfs/src/store.c:511
    #4 0x7f623e2f9ed7 in glusterd_store_retrieve_bricks /path/to/glusterfs/xlators/mgmt/glusterd/src/glusterd-store.c:2389

Direct leak of 8208 byte(s) in 2 object(s) allocated from:
    #0 0x7f62535c0837 in __interceptor_calloc (/usr/lib64/libasan.so.6+0xb0837)
    #1 0x7f62532a1690 in __gf_default_calloc glusterfs/mem-pool.h:122
    #2 0x7f62532a20ca in __gf_calloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:144
    #3 0x7f62532c8128 in gf_store_iter_new /path/to/glusterfs/libglusterfs/src/store.c:511
    #4 0x7f623e2f9cf0 in glusterd_store_retrieve_bricks /path/to/glusterfs/xlators/mgmt/glusterd/src/glusterd-store.c:2363
    #5 0x7fff5cb70bcf  ([stack]+0x15bcf)
    #6 0x7f623e309113 in glusterd_store_retrieve_volumes /path/to/glusterfs/xlators/mgmt/glusterd/src/glusterd-store.c:3505
    #7 0xfffeb96e61d  (<unknown module>)
    #8 0x7f623e4586d7  (/usr/lib64/glusterfs/9dev/xlator/mgmt/glusterd.so+0x2f86d7)

Change-Id: I9b2a543dc095f4fa739cd664fd4d608bf8c87d60
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1263
amarts pushed a commit that referenced this pull request Jun 22, 2020
Found with GCC ThreadSanitizer:

WARNING: ThreadSanitizer: use of an invalid mutex (e.g. uninitialized or destroyed) (pid=188590)
    #0 pthread_mutex_lock <null> (libtsan.so.0+0x528ac)
    #1 client_ctx_del /path/to/glusterfs/libglusterfs/src/client_t.c:535 (libglusterfs.so.0+0xc681a)
    #2 client_destroy_cbk /path/to/glusterfs/xlators/protocol/server/src/server.c:944 (server.so+0xaf6e)
    #3 gf_client_destroy_recursive /path/to/glusterfs/libglusterfs/src/client_t.c:295 (libglusterfs.so.0+0xc5058)
    #4 client_destroy /path/to/glusterfs/libglusterfs/src/client_t.c:330 (libglusterfs.so.0+0xc60e4)
    ...

  Location is heap block of size 272 at 0x7b440001a180 allocated by thread T7:
    #0 calloc <null> (libtsan.so.0+0x3075a)
    #1 __gf_calloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:151 (libglusterfs.so.0+0x6e42b)
    #2 gf_client_get /path/to/glusterfs/libglusterfs/src/client_t.c:155 (libglusterfs.so.0+0xc571a)
    ...

The problem is that client_destroy() may call client_ctx_del() (which attempts to lock
'sratch_ctx.lock') via recursive deletion from gf_client_destroy_recursive(), so
destroying mutex before entering recursive deletion is an error. It should be destroyed
later - just before the client context is freed.

Change-Id: I730a628714d2b404e3f019ae552403da16b51b68
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1285
amarts pushed a commit that referenced this pull request Jun 22, 2020
Found with GCC UBsan:

rpcsvc.c:102:36: runtime error: passing zero to ctz(), which is not a valid argument
    #0 0x7fcd1ff6faa4 in rpcsvc_get_free_queue_index /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:102
    #1 0x7fcd1ff81e12 in rpcsvc_handle_rpc_call /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:837
    #2 0x7fcd1ff833ad in rpcsvc_notify /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:1000
    #3 0x7fcd1ff8829d in rpc_transport_notify /path/to/glusterfs/rpc/rpc-lib/src/rpc-transport.c:520
    #4 0x7fcd0dd72f16 in socket_event_poll_in_async /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2502
    #5 0x7fcd0dd8986a in gf_async ../../../../libglusterfs/src/glusterfs/async.h:189
    #6 0x7fcd0dd8986a in socket_event_poll_in /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2543
    #7 0x7fcd0dd8986a in socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2934
    #8 0x7fcd0dd8986a in socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2854
    #9 0x7fcd2048aff7 in event_dispatch_epoll_handler /path/to/glusterfs/libglusterfs/src/event-epoll.c:640
    #10 0x7fcd2048aff7 in event_dispatch_epoll_worker /path/to/glusterfs/libglusterfs/src/event-epoll.c:751
    ...

Fix, simplify, and prefer 'unsigned long' as underlying bitmap type.

Change-Id: If3f24dfe7bef8bc7a11a679366e219a73caeb9e4
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1283
amarts pushed a commit that referenced this pull request Jun 22, 2020
Found with GCC ThreadSanitizer:

WARNING: ThreadSanitizer: data race (pid=287943)
  Write of size 4 at 0x00000047dfa0 by thread T4:
    #0 cli_rpc_notify /path/to/glusterfs/cli/src/cli.c:313 (gluster+0x40a6df)
    #1 rpc_clnt_handle_disconnect /path/to/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:821 (libgfrpc.so.0+0x13f04)
    #2 rpc_clnt_notify /path/to/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:882 (libgfrpc.so.0+0x13f04)
    #3 rpc_transport_notify /path/to/glusterfs/rpc/rpc-lib/src/rpc-transport.c:520 (libgfrpc.so.0+0xf070)
    #4 socket_event_poll_err /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:1364 (socket.so+0x812c)
    #5 socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2958 (socket.so+0xc453)
    #6 socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2854 (socket.so+0xc453)
    #7 event_dispatch_epoll_handler /path/to/glusterfs/libglusterfs/src/event-epoll.c:640 (libglusterfs.so.0+0xcaf23)
    #8 event_dispatch_epoll_worker /path/to/glusterfs/libglusterfs/src/event-epoll.c:751 (libglusterfs.so.0+0xcaf23)
    #9 <null> <null> (libtsan.so.0+0x2d33f)

  Previous read of size 4 at 0x00000047dfa0 by thread T3 (mutexes: write M3587):
    #0 cli_cmd_await_connected /path/to/glusterfs/cli/src/cli-cmd.c:321 (gluster+0x40ca37)
    #1 cli_cmd_process /path/to/glusterfs/cli/src/cli-cmd.c:123 (gluster+0x40cc74)
    #2 cli_batch /path/to/glusterfs/cli/src/input.c:29 (gluster+0x40c2b9)
    #3 <null> <null> (libtsan.so.0+0x2d33f)

  Location is global 'connected' of size 4 at 0x00000047dfa0 (gluster+0x00000047dfa0)

Change-Id: Ie85a8a80a2c5b82252c0c1d45e68ebe9938da2eb
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1311
amarts pushed a commit that referenced this pull request Aug 12, 2020
Found with GCC ASan:

Direct leak of 202 byte(s) in 2 object(s) allocated from:
    #0 0x7fc6c6ef0667 in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb0667)
    #1 0x7fc6c6bd145b in __gf_malloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:175
    #2 0x7fc6c6bd17a3 in gf_vasprintf /path/to/glusterfs/libglusterfs/src/mem-pool.c:223
    #3 0x7fc6c6bd1993 in gf_asprintf /path/to/glusterfs/libglusterfs/src/mem-pool.c:243
    #4 0x7fc6b0dc92f6 in init /path/to/glusterfs/xlators/cluster/afr/src/afr.c:590
    ...

Change-Id: I29feb1d30a045fb70472758e6ed4e195888090b2
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1278
amarts pushed a commit that referenced this pull request Aug 12, 2020
Found with GCC's address sanitizer:

==67190==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 24624 byte(s) in 6 object(s) allocated from:
    #0 0x7f62535c0837 in __interceptor_calloc (/usr/lib64/libasan.so.6+0xb0837)
    #1 0x7f62532a1690 in __gf_default_calloc glusterfs/mem-pool.h:122
    #2 0x7f62532a20ca in __gf_calloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:144
    #3 0x7f62532c8128 in gf_store_iter_new /path/to/glusterfs/libglusterfs/src/store.c:511
    #4 0x7f623e2f9ed7 in glusterd_store_retrieve_bricks /path/to/glusterfs/xlators/mgmt/glusterd/src/glusterd-store.c:2389

Direct leak of 8208 byte(s) in 2 object(s) allocated from:
    #0 0x7f62535c0837 in __interceptor_calloc (/usr/lib64/libasan.so.6+0xb0837)
    #1 0x7f62532a1690 in __gf_default_calloc glusterfs/mem-pool.h:122
    #2 0x7f62532a20ca in __gf_calloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:144
    #3 0x7f62532c8128 in gf_store_iter_new /path/to/glusterfs/libglusterfs/src/store.c:511
    #4 0x7f623e2f9cf0 in glusterd_store_retrieve_bricks /path/to/glusterfs/xlators/mgmt/glusterd/src/glusterd-store.c:2363
    #5 0x7fff5cb70bcf  ([stack]+0x15bcf)
    #6 0x7f623e309113 in glusterd_store_retrieve_volumes /path/to/glusterfs/xlators/mgmt/glusterd/src/glusterd-store.c:3505
    #7 0xfffeb96e61d  (<unknown module>)
    #8 0x7f623e4586d7  (/usr/lib64/glusterfs/9dev/xlator/mgmt/glusterd.so+0x2f86d7)

Change-Id: I9b2a543dc095f4fa739cd664fd4d608bf8c87d60
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1263
amarts pushed a commit that referenced this pull request Aug 12, 2020
Found with GCC ThreadSanitizer:

WARNING: ThreadSanitizer: use of an invalid mutex (e.g. uninitialized or destroyed) (pid=188590)
    #0 pthread_mutex_lock <null> (libtsan.so.0+0x528ac)
    #1 client_ctx_del /path/to/glusterfs/libglusterfs/src/client_t.c:535 (libglusterfs.so.0+0xc681a)
    #2 client_destroy_cbk /path/to/glusterfs/xlators/protocol/server/src/server.c:944 (server.so+0xaf6e)
    #3 gf_client_destroy_recursive /path/to/glusterfs/libglusterfs/src/client_t.c:295 (libglusterfs.so.0+0xc5058)
    #4 client_destroy /path/to/glusterfs/libglusterfs/src/client_t.c:330 (libglusterfs.so.0+0xc60e4)
    ...

  Location is heap block of size 272 at 0x7b440001a180 allocated by thread T7:
    #0 calloc <null> (libtsan.so.0+0x3075a)
    #1 __gf_calloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:151 (libglusterfs.so.0+0x6e42b)
    #2 gf_client_get /path/to/glusterfs/libglusterfs/src/client_t.c:155 (libglusterfs.so.0+0xc571a)
    ...

The problem is that client_destroy() may call client_ctx_del() (which attempts to lock
'sratch_ctx.lock') via recursive deletion from gf_client_destroy_recursive(), so
destroying mutex before entering recursive deletion is an error. It should be destroyed
later - just before the client context is freed.

Change-Id: I730a628714d2b404e3f019ae552403da16b51b68
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1285
amarts pushed a commit that referenced this pull request Aug 12, 2020
Found with GCC UBsan:

rpcsvc.c:102:36: runtime error: passing zero to ctz(), which is not a valid argument
    #0 0x7fcd1ff6faa4 in rpcsvc_get_free_queue_index /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:102
    #1 0x7fcd1ff81e12 in rpcsvc_handle_rpc_call /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:837
    #2 0x7fcd1ff833ad in rpcsvc_notify /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:1000
    #3 0x7fcd1ff8829d in rpc_transport_notify /path/to/glusterfs/rpc/rpc-lib/src/rpc-transport.c:520
    #4 0x7fcd0dd72f16 in socket_event_poll_in_async /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2502
    #5 0x7fcd0dd8986a in gf_async ../../../../libglusterfs/src/glusterfs/async.h:189
    #6 0x7fcd0dd8986a in socket_event_poll_in /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2543
    #7 0x7fcd0dd8986a in socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2934
    #8 0x7fcd0dd8986a in socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2854
    #9 0x7fcd2048aff7 in event_dispatch_epoll_handler /path/to/glusterfs/libglusterfs/src/event-epoll.c:640
    #10 0x7fcd2048aff7 in event_dispatch_epoll_worker /path/to/glusterfs/libglusterfs/src/event-epoll.c:751
    ...

Fix, simplify, and prefer 'unsigned long' as underlying bitmap type.

Change-Id: If3f24dfe7bef8bc7a11a679366e219a73caeb9e4
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1283
amarts pushed a commit that referenced this pull request Aug 12, 2020
Found with GCC ThreadSanitizer:

WARNING: ThreadSanitizer: data race (pid=287943)
  Write of size 4 at 0x00000047dfa0 by thread T4:
    #0 cli_rpc_notify /path/to/glusterfs/cli/src/cli.c:313 (gluster+0x40a6df)
    #1 rpc_clnt_handle_disconnect /path/to/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:821 (libgfrpc.so.0+0x13f04)
    #2 rpc_clnt_notify /path/to/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:882 (libgfrpc.so.0+0x13f04)
    #3 rpc_transport_notify /path/to/glusterfs/rpc/rpc-lib/src/rpc-transport.c:520 (libgfrpc.so.0+0xf070)
    #4 socket_event_poll_err /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:1364 (socket.so+0x812c)
    #5 socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2958 (socket.so+0xc453)
    #6 socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2854 (socket.so+0xc453)
    #7 event_dispatch_epoll_handler /path/to/glusterfs/libglusterfs/src/event-epoll.c:640 (libglusterfs.so.0+0xcaf23)
    #8 event_dispatch_epoll_worker /path/to/glusterfs/libglusterfs/src/event-epoll.c:751 (libglusterfs.so.0+0xcaf23)
    #9 <null> <null> (libtsan.so.0+0x2d33f)

  Previous read of size 4 at 0x00000047dfa0 by thread T3 (mutexes: write M3587):
    #0 cli_cmd_await_connected /path/to/glusterfs/cli/src/cli-cmd.c:321 (gluster+0x40ca37)
    #1 cli_cmd_process /path/to/glusterfs/cli/src/cli-cmd.c:123 (gluster+0x40cc74)
    #2 cli_batch /path/to/glusterfs/cli/src/input.c:29 (gluster+0x40c2b9)
    #3 <null> <null> (libtsan.so.0+0x2d33f)

  Location is global 'connected' of size 4 at 0x00000047dfa0 (gluster+0x00000047dfa0)

Change-Id: Ie85a8a80a2c5b82252c0c1d45e68ebe9938da2eb
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1311
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet