Skip to content

Commit

Permalink
IMPALA-3501: ee tests: detect build type and support different timeou…
Browse files Browse the repository at this point in the history
…ts based on the same

Impala compiled with the address sanitizer, or compiled with code
coverage, runs through code paths much slower. This can cause end-to-end
tests that pass on a non-ASAN or non-code coverage build to fail. Some
examples include IMPALA-2721, IMPALA-2973, and IMPALA-3501. These
classes of failures tend always to involve some time-sensitive condition
that fails to succeed under such "slow builds".

The works-around in the past have been to simply increase the timeout.
The problem with this approach is that it relaxes conditions for tests
on builds that see the field--i.e., release builds--for builds that
never will--i.e., ASAN and code coverage.

This patch fixes that problem by allowing test authors to set timeout
values based on a *specific* build type. The author may choose timeouts
with a default value, and different timeouts for either or both
so-called "slow builds": ASAN and code coverage.

We detect the so-called "specific build type" by inspecting the binary
expected to be at the path under test. This removes the need to make
alterations to Impala itself. The inspection done is to read the DWARF
information in the binary, specifically the first compile unit's
DW_AT_producer and DW_AT_name DIE attributes. We employ a heuristic
based on these attributes' values to guess the build type. If we can't
determine the build type, we will assume it's a debug build. More
information on this is in IMPALA-3501.

A quick summary of the changes follows:

1. Move some of the logic in tests.common.skip to tests.common.environ
   and rework some skip marks to be more precise.

2. Add Pyelftools for convenient deserialization of DWARF

3. Our Pyelftools usage requires collections.OrderedDict, which isn't in
   python2.6; also add Monkeypatch to handle this.

4. Add ImpalaBuild and specific_build_type_timeout, the core of the new
   functionality

5. Fix the statestore tests that only fail under code coverage (the
   basis for IMPALA-3501)

Testing:

The tests that were previously, reliably failing under code coverage now
pass. I also ran perfunctory tests of debug, release, and ASAN builds to
ensure our detection of build type is working. This patch will *not*
turn the code coverage builds green; there are other tests that fail,
and fixing all of them here is out of the scope of this patch.

Change-Id: I2b675c04c54e36d404fd9e5a6cf085fb8d6d0e47
Reviewed-on: http://gerrit.cloudera.org:8080/3156
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
  • Loading branch information
mikesbrown authored and Internal Jenkins committed May 26, 2016
1 parent 60238ed commit bbad07d
Show file tree
Hide file tree
Showing 8 changed files with 267 additions and 34 deletions.
Binary file added infra/python/deps/monkeypatch-0.1rc3.zip
Binary file not shown.
Binary file added infra/python/deps/pyelftools-0.23.tar.gz
Binary file not shown.
2 changes: 2 additions & 0 deletions infra/python/deps/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,13 @@ impyla == 0.11.2
# before thirdparty is built thrift will be installed anyways.
thrift == 0.9.0
thrift_sasl == 0.1.0
monkeypatch == 0.1rc3
ordereddict == 1.1
pexpect == 3.3
pg8000 == 1.10.2
prettytable == 0.7.2
psutil == 0.7.1
pyelftools == 0.23
pyparsing == 2.0.3
pytest == 2.7.2
py == 1.4.30
Expand Down
246 changes: 246 additions & 0 deletions tests/common/environ.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
# Copyright (c) 2016 Cloudera, Inc. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import logging
import os
import re

try:
from elftools.elf.elffile import ELFFile
except ImportError as e:
# Handle pre-python2.7s' lack of collections.OrderedDict, which we include in
# impala-python as ordereddict.OrderedDict.
if 'cannot import name OrderedDict' == str(e):
import monkeypatch
from ordereddict import OrderedDict
monkeypatch.patch(OrderedDict, 'collections', 'OrderedDict')
from elftools.elf.elffile import ELFFile
else:
raise e


LOG = logging.getLogger('tests.common.environ')


# See if Impala is running with legacy aggregations and/or hash joins. This is kind of a
# hack. It would be better to poll Impala whether it is doing so.
test_start_cluster_args = os.environ.get("TEST_START_CLUSTER_ARGS", "")
old_agg_regex = "enable_partitioned_aggregation=false"
old_hash_join_regex = "enable_partitioned_hash_join=false"
USING_OLD_AGGS_JOINS = re.search(old_agg_regex, test_start_cluster_args) is not None or \
re.search(old_hash_join_regex, test_start_cluster_args) is not None

# Find the likely BuildType of the running Impala. Assume it's found through the path
# $IMPALA_HOME/be/build/latest as a fallback.
impala_home = os.environ.get("IMPALA_HOME", "")
build_type_arg_regex = re.compile(r'--build_type=(\w+)', re.I)
build_type_arg_search_result = re.search(build_type_arg_regex, test_start_cluster_args)

if build_type_arg_search_result is not None:
build_type_dir = build_type_search_result.groups()[0].lower()
else:
build_type_dir = 'latest'

# Resolve any symlinks in the path.
impalad_basedir = \
os.path.realpath(os.path.join(impala_home, 'be/build', build_type_dir)).rstrip('/')

IMPALAD_PATH = os.path.join(impalad_basedir, 'service', 'impalad')


class SpecificImpaladBuildTypes:
"""
Represent a specific build type. In reality, there 5 specific build types. These
specific build types are needed by Python test code.
The specific build types and their *most distinguishing* compiler options are:
1. ADDRESS_SANITIZER (clang -fsanitize=address)
2. DEBUG (gcc -ggdb)
3. DEBUG_CODE_COVERAGE (gcc -ggdb -ftest-coverage)
4. RELEASE (gcc)
5. RELEASE_CODE_COVERAGE (gcc -ftest-coverage)
"""
# ./buildall.sh -asan
ADDRESS_SANITIZER = 'address_sanitizer'
# ./buildall.sh
DEBUG = 'debug'
# ./buildall.sh -codecoverage
DEBUG_CODE_COVERAGE = 'debug_code_coverage'
# ./buildall.sh -release
RELEASE = 'release'
# ./buildall.sh -release -codecoverage
RELEASE_CODE_COVERAGE = 'release_code_coverage'


class ImpaladBuild(object):
"""
Acquires and provides characteristics about the way the Impala under test was compiled
and its likely effects on its responsiveness to automated test timings.
"""
def __init__(self, impalad_path):
self.impalad_path = impalad_path
die_name, die_producer = self._get_impalad_dwarf_info()
self._set_impalad_build_type(die_name, die_producer)

@property
def specific_build_type(self):
"""
Return the correct SpecificImpaladBuildTypes for the Impala under test.
"""
return self._specific_build_type

def has_code_coverage(self):
"""
Return whether the Impala under test was compiled with code coverage enabled.
"""
return self.specific_build_type in (SpecificImpaladBuildTypes.DEBUG_CODE_COVERAGE,
SpecificImpaladBuildTypes.RELEASE_CODE_COVERAGE)

def is_asan(self):
"""
Return whether the Impala under test was compiled with ASAN.
"""
return self.specific_build_type == SpecificImpaladBuildTypes.ADDRESS_SANITIZER

def is_dev(self):
"""
Return whether the Impala under test is a development build (i.e., any debug or ASAN
build).
"""
return self.specific_build_type in (
SpecificImpaladBuildTypes.ADDRESS_SANITIZER, SpecificImpaladBuildTypes.DEBUG,
SpecificImpaladBuildTypes.DEBUG_CODE_COVERAGE)

def runs_slowly(self):
"""
Return whether the Impala under test "runs slowly". For our purposes this means
either compiled with code coverage enabled or ASAN.
"""
return self.has_code_coverage() or self.is_asan()

def _get_impalad_dwarf_info(self):
"""
Read the impalad_path ELF binary, which is supposed to contain DWARF, and read the
DWARF to understand the compiler options. Return a 2-tuple of the two useful DIE
attributes of the first compile unit: the DW_AT_name and DW_AT_producer. If
something goes wrong doing this, log a warning and return nothing.
"""
# Some useful references:
# - be/CMakeLists.txt
# - gcc(1), especially -grecord-gcc-switches, -g, -ggdb, -gdwarf-2
# - readelf(1)
# - general reading about DWARF
# A useful command for exploration without having to wade through many bytes is:
# readelf --debug-dump=info --dwarf-depth=1 impalad
# The DWARF lines are long, raw, and nasty; I'm hesitant to paste them here, so
# curious readers are highly encouraged to try the above, or read IMPALA-3501.
die_name = None
die_producer = None
try:
with open(self.impalad_path, 'rb') as fh:
impalad_elf = ELFFile(fh)
if impalad_elf.has_dwarf_info():
dwarf_info = impalad_elf.get_dwarf_info()
# We only need the first CU, hence the unconventional use of the iterator
# protocol.
cu_iterator = dwarf_info.iter_CUs()
first_cu = next(cu_iterator)
top_die = first_cu.get_top_DIE()
die_name = top_die.attributes['DW_AT_name'].value
die_producer = top_die.attributes['DW_AT_producer'].value
except Exception as e:
LOG.warn('Failure to read DWARF info from {0}: {1}'.format(self.impalad_path,
str(e)))
return die_name, die_producer

def _set_impalad_build_type(self, die_name, die_producer):
"""
Use a heuristic based on the DW_AT_producer and DW_AT_name of the first compile
unit, as returned by _get_impalad_dwarf_info(), to figure out which of 5 supported
builds of impalad we're dealing with. If the heuristic can't determine, fall back to
assuming a debug build and log a warning.
"""
ASAN_CU_NAME = 'asan_preinit.cc'
NON_ASAN_CU_NAME = 'daemon-main.cc'
GDB_FLAG = '-ggdb'
CODE_COVERAGE_FLAG = '-ftest-coverage'

if die_name is None or die_producer is None:
LOG.warn('Not enough DWARF info in {0} to determine build type; choosing '
'DEBUG'.format(self.impalad_path))
self._specific_build_type = SpecificImpaladBuildTypes.DEBUG
return

is_debug = GDB_FLAG in die_producer
specific_build_type = SpecificImpaladBuildTypes.DEBUG

if die_name.endswith(ASAN_CU_NAME):
specific_build_type = SpecificImpaladBuildTypes.ADDRESS_SANITIZER
elif not die_name.endswith(NON_ASAN_CU_NAME):
LOG.warn('Unexpected DW_AT_name in first CU: {0}; choosing '
'DEBUG'.format(die_name))
specific_build_type = SpecificImpaladBuildTypes.DEBUG
elif CODE_COVERAGE_FLAG in die_producer:
if is_debug:
specific_build_type = SpecificImpaladBuildTypes.DEBUG_CODE_COVERAGE
else:
specific_build_type = SpecificImpaladBuildTypes.RELEASE_CODE_COVERAGE
else:
if is_debug:
specific_build_type = SpecificImpaladBuildTypes.DEBUG
else:
specific_build_type = SpecificImpaladBuildTypes.RELEASE

self._specific_build_type = specific_build_type


IMPALAD_BUILD = ImpaladBuild(IMPALAD_PATH)


def specific_build_type_timeout(
default_timeout, slow_build_timeout=None, asan_build_timeout=None,
code_coverage_build_timeout=None):
"""
Return a test environment-specific timeout based on the sort of
SpecificImpalaBuildType under test.
Required parameter: default_timeout - default timeout value. This applies when Impala is
a standard release or debug build, or if no other timeouts are specified.
Optional parameters:
slow_build_timeout - timeout to use if we're running against *any* build known to be
slow. If specified, this will preempt default_timeout if Impala is expected to be
"slow". You can use this as a shorthand in lieu of specifying all of the following
parameters.
The parameters below correspond to specific build types. These preempt both
slow_build_timeout and default_timeout, if the Impala under test is a build of the
applicable type:
asan_build_timeout - timeout to use if Impala with ASAN is running
code_coverage_build_timeout - timeout to use if Impala with code coverage is running
(both debug and release code coverage)
"""

if IMPALAD_BUILD.is_asan() and asan_build_timeout is not None:
timeout_val = asan_build_timeout
elif IMPALAD_BUILD.has_code_coverage() and code_coverage_build_timeout is not None:
timeout_val = code_coverage_build_timeout
elif IMPALAD_BUILD.runs_slowly() and slow_build_timeout is not None:
timeout_val = slow_build_timeout
else:
timeout_val = default_timeout
return timeout_val
37 changes: 8 additions & 29 deletions tests/common/skip.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@
import os
import pytest
from functools import partial

from tests.common.environ import IMPALAD_BUILD, USING_OLD_AGGS_JOINS
from tests.util.filesystem_utils import IS_DEFAULT_FS, IS_S3, IS_ISILON, IS_LOCAL


Expand Down Expand Up @@ -66,22 +68,14 @@ class SkipIfIsilon:
reason="This Isilon issue has yet to be triaged.")
jira = partial(pytest.mark.skipif, IS_ISILON)

# TODO: looking at TEST_START_CLUSTER_ARGS is a hack. It would be better to add an option
# to pytest.
test_start_cluster_args = os.environ.get("TEST_START_CLUSTER_ARGS","")
old_agg_regex = "enable_partitioned_aggregation=false"
old_hash_join_regex = "enable_partitioned_hash_join=false"
using_old_aggs_joins = re.search(old_agg_regex, test_start_cluster_args) is not None or \
re.search(old_hash_join_regex, test_start_cluster_args) is not None

class SkipIfOldAggsJoins:
nested_types = pytest.mark.skipif(using_old_aggs_joins,
nested_types = pytest.mark.skipif(USING_OLD_AGGS_JOINS,
reason="Nested types not supported with old aggs and joins")
passthrough_preagg = pytest.mark.skipif(using_old_aggs_joins,
passthrough_preagg = pytest.mark.skipif(USING_OLD_AGGS_JOINS,
reason="Passthrough optimization not implemented by old agg")
unsupported = pytest.mark.skipif(using_old_aggs_joins,
unsupported = pytest.mark.skipif(USING_OLD_AGGS_JOINS,
reason="Query unsupported with old aggs and joins")
requires_spilling = pytest.mark.skipif(using_old_aggs_joins,
requires_spilling = pytest.mark.skipif(USING_OLD_AGGS_JOINS,
reason="Test case requires spilling to pass")

class SkipIfLocal:
Expand Down Expand Up @@ -111,21 +105,6 @@ class SkipIfLocal:
root_path = pytest.mark.skipif(IS_LOCAL,
reason="Tests rely on the root directory")

# Try to derive the build type. Assume it's 'latest' by default.
impala_home = os.environ.get("IMPALA_HOME", "")
build_type_regex = re.compile(r'--build_type=(\w+)', re.I)
build_type_search_result = re.search(build_type_regex, test_start_cluster_args)

if build_type_search_result is not None:
build_type = build_type_search_result.groups()[0].lower()
else:
build_type = 'latest'

# Resolve any symlinks in the path.
impalad_basedir = \
os.path.realpath(os.path.join(impala_home, 'be/build', build_type)).rstrip('/')
debug_build = os.path.basename(impalad_basedir).lower() == 'debug'

class SkipIfNotDebugBuild:
debug_only = pytest.mark.skipif(not debug_build,
class SkipIfBuildType:
not_dev_build = pytest.mark.skipif(not IMPALAD_BUILD.is_dev(),
reason="Tests depends on debug build startup option.")
4 changes: 2 additions & 2 deletions tests/custom_cluster/test_alloc_fail.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@
import pytest
from copy import deepcopy
from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
from tests.common.skip import SkipIfNotDebugBuild
from tests.common.skip import SkipIfBuildType

@SkipIfNotDebugBuild.debug_only
@SkipIfBuildType.not_dev_build
class TestAllocFail(CustomClusterTestSuite):
"""Tests for handling malloc() failure for UDF/UDA"""

Expand Down
4 changes: 2 additions & 2 deletions tests/custom_cluster/test_exchange_delays.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@

import pytest
from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
from tests.common.skip import SkipIfNotDebugBuild
from tests.common.skip import SkipIfBuildType

@SkipIfNotDebugBuild.debug_only
@SkipIfBuildType.not_dev_build
class TestExchangeDelays(CustomClusterTestSuite):
"""Tests for handling delays in finding data stream receivers"""

Expand Down
8 changes: 7 additions & 1 deletion tests/statestore/test_statestore.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@
import urllib2
import json

from tests.common.environ import specific_build_type_timeout

from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
Expand Down Expand Up @@ -59,6 +61,9 @@ def get_statestore_subscribers(host='localhost', port=25010):
DEFAULT_UPDATE_STATE_RESPONSE = TUpdateStateResponse(status=STATUS_OK, topic_updates=[],
skipped=False)

# IMPALA-3501: the timeout needs to be higher in code coverage builds
WAIT_FOR_FAILURE_TIMEOUT = specific_build_type_timeout(40, code_coverage_build_timeout=60)

class WildcardServerSocket(TSocket.TSocketBase, TTransport.TServerTransportBase):
"""Specialised server socket that binds to a random port at construction"""
def __init__(self, host=None, port=0):
Expand Down Expand Up @@ -292,7 +297,8 @@ def wait_for_update(self, count=None):
finally:
self.update_event.release()

def wait_for_failure(self, timeout=40):

def wait_for_failure(self, timeout=WAIT_FOR_FAILURE_TIMEOUT):
"""Waits until this subscriber no longer appears in the statestore's subscriber
list. If 'timeout' seconds pass, throws an exception."""
start = time.time()
Expand Down

0 comments on commit bbad07d

Please sign in to comment.