testutils/iotlab.py: Randomize the selected node #298

MrKevinWeiss · 2024-01-29T11:14:59Z

It seems there have been some failures mainly due to infrastructure. Specifically the samr21-xpro failing to flash will cause many reruns with the same faulty hardware.

Previously it would just take the first available node in the list, which is deterministic but doesn't help with flakey test reruns. This may cause an issue with distance to other nodes, but if random selection of nodes becomes a problem we would have to introduce node pairing lists... Which is a bit more work.

This is at least a first step.

MrKevinWeiss · 2024-01-29T11:28:51Z

Running tox -e test -- -k "spec04 and task02" locally prevented the selection of the failing samr21-11.saclay.iot-lab.info and instead selected samr21-13.saclay.iot-lab.info which is why this passed.

MrKevinWeiss · 2024-01-30T10:51:00Z

This doesn't actually seem to trigger in all tests at least...

It would appear that

    def _submit(self, site, duration):
        """Submit an experiment with required nodes"""
        api = Api(*self.user_credentials())
        resources = []
        for ctrl in self.ctrls:
            if ctrl.env.get('IOTLAB_NODE') is not None:
                resources.append(exp_resources([ctrl.env.get('IOTLAB_NODE')]))
            elif ctrl.board() is not None:
                board = IoTLABExperiment._archi_from_board(ctrl.board())
                alias = AliasNodes(1, site, board)
                resources.append(exp_resources(alias))
            else:
                raise ValueError("neither BOARD or IOTLAB_NODE are set")
        return submit_experiment(api, self.name, duration, resources)['id']

Is responsible for selecting the nodes.

MrKevinWeiss · 2024-01-30T17:21:19Z

So it must have previously just had the bad nodes blocked... I force pushed and update that gets a list of all available nodes that fit our requirements, then randomly selects them... Maybe we would want to add a flag to use this on not...

MrKevinWeiss · 2024-01-30T17:22:25Z

hmmm I still have to adjust some tests

miri64

This might complicate things, but since you are getting the node information anyway from the site (which IIRC also include the position of the node), would it make sense to have some kind of distance heuristic in the random selection?

testutils/iotlab.py

MrKevinWeiss · 2024-01-30T17:52:48Z

would it make sense to have some kind of distance heuristic in the random selection?

Yes, I can look into that but I imagine it would already complicate an complicated process. Maybe a simpler solution would be to only randomize boards that don't report error conditions (such as m3 boards), typically the grouping of the "special" boards are pretty close together anyways.

Also if it randomly selects a poor choice we can rerun anyways.

Then I dismiss my review for now

MrKevinWeiss · 2024-01-31T09:55:27Z

I made the change to only randomize non-m3 nodes... Also IoTlabs fixed the samrs so it is hard for me to reproduce the failure but easy to show it selects random nodes.

It seems there have been some failures mainly due to infrastructure. Specifically the samr21-xpro failing to flash will cause many reruns with the same faulty hardware. Previously it would just take the first available node in the list, which is deterministic but doesn't help with flakey test reruns. This may cause an issue with distance to other nodes, but if random selection of nodes becomes a problem we would have to introduce node pairing lists... Which is a bit more work. This is at least a first step.

MrKevinWeiss · 2024-04-18T13:22:27Z

I think we should have this in @Teufelchen1

Currently contiki is always selecting the same nodes and they are not working...

MrKevinWeiss · 2024-04-18T13:23:41Z

As soon as I posted that, the nodes started working.

MrKevinWeiss · 2024-06-21T07:15:39Z

Maybe @mguetschow would be interested in looking at this?

Apply miri64s suggestion Co-authored-by: Martine Lenders <martine.lenders@tu-dresden.de>

Teufelchen1

LGTM, did not run it though.

mguetschow · 2024-07-03T10:05:51Z

When I run this locally with tox -- -k "spec04 and task02" --non-RC, I get:

test: commands[0]> pytest -k 'spec04 and task02' --non-RC
======================================= test session starts =======================================
platform linux -- Python 3.11.2, pytest-7.3.2, pluggy-1.5.0 -- /home/mikolai/TUD/Code/Release-Specs/.tox/test/bin/python
cachedir: .tox/test/.pytest_cache
rootdir: /home/mikolai/TUD/Code/Release-Specs
configfile: setup.cfg
plugins: cov-5.0.0, rerunfailures-14.0
collected 136 items / 135 deselected / 1 selected                                                 

04-single-hop-6lowpan-icmp/test_spec04.py::test_task02[nodes0] RERUN                        [100%]
04-single-hop-6lowpan-icmp/test_spec04.py::test_task02[nodes0] RERUN                        [100%]
04-single-hop-6lowpan-icmp/test_spec04.py::test_task02[nodes0] RERUN                        [100%]
04-single-hop-6lowpan-icmp/test_spec04.py::test_task02[nodes0] ERROR                        [100%]

============================================= ERRORS ==============================================
______________________________ ERROR at setup of test_task02[nodes0] ______________________________

local = False, request = <SubRequest 'nodes' for <Function test_task02[nodes0]>>
boards = ['samr21-xpro', 'iotlab-m3'], iotlab_site = 'saclay'

    @pytest.fixture
    def nodes(local, request, boards, iotlab_site):
        """
        Provides the nodes for a test as a list of RIOTCtrl objects
        """
        ctrls = []
        if boards is None:
            boards = request.param
        only_native = all(b.startswith("native") for b in boards)
        for board in boards:
            if local or only_native or IoTLABExperiment.valid_board(board):
                env = {'BOARD': f'{board}'}
                if only_native:
                    # XXX this does not work for a mix of native and non-native boards,
                    # but we do not have these in the release tests at the moment.
                    env["RIOT_TERMINAL"] = "native"
            else:
                env = {
                    'BOARD': IoTLABExperiment.board_from_iotlab_node(board),
                    'IOTLAB_NODE': f'{board}',
                }
            ctrls.append(RIOTCtrl(env=env))
        if local or only_native:
            yield ctrls
        else:
            name_fmt = get_namefmt(request)
            # Start IoT-LAB experiment if requested
            exp = IoTLABExperiment(
                # pylint: disable=C0209
                name="RIOT-release-test-{module}-{function}".format(**name_fmt),
                ctrls=ctrls,
                site=iotlab_site,
            )
            RUNNING_EXPERIMENTS.append(exp)
>           exp.start(duration=IOTLAB_EXPERIMENT_DURATION)

conftest.py:306: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
testutils/iotlab.py:144: in start
    self.exp_id = self._submit(site=self.site, duration=duration)
testutils/iotlab.py:193: in _submit
    return submit_experiment(api, self.name, duration, resources)['id']
.tox/test/lib/python3.11/site-packages/iotlabcli/experiment.py:73: in submit_experiment
    experiment.add_exp_resources(res_dict)
.tox/test/lib/python3.11/site-packages/iotlabcli/experiment.py:605: in add_exp_resources
    self._set_type(resources['type'])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <iotlabcli.experiment._Experiment object at 0x7f24af5abb10>, exp_type = 'alias'

    def _set_type(self, exp_type):
        """ Set current experiment type.
        If type was already set and is different ValueError is raised
        """
        if self.type is not None and self.type != exp_type:
>           raise ValueError(
                "Invalid experiment, should be only physical or only alias")
E           ValueError: Invalid experiment, should be only physical or only alias

.tox/test/lib/python3.11/site-packages/iotlabcli/experiment.py:596: ValueError
======================================== warnings summary =========================================
.tox/test/lib/python3.11/site-packages/_pytest/cacheprovider.py:387
  /home/mikolai/TUD/Code/Release-Specs/.tox/test/lib/python3.11/site-packages/_pytest/cacheprovider.py:387: PytestCacheWarning: cache could not write path /home/mikolai/TUD/Code/Release-Specs/.tox/test/.pytest_cache/v/cache/lastfailed
    config.cache.set("cache/lastfailed", self.lastfailed)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
------------ generated xml file: /home/mikolai/TUD/Code/Release-Specs/test-report.xml -------------
================= 135 deselected, 1 warning, 1 error, 3 rerun in 93.90s (0:01:33) =================
test: exit 1 (94.30 seconds) /home/mikolai/TUD/Code/Release-Specs> pytest -k 'spec04 and task02' --non-RC pid=240341
test: FAIL ✖ in 1 minute 34.33 seconds
flake8: commands[0]> flake8
flake8: OK ✔ in 0.25 seconds
pylint: commands[0]> pylint conftest.py testutils/ 03-single-hop-ipv6-icmp/ 04-single-hop-6lowpan-icmp/ 05-single-hop-route/ 06-single-hop-udp/ 07-multi-hop/ 08-interop/ 09-coap/ 10-icmpv6-error/ 11-lorawan/

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

pylint: OK ✔ in 10.16 seconds
black: commands[0]> black --check --diff .
All done! ✨ 🍰 ✨
31 files would be left unchanged.
  test: FAIL code 1 (94.33=setup[0.02]+cmd[94.30] seconds)
  flake8: OK (0.25=setup[0.00]+cmd[0.24] seconds)
  pylint: OK (10.16=setup[0.01]+cmd[10.15] seconds)
  black: OK (0.35=setup[0.00]+cmd[0.35] seconds)
  evaluation failed :( (105.12 seconds)

Edit: does not happen on master

MrKevinWeiss added the bug label Jan 29, 2024

MrKevinWeiss requested a review from miri64 January 29, 2024 11:26

MrKevinWeiss force-pushed the pr/iotlabrandom branch from 6e6db29 to d5f9a46 Compare January 30, 2024 17:19

miri64 previously requested changes Jan 30, 2024

View reviewed changes

testutils/iotlab.py Outdated Show resolved Hide resolved

MrKevinWeiss force-pushed the pr/iotlabrandom branch from d5f9a46 to 573406f Compare January 31, 2024 09:54

MrKevinWeiss force-pushed the pr/iotlabrandom branch from 573406f to eb7f0d8 Compare January 31, 2024 10:37

test_iotlab: Add test for random feature

1d7f4ef

MrKevinWeiss force-pushed the pr/iotlabrandom branch from 8090f2a to 1d7f4ef Compare January 31, 2024 12:08

MrKevinWeiss added 2 commits January 31, 2024 13:59

Merge branch 'master' into pr/iotlabrandom

c02b9b6

Merge branch 'master' into pr/iotlabrandom

1c77215

Merge branch 'master' into pr/iotlabrandom

8e318b7

Update testutils/iotlab.py

468ca7d

Apply miri64s suggestion Co-authored-by: Martine Lenders <martine.lenders@tu-dresden.de>

Teufelchen1 approved these changes Jun 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testutils/iotlab.py: Randomize the selected node #298

testutils/iotlab.py: Randomize the selected node #298

MrKevinWeiss commented Jan 29, 2024 •

edited

Loading

MrKevinWeiss commented Jan 29, 2024

MrKevinWeiss commented Jan 30, 2024

MrKevinWeiss commented Jan 30, 2024

MrKevinWeiss commented Jan 30, 2024

miri64 left a comment

MrKevinWeiss commented Jan 30, 2024

MrKevinWeiss commented Jan 31, 2024

MrKevinWeiss commented Apr 18, 2024

MrKevinWeiss commented Apr 18, 2024

MrKevinWeiss commented Jun 21, 2024

Teufelchen1 left a comment

mguetschow commented Jul 3, 2024 •

edited

Loading

testutils/iotlab.py: Randomize the selected node #298

Are you sure you want to change the base?

testutils/iotlab.py: Randomize the selected node #298

Conversation

MrKevinWeiss commented Jan 29, 2024 • edited Loading

MrKevinWeiss commented Jan 29, 2024

MrKevinWeiss commented Jan 30, 2024

MrKevinWeiss commented Jan 30, 2024

MrKevinWeiss commented Jan 30, 2024

miri64 left a comment

Choose a reason for hiding this comment

MrKevinWeiss commented Jan 30, 2024

MrKevinWeiss commented Jan 31, 2024

MrKevinWeiss commented Apr 18, 2024

MrKevinWeiss commented Apr 18, 2024

MrKevinWeiss commented Jun 21, 2024

Teufelchen1 left a comment

Choose a reason for hiding this comment

mguetschow commented Jul 3, 2024 • edited Loading

MrKevinWeiss commented Jan 29, 2024 •

edited

Loading

mguetschow commented Jul 3, 2024 •

edited

Loading