Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding an edge leads to an unrelated node never entering runtime #52

Open
mhoff opened this issue Feb 7, 2018 · 11 comments
Open

Adding an edge leads to an unrelated node never entering runtime #52

mhoff opened this issue Feb 7, 2018 · 11 comments

Comments

@mhoff
Copy link
Contributor

mhoff commented Feb 7, 2018

Consider the following MUSIC topology:

image

V, P, C, D are nodes from the ros_music_adapters, written in CPP; R is a python (pymusic) node; and N is a pynest node using SPORE.

The problem arises when the red edge (R, N) is inserted into the MUSIC configuration file. Running MUSIC does then result in all nodes functioning normally, except for D, which is getting stuck somewhere before entering the runtime.

This bug appears to be related to #37 (#35), because essentially the same error symptoms occur. However, --disable-isend does not resolve the issue, nor is the MPI version 1.6.5.

Let me emphasize: slightly different topologies are functional in the sense, that no single node is getting stuck before runtime.

Working example 1:
image

Working example 2:
image

Debug information:

$ python3 --version
Python 3.5.2

$ music --version
MUSIC 1.1.15

# MUSIC build configuration:
$ PYTHON=/usr/bin/python3.5 ./configure --prefix=$HOME/.local --disable-isend
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking for mpiCC... mpiCC
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether mpiCC accepts -g... yes
checking for style of include used by make... GNU
checking dependency style of mpiCC... gcc3
checking which MPI system we think we are using... SYSGUESS=openmpi
checking MPI_CXXFLAGS... -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi -pthread
checking MPI_CFLAGS... -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi -pthread
checking MPI_LDFLAGS... -pthread -Wl,-rpath -Wl,/usr/lib/openmpi/lib -Wl,--enable-new-dtags -L/usr/lib/openmpi/lib -lmpi_cxx -lmpi
checking for python version... 3.5
checking for python platform... linux
checking for python script directory... ${prefix}/lib/python3.5/site-packages
checking for python extension module directory... ${exec_prefix}/lib/python3.5/site-packages
checking for gcc... gcc
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... gcc3
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking how to print strings... printf
checking for a sed that does not truncate output... /bin/sed
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking how to convert x86_64-pc-linux-gnu file names to x86_64-pc-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-pc-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @FILE support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for a working dd... /bin/dd
checking how to truncate binary pipes... /bin/dd bs=4096 count=1
checking for mt... mt
checking if mt is a manifest tool... no
checking how to run the C preprocessor... gcc -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking how to run the C++ preprocessor... g++ -E
checking for ld used by g++... /usr/bin/ld -m elf_x86_64
checking if the linker (/usr/bin/ld -m elf_x86_64) is GNU ld... yes
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking for g++ option to produce PIC... -fPIC -DPIC
checking if g++ PIC flag -fPIC -DPIC works... yes
checking if g++ static flag -static works... yes
checking if g++ supports -c -o file.o... yes
checking if g++ supports -c -o file.o... (cached) yes
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking dynamic linker characteristics... (cached) GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking for ANSI C header files... (cached) yes
checking for an ANSI C-conforming const... yes
checking for inline... inline
checking for size_t... yes
checking for long long... yes
checking for strrchr... yes
checking for mallinfo... yes
checking GL/glut.h usability... yes
checking GL/glut.h presence... yes
checking for GL/glut.h... yes
checking for rts_get_personality... no
checking for ompi_comm_free... yes
checking for MPI::Init_thread method... yes
checking whether /usr/bin/python3 version is >= 2.6... yes
checking for /usr/bin/python3 version... (cached) 3.5
checking for /usr/bin/python3 platform... (cached) linux
checking for /usr/bin/python3 script directory... (cached) ${prefix}/lib/python3.5/site-packages
checking for /usr/bin/python3 extension module directory... (cached) ${exec_prefix}/lib/python3.5/site-packages
checking for "/usr/include/python3.5m/Python.h"... yes
checking whether to build PyMUSIC... yes
checking whether C compiler accepts -fno-strict-aliasing... yes
checking whether to build ROS Toolchain... no
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating pymusic/Makefile
config.status: creating pymusic/setup.py
config.status: creating Makefile
config.status: creating mpidep/Makefile
config.status: creating src/Makefile
config.status: creating src/music/music-config.hh
config.status: creating src/music/version.hh
config.status: creating rudeconfig/Makefile
config.status: creating utils/Makefile
config.status: creating examples/Makefile
config.status: creating testsuite/Makefile
config.status: creating testsuite/music_tests.sh
config.status: creating testsuite/unittests/catch/Makefile
config.status: creating testsuite/sanitytests/Makefile
config.status: creating music-config/Makefile
config.status: creating music-config/predict_rank.py
config.status: creating doc/Makefile
config.status: creating extras/Makefile
config.status: creating config.h
config.status: config.h is unchanged
config.status: executing depfiles commands
config.status: executing libtool commands

$ mpirun --version
mpirun (Open MPI) 1.10.2

$ lsb_release -d
Description:	Ubuntu 16.04.3 LTS

$ cat run.music
music_timestep=0.003
rtf=1.0
stoptime=50000.0
sensor_update_rate=333.3333333333333
command_rate=333.3333333333333

[nest]
  binary=python/ffn_node.py
  np=1
[position]
  ros_node_name=position_adapter
  binary=ros_sensor_adapter
  message_type=Vector3
  ros_topic=/plugging_camera_offset
  np=1
[dvs]
  music_timestep=0.001
  binary=dvs_adapter
  ros_topic=/dvs/events
  np=1
  ros_node_name=dvs_adapter
  message_type=EventArray
  sensor_update_rate=1000.0
[reward]
  binary=python/reward_node.py
  np=1
[command_gen]
  message_mapping_filename=res/map_float_cmd.dat
  ros_topic=/motor/activity
  np=1
  binary=ros_command_adapter
[decoder]
  np=1
  binary=linear_readout_decoder
  weights_filename=res/activity_to_velocity_translation_weights.dat
  tau=0.1

dvs.out -> nest.visual [16384]
nest.motor -> decoder.in [8]
decoder.out -> command_gen.in [2]
position.out -> reward.in [3]
reward.out -> nest.reward [1]
@weidel-p
Copy link

weidel-p commented Feb 8, 2018

Thanks for the bug report, I will try to reproduce and investigate this problem. One question: does this error also occur if N is not using SPORE?

@mhoff
Copy link
Contributor Author

mhoff commented Feb 8, 2018

@weidel-p many thanks! At least in the case of #35, SPORE was not relevant to this issue. I will replace N with a simple test node and report my findings tomorrow.

@mhoff
Copy link
Contributor Author

mhoff commented Feb 9, 2018

I replaced N with the following node:

#!/usr/bin/env python3

import numpy as np

import music
from mpi4py import MPI

setup = music.Setup()

event_out = setup.publishEventOutput("motor")
event_in = setup.publishEventInput("visual")

def event_func(d, t, i):
    print("{} {} {}".format(d, t, i))

event_in.map(event_func, music.Index.GLOBAL, base=0, size=16384, maxBuffered=0)
event_out.map(music.Index.GLOBAL, base=0, size=8, maxBuffered=0)

cont_in = setup.publishContInput("reward")
cont_in_buffer = np.array([0], dtype=np.int)
cont_in.map(cont_in_buffer, base=0)

MPI.COMM_WORLD.Barrier()

print("entering runtime")

times = setup.runtime(0.02)

for time in times:
    print(time)

and the same problem occurs.
Note, that I use the MPI barrier to synchronize with the ros_music_adapters and that I'm using python3 (if any of these aspects might relate to the issue).

@mdjurfeldt
Copy link
Contributor

mdjurfeldt commented Feb 9, 2018 via email

@mhoff
Copy link
Contributor Author

mhoff commented Feb 9, 2018

One idea: Could this have something to do with the particular combination of port types and topology. I'm wondering what would happen if the "red edge" from R to N was spiking rather than continuous---would that also cause a hang?

Is it difficult for you to test that?

Yes, indeed! With the edge being event-based, everything appears to work just fine.

@mdjurfeldt
Copy link
Contributor

mdjurfeldt commented Feb 9, 2018 via email

@mdjurfeldt
Copy link
Contributor

mdjurfeldt commented Feb 9, 2018 via email

@mhoff
Copy link
Contributor Author

mhoff commented Feb 9, 2018

I guess I have found the problem while constructing a minimal failing example. The source port of the red edge has not been an output, but an input port. Sorry for the inconvenience..

However, the resulting problem is still very unintuitive. Is there a simple way to change MUSIC in a way that it fails fast or even reports an error message in such cases of obvious misconfiguration?

@mdjurfeldt
Copy link
Contributor

mdjurfeldt commented Feb 9, 2018 via email

@mhoff
Copy link
Contributor Author

mhoff commented Feb 12, 2018

Weird. Even though properly configuring the edge does apparently indeed resolve the error, D can also be "fixed" by replacing its binary with a dummy substitute which essentially does the same in terms of MUSIC communication. As a consequence, you need the ros_music_adapter project to run the "minimal" failing example attached to this comment.

[edit]use mpirun -np 6 music run.music to run the example[/edit]

@mdjurfeldt
Copy link
Contributor

mdjurfeldt commented Feb 12, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants