Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on tests #132

Closed
JamesRamm opened this issue Jul 27, 2017 · 11 comments
Closed

Segmentation fault on tests #132

JamesRamm opened this issue Jul 27, 2017 · 11 comments

Comments

@JamesRamm
Copy link

Hi
I'm getting a segmentation fault when running the tests.
I am running on ubuntu, using a conda environment.
My exact installation process was as follows:

conda create -n anuga python=2
source activate anuga
conda install nose numpy scipy matplotlib netcdf4
conda install -c pingucarsti gdal
git clone https://github.com/GeoscienceAustralia/anuga_core.git
cd anuga_core/
python setup.py build
python setup.py install

Then running python runtests.py gives the following output:

$ python runtests.py 
Building, see build.log...
Build OK
Running unit tests for anuga
NumPy version 1.13.1
NumPy relaxed strides checking option: True
NumPy is installed in /home/james/miniconda3/envs/anuga/lib/python2.7/site-packages/numpy
Python version 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:09:15) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
nose version 1.3.7
...............................................................................................Segmentation fault (core dumped)
@stoiver
Copy link
Member

stoiver commented Jul 27, 2017

@JamesRamm, could you rerun the tests with the -v flag, ie python runtests.py -v

That should at least give us an idea of where the error is.

Which version of Ubuntu are you using?

Then I will try to replicate the error.

@JamesRamm
Copy link
Author

Hi
Ubuntu version 16.10 (yakkety)
Linux version 4.8.0-52-generic

verbose output:

Building, see build.log...                                                                                         
Build OK                                                                                                           
Running unit tests for anuga
NumPy version 1.13.1
NumPy relaxed strides checking option: True
NumPy is installed in /home/james/miniconda3/envs/anuga/lib/python2.7/site-packages/numpy
Python version 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:09:15) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
nose version 1.3.7
test_basic_single_line_grid (anuga.abstract_2d_finite_volumes.tests.test_ermapper.Test_ERMapper) ... ok
test_basic_single_line_grid_default_format (anuga.abstract_2d_finite_volumes.tests.test_ermapper.Test_ERMapper) ... ok
test_header_creation (anuga.abstract_2d_finite_volumes.tests.test_ermapper.Test_ERMapper) ... ok
test_write_default_header (anuga.abstract_2d_finite_volumes.tests.test_ermapper.Test_ERMapper) ... ok
test_write_grid (anuga.abstract_2d_finite_volumes.tests.test_ermapper.Test_ERMapper) ... ok
test_write_non_default_header (anuga.abstract_2d_finite_volumes.tests.test_ermapper.Test_ERMapper) ... ok
Most of this test was copied from test_interpolate ... ok
Check sww2csv timeseries at centroid. ... ok
test_sww2csv_gauge_point_off_mesh (anuga.abstract_2d_finite_volumes.tests.test_gauge.Test_Gauge) ... ok
test_sww2csv_gauges1 (anuga.abstract_2d_finite_volumes.tests.test_gauge.Test_Gauge) ... ok
Most of this test was copied from test_interpolate ... ok
This is testing the sww2csv_gauges function, by creating multiple ... ok
Check sww2csv timeseries at centroid, then output the centroid coordinates. ... ok
test_areas (anuga.abstract_2d_finite_volumes.tests.test_general_mesh.Test_General_Mesh) ... ok
test_assert_index_in_nodes - ... ok
test_get_edge_midpoint_coordinates (anuga.abstract_2d_finite_volumes.tests.test_general_mesh.Test_General_Mesh) ... ok
test_get_vertex_coordinates_triangle_id ... ok
test_get_edge_midpoint_coordinates_with_geo_ref (anuga.abstract_2d_finite_volumes.tests.test_general_mesh.Test_General_Mesh) ... ok
test_get_triangles_and_vertices_per_node - ... ok
test_get_triangles_and_vertices_per_node - ... ok
get unique_vertex based on triangle lists. ... ok
test_get_vertex_coordinates (anuga.abstract_2d_finite_volumes.tests.test_general_mesh.Test_General_Mesh) ... ok
test_get_vertex_coordinates_triangle_id ... ok
test_get_vertex_coordinates_with_geo_ref (anuga.abstract_2d_finite_volumes.tests.test_general_mesh.Test_General_Mesh) ... ok
Get connectivity based on triangle lists. ... ok
test_one_degenerate_triangles (anuga.abstract_2d_finite_volumes.tests.test_general_mesh.Test_General_Mesh) ... ok
test_two_degenerate_triangles (anuga.abstract_2d_finite_volumes.tests.test_general_mesh.Test_General_Mesh) ... ok
Check that structures are correct. ... ok
test_dirichlet (anuga.abstract_2d_finite_volumes.tests.test_generic_boundary_conditions.Test_Generic_Boundary_Conditions) ... ok
test_dirichlet_empty (anuga.abstract_2d_finite_volumes.tests.test_generic_boundary_conditions.Test_Generic_Boundary_Conditions) ... ok
Test that boundary object complains if number of ... ok
test_generic (anuga.abstract_2d_finite_volumes.tests.test_generic_boundary_conditions.Test_Generic_Boundary_Conditions) ... ok
test_time (anuga.abstract_2d_finite_volumes.tests.test_generic_boundary_conditions.Test_Generic_Boundary_Conditions) ... ok
test_time_space_boundary (anuga.abstract_2d_finite_volumes.tests.test_generic_boundary_conditions.Test_Generic_Boundary_Conditions) ... ok
test_transmissive (anuga.abstract_2d_finite_volumes.tests.test_generic_boundary_conditions.Test_Generic_Boundary_Conditions) ... ok
test_CFL (anuga.abstract_2d_finite_volumes.tests.test_generic_domain.Test_Domain) ... ok
Test that quantities already set can be added to using ... ok
test_boundary_conditions (anuga.abstract_2d_finite_volumes.tests.test_generic_domain.Test_Domain) ... ok
test_boundary_indices (anuga.abstract_2d_finite_volumes.tests.test_generic_domain.Test_Domain) ... ok
test_conserved_evolved_boundary_conditions (anuga.abstract_2d_finite_volumes.tests.test_generic_domain.Test_Domain) ... ok
test_conserved_quantities (anuga.abstract_2d_finite_volumes.tests.test_generic_domain.Test_Domain) ... ok
Quantity created from other quantities using arbitrary expression ... ok
Domain implements a default first order gradient limiter ... ok
test_rectangular_periodic_and_ghosts (anuga.abstract_2d_finite_volumes.tests.test_generic_domain.Test_Domain) ... ok
test_set_quanitities_to_be_monitored ... ok
Quantity set using arbitrary expression ... ok
Set quantities for sub region ... ok
test_setting_timestepping_method ... ok
test_simple (anuga.abstract_2d_finite_volumes.tests.test_generic_domain.Test_Domain) ... ok
test_update_conserved_quantities (anuga.abstract_2d_finite_volumes.tests.test_generic_domain.Test_Domain) ... ok
test_simple (anuga.abstract_2d_finite_volumes.tests.test_ghost.Test_Domain) ... ok
test_basic_triangle (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_boundary_inputs (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_boundary_inputs_using_all_defaults (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_boundary_inputs_using_one_default (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_boundary_polygon (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_boundary_polygon_II (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
Same as II but vertices ordered differently ... ok
test_boundary_polygon_IIIa - Check pathological situation where ... ok
Reproduce test test_spatio_temporal_file_function_time ... ok
Create a discontinuous mesh (duplicate vertices) ... ok
test_boundary_polygon_VI(self) ... ok
test_boundary_tags (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_build_neighbour_structure_duplicates (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_general_triangle (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_get_intersecting_segments(self): ... ok
test_get_intersecting_segments(self): ... ok
test_get_intersecting_segments(self): ... ok
test_get_intersecting_segments(self): ... ok
test_get_intersecting_segments(self): ... ok
test_get_intersecting_segments(self): ... ok
test_get_intersecting_segments(self): ... ok
test_get_intersecting_segments_coinciding(self): ... ok
test_get_intersecting_segments_partially_coinciding(self): ... ok
test_get_triangle_containing_point (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_get_triangle_neighbours (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_inputs (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test that the radius is calculated correctly by mesh in the case of an equilateral triangle ... ok
test that the radius is calculated correctly by mesh in the case of a right-angled triangle ... ok
get values based on triangle lists. ... ok
test_lone_vertices (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_mesh_and_neighbours (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_mesh_get_boundary_polygon_with_georeferencing ... ok
test_more_triangles (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_rectangular_mesh (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_rectangular_mesh2 (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_rectangular_mesh_basic (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_surrogate_neighbours (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_triangle_inputs (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_two_triangles (anuga.abstract_2d_finite_volumes.tests.test_neighbour_mesh.Test_Mesh) ... ok
test_pmesh2Domain (anuga.abstract_2d_finite_volumes.tests.test_pmesh2domain.Test_pmesh2domain) ... ok
test_pmesh2Domain_instance (anuga.abstract_2d_finite_volumes.tests.test_pmesh2domain.Test_pmesh2domain) ... ok
test_backup_saxpy_centroid_values (anuga.abstract_2d_finite_volumes.tests.test_quantity.Test_Quantity) ... ok
test_both_updates (anuga.abstract_2d_finite_volumes.tests.test_quantity.Test_Quantity) ... ok
test_boundary_allocation (anuga.abstract_2d_finite_volumes.tests.test_quantity.Test_Quantity) ... ok
test_cache_test_set_values_from_file (anuga.abstract_2d_finite_volumes.tests.test_quantity.Test_Quantity) ... Segmentation fault (core dumped)

@JamesRamm
Copy link
Author

I've managed to track the seg fault to the following code:

      quantity.set_values(filename=ptsfile,
                            attribute_name=att,
                            alpha=0,
                            use_cache=True,
                            verbose=False)

line 1095 of test_quantity.py.
I'll do a bit more poking around

@JamesRamm
Copy link
Author

JamesRamm commented Jul 28, 2017

Ok, so I got the debugger out and followed the call stack of that failing test way down to a function called cg_solve_c_precon, which is called by conjugate_gradient.

This is a c extension (cg_ext.c) which I'm not setup to debug, but hopefully this will be of help to you!

EDIT, a little more info. Running the tests with gdb gives the following output:

(gdb) run runtests.py 
Starting program: /home/james/miniconda3/envs/anuga/bin/python runtests.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Building, see build.log...
Build OK
Running unit tests for anuga
NumPy version 1.13.1
NumPy relaxed strides checking option: True
NumPy is installed in /home/james/miniconda3/envs/anuga/lib/python2.7/site-packages/numpy
Python version 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:09:15) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
nose version 1.3.7
...............................................................................................[New Thread 0x7fffd8d62780 (LWP 26691)]
[New Thread 0x7fffd8961800 (LWP 26692)]
[New Thread 0x7fffd8560880 (LWP 26693)]

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff39060b2 in dcopy_ ()
   from /home/james/miniconda3/envs/anuga/lib/python2.7/site-packages/numpy/core/../../../../libmkl_intel_lp64.so

@JamesRamm
Copy link
Author

JamesRamm commented Jul 28, 2017

And the first 10 lines of the stack trace from that seg fault:

#0  0x00007ffff39060b2 in dcopy_ ()
   from /home/james/miniconda3/envs/anuga/lib/python2.7/site-packages/numpy/core/../../../../libmkl_intel_lp64.so
#1  0x00007fffec1645ab in _cg_solve_c_precon (data=0x17eb5a0, colind=0x17ee290, row_ptr=0x178db00, b=0x17ee360, 
    x=0x17ee3e0, imax=1012, tol=1e-08, a_tol=1e-14, M=6, precon=0x17ee3a0) at anuga/utilities/cg_ext.c:321
#2  0x00007fffec164892 in cg_solve_c_precon (self=<optimised out>, args=<optimised out>)
    at anuga/utilities/cg_ext.c:546
#3  0x00007ffff7ad91e5 in call_function (oparg=<optimised out>, pp_stack=0x7fffffff6928) at Python/ceval.c:4352
#4  PyEval_EvalFrameEx (f=<optimised out>, throwflag=<optimised out>) at Python/ceval.c:2989
#5  0x00007ffff7adac3e in PyEval_EvalCodeEx (co=0x7fffec56f0b0, globals=<optimised out>, locals=<optimised out>, 
    args=<optimised out>, argcount=3, kws=0x17edc58, kwcount=3, defs=0x7fffec56d608, defcount=8, closure=0x0)
    at Python/ceval.c:3584
#6  0x00007ffff7ada1f7 in fast_function (nk=<optimised out>, na=3, n=<optimised out>, pp_stack=0x7fffffff6b48, 
    func=0x7fffec9d7c08) at Python/ceval.c:4447
#7  call_function (oparg=<optimised out>, pp_stack=0x7fffffff6b48) at Python/ceval.c:4372
#8  PyEval_EvalFrameEx (f=<optimised out>, throwflag=<optimised out>) at Python/ceval.c:2989
#9  0x00007ffff7adac3e in PyEval_EvalCodeEx (co=0x7fffec9e7a30, globals=<optimised out>, locals=<optimised out>, 
    args=<optimised out>, argcount=3, kws=0x17ec718, kwcount=4, defs=0x7fffec9e4a90, defcount=6, closure=0x0)
    at Python/ceval.c:3584
#10 0x00007ffff7ada1f7 in fast_function (nk=<optimised out>, na=3, n=<optimised out>, pp_stack=0x7fffffff6d68, 
    func=0x7fffec5767d0) at Python/ceval.c:4447

@stoiver
Copy link
Member

stoiver commented Jul 28, 2017

@JamesRamm great work. The problem seems to be that in cg_ext.c there is a function dcopy, but there is a function of the same name in /home/james/miniconda3/envs/anuga/lib/python2.7/site-packages/numpy/core/../../../../libmkl_intel_lp64.so. Probably the easiest way out of this will be to change the names of the functions in cg_ext.c with lapack type names to something a bit unique. Still a bit strange that the local functions are not being linked.

@JamesRamm
Copy link
Author

That is strange. There was a bunch of output from running setup.py build which included some warnings - I could build again to take a closer look at this.

Will setup.py clean remove the current build?

Changing the name to something more unique seems the easiest way out.
cg_ext contains a numpy include: #include "numpy/arrayobject.h" and I wonder if this is perhaps including dcopy somewhere in its' references which overrides the local? Although if that is the case I would expect this issue to crop up on everyones build.

@stoiver
Copy link
Member

stoiver commented Jul 28, 2017

I guess the problem is that conda numpy is linked against libmkl_intel_lp64.so which obviously contains the lapack procedures like dcopy (which has a few extra calling argument, which no doubt caused the segmentation fault)

@stoiver
Copy link
Member

stoiver commented Jul 28, 2017

To rebuild use
python setup.py build --force

@JamesRamm
Copy link
Author

Ok
I managed to get it to work by removing the MKL optimisations:
conda remove mkl
conda install nomkl

Then reinstalling numpy, scipy, matplotlib, netcdf4 and gdal is required.
The tests will now run through (I do get 26 fails though!).

I imagine this means that the conda instructions (and maybe install_conda.sh?) need updating to account for this (conda remove mkl not necessary if you are installing for the first time!...Just need to install nomkl before installing numpy).

However, perhaps it is desirable to support the MKL extensions; they may bring about some performance improvements?

Anaconda docs on the optimisations and how to uninstall are here:
https://docs.continuum.io/mkl-optimizations/

@stoiver
Copy link
Member

stoiver commented Sep 14, 2017

Fixed in PR #140

@stoiver stoiver closed this as completed Sep 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants