Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash (segfault) of the Deep CNN sulci recognition in about 6% of cases #96

Closed
ylep opened this issue Mar 13, 2023 · 8 comments
Closed
Labels
bug Something isn't working

Comments

@ylep
Copy link
Member

ylep commented Mar 13, 2023

Describe the bug

Sulci recognition with Deep CNN crashes on some datasets. On the 1558 hemispheres that I processed today, 91 exhibited the crash.

Testing on one of the faulty hemispheres showed that the crash is not systematic, but happens frequently (9 times over 10 runs). It happens irrelespective of if the GPU is used (cuda = 0) or the CPU (cuda = -1).

The last messages printed on standard output are:

ss aggregation done: 2 bits at end
splitVertex end

The crash seems related to a segfault in aimssip.so, since the kernel log contains:

python[5568]: segfault at 55b4a3f9e ip 00007f78020e7548 sp 00007ffd06376778 error 4 in aimssip.so[7f780155c000+d69000]
Code: 48 39 b2 48 08 00 00 74 12 48 39 b2 e0 15 00 00 74 09 48 39 b2 48 16 00 00 74 07 c3 66 0f 1f 44 00 00 48 85 ff 74 f4 48 8b 17 <48> 03 42 b0 c3 0f 1f 00 f3 0f 1e fa 55 48 89 fd 53 48 83 ec 08 48

To Reproduce
Steps to reproduce the behavior:

  1. Get /neurospin/tmp/yleprince/2023-03-13_deepcnn_crash.zip
  2. Run bv python3 -m capsul deepsulci.sulci_labeling.capsul.labeling.SulciDeepLabeling graph=Lsub-50011.arg labeled_graph=output.arg model_file=/casa/install/share/brainvisa-share-5.1/models/models_2019/cnn_models/sulci_unet_model_left.mdsm param_file=/casa/install/share/brainvisa-share-5.1/models/models_2019/cnn_models/sulci_unet_model_params_left.json roots=Lroots_sub-50011.nii.gz skeleton=Lskeleton_sub-50011.nii.gz fix_random_seed=True
  3. Crash.

Environment:

  • Engine: Singularity
  • Version of BrainVISA : 5.1.0
@ylep ylep added the bug Something isn't working label Mar 13, 2023
@ylep
Copy link
Member Author

ylep commented Mar 14, 2023

Here is the complete output of the command using the dataset mentioned in the bug description:

$ bv python3 -m capsul deepsulci.sulci_labeling.capsul.labeling.SulciDeepLabeling graph=Lsub-50011.arg labeled_graph=output.arg model_file=/casa/install/share/brainvisa-share-5.1/models/models_2019/cnn_models/sulci_unet_model_left.mdsm param_file=/casa/install/share/brainvisa-share-5.1/models/models_2019/cnn_models/sulci_unet_model_params_left.json roots=Lroots_sub-50011.nii.gz skeleton=Lskeleton_sub-50011.nii.gz fix_random_seed=True
Working on cpu
Reading FGraph version 3.1
Labeling Lsub-50011.arg
threshold 100
/usr/lib/python3/dist-packages/sklearn/cluster/_agglomerative.py:245: UserWarning: the number of connected components of the connectivity matrix is 2 > 1. Completing it to avoid stopping the tree early.
  connectivity, n_connected_components = _fix_connectivity(
/usr/lib/python3/dist-packages/sklearn/cluster/_agglomerative.py:245: UserWarning: the number of connected components of the connectivity matrix is 2 > 1. Completing it to avoid stopping the tree early.
  connectivity, n_connected_components = _fix_connectivity(
Reading FGraph version 3.1
split vertex 11 in: [8 9] ['F.C.M.ant._left', 'F.C.M.post._left'] , size: 856 (215, 641)
ss in many pieces - re-aggregating...
ss aggregation done: 2 bits at end
splitVertex end
new vertices. 274 582
split vertex doesn't have one label: [8 9] [147, 127]
v1 label: F.C.M.ant._left
split vertex2 doesn't have one label: [8 9] [68, 514]
v2 label: F.C.M.post._left
split vertex 2 in: [30 47] ['S.F.sup._left', 'S.Pe.C.sup._left'] , size: 1024 (828, 196)
ss in many pieces - re-aggregating...
ss aggregation done: 2 bits at end
splitVertex end
new vertices. 822 202
split vertex doesn't have one label: [30 47] [813, 9]
    good enough.
v1 label: S.F.sup._left
split vertex2 doesn't have one label: [30 47] [15, 187]
    good enough.
v2 label: S.Pe.C.sup._left
split vertex 167 in: [ 1 39] ['F.C.L.a._left', 'S.Olf._left'] , size: 491 (197, 294)
splitVertex end
new vertices. 311 180
split vertex doesn't have one label: [ 1 39] [18, 293]
    good enough.
v1 label: S.Olf._left
split vertex2 doesn't have one label: [ 1 39] [179, 1]
    good enough.
v2 label: F.C.L.a._left
split vertex 33 in: [ 1  2 16] ['F.C.L.a._left', 'F.C.L.p._left', 'INSULA_left'] , size: 1217 (2, 693, 522)
ss in many pieces - re-aggregating...
ss aggregation done: 2 bits at end
splitVertex end
new vertices. 732 485
split vertex doesn't have one label: [ 1  2 16] [2, 679, 51]
v1 label: F.C.L.p._left
split vertex2 doesn't have one label: [ 2 16] [14, 471]
    good enough.
v2 label: INSULA_left
split vertex 203 in: [11 61] ['F.Coll._left', 'ventricle_left'] , size: 275 (212, 63)
splitVertex end
new vertices. 61 214
split vertex doesn't have one label: [11 61] [7, 54]
    good enough.
v1 label: ventricle_left
split vertex2 doesn't have one label: [11 61] [205, 9]
    good enough.
v2 label: F.Coll._left
split vertex 11 in: [8 9] ['F.C.M.ant._left', 'F.C.M.post._left'] , size: 274 (147, 127)
ss in many pieces - re-aggregating...
ss aggregation done: 2 bits at end
splitVertex end
new vertices. 126 148
split vertex doesn't have one label: [8 9] [66, 60]
v1 label: F.C.M.ant._left
split vertex2 doesn't have one label: [8 9] [81, 67]
v2 label: F.C.M.ant._left
split vertex 303 in: [8 9] ['F.C.M.ant._left', 'F.C.M.post._left'] , size: 582 (68, 514)
ss in many pieces - re-aggregating...
ss aggregation done: 2 bits at end
splitVertex end
new vertices. 126 456
split vertex doesn't have one label: [8 9] [45, 81]
    good enough.
v1 label: F.C.M.post._left
split vertex2 doesn't have one label: [8 9] [23, 433]
    good enough.
v2 label: F.C.M.post._left
split vertex 33 in: [ 1  2 16] ['F.C.L.a._left', 'F.C.L.p._left', 'INSULA_left'] , size: 732 (2, 679, 51)
ss in many pieces - re-aggregating...
ss aggregation done: 2 bits at end
splitVertex end
new vertices. 624 108
split vertex doesn't have one label: [ 1  2 16] [2, 611, 11]
    good enough.
v1 label: F.C.L.p._left
split vertex2 doesn't have one label: [ 2 16] [68, 40]
    good enough.
v2 label: F.C.L.p._left
split vertex 11 in: [8 9] ['F.C.M.ant._left', 'F.C.M.post._left'] , size: 126 (66, 60)
ss in many pieces - re-aggregating...
ss aggregation done: 2 bits at end
splitVertex end
new vertices. 80 46
split vertex doesn't have one label: [8 9] [52, 28]
    good enough.
v1 label: F.C.M.ant._left
split vertex2 doesn't have one label: [8 9] [14, 32]
    good enough.
v2 label: F.C.M.post._left
split vertex 308 in: [8 9] ['F.C.M.ant._left', 'F.C.M.post._left'] , size: 148 (81, 67)
ss in many pieces - re-aggregating...
ss aggregation done: 2 bits at end
splitVertex end
new vertices. 96 52
split vertex doesn't have one label: [8 9] [66, 30]
    good enough.
v1 label: F.C.M.ant._left
split vertex2 doesn't have one label: [8 9] [15, 37]
    good enough.
v2 label: F.C.M.post._left
Segmentation fault (core dumped)

@ylep
Copy link
Member Author

ylep commented Mar 14, 2023

Using the pdb debugger I was able to locate the crash in this block of code:

# fusion pass: in each split group, merge vertices which share the same
# label and are adjacent
for split_group in split_groups.values():
labels = {}
for v in split_group:
labels.setdefault(v['label'], []).append(v)
if len(labels) == len(split_group):
# all vertices have different labels: skip this step
continue
for label, vertices in labels.items():
vertices = set(vertices) # copy set
while len(vertices) >= 2:
v = next(iter(vertices))
# check junctions
junctions = [j for j in v.edges()
if j.getSyntax() == 'junction'
and all(v2 in vertices
for v2 in j.vertices())]
if len(junctions) == 0:
vertices.remove(v)
else:
# merge v and 1st connected other vertex
v2 = [v3 for v3 in junctions[0].vertices()
if v3 is not v][0]
# v2 will disappear
vertices.remove(v2)
aims.FoldArgOverSegment(graph).mergeVertices(v, v2)
del v2
# do v again next time since it may have other edges

Therefore, I am transferring this issue to aims-free and continuing to investigate...

@ylep ylep transferred this issue from brainvisa/morpho-deepsulci Mar 14, 2023
@ylep
Copy link
Member Author

ylep commented Mar 14, 2023

More precisely, the crash happens on that statement:

junctions = [j for j in v.edges()
if j.getSyntax() == 'junction'
and all(v2 in vertices
for v2 in j.vertices())]

@ylep
Copy link
Member Author

ylep commented Mar 14, 2023

It looks like the graph structure is corrupted: the call that triggers the segfault is v.edges().

@ylep
Copy link
Member Author

ylep commented Mar 14, 2023

Instrumenting the code as shown below shows that the same vertex is present in different split groups, which results in a crash when the same vertex is considered a second time, after having been merged:

    # fusion pass: in each split group, merge vertices which share the same
    # label and are adjacent
    for split_group in split_groups.values():
        labels = {}
        for v in split_group:
            labels.setdefault(v['label'], []).append(v)
        if len(labels) == len(split_group):
            # all vertices have different labels: skip this step
            continue
        for label, vertices in labels.items():
            vertices = set(vertices)  # copy set
            while len(vertices) >= 2:
                v = next(iter(vertices))
                print(f"v={id(v)}, calling v.edges()")
                v.edges()
                # check junctions
                junctions = [j for j in v.edges()
                              if j.getSyntax() == 'junction'
                                and all(v2 in vertices
                                        for v2 in j.vertices())]
                if len(junctions) == 0:
                    vertices.remove(v)
                else:
                    # merge v and 1st connected other vertex
                    v2 = [v3 for v3 in junctions[0].vertices()
                          if v3 is not v][0]
                    # v2 will disappear
                    vertices.remove(v2)
                    print(f"calling aims.FoldArgOverSegment(graph).mergeVertices(v={id(v)}, v2={id(v2)})")
                    aims.FoldArgOverSegment(graph).mergeVertices(v, v2)
                    del v2
                    # do v again next time since it may have other edges
v=140310277844432, calling v.edges()
calling aims.FoldArgOverSegment(graph).mergeVertices(v=140310277844432, v2=140310277840544)
v=140310277844432, calling v.edges()
v=140310277847024, calling v.edges()
calling aims.FoldArgOverSegment(graph).mergeVertices(v=140310277847024, v2=140310277843280)
v=140310277847024, calling v.edges()
calling aims.FoldArgOverSegment(graph).mergeVertices(v=140310277847024, v2=140310277839248)
v=140310277847024, calling v.edges()
v=140310277840832, calling v.edges()
calling aims.FoldArgOverSegment(graph).mergeVertices(v=140310277840832, v2=140310277847744)
v=140310277847024, calling v.edges()
v=140310277839824, calling v.edges()
v=140310277839248, calling v.edges()
Segmentation fault (core dumped)

@ylep
Copy link
Member Author

ylep commented Mar 14, 2023

I got it:

  • a vertex object that is merged into another one are left in a broken state by FoldArgOverSegment (calling v.edges() on it can cause a segmentation fault)
  • when the same vertex appears in multiple split groups, and is merged into another vertex during the processing of the first split group, it causes the crash when v.edges() is called on it during the processing of the second split group.

I am about to push a fix.

@ylep
Copy link
Member Author

ylep commented Mar 15, 2023

I confirm that the issue is fixed, I could label all 1558 hemispheres without a single crash.

@ylep ylep changed the title Occasional crash (segfault) of the Deep CNN sulci recognition Crash (segfault) of the Deep CNN sulci recognition in about 6% of cases Mar 17, 2023
@denisri
Copy link
Contributor

denisri commented Mar 20, 2023

Thanks @ylep !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants