Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RT crashes on i686 (most likely caused by aligned SSE access on unaligned address) #4432

Closed
Llerr opened this issue Mar 7, 2018 · 48 comments
Assignees
Labels
type: bug Something is not doing what it's supposed to be doing

Comments

@Llerr
Copy link

Llerr commented Mar 7, 2018

system: ubuntu 16.04 LTS i686

git branch -a
* (HEAD отделён на 5.4-rc2)
  dev
  remotes/origin/HEAD -> origin/dev
  remotes/origin/autowblocal
(gdb) run
Starting program: /opt/rawtherapee/bin/rawtherapee 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
[New Thread 0xb53fdb40 (LWP 22201)]
[New Thread 0xb4bfcb40 (LWP 22202)]
[New Thread 0xb43fbb40 (LWP 22203)]
[New Thread 0xb3bfab40 (LWP 22204)]
[New Thread 0xb33f9b40 (LWP 22205)]
[New Thread 0xb2bf8b40 (LWP 22206)]
[New Thread 0xb23f7b40 (LWP 22207)]
[New Thread 0xb11ffb40 (LWP 22208)]
[New Thread 0xb07ffb40 (LWP 22209)]

Thread 1 "rawtherapee" received signal SIGSEGV, Segmentation fault.
0x08680681 in LUT<float>::LUT (this=0x942c3b8) at /home/lex/programs/code-rawtherapee/rtgui/../rtengine/LUT.h:177
177	        maxsv = ZEROV;
(gdb) where
#0  0x08680681 in LUT<float>::LUT (this=0x942c3b8) at /home/lex/programs/code-rawtherapee/rtgui/../rtengine/LUT.h:177
#1  0x0867f440 in MyCurve::MyCurve (this=0x942c308, __vtt_parm=0x8da8acc <VTT for MyDiagonalCurve+4>, __in_chrg=<optimized out>)
    at /home/lex/programs/code-rawtherapee/rtgui/mycurve.cc:24
#2  0x08680c5e in MyDiagonalCurve::MyDiagonalCurve (this=0x942c308, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at /home/lex/programs/code-rawtherapee/rtgui/mydiagonalcurve.cc:24
#3  0x084ee417 in DiagonalCurveEditorSubGroup::DiagonalCurveEditorSubGroup (this=0x942c230, prt=0x942b488, curveDir=...)
    at /home/lex/programs/code-rawtherapee/rtgui/diagonalcurveeditorsubgroup.cc:56
#4  0x084e2de8 in CurveEditorGroup::addCurve (this=0x942b488, cType=CT_Diagonal, curveLabel=..., relatedWidget=0x942f780, 
    expandRelatedWidget=true, periodic=true) at /home/lex/programs/code-rawtherapee/rtgui/curveeditorgroup.cc:82
#5  0x087e30c4 in ToneCurve::ToneCurve (this=0x9350e20, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at /home/lex/programs/code-rawtherapee/rtgui/tonecurve.cc:174
#6  0x087f16c8 in ToolPanelCoordinator::ToolPanelCoordinator (this=0x9362000, batch=true)
    at /home/lex/programs/code-rawtherapee/rtgui/toolpanelcoord.cc:42
#7  0x08431b2b in BatchToolPanelCoordinator::BatchToolPanelCoordinator (this=0x9362000, parent=0x92ca8f8)
    at /home/lex/programs/code-rawtherapee/rtgui/batchtoolpanelcoord.cc:28
#8  0x085c243b in FilePanel::FilePanel (this=0x92ca8f8, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at /home/lex/programs/code-rawtherapee/rtgui/filepanel.cc:55
#9  0x087932ba in RTWindow::RTWindow (this=0x9317730, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at /home/lex/programs/code-rawtherapee/rtgui/rtwindow.cc:177
#10 0x08675e43 in (anonymous namespace)::create_rt_window () at /home/lex/programs/code-rawtherapee/rtgui/main.cc:348
#11 0x08677399 in main (argc=1, argv=0xbfffef74) at /home/lex/programs/code-rawtherapee/rtgui/main.cc:659
(gdb) list
172	    LUT()
173	    {
174	        data = nullptr;
175	        reset();
176	#ifdef __SSE2__
177	        maxsv = ZEROV;
178	        sizev = ZEROV;
179	        sizeiv = _mm_setzero_si128();
180	#endif
181	    }
(gdb) 
@heckflosse heckflosse self-assigned this Mar 7, 2018
@heckflosse heckflosse changed the title crashed RT crashes on i686 (most likely caused by algined SSE access on unaligned address) Mar 7, 2018
@Beep6581 Beep6581 added the type: bug Something is not doing what it's supposed to be doing label Mar 7, 2018
@heckflosse
Copy link
Collaborator

This is caused by removing the __x86_64__ dependencies in 96bf06a.
@Floessie Is there an other solution than adding the dependencies again?

@Floessie
Copy link
Collaborator

Floessie commented Mar 7, 2018

@heckflosse Don't we have those ALIGN macros? Can't they be added to the vfloat define?

@heckflosse
Copy link
Collaborator

@Floessie They are used in LUT.h, but I remember that on non __x86_64__ builds they didn't work in all cases. E.g. they align 16 bytes relative to the stack, which may be unaligned. For this reason we had to use -mstackrealign for 32 bit builds in past. But in case of LUT.h vfloat members, that also didn't work on non __x86_64__ builds for whatever reason

@Floessie
Copy link
Collaborator

Floessie commented Mar 7, 2018

@heckflosse Well, I don't have that insight, but why might the stack be unaligned when GCC defaults to -mpreferred-stack-boundary=4?

@heckflosse
Copy link
Collaborator

heckflosse commented Mar 7, 2018

@Floessie iirc

vfloat a ALIGNED16;
a = _mm_setzero_ps();

where a is a member of LUT generates an aligned store _mm_store_ps(), which crashes if memory is not aligned to 16 byte and for whatever reason this never worked on non __x86_64__ for the vfloat members in LUT.h though you are right that it should work.

@heckflosse
Copy link
Collaborator

heckflosse commented Mar 7, 2018

@Floessie Can you test this on a non __x86_64__ build?

#include "LUT.h"
LUTf test(65536);
std::cout << alignof(test) << std::endl;

On __x86_64__ the output is 16.

@heckflosse
Copy link
Collaborator

@Floessie In case on i686 builds the output of the test code from above is 8 this patch probably fixes the bug:

diff --git a/rtengine/LUT.h b/rtengine/LUT.h
index abc2b0bd..5824fd01 100644
--- a/rtengine/LUT.h
+++ b/rtengine/LUT.h
@@ -88,7 +88,7 @@ using LUTd = LUT<double>;
 using LUTuc = LUT<uint8_t>;
 
 template<typename T>
-class LUT :
+class alignas(16) LUT :
     public rtengine::NonCopyable
 {
 protected:

@heckflosse
Copy link
Collaborator

@Llerr I pushed a possible fix to dev branch. Can you please test?

@Floessie
Copy link
Collaborator

Floessie commented Mar 8, 2018

@heckflosse It reports 16 with GCC 7.3 on i686, I'm afraid.

@gaaned92
Copy link

gaaned92 commented Mar 8, 2018

At least on W10 32 bits no improvment

Thread 1 received signal SIGSEGV, Segmentation fault.
0x005ae100 in LUT<float>::LUT (this=0xecff7a8)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtengine/LUT.h:177
177             maxsv = ZEROV;
(gdb) bt full
#0  0x005ae100 in LUT<float>::LUT (this=0xecff7a8)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtengine/LUT.h:177
No locals.
#1  MyCurve::MyCurve (this=0xecff6f8,
    __vtt_parm=0xbbc0a0 <VTT for MyDiagonalCurve+4>,
    __in_chrg=<optimized out>)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/mycurve.cc:24
No locals.
#2  0x005af5bc in MyDiagonalCurve::MyDiagonalCurve (this=0xecff6f8,
    __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/mydiagonalcurve.cc:24
No locals.
#3  0x0049ab14 in DiagonalCurveEditorSubGroup::DiagonalCurveEditorSubGroup (
    this=0xecf15b8, prt=<optimized out>, curveDir=...)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/diagonalcurveeditorsubgroup.cc:133
        evdarks = <optimized out>
        evshadows = <optimized out>
        paramCurveSliderBox = <optimized out>
        sideEnd = Gtk::POS_BOTTOM
        custombbox = <optimized out>
        NURBSbbox = <optimized out>
        parambbox = <optimized out>
        evhighlights = <optimized out>
        evlights = <optimized out>
#4  0x0048fd36 in CurveEditorGroup::addCurve (this=this@entry=0xecf1420,
    cType=cType@entry=CT_Diagonal, curveLabel=...,
    relatedWidget=relatedWidget@entry=0xecde378,
    expandRelatedWidget=expandRelatedWidget@entry=true,
    periodic=periodic@entry=true)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/curveeditorgroup.cc:82
        newCE = <optimized out>
#5  0x006bb62b in ToneCurve::ToneCurve (this=0xeb4f748,
    __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/tonecurve.cc:147
        m = <optimized out>
        bottomMilestones = {<std::_Vector_base<GradientMilestone, std::allocator<GradientMilestone> >> = {
            _M_impl = {<std::allocator<GradientMilestone>> = {<__gnu_cxx::new_allocator<GradientMilestone>> = {<No data fields>}, <No data fields>},
              _M_start = 0xec420d8, _M_finish = 0xec42128,
              _M_end_of_storage = 0xec42128}}, <No data fields>}
        lab = <optimized out>
#6  0x006c4322 in ToolPanelCoordinator::ToolPanelCoordinator (this=0xeb49388,
    batch=true)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/toolpanelcoord.cc:42
        type = <optimized out>
#7  0x0041700b in BatchToolPanelCoordinator::BatchToolPanelCoordinator (
    this=0xeb49388, parent=0xeae1f80)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/batchtoolpanelcoord.cc:28
No locals.
#8  0x0052b278 in FilePanel::FilePanel (this=0xeae1f80,
    __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/filepanel.cc:55
        devLab = <optimized out>
        exportLab = <optimized out>
        dirSelected = {<sigc::signal2<void, Glib::ustring const&, Glib::ustring const&, sigc::nil>> = {<sigc::signal_base> = {<sigc::trackable> = {
                callback_list_ = 0x76f4da33 <ntdll!RtlReAllocateHeap+67>},
              impl_ = 0xeaa72d0}, <No data fields>}, <No data fields>}
        sFilterPanel = <optimized out>
        sExportPanel = <optimized out>
        inspectLab = <optimized out>
        filtLab = <optimized out>
#9  0x006816c1 in RTWindow::RTWindow (this=0xebe08e8,
    __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/rtwindow.cc:182
        lbq = <optimized out>
        actionGrid = <optimized out>
        fpanelLabelGrid = <optimized out>
        fpl = <optimized out>
        preferences = <optimized out>
        lMonitorRect = {gobject_ = {x = 0, y = 0, width = 1280,
            height = 1024}}
#10 0x005a8742 in (anonymous namespace)::create_rt_window ()
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/main.cc:348
        icon_path = {static npos = 4294967295, string_ = {
            static npos = 4294967295,
            _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
              _M_p = 0xea91cb8 "D:\\rawtherapee\\rtinstall\\devrelease32\\.\\images"}, _M_string_length = 46, {
              _M_local_buf = ".\000\000\000c\001\000P\000\000\000\000\060\032▒\016", _M_allocated_capacity = 46}}}
        defaultIconTheme = {pCppObject_ = 0xc94a2d0}
        screen = {pCppObject_ = 0xeaa1b28}
        rtWindow = <optimized out>
#11 0x005aa584 in main (argc=<optimized out>, argv=<optimized out>)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/main.cc:659
        m = <incomplete type>
        rtWindow = <optimized out>
        exname = "D:\\rawtherapee\\rtinstall\\devrelease32\\rawtherapee-debug.exe", '\000' <repeats 452 times>
        exePath = {static npos = 4294967295, string_ = {
            static npos = 4294967295,
            _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
              _M_p = 0xc94fbb8 "D:\\rawtherapee\\rtinstall\\devrelease32"},
            _M_string_length = 37, {
              _M_local_buf = "%\000\000\000\005\000\004\001\230C▒\fh▒▒\a",
              _M_allocated_capacity = 37}}}
        exnameU = L"D:\\rawtherapee\\rtinstall\\devrelease32\\rawtherapee-debug.exe", '\000' <repeats 452 times>
        consoleOpened = <optimized out>
        ret = 0
(gdb)

@Floessie
Copy link
Collaborator

Floessie commented Mar 8, 2018

@gaaned92 André, could you also test Ingo's patch with the class alignas(16) LUT above?

@Llerr
Copy link
Author

Llerr commented Mar 8, 2018

@Llerr I pushed a possible fix to dev branch. Can you please test?
What branch is use? 5.4-rc2
Yes, but only 12.03.2018.
I install 5.3 from repositary, all work. But when i compile it`s crashed.

@Beep6581
Copy link
Owner

Beep6581 commented Mar 8, 2018

@Llerr after cloning, do git checkout dev, then do git describe to make sure you're at least on 5.3-669-g478e1c05, then compile. In fact, paste the output of git describe here.
More info: http://rawpedia.rawtherapee.com/Linux#Choose_a_branch

@gaaned92
Copy link

gaaned92 commented Mar 8, 2018

@Floessie
I applied the patch but with no succes

[Thread 11368.0x2ab0 exited with code 0]

Thread 1 received signal SIGSEGV, Segmentation fault.
0x005ae100 in LUT<float>::LUT (this=0xec2d788)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtengine/LUT.h:177
177             maxsv = ZEROV;
(gdb) bt full
#0  0x005ae100 in LUT<float>::LUT (this=0xec2d788)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtengine/LUT.h:177
No locals.
#1  MyCurve::MyCurve (this=0xec2d6d8,
    __vtt_parm=0xbbc0a0 <VTT for MyDiagonalCurve+4>,
    __in_chrg=<optimized out>)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/mycurve.cc:24
No locals.
#2  0x005af5bc in MyDiagonalCurve::MyDiagonalCurve (this=0xec2d6d8,
    __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/mydiagonalcurve.cc:24
No locals.
#3  0x0049ba91 in DiagonalCurveEditorSubGroup::DiagonalCurveEditorSubGroup (
    this=0xec042e0, prt=<optimized out>, curveDir=...)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/diagonalcurveeditorsubgroup.cc:210
        evdarks = <optimized out>
        evshadows = <optimized out>
        paramCurveSliderBox = <optimized out>
        sideEnd = Gtk::POS_BOTTOM
        custombbox = <optimized out>
        NURBSbbox = <optimized out>
        parambbox = <optimized out>
        evhighlights = <optimized out>
        evlights = <optimized out>
#4  0x0048fd36 in CurveEditorGroup::addCurve (this=this@entry=0xec04148,
    cType=cType@entry=CT_Diagonal, curveLabel=...,
    relatedWidget=relatedWidget@entry=0xec010c0,
    expandRelatedWidget=expandRelatedWidget@entry=true,
    periodic=periodic@entry=true)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/curveeditorgroup.cc:82
        newCE = <optimized out>
#5  0x006bb62b in ToneCurve::ToneCurve (this=0xeaf0658,
    __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/tonecurve.cc:147
        m = <optimized out>
        bottomMilestones = {<std::_Vector_base<GradientMilestone, std::allocator<GradientMilestone> >> = {
            _M_impl = {<std::allocator<GradientMilestone>> = {<__gnu_cxx::new_allocator<GradientMilestone>> = {<No data fields>}, <No data fields>},
              _M_start = 0xeb64d60, _M_finish = 0xeb64db0,
              _M_end_of_storage = 0xeb64db0}}, <No data fields>}
        lab = <optimized out>
#6  0x006c4322 in ToolPanelCoordinator::ToolPanelCoordinator (this=0xeae9858,
    batch=true)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/toolpanelcoord.cc:42
        type = <optimized out>
#7  0x0041700b in BatchToolPanelCoordinator::BatchToolPanelCoordinator (
    this=0xeae9858, parent=0xea1c130)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/batchtoolpanelcoord.cc:28
No locals.
#8  0x0052b278 in FilePanel::FilePanel (this=0xea1c130,
    __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/filepanel.cc:55
        devLab = <optimized out>
        exportLab = <optimized out>
        dirSelected = {<sigc::signal2<void, Glib::ustring const&, Glib::ustring const&, sigc::nil>> = {<sigc::signal_base> = {<sigc::trackable> = {
                callback_list_ = 0x76f4da33 <ntdll!RtlReAllocateHeap+67>},
              impl_ = 0xe9d6e78}, <No data fields>}, <No data fields>}
        sFilterPanel = <optimized out>
        sExportPanel = <optimized out>
        inspectLab = <optimized out>
        filtLab = <optimized out>
#9  0x006816c1 in RTWindow::RTWindow (this=0xe9f5838,
    __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/rtwindow.cc:182
        lbq = <optimized out>
        actionGrid = <optimized out>
        fpanelLabelGrid = <optimized out>
        fpl = <optimized out>
        preferences = <optimized out>
        lMonitorRect = {gobject_ = {x = 0, y = 0, width = 1280,
            height = 1024}}
#10 0x005a8742 in (anonymous namespace)::create_rt_window ()
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/main.cc:348
        icon_path = {static npos = 4294967295, string_ = {
            static npos = 4294967295,
            _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
              _M_p = 0xe9b4400 "D:\\rawtherapee\\rtinstall\\devrelease32\\.\\images"}, _M_string_length = 46, {
              _M_local_buf = ".\000\000\000c\001\000P\000\000\000\000xA▒\016", _M_allocated_capacity = 46}}}
        defaultIconTheme = {pCppObject_ = 0xea46bb0}
        screen = {pCppObject_ = 0xea5db00}
        rtWindow = <optimized out>
#11 0x005aa584 in main (argc=<optimized out>, argv=<optimized out>)
    at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/main.cc:659
        m = <incomplete type>
        rtWindow = <optimized out>
        exname = "D:\\rawtherapee\\rtinstall\\devrelease32\\rawtherapee-debug.exe", '\000' <repeats 452 times>
        exePath = {static npos = 4294967295, string_ = {
            static npos = 4294967295,
            _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
              _M_p = 0xc5af940 "D:\\rawtherapee\\rtinstall\\devrelease32"},
            _M_string_length = 37, {
              _M_local_buf = "%\000\000\000▒\023\004▒▒▒\\\fh▒▒\a",
              _M_allocated_capacity = 37}}}
        exnameU = L"D:\\rawtherapee\\rtinstall\\devrelease32\\rawtherapee-debug.exe", '\000' <repeats 452 times>
        consoleOpened = <optimized out>
        ret = 0
(gdb)

beginning of LUT.h

#ifndef LUT_H_
#define LUT_H_

#include <cstring>
#include <cstdint>
#include <cassert>

#ifndef NDEBUG
#include <glibmm.h>
#include <fstream>
#endif

#include "opthelper.h"
#include "rt_math.h"
#include "noncopyable.h"

// Bit representations of flags
enum {
    LUT_CLIP_BELOW = 1 << 0,
    LUT_CLIP_ABOVE = 1 << 1
};

template<typename T>
class LUT;

using LUTf = LUT<float>;
using LUTi = LUT<int32_t>;
using LUTu = LUT<uint32_t>;
using LUTd = LUT<double>;
using LUTuc = LUT<uint8_t>;

template<typename T>
class alignas(16) LUT :
    public rtengine::NonCopyable
{
protected:
    // list of variables ordered to improve cache speed
    int maxs;
    float maxsf;
    T * data;
    unsigned int clip;
    unsigned int size;
    unsigned int upperBound;  // always equals size-1, parameter created for performance reason
private:
    unsigned int owner;
#ifdef __SSE2__
    alignas(16) vfloat maxsv;
    alignas(16) vfloat sizev;
    alignas(16) vint sizeiv;
#endif
public:
    /// convenience flag! If one doesn't want to delete the buffer but want to flag it to be recomputed...
    /// The user have to handle it itself, even if some method can (re)initialize it
    bool dirty;

@heckflosse
Copy link
Collaborator

This patch for dev should force unaligned access on vfloat/vint members of LUT when building with SSE2 support for non x86/64 targets:

diff --git a/rtengine/LUT.h b/rtengine/LUT.h
index abc2b0bd..709bf860 100644
--- a/rtengine/LUT.h
+++ b/rtengine/LUT.h
@@ -133,9 +133,15 @@ public:
         maxs = size - 2;
         maxsf = (float)maxs;
 #ifdef __SSE2__
+#ifdef __x86_64__
         maxsv =  F2V( maxs );
         sizeiv =  _mm_set1_epi32( (int)(size - 1) );
         sizev = F2V( size - 1 );
+#else
+        _mm_storeu_ps((float*)&maxsv, F2V( maxs ));
+        _mm_storeu_si128(&sizeiv, _mm_set1_epi32( (int)(size - 1) ));
+        _mm_storeu_ps((float*)&sizev, F2V( size - 1 ));
+#endif
 #endif
     }
     void operator ()(int s, int flags = 0xfffffff)
@@ -163,9 +169,16 @@ public:
         maxs = size - 2;
         maxsf = (float)maxs;
 #ifdef __SSE2__
+#ifdef __x86_64__
         maxsv =  F2V( maxs );
         sizeiv =  _mm_set1_epi32( (int)(size - 1) );
         sizev = F2V( size - 1 );
+#else
+        _mm_storeu_ps((float*)&maxsv, F2V( maxs ));
+        _mm_storeu_si128(&sizeiv, _mm_set1_epi32( (int)(size - 1) ));
+        _mm_storeu_ps((float*)&sizev, F2V( size - 1 ));
+
+#endif
 #endif
     }
 
@@ -174,9 +187,15 @@ public:
         data = nullptr;
         reset();
 #ifdef __SSE2__
+#ifdef __x86_64__
         maxsv = ZEROV;
         sizev = ZEROV;
         sizeiv = _mm_setzero_si128();
+#else
+        _mm_storeu_ps((float*)&maxsv, ZEROV);
+        _mm_storeu_ps((float*)&sizev, ZEROV);
+        _mm_storeu_si128(&sizeiv, _mm_setzero_si128());
+#endif
 #endif
     }
 
@@ -238,9 +257,15 @@ public:
             this->maxs = this->size - 2;
             this->maxsf = (float)this->maxs;
 #ifdef __SSE2__
+#ifdef __x86_64__
             this->maxsv =  F2V( this->size - 2);
             this->sizeiv =  _mm_set1_epi32( (int)(this->size - 1) );
             this->sizev = F2V( this->size - 1 );
+#else
+            _mm_storeu_ps((float*)&this->maxsv, F2V( this->size - 2));
+            _mm_storeu_si128(&this->sizeiv, _mm_set1_epi32( (int)(this->size - 1) ));
+            _mm_storeu_ps((float*)&this->sizev, F2V( this->size - 1 ));
+#endif
 #endif
         }
 
@@ -311,7 +336,11 @@ public:
 
         // Clamp and convert to integer values. Extract out of SSE register because all
         // lookup operations use regular addresses.
+#ifdef __x86_64__
         vfloat clampedIndexes = vmaxf(ZEROV, vminf(maxsv, indexv));
+#else
+        vfloat clampedIndexes = vmaxf(ZEROV, vminf(_mm_loadu_ps((float*)&maxsv), indexv));
+#endif
         vint indexes = _mm_cvttps_epi32(clampedIndexes);
         int indexArray[4];
         _mm_storeu_si128(reinterpret_cast<__m128i*>(&indexArray[0]), indexes);
@@ -343,7 +372,11 @@ public:
 
         // Clamp and convert to integer values. Extract out of SSE register because all
         // lookup operations use regular addresses.
+#ifdef __x86_64__
         vfloat clampedIndexes = vmaxf(ZEROV, vminf(maxsv, indexv));
+#else
+        vfloat clampedIndexes = vmaxf(ZEROV, vminf(_mm_loadu_ps((float*)&maxsv), indexv));
+#endif
         vint indexes = _mm_cvttps_epi32(clampedIndexes);
         int indexArray[4];
         _mm_storeu_si128(reinterpret_cast<__m128i*>(&indexArray[0]), indexes);
@@ -363,7 +396,11 @@ public:
         vfloat lower = _mm_castsi128_ps(_mm_unpacklo_epi64(temp0, temp1));
         vfloat upper = _mm_castsi128_ps(_mm_unpackhi_epi64(temp0, temp1));
 
+#ifdef __x86_64__
         vfloat diff = vmaxf(ZEROV, vminf(sizev, indexv)) - _mm_cvtepi32_ps(indexes);
+#else
+        vfloat diff = vmaxf(ZEROV, vminf(_mm_loadu_ps((float*)&sizev), indexv)) - _mm_cvtepi32_ps(indexes);
+#endif
         return vintpf(diff, upper, lower);
     }
 
@@ -374,7 +411,11 @@ public:
 
         // Clamp and convert to integer values. Extract out of SSE register because all
         // lookup operations use regular addresses.
+#ifdef __x86_64__
         vfloat clampedIndexes = vmaxf(ZEROV, vminf(maxsv, indexv));
+#else
+        vfloat clampedIndexes = vmaxf(ZEROV, vminf(_mm_loadu_ps((float*)&maxsv), indexv));
+#endif
         vint indexes = _mm_cvttps_epi32(clampedIndexes);
         int indexArray[4];
         _mm_storeu_si128(reinterpret_cast<__m128i*>(&indexArray[0]), indexes);
@@ -402,7 +443,11 @@ public:
     vfloat operator[](vint idxv ) const
     {
         vfloat tempv, p1v;
+#ifdef __x86_64__
         idxv = _mm_max_epi32( _mm_setzero_si128(), _mm_min_epi32(idxv, sizeiv));
+#else
+        idxv = _mm_max_epi32( _mm_setzero_si128(), _mm_min_epi32(idxv, _mm_loadu_si128((__m128i*)&sizeiv)));
+#endif
         // access the LUT 4 times and shuffle the values into p1v
 
         int idx;
@@ -443,7 +488,11 @@ public:
     {
         vfloat tempv, p1v;
         tempv = _mm_cvtepi32_ps(idxv);
+#ifdef __x86_64__
         tempv = _mm_min_ps( tempv, sizev );
+#else
+        tempv = _mm_min_ps( tempv, _mm_loadu_ps((float*)&sizev ));
+#endif
         idxv = _mm_cvttps_epi32(_mm_max_ps( tempv, _mm_setzero_ps( )  ));
         // access the LUT 4 times and shuffle the values into p1v
 
@@ -691,9 +740,15 @@ public:
         maxs = size - 2;
         maxsf = (float)maxs;
 #ifdef __SSE2__
+#ifdef __x86_64__
         maxsv =  F2V( size - 2);
         sizeiv =  _mm_set1_epi32( (int)(size - 1) );
         sizev = F2V( size - 1 );
+#else
+        _mm_storeu_ps((float*)&maxsv, F2V( size - 2));
+        _mm_storeu_si128(&sizeiv, _mm_set1_epi32( (int)(size - 1) ));
+        _mm_storeu_ps((float*)&sizev, F2V( size - 1 ));
+#endif
 #endif
     }
 

@Floessie
Copy link
Collaborator

Floessie commented Mar 8, 2018

@heckflosse Isn't unaligned heap our problem here, given the fact, that this MyDiagonalCurve is allocated via new?

@heckflosse
Copy link
Collaborator

@Floessie Yes, I think that's the problem.

@Floessie
Copy link
Collaborator

Floessie commented Mar 8, 2018

@heckflosse This is just a PoC (call it a dirty hack):

diff --git a/rtgui/main.cc b/rtgui/main.cc
index 2ae442e5a..34e7fe68a 100644
--- a/rtgui/main.cc
+++ b/rtgui/main.cc
@@ -51,6 +51,18 @@
 #include "conio.h"
 #endif
 
+#include <stdlib.h>
+
+void* operator new(decltype(sizeof(0)) n) noexcept(false)
+{
+    return aligned_alloc(16, n);
+}
+
+void operator delete(void* p) noexcept
+{
+    free(p);
+}
+
 // Set this to 1 to make RT work when started with Eclipse and arguments, at least on Windows platform
 #define ECLIPSE_ARGS 0
 

This requires C11.

@gaaned92
Copy link

gaaned92 commented Mar 8, 2018

If you want I test something, ping me.

@Floessie
Copy link
Collaborator

Floessie commented Mar 8, 2018

@gaaned92 Ping! 😄 Could you test my dirty hack, so we know if it helps at all?

@heckflosse
Copy link
Collaborator

heckflosse commented Mar 8, 2018

@Floessie I tried your patch on msys2/mingw64.

There is no aligned_alloc(), so I used the suggested _aligned_malloc().
That compiles, but segfaults.
Then I also used _aligned_free() instead of free()
Still segfaults.
I even tried _aligned_malloc(16, n + 16 - ((n%16) ? n&16 : 16) ); because of this
Still segfaults :(

Edit: I changed _aligned_malloc(16, n) to _aligned_malloc(n,16)
That terminates rt immediately after start without a segfault :(

@Floessie
Copy link
Collaborator

Floessie commented Mar 8, 2018

That's a pity. Works on Linux amd64 and i686. But I don't have real hardware that would SEGV on unaligned SSE accesses, so I can't test...

@heckflosse
Copy link
Collaborator

@Floessie afaik all cpus fail on aligned SSE access on unaligned memory

@heckflosse
Copy link
Collaborator

heckflosse commented Mar 8, 2018

@gaaned92 Ping for testing my patch

@gaaned92
Copy link

gaaned92 commented Mar 8, 2018

@heckflosse
with your patch
for build = debug ok

for build = release crash


Thread 1 received signal SIGSEGV, Segmentation fault.
0x005bb0be in ?? ()
(gdb) bt full
#0  0x005bb0be in ?? ()
No symbol table info available.
#1  0x005bcd7e in ?? ()
No symbol table info available.
#2  0x004a3c83 in ?? ()
No symbol table info available.
#3  0x0049669f in ?? ()
No symbol table info available.
#4  0x006cc8ca in ?? ()
No symbol table info available.
#5  0x006d47fd in ?? ()
No symbol table info available.
#6  0x0041c09a in ?? ()
No symbol table info available.
#7  0x0053a5aa in ?? ()
No symbol table info available.
#8  0x00689c10 in ?? ()
No symbol table info available.
#9  0x005b7b8d in ?? ()
No symbol table info available.
#10 0x00b260af in ?? ()
No symbol table info available.
#11 0x004013cb in ?? ()
No symbol table info available.
#12 0x73c48654 in KERNEL32!BaseThreadInitThunk ()
   from C:\WINDOWS\System32\kernel32.dll
No symbol table info available.
#13 0x76f74a77 in ntdll!RtlGetAppContainerNamedObjectPath ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#14 0x76f74a47 in ntdll!RtlGetAppContainerNamedObjectPath ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#15 0x00000000 in ?? ()
No symbol table info available.
(gdb)

Not a lot of information here!

@heckflosse
Copy link
Collaborator

@gaaned92 Thanks a lot for testing. Because the debug build does not crash, I guess the bug with sse access to unaligned data in LUT.h is fixed with my patch and the crash in release build is a different one (maybe specific to Win32 builds as now all the SSEFUNCTION macros, which were needed for Win32 SSE builds, are removed). But that's just a guess.

Nevertheless it would be good to know how a release build on i686 Linux behaves with the patch @Llerr

@Llerr
Copy link
Author

Llerr commented Mar 12, 2018

@Beep6581

$ git describe 
5.3-680-g8f9fb2d
(gdb) run
Starting program: /opt/rawtherapee/bin/rawtherapee 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
[New Thread 0xb53fdb40 (LWP 7280)]
[New Thread 0xb4bfcb40 (LWP 7281)]
[New Thread 0xb43fbb40 (LWP 7282)]
[New Thread 0xb3bfab40 (LWP 7283)]
[New Thread 0xb33f9b40 (LWP 7284)]
[New Thread 0xb2bf8b40 (LWP 7285)]
[New Thread 0xb23f7b40 (LWP 7286)]
[New Thread 0xb12ffb40 (LWP 7287)]
[New Thread 0xb0afeb40 (LWP 7288)]

Thread 1 "rawtherapee" received signal SIGSEGV, Segmentation fault.
0x08680d41 in LUT<float>::LUT (this=0x94173f8) at /home/lex/programs/code-rawtherapee/rtgui/../rtengine/LUT.h:177
177	        maxsv = ZEROV;
(gdb) where
#0  0x08680d41 in LUT<float>::LUT (this=0x94173f8) at /home/lex/programs/code-rawtherapee/rtgui/../rtengine/LUT.h:177
#1  0x0867fb00 in MyCurve::MyCurve (this=0x9417348, __vtt_parm=0x8da988c <VTT for MyDiagonalCurve+4>, __in_chrg=<optimized out>)
    at /home/lex/programs/code-rawtherapee/rtgui/mycurve.cc:24
#2  0x0868131e in MyDiagonalCurve::MyDiagonalCurve (this=0x9417348, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at /home/lex/programs/code-rawtherapee/rtgui/mydiagonalcurve.cc:24
#3  0x084eea97 in DiagonalCurveEditorSubGroup::DiagonalCurveEditorSubGroup (this=0x9417270, prt=0x94164b8, curveDir=...)
    at /home/lex/programs/code-rawtherapee/rtgui/diagonalcurveeditorsubgroup.cc:56
#4  0x084e3468 in CurveEditorGroup::addCurve (this=0x94164b8, cType=CT_Diagonal, curveLabel=..., relatedWidget=0x940f8f0, 
    expandRelatedWidget=true, periodic=true) at /home/lex/programs/code-rawtherapee/rtgui/curveeditorgroup.cc:82
#5  0x087e34ce in ToneCurve::ToneCurve (this=0x9388c18, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at /home/lex/programs/code-rawtherapee/rtgui/tonecurve.cc:147
#6  0x087f222c in ToolPanelCoordinator::ToolPanelCoordinator (this=0x9393c00, batch=true)
    at /home/lex/programs/code-rawtherapee/rtgui/toolpanelcoord.cc:42
#7  0x084321ab in BatchToolPanelCoordinator::BatchToolPanelCoordinator (this=0x9393c00, parent=0x9294000)
    at /home/lex/programs/code-rawtherapee/rtgui/batchtoolpanelcoord.cc:28
#8  0x085c2abb in FilePanel::FilePanel (this=0x9294000, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at /home/lex/programs/code-rawtherapee/rtgui/filepanel.cc:55
#9  0x08793b42 in RTWindow::RTWindow (this=0x92da278, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at /home/lex/programs/code-rawtherapee/rtgui/rtwindow.cc:182
#10 0x086764c3 in (anonymous namespace)::create_rt_window () at /home/lex/programs/code-rawtherapee/rtgui/main.cc:348
#11 0x08677a80 in main (argc=1, argv=0xbfffef74) at /home/lex/programs/code-rawtherapee/rtgui/main.cc:658

it`s crushed :(

$ lscpu 
Архитектура:i686
CPU op-mode(s):        32-bit, 64-bit
Порядок байтов:Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Потоков на ядро:2
Ядер на сокет:4
Сокет(ы):        1
Vendor ID:             GenuineIntel
Семейство CPU:6
Модель:          58
Model name:            Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
Stepping:              9
CPU МГц:            2193.132
CPU max MHz:           3900,0000
CPU min MHz:           1600,0000
BogoMIPS:              6784.23
Виртуализация:VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb retpoline tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts

@Floessie
Copy link
Collaborator

@heckflosse

afaik all cpus fail on aligned SSE access on unaligned memory

Tested again with latest dev in a Debian Testing i686 QEMU VM: Doesn't crash. There seems to be a difference between hardware and emulation...

@heckflosse
Copy link
Collaborator

heckflosse commented Mar 12, 2018

@Floessie If you want to know about your emulation handling things different, the following code should crash on all systems:

float test[8] = { 1.f };
for(int i = 0; i < 4; ++i) {
    _mm_store_ps(&test[i], _mm_load_ps(&test[i+1]) + _mm_set1_ps(1.f));
}
printf("%f\n",test[3]);

@Floessie
Copy link
Collaborator

@heckflosse Proof: My old QEMU 2.1.2 doesn't crash on the snippet above, but happily prints out 2.000000. 😞

@heckflosse
Copy link
Collaborator

@Floessie Damn. My example was so simple that the compiler most likely could predict the output. Here's a working (crashing) example:

diff --git a/rtengine/rawimagesource.cc b/rtengine/rawimagesource.cc
index 0420c5b7..8dd30b05 100644
--- a/rtengine/rawimagesource.cc
+++ b/rtengine/rawimagesource.cc
@@ -1520,6 +1520,10 @@ void RawImageSource::vflip (Imagefloat* image)
 int RawImageSource::load (const Glib::ustring &fname)
 {
 
+float test[8] = { 1.f };
+LFModifier::test(test);
+std::cout << test[3] << std::endl;
+
     MyTime t1, t2;
     t1.set();
     fileName = fname;
diff --git a/rtengine/rtlensfun.cc b/rtengine/rtlensfun.cc
index 8216f0b8..ebd8605e 100644
--- a/rtengine/rtlensfun.cc
+++ b/rtengine/rtlensfun.cc
@@ -43,6 +43,11 @@ LFModifier::operator bool() const
     return data_;
 }
 
+void LFModifier::test(float *test) {
+    for(int i = 0; i < 4; ++i) {
+        _mm_store_ps(&test[i], _mm_load_ps(&test[i+1]) + _mm_set1_ps(1.f));
+    }
+}
 
 void LFModifier::correctDistortion(double &x, double &y, int cx, int cy, double scale) const
 {
diff --git a/rtengine/rtlensfun.h b/rtengine/rtlensfun.h
index 7690ef54..ebb7b7f2 100644
--- a/rtengine/rtlensfun.h
+++ b/rtengine/rtlensfun.h
@@ -47,9 +47,9 @@ public:
     void correctCA(double &x, double &y, int cx, int cy, int channel) const override;
     void processVignetteLine(int width, int y, float *line) const override;
     void processVignetteLine3Channels(int width, int y, float *line) const override;
+    static void test(float *test);
 
     Glib::ustring getDisplayString() const;
-
 private:
     LFModifier(lfModifier *m, bool swap_xy, int flags);
 

And here's the same, but with unaligned access, which does not crash:

diff --git a/rtengine/rawimagesource.cc b/rtengine/rawimagesource.cc
index 0420c5b7..8dd30b05 100644
--- a/rtengine/rawimagesource.cc
+++ b/rtengine/rawimagesource.cc
@@ -1520,6 +1520,10 @@ void RawImageSource::vflip (Imagefloat* image)
 int RawImageSource::load (const Glib::ustring &fname)
 {
 
+float test[8] = { 1.f };
+LFModifier::test(test);
+std::cout << test[3] << std::endl;
+
     MyTime t1, t2;
     t1.set();
     fileName = fname;
diff --git a/rtengine/rtlensfun.cc b/rtengine/rtlensfun.cc
index 8216f0b8..938ebd7a 100644
--- a/rtengine/rtlensfun.cc
+++ b/rtengine/rtlensfun.cc
@@ -43,6 +43,11 @@ LFModifier::operator bool() const
     return data_;
 }
 
+void LFModifier::test(float *test) {
+    for(int i = 0; i < 4; ++i) {
+        _mm_storeu_ps(&test[i], _mm_loadu_ps(&test[i+1]) + _mm_set1_ps(1.f));
+    }
+}
 
 void LFModifier::correctDistortion(double &x, double &y, int cx, int cy, double scale) const
 {
diff --git a/rtengine/rtlensfun.h b/rtengine/rtlensfun.h
index 7690ef54..ebb7b7f2 100644
--- a/rtengine/rtlensfun.h
+++ b/rtengine/rtlensfun.h
@@ -47,9 +47,9 @@ public:
     void correctCA(double &x, double &y, int cx, int cy, int channel) const override;
     void processVignetteLine(int width, int y, float *line) const override;
     void processVignetteLine3Channels(int width, int y, float *line) const override;
+    static void test(float *test);
 
     Glib::ustring getDisplayString() const;
-
 private:
     LFModifier(lfModifier *m, bool swap_xy, int flags);
 

To test, apply the patch to dev and load a raw file.

@Floessie
Copy link
Collaborator

Floessie commented Mar 13, 2018

@heckflosse Segfaults! 🎉 Second patch not yet tested. Edit: Second patch doesn't segfault, as expected.

@Floessie Floessie changed the title RT crashes on i686 (most likely caused by algined SSE access on unaligned address) RT crashes on i686 (most likely caused by aligned SSE access on unaligned address) Mar 13, 2018
@heckflosse
Copy link
Collaborator

@Llerr It seems that in your latest test my patch from #4432 (comment) was not applied.

Your output shows

0x08680d41 in LUT<float>::LUT (this=0x94173f8) at /home/lex/programs/code-rawtherapee/rtgui/../rtengine/LUT.h:177
177	        maxsv = ZEROV;

while with applied patch line 177 is:

        _mm_storeu_ps((float*)&maxsv, F2V( maxs ));

@Floessie
Copy link
Collaborator

@heckflosse But Ingo, there are a lot of other places, where unaligned loads and stores are used (not to mention ZEROV assignments). Aren't they all relevant and not only LUT.h? And why is it, that it doesn't crash in my VM now that we know it should? Are my GLIBC 2.27 and GCC 7.3 aligning on 16B even on i686? And if so, is it worth the trouble fixing all unaligned accesses when this is going to be mended with an up-to-date toolchain?

😕,
Flössie

@Floessie
Copy link
Collaborator

Seek and ye shall find: kraj/glibc@4e61a6b#diff-f2db25b59456b2ed40b038a3935be358 (nota bene: the hash has no 42 in it as it should have 😁)

HTH,
Flössie

@Beep6581
Copy link
Owner

@Floessie maybe it's been truncated ;)

@Floessie
Copy link
Collaborator

Floessie commented Mar 15, 2018

@heckflosse In the meantime I've set up a VM with Debian 9.4 i386. dev segfaults immediately somewhere in MyCurve as expected. As malloc() is a weak symbol in GLIBC, I simply overwrote it, and that made RT start, open an image, and even save it.

Here's a patch as a starting point (must be #ifdefed like hell, of course):

diff --git a/rtgui/main.cc b/rtgui/main.cc
index 55d27125..b969ca8b 100644
--- a/rtgui/main.cc
+++ b/rtgui/main.cc
@@ -51,6 +51,13 @@
 #include "conio.h"
 #endif
 
+#include <malloc.h>
+
+void* malloc(size_t size)
+{
+    return memalign(16, size);
+}
+
 // Set this to 1 to make RT work when started with Eclipse and arguments, at least on Windows platform
 #define ECLIPSE_ARGS 0
 

Best,
Flössie

@Floessie
Copy link
Collaborator

Here's a full integration, which fixes the problem for i386 Linux. Successfully tested on Debian 9.4 i386, Debian 9.4 amd64, and Debian 10 i386. The malloc() hack is only active on the first.

diff --git a/CMakeLists.txt b/CMakeLists.txt
index d57e1827..8dcf6793 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -103,6 +103,9 @@ if(HAVE_X86_SSE_MATH)
     set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -msse2 -mfpmath=sse")
 endif()
 
+# On i386 Linux we can fix unaligned SSE malloc (see GitHub issue #4432)
+include(FindUnalignedMalloc)
+
 if(WIN32)
     # Add additional paths. Look in the MinGW path first, then in the Gtkmm path.
     # If you wish to build some dependent libraries, you have to install them in MinGW to use them:
diff --git a/cmake/modules/FindUnalignedMalloc.cmake b/cmake/modules/FindUnalignedMalloc.cmake
new file mode 100644
index 00000000..4ddfb2af
--- /dev/null
+++ b/cmake/modules/FindUnalignedMalloc.cmake
@@ -0,0 +1,49 @@
+# This file is part of RawTherapee.
+#
+# Copyright (C) 2018 Flössie <floessie.mail@gmail.com>
+#
+# RawTherapee is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# RawTherapee is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with RawTherapee.  If not, see <http://www.gnu.org/licenses/>.
+
+include(CheckCXXSourceCompiles)
+
+set(CMAKE_REQUIRED_QUIET_COPY "${CMAKE_REQUIRED_QUIET}")
+set(CMAKE_REQUIRED_QUIET ON)
+
+set(TEST_SOURCE
+"
+#include <cstddef>
+#include <type_traits>
+
+int main()
+{
+#if defined(__SSE2__) && (defined(__i386) || defined(_M_IX86)) && defined(__linux)
+    static_assert(std::alignment_of<std::max_align_t>::value >= 16, \"Unaligned heap objects possible\");
+#endif
+
+    return 0;
+}
+")
+
+CHECK_CXX_SOURCE_COMPILES("${TEST_SOURCE}" HAVE_ALIGNED_MALLOC)
+
+if(NOT HAVE_ALIGNED_MALLOC)
+    set(HAVE_UNALIGNED_MALLOC 1)
+else()
+    unset(HAVE_ALIGNED_MALLOC)
+endif()
+
+unset(TEST_SOURCE)
+
+set(CMAKE_REQUIRED_QUIET "${CMAKE_REQUIRED_QUIET_COPY}")
+unset(CMAKE_REQUIRED_QUIET_COPY)
diff --git a/rtgui/CMakeLists.txt b/rtgui/CMakeLists.txt
index 540b4d26..c39ed004 100644
--- a/rtgui/CMakeLists.txt
+++ b/rtgui/CMakeLists.txt
@@ -1,5 +1,6 @@
 # Common source files for both CLI and non-CLI execautables
 set(CLISOURCEFILES
+    alignedmalloc.cc
     edit.cc
     main-cli.cc
     multilangmgr.cc
@@ -11,6 +12,7 @@ set(CLISOURCEFILES
 
 set(NONCLISOURCEFILES
     adjuster.cc
+    alignedmalloc.cc
     batchqueue.cc
     batchqueuebuttonset.cc
     batchqueueentry.cc
diff --git a/rtgui/alignedmalloc.cc b/rtgui/alignedmalloc.cc
new file mode 100644
index 00000000..91fe15a8
--- /dev/null
+++ b/rtgui/alignedmalloc.cc
@@ -0,0 +1,31 @@
+/*
+ *  This file is part of RawTherapee.
+ *
+ *  Copyright (C) 2018 Flössie <floessie.mail@gmail.com>
+ *
+ *  RawTherapee is free software: you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation, either version 3 of the License, or
+ *  (at your option) any later version.
+ *
+ *  RawTherapee is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with RawTherapee.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "config.h"
+
+#ifdef HAVE_UNALIGNED_MALLOC
+
+#include <malloc.h>
+
+void* malloc(size_t size)
+{
+    return memalign(16, size);
+}
+
+#endif
diff --git a/rtgui/config.h.in b/rtgui/config.h.in
index fdf64b73..2d1f41db 100644
--- a/rtgui/config.h.in
+++ b/rtgui/config.h.in
@@ -21,6 +21,7 @@
 #define __CONFIG_H__
 
 #cmakedefine BUILD_BUNDLE
+#cmakedefine HAVE_UNALIGNED_MALLOC
 #define DATA_SEARCH_PATH "${DATADIR}"
 #define DOC_SEARCH_PATH "${DOCDIR}"
 #define CREDITS_SEARCH_PATH "${CREDITSDIR}"

@heckflosse @Beep6581 @innir Please test, thanks.

HTH,
Flössie

@heckflosse
Copy link
Collaborator

Thanks @Floessie, builds and works fine on Win7/64

@Beep6581
Copy link
Owner

Built and ran fine in Sabayon
gcc (Gentoo Hardened 6.4.0-r1 p1.3) 6.4.0
x86_64 Intel(R) Core(TM) m3-6Y30 CPU @ 0.90GHz

@Floessie
Copy link
Collaborator

Tested with Clang 3.9.1 on Debian 9.4 i386. ✔️

@Floessie
Copy link
Collaborator

Tested with Clang 5.0.1 on Debian 10 i386:

With patch: ✔️ (HAVE_UNALIGNED_MALLOC is defined)
Without patch: ✔️

Clang 5 needs -latomic on the CMAKE_EXE_LINKER_FLAGS.

@Floessie
Copy link
Collaborator

Shall I push to dev?

@heckflosse
Copy link
Collaborator

@Floessie yes, please :)

@Beep6581
Copy link
Owner

Is this for 5.5?

@heckflosse
Copy link
Collaborator

imho for 5.4

@Floessie
Copy link
Collaborator

I'm about to push...

@Llerr
Copy link
Author

Llerr commented Mar 20, 2018

i'm test. it's work.
$git describe
5.4-rc3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something is not doing what it's supposed to be doing
Projects
None yet
Development

No branches or pull requests

5 participants