Upgrade to LLVM 3.7.1 and switch over CI #14623

tkelman · 2016-01-10T04:48:51Z

Thanks to @Keno for doing most of the preparation work, and @staticfloat for handling mac packaging. ~~There's one ugly git line in the .travis.yml that can go away once staticfloat/homebrew-julia@9642485 and staticfloat/homebrew-julia@e6b8d2d are merged into master of homebrew-julia.~~ gone

This probably calls for a PkgEval run before merging. Closes #9336.

Fixes: (not yet) ~~#10595~~, #12671, #10444, #9222, #9085, #4905, #4418, #3596, #10301, #11037, #11083
^^ these should all be checked manually, and add tests before closing wherever possible

todo:

fix Win32 std::bad_alloc during make testall1 #11083, rebuild windows llvm binaries to include fix
USE_ORCJIT on osx travis (needs new bottles for llvm37-julia staticfloat/homebrew-julia#199 to be merged, then restart osx travis jobs)
run PkgEval

tkelman · 2016-01-10T05:44:57Z

Both win32 and win64 are failing on appveyor with LLVM ERROR: Unable to allocate section memory!, so #11083 is indeed still open and a blocker here.

staticfloat · 2016-01-10T05:50:34Z

Now merged into homebrew-julia's master branch.

IainNZ · 2016-01-10T07:10:13Z

Wow, huge!

How carefully have the perf implications been looked into for this? I recall a lot of work on bringing codegen times down to a relatively small regression, but how about the generated code on nontrivial code?

ViralBShah · 2016-01-10T07:22:17Z

The plan is to get this in for now, and then tackle the rest going forward on master as undoubtedly we will find more issues.

ViralBShah · 2016-01-10T08:15:10Z

Perhaps this is also where the benchmarking infrastructure will come in handy.

jrevels · 2016-01-10T09:25:48Z

No time like the present?

runbenchmarks(ALL, vs = "JuliaLang/julia:master")

The CI tracker currently only has a subset of the array benchmarks in Base, so some manual perf testing would also be helpful.

KristofferC · 2016-01-10T15:59:51Z

FWIW I see the same performance in my packages and some of them (NearestNeighbors.jl) have previously been quite good at detecting performance regressions, at least in the array code. Just one sample, but hey.

tkelman · 2016-01-10T16:47:28Z

@staticfloat we're going to need a newer gtar on the linux buildbots, apparently they don't understand .xz https://build.julialang.org/builders/package_tarball64/builds/24/steps/make%20binary-dist/logs/stdio

jrevels · 2016-01-10T17:02:20Z

https://github.com/JuliaCI/BaseBenchmarkReports/blob/master/54afb4e/54afb4e_vs_16301d2.md

Looks like some of the raw time regressions could be attributable to GC, but some might go away if johnmyleswhite/Benchmarks.jl#40 was in play. eig(rand(4, 4)) seems to be the only benchmark that demonstrates memory allocation regressions.

Keno · 2016-01-10T17:11:17Z

Ok, fine, I'll write us a better memory manager.

Keno · 2016-01-10T17:35:55Z

Also, do people know what other JITs do to avoid RWX pages? I looked at v8 and openjdk, but both seems to have RWX pages.

ihnorton · 2016-01-10T17:51:39Z

LuaJIT does RW first and then switches to RX.
https://github.com/LuaJIT/LuaJIT/blob/2e85af8836931f10aaaaae8c10f9b394219187a5/src/lj_mcode.c#L169-L203

(related: http://lists.llvm.org/pipermail/llvm-dev/2012-July/051841.html)

Keno · 2016-01-10T17:55:02Z

Right, that's what LLVM 3.7 does too, but we run into some fragmentation problems which artificially bloats our memory usage.

Keno · 2016-01-10T17:58:30Z

What we could do is go RW->RX and put it pack into RW when we want to append. There is a problem with multi-threaded code, but I'm sure we could solve that just by holding the thread until we're done writing to the page. I'm not sure that's significantly better than just having a RWX page though. An attacker just have to time the writing to the page correctly. Also, I'm not necessarily arguing that it should be our job to protect against this, just exploring if there's a reasonable default.

yuyichao · 2016-01-10T18:03:49Z

There is a problem with multi-threaded code, but I'm sure we could solve that just by holding the thread until we're done writing to the page.

We already relies on segfault for multithreading GC, we could easily add the logic for catching not executable fault as long as LLVM (or codegen) is able to tell whether an address is being written to by LLVM.

An attacker just have to time the writing to the page correctly.

I always thought the whole point is to make the attack window smaller? The attacker can always write to the same page when we are doing codegen (edit: for multi-threading), unless there's some verification pass after we set the page to not-writable, in which case we can also do that when we set the page back (edit: to RX).

tkelman · 2016-01-10T18:04:06Z

dunno. I should remember #14430 (comment) and add in -DUSE_ORCJIT somewhere for osx travis since we know the homebrew bottle is patched

Keno · 2016-01-10T18:08:43Z

@yuyichao That was my thought as well (i.e. holding the thread on an NX fault). The fact that the attack window is already there is a fair point. The only thing that is worse here is that a potential attacker might more easily be able to determine the allocation address. In any case, I think I actually found a bug in my patch which caused us to waste some memory. Let's see if just fixing that is sufficient. We should still think about a better memory allocator in the future though.

Keno · 2016-01-10T18:16:29Z

We could checksum the portion of the page that already exists and crash if it was modified.

yuyichao · 2016-01-10T18:16:56Z

If detecting the address is an issue, we can hide that by making two maps of the physical pages that we might still emit code into and never set the RX one back. Not sure if this is possible to do on windows or if it is supported by the llvm api though.

Keno · 2016-01-10T18:16:59Z

Also, my patch does indeed seem to fix the windows issue. Nice! Will commit upstream and add it to the patch set.

Keno · 2016-01-10T18:17:47Z

If detecting the address is an issue, we can hide that by making two maps of the physical pages that we might still emit code into and never set the RX one back. Not sure if this is possible to do on windows or if it is supported by the llvm api though.

Yeah, Windows allows it. LLVM's memory APIs may not currently support it, but that's what a custom memory manager is for.

yuyichao · 2016-01-10T18:18:03Z

We could checksum the portion of the page that already exists and crash if it was modified.

A very minor issue is what if the attacher can modify the checksum on the stack.... probably not important compare to the one it solve.....

tkelman · 2016-01-10T18:19:51Z

Is the bug Windows only or will it also be worth rebuilding a new revision of the homebrew bottle with the modified patch?

Keno · 2016-01-10T18:21:50Z

This bug is windows-only, but there was also a MachO patch that was forgotten (but that one is only a problem in LLVM_ASSERTIONS mode which the bottles may not be in?)

tkelman · 2016-01-10T18:22:42Z

not sure whether bottles have assertions on, but I think that would be good to do on CI?

Keno · 2016-01-10T18:23:04Z

Yeah, probably.

Keno · 2016-01-10T18:24:09Z

If the attacker can write the stack you have generally lost because they can overwrite the return address.

Keno · 2016-01-10T18:27:01Z

Patch committed upstream as llvm-mirror/llvm@1f644cd.

ihnorton · 2016-01-10T18:28:15Z

there was also a MachO patch that was forgotten

@tkelman: #14585 (comment)

If the attacker can write the stack you have generally lost because they can overwrite the return address.

Right now can't anybody just ccall(:mprotect, ... themselves?

(also, can we please, please add the llvm-shlib patch to our patchset so we can support USE_LLVM_SHLIB on Windows?)

Set CXXFLAGS=-DUSE_ORCJIT

(possibly temporary if we need these for something?)

tkelman · 2016-01-15T09:47:38Z

Fewer failures this time around, but still more than you might hope: https://gist.github.com/cf1da5230c275eedaa67

MbedTLS and dependents look like they were broken by #14667, so these aren't all regressions relative to latest master. The nightly binary is old because the centos and osx buildbots have been constantly failing for 2 weeks.

There are some strange-looking bounds errors and segfaults in here that may be legit regressions though. (or more likely due to #14474)

Enough other things have changed on master in the last few weeks that it's probably best to just merge this and work on fixing things, bringing back osx testing, etc as we go.

Keno · 2016-01-15T14:50:11Z

+1 Let's get this merged and address failures as they come in

Upgrade to LLVM 3.7.1 and switch over CI

JeffBezanson · 2016-01-15T18:03:12Z

🎆 Congrats, big milestone here!

JeffBezanson · 2016-01-15T21:03:08Z

Am I reading correctly that this makes total travis CI time 2 hours longer? And to think I was holding off on merging jb/functions since it makes CI time 20 minutes longer...

Keno · 2016-01-15T21:04:38Z

No, something is going wrong with Travis.

JeffBezanson · 2016-01-15T21:05:41Z

Ok, I suspected it must be something like that. A single AV build is ~20 minutes longer; is that expected?

Keno · 2016-01-15T21:06:51Z

Looks like it's recompiling OpenBLAS for some reason. @tkelman take a look? 20 minutes is possible though moderately more than expected.

tkelman · 2016-01-15T21:13:42Z

It's how the docker caching works on travis now. We aren't using the ppa any more, we do a source build from scratch but cache the built deps. It only takes so long after clearing out the cache. When the cache is populated it's about ~~half an hour.~~ ok maybe more like 40-45 mins.

AppVeyor is slower as a real perf regression. OSX was so much slower it timed out constantly and we had to turn it off for the time being.

vtjnash · 2016-01-15T21:25:01Z

@JeffBezanson yes, i noted that prior to merging. total CI build time regression appeared to be about 2x

vtjnash · 2016-01-15T21:26:58Z

@Keno can you look into the int.jl test? it seems to be showing a 15x increase in runtime on CI runs

Keno · 2016-01-15T21:28:57Z

Will take a look.

jiahao · 2016-01-15T22:45:12Z

🎉

blakejohnson · 2016-01-16T00:18:44Z

There are some strange-looking bounds errors and segfaults in here that may be legit regressions though. (or more likely due to #14474)

@tkelman if you find particular examples where you suspect, #14474, please let me know.

tkelman · 2016-01-16T00:28:40Z

If you could take a look at Brim, Gadfly, Mamba, OpenStreetMap, and/or Winston that'd be awesome. They're all hitting BoundsErrors.

JuMP, NamedTuples, and Persist are failing for reasons that don't make sense to me yet, but they don't look BoundsError related.

blakejohnson · 2016-01-16T01:58:30Z

So, trying Pkg.test("Gadfly"), the first error I encounter is a BoundsError in Showoff.jl of the flavor:

a, b, c, d = (1,2,3)

and caused by 53ecbaa changing the number of arguments returned by Base.grisu from 4 to 3. If I fix that, then all Gadfly tests pass. This is probably something to add to Compat.jl

@dcjones

tkelman · 2016-01-16T02:21:51Z

Ah good call, sorry about the false alarm. Does anyone else have commit access to Showoff.jl? If not we may need to redirect metadata to use a mirror/fork for a while if it becomes urgent.

blakejohnson · 2016-01-16T02:23:16Z

Ah, I guess I am on 0201437, which doesn't include this PR. But, at that point Brim also passes all tests. I would say, generally, that if you see a BoundsError, it is actually quite unlikely to be #14474. The more likely signature would be a segfault, because we elided a bounds check that we shouldn't have.

blakejohnson · 2016-01-16T04:23:51Z

There is definitely something pretty weird going on with Brim related to bounds checks. The test which fails is:

using Brim
A = [1 0; 0 1]
M = partition_lp(A)

When launched from a julia session with --check-bounds=yes it gives a bizarre BoundsError:

ERROR: BoundsError: attempt to access 4x4 Array{Int64,2}:
 0  0  0  0
 0  0  0  0
 0  0  0  0
 0  0  0  1
  at index [3,1]

Without that command line option it works with a deprecation warning.

nalimilan · 2016-01-20T18:14:47Z

FWIW, I've just switched the Copr RPM nightlies to LLVM 3.7.1, and all the tests pass. The build is only slightly slower than before (~26 minutes vs. 18 minutes in the best cases). This should increase the testing of this code since there are about 500 downloads each week.

tkelman force-pushed the llvm37 branch from 741ae96 to 2030ddb Compare January 10, 2016 16:07

tkelman added 2 commits January 14, 2016 14:14

Disable osx travis for now

0cc27d3

Set CXXFLAGS=-DUSE_ORCJIT

disable libedit and terminfo in llvm configure

7725b8e

(possibly temporary if we need these for something?)

tkelman force-pushed the llvm37 branch from a461f38 to 7725b8e Compare January 14, 2016 19:14

Keno added a commit that referenced this pull request Jan 15, 2016

Merge pull request #14623 from JuliaLang/llvm37

d4749d2

Upgrade to LLVM 3.7.1 and switch over CI

Keno merged commit d4749d2 into master Jan 15, 2016

ihnorton mentioned this pull request Jan 15, 2016

Doc for ccall with Union{} return type #14685

Closed

tkelman deleted the llvm37 branch January 15, 2016 15:21

This was referenced Jan 19, 2016

Gadfly, PyPlot, Winston all don't work with 0.5 dev #14723

Closed

BoundsError: attempt to access (1,0,true) at index [4] in the draw function GiovineItalia/Gadfly.jl#794

Open

jrevels mentioned this pull request Jan 25, 2016

Performance problem with memory loads/stores #10301

Closed

jrevels removed the potential benchmark Could make a good benchmark in BaseBenchmarks label Jan 27, 2016

Upgrade to LLVM 3.7.1 and switch over CI #14623

Upgrade to LLVM 3.7.1 and switch over CI #14623

Conversation

tkelman commented Jan 10, 2016

tkelman commented Jan 10, 2016

staticfloat commented Jan 10, 2016

IainNZ commented Jan 10, 2016

ViralBShah commented Jan 10, 2016

ViralBShah commented Jan 10, 2016

jrevels commented Jan 10, 2016

KristofferC commented Jan 10, 2016

tkelman commented Jan 10, 2016

jrevels commented Jan 10, 2016

Keno commented Jan 10, 2016

Keno commented Jan 10, 2016

ihnorton commented Jan 10, 2016

Keno commented Jan 10, 2016

Keno commented Jan 10, 2016

yuyichao commented Jan 10, 2016

tkelman commented Jan 10, 2016

Keno commented Jan 10, 2016

Keno commented Jan 10, 2016

yuyichao commented Jan 10, 2016

Keno commented Jan 10, 2016

Keno commented Jan 10, 2016

yuyichao commented Jan 10, 2016

tkelman commented Jan 10, 2016

Keno commented Jan 10, 2016

tkelman commented Jan 10, 2016

Keno commented Jan 10, 2016

Keno commented Jan 10, 2016

Keno commented Jan 10, 2016

ihnorton commented Jan 10, 2016

tkelman commented Jan 15, 2016

Keno commented Jan 15, 2016

JeffBezanson commented Jan 15, 2016

JeffBezanson commented Jan 15, 2016

Keno commented Jan 15, 2016

JeffBezanson commented Jan 15, 2016

Keno commented Jan 15, 2016

tkelman commented Jan 15, 2016

vtjnash commented Jan 15, 2016

vtjnash commented Jan 15, 2016

Keno commented Jan 15, 2016

jiahao commented Jan 15, 2016

blakejohnson commented Jan 16, 2016

tkelman commented Jan 16, 2016

blakejohnson commented Jan 16, 2016

tkelman commented Jan 16, 2016

blakejohnson commented Jan 16, 2016

blakejohnson commented Jan 16, 2016

nalimilan commented Jan 20, 2016