Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also .

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also .
Choose a Base Repository
cockroachdb/cockroach
AALEKH/cockroach
Abioy/cockroach
AflenChen/cockroach
Arifur794/cockroach
CodEnFisH/cockroach
DilipLukose/cockroach
El-Coder/cockroach
Frank-Jin/cockroach
GavinHwa/cockroach
GokulSrinivas/cockroach
GrayMissing/cockroach
HanumathRao/cockroach
HengWang/cockroach
HunterChen/cockroach
InsaneYoungStunner/cockroach
Kevin-GuanJian/cockroach
Linicks/cockroach
PragashSiva/cockroach
RaduBerinde/cockroach
SandyZeng/cockroach
Viewtiful/cockroach
XuWanHong/cockroach-1
Zemnmez/cockroach
a-robinson/cockroach
abhishekgahlot/cockroach
alex/cockroach
alisheikh/cockroach
anchal-agrawal/cockroach
andradeandrey/cockroach
angel1991521/cockroach
ansonism/cockroach
axfcampos/cockroach
banks/cockroach
bdarnell/cockroach
bdotdub/cockroach
bigrats/cockroach
bigxing/cockroach
bobpattersonjr/cockroach
bowlofstew/cockroach
brandenyoon/cockroach
briliant1991/cockroach
bussiere/cockroach
bydsky/cockroach
cDoru/cockroach
cainiao1989/cockroach
cdsalmons/cockroach
chagge/cockroach
chunshengster/cockroach
cleverdeng/cockroach
clm971910/cockroach
cn15810092493/cockroach
connecteev/cockroach
dallasmarlow/cockroach
darkseed/cockroach
db-production/cockroach
dfrsg/cockroach
diegode/cockroach
domluna/cockroach
eagle518/cockroach
easyfmxu/cockroach
eclectice/cockroach
elvin-du/cockroach
embark/cockroach
erriapo/cockroach
es-chow/cockroach
esaul/cockroach
flyingliang/cockroach
gaowenbin/cockroach
ghostsun/cockroach
gqf2008/cockroach
grimreaper/cockroach
gstarnberger/cockroach
gude/cockroach
guiquanz/cockroach
hannibalhuang/cockroach
hanshenu/cockroach
hanwoody/cockroach
hcxiong/cockroach
hollis/cockroach
hubt/cockroach
hunslater/cockroach
iamima/cockroach
icattlecoder/cockroach
ikarzali/cockroach
ilovejs/cockroach
jackylk/cockroach
jamesgraves/cockroach
jamiepg1/cockroach
jay23jack/cockroach
jess-edwards/cockroach
jinguoxing/cockroach
jmank88/cockroach
joezxy/cockroach
joliny/cockroach
jonathanmarvens/cockroach
josephwinston/cockroach
josephyzhou/cockroach
joshuawatson/cockroach
jrcjc123/cockroach
jsanc623/cockroach
kanasite/cockroach
kebohiki/cockroach
kkaneda/cockroach
kortschak/cockroach
kritivasas/cockroach
kuguobing/cockroach
lemonhall/cockroach
leomzhong/cockroach
lessc0de/cockroach
lianhuiwang/cockroach
liuzongquan/cockroach
lostz/cockroach
lshmouse/cockroach
luan-cestari/cockroach
lupengfeige/cockroach
mabdullah353/cockroach
mackjoner/cockroach
maniksurtani/cockroach
manithnuon/cockroach
markreg/cockroach
matadorhong/cockroach
meshileya/cockroach
mindis/cockroach
mixiong/cockroach
mjibson/cockroach
mobilipia/cockroach
mohae/cockroach
mrunix/cockroach
msmakhlouf/cockroach
nanderoo/cockroach
neuroradiology/cockroach
neutony/cockroach
nikelius/cockroach
nimishzynga/cockroach
nkgfirecream/cockroach
nmarasoiu/cockroach
ofonimefrancis/cockroach
oldmantaiter/cockroach
ollyblue/cockroach
petermattis/cockroach
picolonet/storage
pinterb/cockroach
pramendra/cockroach
putaozhuose/cockroach
r00tjimmy/cockroach
ramgtv/cockroach
rayleyva/cockroach
sandeepmukho/cockroach
sawanoboly/cockroach
scrooph/cockroach
sdboyer/cockroach
shafiahmed/cockroach
shanyechen/cockroach
shilezi/cockroach
silky/cockroach
slavau/cockroach
sunya123/cockroach
superneo/cockroach
swarbiv/cockroach
sxhao/cockroach
tamird/cockroach
therob3000/cockroach
timwee/cockroach
tml/cockroach
tomzhang/cockroach
toshisam/cockroach
trebogeer/cockroach
treemantris/cockroach
tristartom/cockroach
truthwzl/cockroach
tschottdorf/cockroach
udybrill/cockroach
umegaya/cockroach
vikram/cockroach
vivekmenezes/cockroach
vvydier/cockroach
waderly/cockroach
walkingsparrow/cockroach
wangtuanjie/cockroach
wheelcomplex/cockroach
willmadison/cockroach
wulinjun4/cockroach
wuyu201321060203/cockroach
wycg1984/cockroach
xiaoyulei/cockroach
yacki/cockroach
yananzhi/cockroach
yangyaoweng/cockroach
yanniyang/cockroach
yekeqiang/cockroach
yemaocheng/cockroach
yonglehou/cockroach
zeeshanali/cockroach
zhaixuezhong/cockroach
zhangchn/cockroach
zhanglei/cockroach
zhonghai/cockroach
zimmermamc/cockroach
zofuthan/cockroach
Nothing to show
Choose a Head Repository
cockroachdb/cockroach
AALEKH/cockroach
Abioy/cockroach
AflenChen/cockroach
Arifur794/cockroach
CodEnFisH/cockroach
DilipLukose/cockroach
El-Coder/cockroach
Frank-Jin/cockroach
GavinHwa/cockroach
GokulSrinivas/cockroach
GrayMissing/cockroach
HanumathRao/cockroach
HengWang/cockroach
HunterChen/cockroach
InsaneYoungStunner/cockroach
Kevin-GuanJian/cockroach
Linicks/cockroach
PragashSiva/cockroach
RaduBerinde/cockroach
SandyZeng/cockroach
Viewtiful/cockroach
XuWanHong/cockroach-1
Zemnmez/cockroach
a-robinson/cockroach
abhishekgahlot/cockroach
alex/cockroach
alisheikh/cockroach
anchal-agrawal/cockroach
andradeandrey/cockroach
angel1991521/cockroach
ansonism/cockroach
axfcampos/cockroach
banks/cockroach
bdarnell/cockroach
bdotdub/cockroach
bigrats/cockroach
bigxing/cockroach
bobpattersonjr/cockroach
bowlofstew/cockroach
brandenyoon/cockroach
briliant1991/cockroach
bussiere/cockroach
bydsky/cockroach
cDoru/cockroach
cainiao1989/cockroach
cdsalmons/cockroach
chagge/cockroach
chunshengster/cockroach
cleverdeng/cockroach
clm971910/cockroach
cn15810092493/cockroach
connecteev/cockroach
dallasmarlow/cockroach
darkseed/cockroach
db-production/cockroach
dfrsg/cockroach
diegode/cockroach
domluna/cockroach
eagle518/cockroach
easyfmxu/cockroach
eclectice/cockroach
elvin-du/cockroach
embark/cockroach
erriapo/cockroach
es-chow/cockroach
esaul/cockroach
flyingliang/cockroach
gaowenbin/cockroach
ghostsun/cockroach
gqf2008/cockroach
grimreaper/cockroach
gstarnberger/cockroach
gude/cockroach
guiquanz/cockroach
hannibalhuang/cockroach
hanshenu/cockroach
hanwoody/cockroach
hcxiong/cockroach
hollis/cockroach
hubt/cockroach
hunslater/cockroach
iamima/cockroach
icattlecoder/cockroach
ikarzali/cockroach
ilovejs/cockroach
jackylk/cockroach
jamesgraves/cockroach
jamiepg1/cockroach
jay23jack/cockroach
jess-edwards/cockroach
jinguoxing/cockroach
jmank88/cockroach
joezxy/cockroach
joliny/cockroach
jonathanmarvens/cockroach
josephwinston/cockroach
josephyzhou/cockroach
joshuawatson/cockroach
jrcjc123/cockroach
jsanc623/cockroach
kanasite/cockroach
kebohiki/cockroach
kkaneda/cockroach
kortschak/cockroach
kritivasas/cockroach
kuguobing/cockroach
lemonhall/cockroach
leomzhong/cockroach
lessc0de/cockroach
lianhuiwang/cockroach
liuzongquan/cockroach
lostz/cockroach
lshmouse/cockroach
luan-cestari/cockroach
lupengfeige/cockroach
mabdullah353/cockroach
mackjoner/cockroach
maniksurtani/cockroach
manithnuon/cockroach
markreg/cockroach
matadorhong/cockroach
meshileya/cockroach
mindis/cockroach
mixiong/cockroach
mjibson/cockroach
mobilipia/cockroach
mohae/cockroach
mrunix/cockroach
msmakhlouf/cockroach
nanderoo/cockroach
neuroradiology/cockroach
neutony/cockroach
nikelius/cockroach
nimishzynga/cockroach
nkgfirecream/cockroach
nmarasoiu/cockroach
ofonimefrancis/cockroach
oldmantaiter/cockroach
ollyblue/cockroach
petermattis/cockroach
picolonet/storage
pinterb/cockroach
pramendra/cockroach
putaozhuose/cockroach
r00tjimmy/cockroach
ramgtv/cockroach
rayleyva/cockroach
sandeepmukho/cockroach
sawanoboly/cockroach
scrooph/cockroach
sdboyer/cockroach
shafiahmed/cockroach
shanyechen/cockroach
shilezi/cockroach
silky/cockroach
slavau/cockroach
sunya123/cockroach
superneo/cockroach
swarbiv/cockroach
sxhao/cockroach
tamird/cockroach
therob3000/cockroach
timwee/cockroach
tml/cockroach
tomzhang/cockroach
toshisam/cockroach
trebogeer/cockroach
treemantris/cockroach
tristartom/cockroach
truthwzl/cockroach
tschottdorf/cockroach
udybrill/cockroach
umegaya/cockroach
vikram/cockroach
vivekmenezes/cockroach
vvydier/cockroach
waderly/cockroach
walkingsparrow/cockroach
wangtuanjie/cockroach
wheelcomplex/cockroach
willmadison/cockroach
wulinjun4/cockroach
wuyu201321060203/cockroach
wycg1984/cockroach
xiaoyulei/cockroach
yacki/cockroach
yananzhi/cockroach
yangyaoweng/cockroach
yanniyang/cockroach
yekeqiang/cockroach
yemaocheng/cockroach
yonglehou/cockroach
zeeshanali/cockroach
zhaixuezhong/cockroach
zhangchn/cockroach
zhanglei/cockroach
zhonghai/cockroach
zimmermamc/cockroach
zofuthan/cockroach
Nothing to show
Checking mergeability… Don’t worry, you can still create the pull request.
This comparison is big! We’re only showing the most recent 250 commits
Commits on Apr 24, 2018
sql: reset forgotten planner fields
Some planner weren't being reset between statements. Not sure what the
consequences were.

Release note: None
craig[bot] and mjibson
Merge #25015
25015: sql: make schema changes more resilient to permanent errors r=mjibson a=mjibson

This is a partial backport of #24464. That change added an unrelated
feature that required refactoring the IsPermanentSchemaChangeError
func to be a whitelist instead of a blacklist. This means that schema
change errors now must be specifically listed as non-permanent.

The issue below is fixed now because the system will detect that
error as permanent instead of not.

Fixes #24748

Release note (bug fix): Correctly fail some kinds of schema change
errors that are stuck in a permanent loop.

Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
craig[bot] and andreimatei
Merge #25023
25023: release-2.0: reset forgotten planner fields r=andreimatei a=andreimatei

Cherry-pick of #25004

Some planner weren't being reset between statements. Not sure what the
consequences were.

Release note: None

cc @cockroachdb/release 

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
build: GH label changes
- Robot -> O-robot
- test-failure -> C-test-failure

Release note: None
craig[bot] and jordanlewis
Merge #25054
25054: backport-2.0: build: GH label changes r=jordanlewis a=jordanlewis

Backport 1/1 commits from #25050.

/cc @cockroachdb/release

---

- Robot -> O-robot
- test-failure -> C-test-failure

@benesch do we need to backport this for it to work on other branches?

Release note: None


Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
Commits on Apr 25, 2018
Makefile: work around symlink bug in Go toolchain
Build info injected via linker flags, like the Git commit SHA, is
dropped on the floor when building from a symlink (e.g., ~/roach ->
~/go/src/cockroachdb/cockroach). This issue was filed upstream
(golang/go#24359), but the Go team refused to fix it. Luckily, we can
work around it by teaching Make to resolve any symlinks in the CWD.

Release note (build change): Build metadata, like the commit SHA and build
time, is properly injected into the binary when using Go 1.10 and building from
a symlink.
importccl: require hex-encoded byte literals in IMPORT
Previously there was no way to get BYTES columns in IMPORT because
the CSV spec doesn't support non-unicode runes (or at least the
Go CSV reader doesn't). Change it to support our hex-encoded byte
literal format. This is the same format as used when a bytes column
is converted to a string.

Unfortunately this doesn't help with parsing our CSV-output mode in
the CLI because that uses Go's %q from the fmt package, which prints
visible chars if they exist. We can fix the CLI in a later change.

As part of this change, do a large refactor around
parser.ParseStringAs. IMPORT, COPY, and StrVal.ResolveAsType all
share some similar code that takes a string and converts it into a
Datum. Each, though, has a different need about what to do with strings
and bytes. Refactor the functionality into some reusable functions.

We can also remove the creation of unneeded collation envs since
the unexported member of the evalContext is now available in the
tree package.

For IMPORT specifically, we also add a new FmtFlag:
FmtParseDatums. This flag will cause Datums to print such that they
can be parsed by the ParseDatumStringAs function. Although IMPORT is
the only user of this function now, the upcoming EXPORT statement
will use this flag to guarantee that IMPORT is able to correctly
parse anything it produces.

Fixes #24841

Release note (sql change): IMPORT now supports hex-encoded byte
literals for BYTES columns.
craig[bot] and mjibson
Merge #25063
25063: backport-2.0: importccl: require hex-encoded byte literals in IMPORT r=mjibson a=mjibson

Previously there was no way to get BYTES columns in IMPORT because
the CSV spec doesn't support non-unicode runes (or at least the
Go CSV reader doesn't). Change it to support our hex-encoded byte
literal format. This is the same format as used when a bytes column
is converted to a string.

Unfortunately this doesn't help with parsing our CSV-output mode in
the CLI because that uses Go's %q from the fmt package, which prints
visible chars if they exist. We can fix the CLI in a later change.

As part of this change, do a large refactor around
parser.ParseStringAs. IMPORT, COPY, and StrVal.ResolveAsType all
share some similar code that takes a string and converts it into a
Datum. Each, though, has a different need about what to do with strings
and bytes. Refactor the functionality into some reusable functions.

We can also remove the creation of unneeded collation envs since
the unexported member of the evalContext is now available in the
tree package.

For IMPORT specifically, we also add a new FmtFlag:
FmtParseDatums. This flag will cause Datums to print such that they
can be parsed by the ParseDatumStringAs function. Although IMPORT is
the only user of this function now, the upcoming EXPORT statement
will use this flag to guarantee that IMPORT is able to correctly
parse anything it produces.

Fixes #24841
Cherrypick of #24859

Release note (sql change): IMPORT now supports hex-encoded byte
literals for BYTES columns.

Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
sql: verify column uniqueness in FKs during ADD COLUMN
Move a check to a place common to CREATE TABLE and ALTER TABLE.

Fixes #24664

Release note (bug fix): correctly verify columns can only by
referencing by a single foreign key. Existing tables with a column
that is used by multiple foreign key constraints should be manually
changed to have at most one foreign key per column.
ui: identify cluster to segment on load
Fixes an issue where a bounce will never send an identify event to Segment.

Release note: None
Commits on Apr 26, 2018
craig[bot] and benesch
Merge #25062
25062: backport-2.0: Makefile: work around symlink bug in Go toolchain r=knz a=benesch

Backport 1/1 commits from #25008.

/cc @cockroachdb/release

---

Build info injected via linker flags, like the Git commit SHA, is
dropped on the floor when building from a symlink (e.g., ~/roach ->
~/go/src/cockroachdb/cockroach). This issue was filed upstream
(golang/go#24359), but the Go team refused to fix it. Luckily, we can
work around it by teaching Make to resolve any symlinks in the CWD.

Release note (build change): Build metadata, like the commit SHA and build
time, is properly injected into the binary when using Go 1.10 and building from
a symlink.


Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
server: fix the fix for spurious gzip errors
Release note (general change): stopped spamming the server logs with
"error closing gzip response writer" messages.
craig[bot] and a-robinson
Merge #25108
25108: backport-2.0: server: fix the fix for spurious gzip errors r=a-robinson a=a-robinson

Backport 1/1 commits from #25106.

/cc @cockroachdb/release

---

Release note (general change): stopped spamming the server logs with
"error closing gzip response writer" messages.

#24367 just made things worse. It's a little embarrassing that three of us looked at that code without noticing.

Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
Commits on Apr 27, 2018
craig[bot] and mjibson
Merge #25079
25079: backport-2.0: sql: verify column uniqueness in FKs during ADD COLUMN r=mjibson a=mjibson

Backport 1/1 commits from #25060.

/cc @cockroachdb/release

---

Move a check to a place common to CREATE TABLE and ALTER TABLE.

Fixes #24664

Release note (bug fix): correctly verify columns can only by
referencing by a single foreign key. Existing tables with a column
that is used by multiple foreign key constraints should be manually
changed to have at most one foreign key per column.


Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
debug: remove limit on logspy duration
Note to self: backport this to release-2.0.

Fixes #24998.

Release note: None
storage/ui: Expose range's maxBytes field to help debug split issues
Also humanize all byte fields on the page to format them in more
reader-friendly ways (e.g. "64.0 MiB" instead of "67108864").

Release note: None
Commits on Apr 28, 2018
craig[bot] Abhishek Madan
craig[bot] and Abhishek Madan
Merge #24904
24904: cherrypick-2.0: distsql: Remove column size check from INTERSECT and EXCEPT joins r=knz a=abhimadan

This check mistakenly used the joiner's `PostProcessSpec` to verify that
all left and right columns are used as equality columns. If there's a
projection or render on the joiner, this check doesn't work as intended,
since the output columns don't match the input columns. It's not
possible for a processor constructor to differentiate its input's merge
ordering columns from its "actual" output columns when a processor is
constructed, and the physical planner already guarantees that all
equality columns are output columns. Therefore, the check isn't
necessary, and has been removed.

Also add subtests to the `distsql_union` logic test file to split up the
different set operations.

Fixes #24689.

Release note: None

----

Cherry-pick of #24704.

cc @cockroachdb/release 

Co-authored-by: Abhishek Madan <abhishek@cockroachlabs.com>
knz
sql: fix the documentation of age()
Release note (bug fix): the function documentation for the age()
builtin was incorrect. This is now fixed.
Commits on Apr 30, 2018
craig[bot] and tschottdorf
Merge #25121
25121: backport-2.0: debug: remove limit on logspy duration r=a-robinson a=tschottdorf

Backport 1/1 commits from #25077.

/cc @cockroachdb/release

---

Note to self: backport this to release-2.0.

Fixes #24998.

Release note: None


Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
craig[bot] and a-robinson
Merge #25127
25127: backport-2.0: backupccl: skip flaky TestBackupRestoreControlJob r=a-robinson a=a-robinson

Backport 1/1 commits from #24638.

/cc @cockroachdb/release

---

Release note: None

Touches #24693


Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
craig[bot] and yaojingguo
Merge #25028
25028: backport-2.0: storage: crash a node if the ingestion fails in an unexpected way r=a-robinson a=a-robinson

Backport 1/1 commits from #25024.

/cc @cockroachdb/release

---

Fixes #24971

Release note: None

--------------

As requested by @dt, I'll let this bake on master for a week or two before merging, but I'm opening this to make sure I don't forget about it.

Co-authored-by: Jingguo Yao <yaojingguo@gmail.com>
craig[bot] and knz
Merge #25145
25145: cherry-pick 2.0: sql: fix the documentation of age() r=knz a=knz

Picks #25132. 
cc @cockroachdb/release 

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
sql: permited collated strings as index constraints
Previously, there was an accidental miss - you couldn't use a collated
string as a value in a WHERE clause that got converted into an index
constraint. This was purely an oversight.

Release note (sql change): permit use of collated strings in WHERE
clauses on indexed columns.
craig[bot] and jordanlewis
Merge #25175
25175: backport-2.0: sql: permited collated strings as index constraints r=jordanlewis a=jordanlewis

Backport 1/1 commits from #25169.

/cc @cockroachdb/release

---

Previously, there was an accidental miss - you couldn't use a collated
string as a value in a WHERE clause that got converted into an index
constraint. This was purely an oversight.

Fixes #24449.

Release note (sql change): permit use of collated strings in WHERE
clauses on indexed columns.


Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
craig[bot] and a-robinson
Merge #25128
25128: backport-2.0: storage/ui: Expose range's maxBytes field to help debug split issues r=a-robinson a=a-robinson

Backport 1/1 commits from #25072.

/cc @cockroachdb/release

---

Release note: None

This is a pretty natural fit for going in the `RangeInfo`, and would be nice to see in issues like #24966


Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
Commits on May 01, 2018
opt: hide two columns in EXPLAIN (VERBOSE) output
The Level and Type columns don't show any information that's not
visible from the Tree column. These columns exist only in case someone
wants to use the results in a script.

This change hides these columns by default. The columns are still
accessible using `SELECT "Level", "Type" FROM [ EXPLAIN ... ]`.

Release note (sql change): The Level and Type columns of the EXPLAIN
(VERBOSE) results are now hidden; if they are needed, they can be
SELECTed explicitly.
knz
sql: fail gracefully on unsupported multi-arg aggregations
CockroachDB does not yet support running aggregation functions with
more than one argument (#10495).

Prior to this patch, an error was properly generated for the situation
where an aggregate function is applied to multiple arguments in
"simple" contexts; however the check was absent when used in a window
frame, and attempting to use multiple arguments in that context would
cause a panic.

Intuitively, this code path should not have been reachable because
issue #10495 would otherwise suggest that no aggregate function
accepting more than one argument should even be defined.

Unfortunately, a previous change meant to experiment with distributed
execution of multi-argument aggregates introduced the definitions of
two such aggregates: `final_variance()` and `final_stddev()`.
This in turns enabled the code path leading to a crash.

To remove access to this code path required removing access to these
particular built-in functions from the SQL dialect. This patch
achieves this by introducing a boolean flag `Private` on built-in
functions, set to true for these two, which prevent both use by SQL
queries and auto-generation of docs.

Release note (bug fix): CockroachDB now properly reports an error when
using the internal-only functions `final_variance()` and
`final_stddev()` instead of causing a crash.
sql: fix information_schema.constraint_column_usage.constraint_schema
Found while working on #22298.

This column was missed in #22753. It was displaying each constraint's
database instead of each constraint's schema.

Release note (bug fix): The constraint_schema column in
information_schema.constraint_column_usage now displays the
constraint's schema instead of its catalog.
3 authors
Merge #25206 #25218
25206: cherrypick-2.0: opt: hide two columns in EXPLAIN (VERBOSE) output r=RaduBerinde a=RaduBerinde

The Level and Type columns don't show any information that's not
visible from the Tree column. These columns exist only in case someone
wants to use the results in a script.

This change hides these columns by default. The columns are still
accessible using `SELECT "Level", "Type" FROM [ EXPLAIN ... ]`.

Release note (sql change): The Level and Type columns of the EXPLAIN
(VERBOSE) results are now hidden; if they are needed, they can be
SELECTed explicitly.

This is a cherry-pick of #25172. CC @cockroachdb/release 

25218: cherry-pick 2.0: sql: fail gracefully on unsupported multi-arg aggregations r=knz a=knz

Picks #25158

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Commits on May 02, 2018
craig[bot] and nvanbenschoten
Merge #25220
25220: backport-2.0: sql: fix information_schema.constraint_column_usage.constraint_schema r=nvanbenschoten a=nvanbenschoten

Backport 1/1 commits from #25190.

/cc @cockroachdb/release

---

Found while working on #22298.

This column was missed in #22753. It was displaying each constraint's
database instead of each constraint's schema.

Release note (bug fix): The constraint_schema column in
information_schema.constraint_column_usage now displays the
constraint's schema instead of its catalog.


Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
release-2.0: distsqlrun: fix panic with offsets and topk sort
This is a bug fix for release-2.0 only. It was fixed on master in
f799885.

Release note (bug fix): Fix a panic caused by certain queries containing
OFFSET and ORDER BY.
craig[bot] and jordanlewis
Merge #25238
25238: release-2.0: distsqlrun: fix panic with offsets and topk sort r=jordanlewis a=jordanlewis

Fixes #25197.

Release note (bug fix): fix a panic in the top k sorter that occurred
with offsets.

Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
sql: fix crash for 'begin; release savepoint'
Release is not accepted if savepoint hasn't been previously used. This
patch turns the crash into an error.
Also sticks a pg error code to similar errors.

Release note(bug fix): 'begin; release savepoint' no longer crashes, but
returns an error.

Fixes #24433
craig[bot] and andreimatei
Merge #25251
25251: release-2.0: sql: fix crash for 'begin; release savepoint' r=andreimatei a=andreimatei

Cherry-pick of #25247

Release is not accepted if savepoint hasn't been previously used. This
patch turns the crash into an error.
Also sticks a pg error code to similar errors.

Release note(bug fix): 'begin; release savepoint' no longer crashes, but
returns an error.

Fixes #24433

cc @cockroachdb/release 

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Commits on May 03, 2018
cli: `debug zip` with timeout, added dump for crdb_internal.gossip_*
Now made command `debug zip` continue past errors with a timeout
(based on peter's timeout commit).

Also dumped information in crdb_internal.gossip_nodes and gossip_liveness
to the output file.

Fixes #23954.

Release note: None
Commits on May 07, 2018
craig[bot] and windchan7
Merge #25276
25276: cherrypick-2.0: cli: `debug zip` with timeout, added dump for crdb_internal.gossip_* r=bdarnell a=tschottdorf

Now made command `debug zip` continue past errors with a timeout
(based on peter's timeout commit).

Also dumped information in crdb_internal.gossip_nodes and gossip_liveness
to the output file.

Fixes #23954.
cc @cockroachdb/release 
Release note: None

Co-authored-by: Victor Chen <victor@cockroachlabs.com>
Commits on May 08, 2018
opt/idxconstraint: fix IS NOT handling of NULLs
`IS DISTINCT FROM` (IsNotOp) treats NULL as any other value:
`NULL IS DISTINCT FROM 5` is true. The index constraints code
incorrectly treats it as `NeOp` and generates a not-null constraint.
Fixing and adding some testcases.

Release note (bug fix): fixed a bug with `IS DISTINCT FROM` not
returning NULL values that pass the condition in some cases.
storage/engine: don't try to slice potentially invalid pointers
Don't try to slice an invalid pointer (such as nil) because the Go
compiler generates code which accesses the first element.

Fixes #25289

Release note (bug fix): Fix a rare segfault that occurred when reading
from an invalid memory location returned from C++.
craig[bot] and petermattis
Merge #25361
25361: release-2.0: storage/engine: don't try to slice potentially invalid pointers r=bdarnell a=petermattis

Backport 1/1 commits from #25347.

/cc @cockroachdb/release

---

Don't try to slice an invalid pointer (such as nil) because the Go
compiler generates code which accesses the first element.

Fixes #25289

Release note (bug fix): Fix a rare segfault that occurred when reading
from an invalid memory location returned from C++.


Co-authored-by: Peter Mattis <petermattis@gmail.com>
craig[bot] and RaduBerinde
Merge #25339
25339: cherrypick-2.0: opt/idxconstraint: fix IS NOT handling of NULLs r=RaduBerinde a=RaduBerinde

`IS DISTINCT FROM` (IsNotOp) treats NULL as any other value:
`NULL IS DISTINCT FROM 5` is true. The index constraints code
incorrectly treats it as `NeOp` and generates a not-null constraint.
Fixing and adding some testcases.

Release note (bug fix): fixed a bug with `IS DISTINCT FROM` not
returning NULL values that pass the condition in some cases.

This is a backport of #25336. CC @cockroachdb/release 

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
cherry-pick 2.0: fix flaky select_index test
This is a fixup to #25339, I missed a `rowsort`.

Release note: None
Commits on May 09, 2018
knz
sql: improve the documentation of the timestamp functions
Release note (sql change): the online documentation for `now()`,
`current_time()`, `current_date()`, `current_timestamp()`,
`clock_timestamp()`, `statement_timestamp()` and
`cluster_logical_timestamp()` has been improved and clarified.
craig[bot] and knz
Merge #25383
25383: release-2.0: sql: improve the documentation of the timestamp functions r=knz a=knz

Backport 1/1 commits from #25327.

/cc @cockroachdb/release


Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Commits on May 10, 2018
craig[bot] and RaduBerinde
Merge #25364
25364: cherry-pick 2.0: fix flaky select_index test r=RaduBerinde a=RaduBerinde

This is a fixup to #25339, I missed a `rowsort`.

Release note: None

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Commits on May 12, 2018
engine: fix temp directory cleanup on Windows
File locks are mandatory on Windows; our cleanup code was previously
assuming file system locks were advisory. Specifically, our cleanup code
was locking the temporary directory, then attempting to remove it before
releasing the lock. This worked on Unix platforms where locks are
advisory-only, but failed on Windows because the process's own lock
would prevent the cleanup! Presumably this ordering was meant to avoid a
race condition where the directory was unlocked before it was cleaned
up. It turns out this race condition doesn't matter (see the comment
within), so just unlock the directory before removing it, which works on
both Unix and Windows.

Release note (bug fix): Restarting a CockroachDB server on Windows no
longer fails due to file system locks in the store directory.
Commits on May 14, 2018
craig[bot] and benesch
Merge #25439
25439: backport-2.0: engine: actually unlock temp dirs on Windows r=bdarnell a=tschottdorf

Backport 1/1 commits from #25267.

/cc @cockroachdb/release

---

File locks are mandatory on Windows; our cleanup code was previously
assuming file system locks were advisory. Specifically, our cleanup code
was locking the temporary directory, then attempting to remove it before
releasing the lock. This worked on Unix platforms where locks are
advisory-only, but failed on Windows because the process's own lock
would prevent the cleanup! Presumably this ordering was meant to avoid a
race condition where the directory was unlocked before it was cleaned
up. It turns out this race condition doesn't matter (see the comment
within), so just unlock the directory before removing it, which works on
both Unix and Windows.

Release note (bug fix): Restarting a CockroachDB server on Windows no
longer fails due to file system locks in the store directory.

Fix #24144.
Fix #25272.


Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
ui: add compaction/flushes to storage graphs
Knowing whether compactions are going on is helpful when diagnosing I/O
performance problems.

Release note (admin ui change): Add RocksDB compactions/flushes to
storage graphs.
compactor: configure via cluster settings
The compactor can cause instability or degrade performance, and it does
so in an intransparent way, partially because its compactions are
low-priority and are thus regularly pending for upwards of ten minutes,
without being able to tell at which point work is being done. It's
important to be able to change its behavior, and in particular to
disable it, whether that's because of a known problem or a suspected
one. This wasn't possible at all previously. Now it can be achieved by
setting `compactor.enabled` to `false`; the advanced settings have
been marked as hidden.

Release note (general change): Introduced cluster settings to configure
the compactor.
craig[bot] and tschottdorf
Merge #25457
25457: backport-2.0: ui: add compaction/flushes to storage graphs r=a-robinson a=tschottdorf

Backport 1/1 commits from #25428.

/cc @cockroachdb/release

---

Knowing whether compactions are going on is helpful when diagnosing I/O
performance problems.

Release note (admin ui change): Add RocksDB compactions/flushes to
storage graphs.


Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
craig[bot] and couchand
Merge #25081
25081: release-2.0: ui: identify cluster to segment on load r=couchand a=couchand

Backport 1/1 commits from #24996.

/cc @cockroachdb/release

---

Fixes an issue where a bounce will never send an identify event to Segment.

Release note: None


Co-authored-by: Andrew Couch <andrew@cockroachlabs.com>
importccl: check node health and compatibility during IMPORT planning
Simplify the LoadCSV signature by taking just a PlanHookState for any
argument that can be fetched from it. Determine the node list using
this new health check function. We can remove the rand.Shuffle call
because the map iteration should produce some level of randomness.

Fixes #12876

Release note (bug fix): Fix problems with IMPORT sometimes failing
after node decommissioning.
rocksdb: use max_manifest_file_size option
Cockroach uses a single long running rocksdb instance for the entire
process lifetime, which could be many months. By default, rocksdb tracks
filesystem state changes in a log file called the MANIFEST, which grows
without bound until the instance is re-opened. We should bound the
maximum file size of rocksdb MANIFEST using the corresponding rocksdb
option to prevent unbounded growth.

The MANIFEST file grew to several GBs in size in a customer bug report
but that was probably because of some other bad behavior in rocksdb
state management. We do want to bound the MANIFEST size in such cases as
well.

Release note: None
storage: fix deadlock in consistency queue
When `CheckConsistency` returns an error, the queue checks whether the
store is draining to decide whether the error is worth logging.

Unfortunately this check was incorrect and would block until the store
actually started draining.

A toy example of this problem is below (this will deadlock). The dual
return form of chan receive isn't non-blocking -- the second parameter
indicates whether the received value corresponds to a closing of the
channel.

Switch to a `select` instead.

```
package main

import (
	"fmt"
)

func main() {
	ch := make(chan struct{})
	_, ok := <-ch
	fmt.Println(ok)
}
```

Touches #21824.

Release note (bug fix): Prevent the consistency checker from
deadlocking. This would previously manifest itself as a steady number of
replicas queued for consistency checking on one or more nodes and would
resolve by restarting the affected nodes.
3 authors
Merge #25471 #25474
25471: backport-2.0: rocksdb: use max_manifest_file_size option r=bdarnell a=tschottdorf

Backport 1/1 commits from #25341.

/cc @cockroachdb/release

---

Cockroach uses a single long running rocksdb instance for the entire
process lifetime, which could be many months. By default, rocksdb tracks
filesystem state changes in a log file called the MANIFEST, which grows
without bound until the instance is re-opened. We should bound the
maximum file size of rocksdb MANIFEST using the corresponding rocksdb
option to prevent unbounded growth.

The MANIFEST file grew to several GBs in size in a customer bug report
but that was probably because of some other bad behavior in rocksdb
state management. We do want to bound the MANIFEST size in such cases as
well.

Release note: None


25474: backport-2.0: storage: fix deadlock in consistency queue r=bdarnell a=tschottdorf

Backport 1/1 commits from #25456.

/cc @cockroachdb/release

---

When `CheckConsistency` returns an error, the queue checks whether the
store is draining to decide whether the error is worth logging.

Unfortunately this check was incorrect and would block until the store
actually started draining.

A toy example of this problem is below (this will deadlock). The dual
return form of chan receive isn't non-blocking -- the second parameter
indicates whether the received value corresponds to a closing of the
channel.

Switch to a `select` instead.

```go
package main

import (
	"fmt"
)

func main() {
	ch := make(chan struct{})
	_, ok := <-ch
	fmt.Println(ok)
}
```

Touches #21824.

Release note (bug fix): Prevent the consistency checker from
deadlocking. This would previously manifest itself as a steady number of
replicas queued for consistency checking on one or more nodes and would
resolve by restarting the affected nodes.


Co-authored-by: Garvit Juniwal <garvitjuniwal@gmail.com>
Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
craig[bot] and mjibson
Merge #25307
25307: release-2.0: importccl: check node health and compatibility during IMPORT planning r=mjibson a=mjibson

Backport 1/1 commits from #25162.

/cc @cockroachdb/release

---

Simplify the LoadCSV signature by taking just a PlanHookState for any
argument that can be fetched from it. Determine the node list using
this new health check function. We can remove the rand.Shuffle call
because the map iteration should produce some level of randomness.

Fixes #12876

Release note (bug fix): Fix problems with imports sometimes failing
after node decommissioning.


Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
storage: fix compactor-disabled test
The test was queuing suggestions worthy of a compaction before disabling
the compactor.

Also fixes a bug in the compactor that would over-increment the bytes
skipped metric whenever multiple suggestions were aggregated and
skipped.

Release note: None
Commits on May 15, 2018
craig[bot] and tschottdorf
Merge #25458
25458: backport-2.0: compactor: configure via cluster settings r=a-robinson a=tschottdorf

Backport 1/1 commits from #25397.

/cc @cockroachdb/release

---

The compactor can cause instability or degrade performance, and it does
so in an intransparent way, partially because its compactions are
low-priority and are thus regularly pending for upwards of ten minutes,
without being able to tell at which point work is being done. It's
important to be able to change its behavior, and in particular to
disable it, whether that's because of a known problem or a suspected
one. This wasn't possible at all previously. Now it can be achieved by
setting `compactor.threshold_bytes` and `max_record_age` to a very high
and low value, respectively.

Release note (general change): Introduced cluster settings to configure
the compactor.


Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
gossip: Increase node descriptor TTL and centralize intervals/TTLs
In the process, remove the old COCKROACH_GOSSIP_STORES_INTERVAL
environment variable. I can't find any mentions of people using it, and
setting it to larger than the 2 minute TTL for store gossip would be
dangerous.

Addresses part of #24753 by making it much less likely that nodes on
opposite sides of a gossip partition would just forget about each other.

Release note (bug fix): Prevent the internal gossip network from being
partitioned by making it much less likely that nodes in the network
could just forget about each other.
craig[bot] and a-robinson
Merge #25521
25521: cherrypick-2.0: gossip: Increase node descriptor TTL and centralize intervals/TTLs r=a-robinson a=a-robinson

In the process, remove the old COCKROACH_GOSSIP_STORES_INTERVAL
environment variable. I can't find any mentions of people using it, and
setting it to larger than the 2 minute TTL for store gossip would be
dangerous.

Addresses part of #24753 by making it much less likely that nodes on
opposite sides of a gossip partition would just forget about each other.

Release note (bug fix): Prevent the internal gossip network from being
partitioned by making it much less likely that nodes in the network
could just forget about each other.

Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
Commits on May 17, 2018
storage: ensure non-expired context before each liveness update attempt
Fixes #25430.

Before this change, a liveness update could get stuck in an infinite
loop if its context expired. This is because it would continue to retry
and continue to get `errRetryLiveness` errors due to
`AmbiguousResultErrors` created by `DistSender`.

Release note: None
Commits on May 18, 2018
sql: fix crash when propagating filters through groupNode
We were incorrectly passing the `info` for the `groupNode` as the
`info` for the input. This causes a crash, but only in fairly specific
circumstances: when we have a filter that we need to put in a
`filterNode` around the input, and the input has fewer columns than the
`groupNode`.

Fixes #25533.

Release note (bug fix): Fixed a crash in some cases when using a GROUP
BY with HAVING.
Commits on May 19, 2018
json: properly report capacity of encoded jsons
Previously, jsonEncoded structs reported their memory size as including
the underlying capacity of their encoded byte slices. This was
incorrect, as usually that underlying slice is an entire kv batch. This
problem led to significant overcounting of the memory used by queries
that read JSON data from disk, which resulted in spurious
BudgetExceededErrors.

Release note (bug fix): prevent spurious BudgetExceededErrors for some
queries that read a lot of JSON data from disk.

Co-authored-by: Alfonso Subiotto Marqués <alfonso@cockroachlabs.com>
craig[bot] and jordanlewis
Merge #25719
25719: backport-2.0: json: properly report capacity of encoded jsons r=jordanlewis a=jordanlewis

Backport 1/1 commits from #25679.

/cc @cockroachdb/release

Previously, jsonEncoded structs reported their memory size as including
the underlying capacity of their encoded byte slices. This was
incorrect, as usually that underlying slice is an entire kv batch. This
problem led to significant overcounting of the memory used by queries
that read JSON data from disk, which resulted in spurious
BudgetExceededErrors.

Release note (bug fix): prevent spurious BudgetExceededErrors for some
queries that read a lot of JSON data from disk.

Co-authored-by: Alfonso Subiotto Marqués <alfonso@cockroachlabs.com>

Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
Commits on May 20, 2018
craig[bot] and RaduBerinde
Merge #25654
25654: cherrypick-2.0: sql: fix crash when propagating filters through groupNode r=RaduBerinde a=RaduBerinde

We were incorrectly passing the `info` for the `groupNode` as the
`info` for the input. This causes a crash, but only in fairly specific
circumstances: when we have a filter that we need to put in a
`filterNode` around the input, and the input has fewer columns than the
`groupNode`.

Fixes #25533.

Release note (bug fix): Fixed a crash in some cases when using a GROUP
BY with HAVING.

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Commits on May 21, 2018
roachpb: add pMax to Percentiles
Fixes #25730.

This is a partial backport of #24905. It doesn't include the new
roachtest, but does include the change to crdb_internal.kv_store_status
necessary for the test to pass.

Release note: None
craig[bot] and m-schneider
Merge #25774
25774: roachpb: add pMax to Percentiles r=nvanbenschoten a=nvanbenschoten

Fixes #25730.

This is a partial backport of #24905. It doesn't include the new
roachtest, but does include the change to crdb_internal.kv_store_status
necessary for the test to pass.

/cc @cockroachdb/release

Release note: None


Co-authored-by: Masha Schneider <masha@cockroachlabs.com>
Commits on May 22, 2018
craig[bot] and nvanbenschoten
Merge #25634
25634: backport-2.0: storage: ensure non-expired context before each liveness update attempt r=nvanbenschoten a=nvanbenschoten

Backport 1/1 commits from #25631.

/cc @cockroachdb/release

---

Fixes #25430.

Before this change, a liveness update could get stuck in an infinite
loop if its context expired. This is because it would continue to retry
and continue to get `errRetryLiveness` errors due to
`AmbiguousResultErrors` created by `DistSender`.

Release note: None


Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
backport-2.0: cluster setting `cluster.preserve_downgrade_option`.
/cc @cockroachdb/release

Backport cluster setting `cluster.preserve_downgrade_option` to 2.0 so that
users can disable upgrade if they want to.

Release note: Added new cluster setting `cluster.preserve_downgrade_option`.
sqlbase: comp. cols work with missing insert cols
Previously, specifying only a partial set of columns to insert into a
table with computed columns could lead to incorrect results.

Release note (bug fix): fix a crash caused by inserting data into a
table with computed columns that reference other columns that weren't
present in the insert statement.
craig[bot] and windchan7
Merge #25811
25811: backport-2.0: cluster setting `cluster.preserve_downgrade_option`. r=windchan7 a=windchan7

/cc @cockroachdb/release

Backport cluster setting `cluster.preserve_downgrade_option` to 2.0 so that
users can disable upgrade if they want to.

Release note: Added new cluster setting `cluster.preserve_downgrade_option`.

Co-authored-by: Victor Chen <victor@cockroachlabs.com>
Commits on May 23, 2018
sql: remove duplicate dedupping in schema changer
This code already uses cfgFilter.ForModified()
to prevent reprocessing of unchanged schema changes.

Release note: None
sql: separate dropped tables needing GC from other schema changes
This is going to aid further improvements to this code.

Release note: None
sql: set DROP TABLE schema change to fire at GC TTL timeout
This is to stop reprocessing DROP TABLE schema changes
every 10 minutes. It has been found to be expensive to
reprocess such schema changes especially when many have
been queued up.

related to #24762

Release note (sql change): Stop DROP TABLE from hogging CPU
craig[bot] and vivekmenezes
Merge #25852
25852: backport-2.0: sql: Check zone config TTL changes only when re-processing system config  r=vivekmenezes a=vivekmenezes

Backport 4/4 commits from #24983.

/cc @cockroachdb/release

---




Co-authored-by: Vivek Menezes <vivek@cockroachlabs.com>
Commits on May 29, 2018
knz
cli: fix `cockroach quit`
This patch fixes the following:

- the logic in `doShutdown()` aims to ignore errors caused by attempts
  connect to a server which is closing its gRPC channels, but was
  missing one case of such errors: during the initial check whether
  the node was running. This patch causes gRPC "closed connection"
  errors to become also ignored in that case.

- previously if there was a transient gRPC error during a hard
  shutdown whereby the shutdown could still succeed, then `cockroach
  quit` would fail no matter what. This patch makes `cockroach quit`
  retry a hard shutdown.

- the warning messages are now emitted on stderr (via `log.Warningf`)
  instead of stdout.

Release note (bug fix): fix a bug where `cockroach quit` would
erroneously fail even though the node already successfully shut down.

Release note (cli change): `cockroach quit` now emits warning message
on its standard error stream, not standard output.
craig[bot] and jordanlewis
Merge #25807
25807: backport-2.0: sqlbase: comp. cols work with missing insert cols r=jordanlewis a=jordanlewis

Backport 1/1 commits from #25682.

/cc @cockroachdb/release

---

Previously, specifying only a partial set of columns to insert into a
table with computed columns could lead to incorrect results.

Fixes #25627.
Fixes #25670.

Release note (bug fix): fix a crash caused by inserting data into a
table with computed columns that reference other columns that weren't
present in the insert statement.


Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
knz
sql: fix a bug in UPSERT with column families
The UPSERT code on the "happy" path (either the fast path, or the long
path when there is no conflict) uses the INSERT row writer with a flag
"ignore duplicate errors", with the expectation that code would do the
Right Thing if the raw already exists (using KV Puts instead of
CPuts).

Unfortunately the INSERT rowwriter does the wrong thing when the row
already existed before, and the new row (or a column family in the new
row) contains just NULLs: in that case, the correct way to "update"
the row in KV is to issue a Del, which INSERT (expectedly) didn't do.

This patch addresses the issue by making the INSERT row writer able to
issue Puts.

Release note (bug fix): UPSERT is now properly able to write NULLs to
every columns in tables containing more than one column family.
knz
sql: fix a logic error in UPDATE with multiple column families
Prior to this patch, UPDATE on a column family other than the first
where the column family only contains 1 column, would write an empty
value in KV instead of erasing the K/V pair (as is expected).

This appeared to be OK because the table reader currently
automatically converts unexpected empty values in KV pairs to NULLs,
but the error was messing up with KV tracing.

Release note: None
knz
sql: factor some common code between INSERT and UPDATE
This was a TODO left by @danhhz a long time ago.

In fact merging this code was useful beyond DRY, since both
implementations were performing slightly different optimizations that
would benefit from being done together, always.

Release note: None
craig[bot] and knz
Merge #26181
26181: release-2.0: sql: miscellaneous INSERT, UPSERT and UPDATE correctness fixes r=knz a=knz

Backport 3/3 commits from #26169.

/cc @cockroachdb/release

Fixes #25726.

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Commits on May 30, 2018
cli: don't prompt for a password when a cert is provided
This was broken by a refactor (14fdc28). Add a test so it doesn't break again.

Fix #25165.

Release note (cli change): The CLI no longer prompts for a password when a
certificate is provided.
craig[bot] and benesch
Merge #26232
26232: release-2.0: cli: don't prompt for a password when a cert is provided r=bdarnell a=benesch

Backport 1/1 commits from #25252.

/cc @cockroachdb/release

Fix #26180.

---

This was broken by a refactor (14fdc28). Add a test so it doesn't break again.

Fix #25165.

Release note (cli change): The CLI no longer prompts for a password when a
certificate is provided.


Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
storage: avoid querying pusher where transaction record can't exist
This change checks whether a txn has a non-nil key before querying it
if it's waiting on an extant transaction which owns a conflicting intent.
Previously, the code would query the pusher's txn, even if the key was
nil, which would just send a spurious `QueryTxn` request to the first
range in the keyspace.

Release note: None
Commits on May 31, 2018
craig[bot] and spencerkimball
Merge #26245
26245: backport-2.0: storage: avoid querying pusher where transaction record can't exist r=tschottdorf a=tschottdorf

Backport 1/1 commits from #26238.

/cc @cockroachdb/release

---

This change checks whether a txn has a non-nil key before querying it
if it's waiting on an extant transaction which owns a conflicting intent.
Previously, the code would query the pusher's txn, even if the key was
nil, which would just send a spurious `QueryTxn` request to the first
range in the keyspace.

See #26059

Release note: None


Co-authored-by: Spencer Kimball <spencer.kimball@gmail.com>
Commits on Jun 04, 2018
libroach: fix excessive compactions performed by DBCompactRange
Fix excessive compactions from `DBCompactRange` due to mishandling of
the first and last ranges to compact. When a non-empty start or end key
is specified, DBCompactRange was previously calling
`rocksdb::DB::CompactRange` with a `null` start/end key resulting in
compacting from the beginning (or to the end) of the entire key space.

See #24029
craig[bot] and petermattis
Merge #26403
26403: release-2.0: libroach: fix excessive compactions performed by DBCompactRange r=bdarnell,benesch a=petermattis

Backport 1/1 commits from #26355.

/cc @cockroachdb/release

---

Fix excessive compactions from `DBCompactRange` due to mishandling of
the first and last ranges to compact. When a non-empty start or end key
is specified, DBCompactRange was previously calling
`rocksdb::DB::CompactRange` with a `null` start/end key resulting in
compacting from the beginning (or to the end) of the entire key space.

See #24029


Co-authored-by: Peter Mattis <petermattis@gmail.com>
Commits on Jun 05, 2018
knz
sql: properly mark current_date as impure
Release note (bug fix): Fixed a bug where a long-running query running
from one day to the next would not always produce the right value for
`current_date()`.
craig[bot] and knz
Merge #26413
26413: backport-2.0: sql: properly mark current_date as impure r=knz a=knz

Back-port of a small fix contained in #26370 for #26354.

cc @cockroachdb/release 


Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Commits on Jun 08, 2018
craig[bot] and knz
Merge #26163
26163: release-2.0: cli: fix `cockroach quit` r=knz a=knz

Backport 1/1 commits from #26158.

This `cockroach quit` bug is likely to affect production users, so we need to back-port it.

/cc @cockroachdb/release


Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Commits on Jun 12, 2018
storage: permit full leases in fatal logs from leasePostApply assertions
See #26315.

These lease structs don't contain any important or sensitive
information. It will be helpful if we ever see these assertions
fire for them to be included in the diagnostic report.

Release note: None
craig[bot] and nvanbenschoten
Merge #26638
26638: backport-2.0: storage: permit full leases in fatal logs from leasePostApply assertions r=nvanbenschoten a=nvanbenschoten

Backport 1/1 commits from #26534.

/cc @cockroachdb/release

---

See #26315.

These lease structs don't contain any important or sensitive
information. It will be helpful if we ever see these assertions
fire for them to be included in the diagnostic report.

Release note: None


Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
sql: fix bug in the copy protocol with large rows
We weren't handling correctly rows that were split by the client into
multiple protocol packages. And psql splits at 8191 bytes.

Fixes #25306

Release note (bug fix): Rows larger than 8192 bytes are now supported by the "copy
from" protocol.
sql: fix a bug with "copy from stdin" into a table that doesn't exist
Whenever an error occurred while "creating the copy machine", we used to
return an error to the ConnExecutor, which was dropping the connection,
leaving the client bewildered. Now the connection survives and the
client receives the proper error.

Release note (bug fix): Trying to "copy from stdin" into a table that
doesn't exist no longer drops the connection.
sql,compactor: rate limit clear range requests
Now that DBCompactRange no longer attempts to compact the entire
database (#26355), sending ClearRange requests in sequential batches of
50is enough to prevent a large DROP TABLE from bricking a cluster.
They're slow enough that the compaction queue can keep up and purge
range deletion tombstones before enough pile up to wedge the cluster.

This is a partial fix for #24029.

Release note (bug fix): The performance impact of dropping a large table
has been substantially reduced.
craig[bot] and andreimatei
Merge #26641
26641: cherrypick 2.0: sql: fixing "copy from stdin" issues r=andreimatei a=andreimatei

Cherry-pick of #26345

See individual commits.

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
compactor: never stop the loop
The compactor had an optimization that would stop the compactor loop
when no work was to be done.

Unfortunately, the condition to probe this was incorrect. Essentially,
whenever *any* work had been done, the loop stopped and would not
reawaken until the next incoming suggestion. In the absence of such a
suggestion, the skipped existing suggestions would never be revisited,
and thus never discarded. This created confusion as it kept the "queued
bytes" metric up.

Refactor the code so that instead of stopping, the loop waits for the
max age before running again (except if there are further suggestions).

It's hard to use timers correctly, so this commit should be scrutinized.
In particular, whether it's correct to call `t.Reset` in a select that
does not read from the timer's channel.

The test change makes `make test PKG=./pkg/storage/compactor` fail
every time on my machine before the fix.

I would suggest not backporting this, at least not until there is a
follow-up bugfix that needs backporting and doesn't apply cleanly on top
of this diff.

Fixes #21824.

Release note (bug fix): Expired compaction suggestions are now dropped
not too soon after they expire. Previously, this could be delayed
indefinitely.
craig[bot] and tschottdorf
Merge #26659
26659: backport-2.0: compactor: never stop the loop r=bdarnell a=tschottdorf

Backport 2/2 commits from #26039.

/cc @cockroachdb/release

---

The compactor had an optimization that would stop the compactor loop when
no work was to be done.

Unfortunately, the condition to probe this was incorrect. Essentially,
whenever *any* work had been done, the loop stopped and would not reawaken
until the next incoming suggestion. In the absence of such a suggestion,
the skipped existing suggestions would never be revisited, and thus never
discarded. This created confusion as it kept the "queued bytes" metric up.

Refactor the code so that instead of stopping, the loop waits for the max
age before running again (except if there are further suggestions).

It's hard to use timers correctly, so this commit should be scrutinized. In
particular, whether it's correct to call `t.Reset` in a select that does
not read from the timer's channel.

The test change makes `make test PKG=./pkg/storage/compactor` fail every
time on my machine before the fix.

I would suggest not backporting this, at least not until there is a
follow-up bugfix that needs backporting and doesn't apply cleanly on top of
this diff.

Fixes #21824.

Release note (bug fix): Expired compaction suggestions are now dropped not
too soon after they expire. Previously, this could be delayed indefinitely.


Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Commits on Jun 13, 2018
craig[bot] and benesch
Merge #26615
26615: release-2.0: sql,compactor: rate limit clear range requests r=bdarnell,petermattis a=benesch

Backports #26449.

I'm running a clearrange roachtest with this diff applied tonight. If it passes we're good to go.

```diff
diff --git a/pkg/cmd/roachtest/clearrange.go b/pkg/cmd/roachtest/clearrange.go
index ea5bcdff8..2b244af6d 100644
--- a/pkg/cmd/roachtest/clearrange.go
+++ b/pkg/cmd/roachtest/clearrange.go
@@ -30,19 +30,9 @@ func registerClearRange(r *registry) {
 		// thoroughly brick the cluster.
 		Stable: false,
 		Run: func(ctx context.Context, t *test, c *cluster) {
-			t.Status(`downloading store dumps`)
-			// Created via:
-			// roachtest --cockroach cockroach-v2.0.1 store-gen --stores=10 bank \
-			//           --payload-bytes=10240 --ranges=0 --rows=65104166
-			fixtureURL := `gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1`
-			location := storeDirURL(fixtureURL, c.nodes, "2.0")
+			t.Status(`waiting for compactions to disappear`)
+			time.Sleep(90 * time.Minute)
 
-			// Download this store dump, which measures around 2TB (across all nodes).
-			if err := downloadStoreDumps(ctx, c, location, c.nodes); err != nil {
-				t.Fatal(err)
-			}
-
-			c.Put(ctx, cockroach, "./cockroach")
 			c.Start(ctx)
 
 			// Also restore a much smaller table. We'll use it to run queries against
@@ -81,7 +71,7 @@ func registerClearRange(r *registry) {
 				// above didn't brick the cluster.
 				//
 				// Don't lower this number, or the test may pass erroneously.
-				const minutes = 60
+				const minutes = 120
 				t.WorkerStatus("repeatedly running COUNT(*) on small table")
 				for i := 0; i < minutes; i++ {
 					after := time.After(time.Minute)

```

---

Now that DBCompactRange no longer attempts to compact the entire
database (#26355), sending ClearRange requests in sequential batches of
50is enough to prevent a large DROP TABLE from bricking a cluster.
They're slow enough that the compaction queue can keep up and purge
range deletion tombstones before enough pile up to wedge the cluster.

This is a partial fix for #24029.

Release note (bug fix): The performance impact of dropping a large table
has been substantially reduced.

Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
sql: Check CHECK constraints on upsert when updating a conflicting row
* Before: On a table with a CHECK constraint, such as:

    ```
    CREATE TABLE ex(foo INT PRIMARY KEY, bar INT, CHECK (bar < 2)
    ```

    an upsert on the table would not properly check the CHECK constraints,
    e.g.

    ```
    INSERT INTO ex(foo, bar) VALUES(1, 2);
    INSERT INTO ex(foo, bar) VALUES(1, 2) ON CONFLICT (foo) DO UPDATE SET
        bar = 3
    ```

    would update the row, violating the check constraint.

* After: On attempting such an operation, CockroachDB now throws the
proper database error "failed to satisfy CHECK constraint ..."

Release note (sql change): CHECK constraints are now checked when
updating a conflicting row in INSERT ... ON CONFLICT DO UPDATE
statements.
craig[bot] and emsal1863
Merge #26699
26699: release-2.0: sql: Check CHECK constraints on upsert when updating a conflicting row r=emsal1863 a=emsal1863

Backport 1/1 commits from #26642.

/cc @cockroachdb/release

---

* Before: On a table with a CHECK constraint, such as:

    ```
    CREATE TABLE ex(foo INT PRIMARY KEY, bar INT, CHECK (bar < 2)
    ```

    an upsert on the table would not properly check the CHECK constraints,
    e.g.

    ```
    INSERT INTO ex(foo, bar) VALUES(1, 2);
    INSERT INTO ex(foo, bar) VALUES(1, 2) ON CONFLICT (foo) DO UPDATE SET
        bar = 3
    ```

    would update the row, violating the check constraint.

* After: On attempting such an operation, CockroachDB now throws the
proper database error "failed to satisfy CHECK constraint ..."

Release note (sql change): CHECK constraints are now checked when
updating a conflicting row in INSERT ... ON CONFLICT DO UPDATE
statements.

closes #23699 


Co-authored-by: Emmanuel <absolutezero2a03@gmail.com>
server: use the right ctx when logging an error
We were using a context that was potentially long gone, leading to a
span use-after-free crash.

Fixes #25389

Release note (bug fix): Fixed a rare crash on node decomissioning.
Commits on Jun 14, 2018
craig[bot] and andreimatei
Merge #26717
26717: cherrypick 2.0: server: use the right ctx when logging an error r=andreimatei a=andreimatei

Cherry-pick of #26706

We were using a context that was potentially long gone, leading to a
span use-after-free crash.

Fixes #25389

Release note (bug fix): Fixed a rare crash on node decomissioning.

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
backport 2.0: sql: Fix panic adding unique, non-indexable column
This change updates the `checkColumnsFor...Index` functions to use
`allNonDropColumns()` to validate against any proposed mutations.  Otherwise, a
`ColumnMutation` that adds a non-indexable column followed by an
`IndexMutation` creating an index on that column would would be incorrectly
accepted, leading to a panic.

Fixes #26483

Release note (sql change): Return an error to the user instead of panicing when
trying to add a column with a unique constraint when that column's type is not
indexable.
craig[bot] and bobvawter
Merge #26728
26728: backport 2.0: sql: Fix panic adding unique, non-indexable column r=bobvawter a=bobvawter

This change updates the `checkColumnsFor...Index` functions to use
`allNonDropColumns()` to validate against any proposed mutations.  Otherwise, a
`ColumnMutation` that adds a non-indexable column followed by an
`IndexMutation` creating an index on that column would would be incorrectly
accepted, leading to a panic.

Fixes #26483

Release note (sql change): Return an error to the user instead of panicing when
trying to add a column with a unique constraint when that column's type is not
indexable.

Co-authored-by: Bob Vawter <bob@cockroachlabs.com>
Commits on Jun 15, 2018
Bump RocksDB pointer to grab facebook/rocksdb#3926
RocksDB was violating an invariant that no 2 sstables in a level
overlap. It isn't quite clear what the upshot of this violation is. At
the very least it would cause the overlapping tables to be compacted
together. It seems possible that it could lead to missing writes, but I
haven't been able to verify that.

Fixes #26693

Release note: None
craig[bot] and petermattis
Merge #26755
26755: release-2.0: Bump RocksDB pointer to grab facebook/rocksdb#3926 r=benesch,a-robinson a=petermattis

RocksDB was violating an invariant that no 2 sstables in a level
overlap. It isn't quite clear what the upshot of this violation is. At
the very least it would cause the overlapping tables to be compacted
together. It seems possible that it could lead to missing writes, but I
haven't been able to verify that.

Fixes #26693

Release note: None

Co-authored-by: Peter Mattis <petermattis@gmail.com>
Commits on Jun 18, 2018
storage: change sideloadStorage.PutIfNotExist to Put
Fixes #24025.

This commit changes the semantics of `sideloadStorage.PutIfNotExist` to
overwrite the file if it already exists instead of no-oping. In doing
so, it changes the methods name to `sideloadStorage.Put`. This ensures
that if a call to `PutIfNotExist` fails midway through, a future retry
will retry the file write instead of noticing that a file exists and
short circuiting.

Release note: None
cli: fix use of finalized context
One more use of the wrong context, possibly resulting in panics while
handling log.Fatal().

Release note: None
3 authors
Merge #26798 #26805
26798: backport-2.0: storage: change sideloadStorage.PutIfNotExist to Put r=nvanbenschoten a=nvanbenschoten

Backport 1/1 commits from #24893.

/cc @cockroachdb/release

---

Fixes #24025.

This commit changes the semantics of `sideloadStorage.PutIfNotExist` to
overwrite the file if it already exists instead of no-oping. In doing
so, it changes the methods name to `sideloadStorage.Put`. This ensures
that if a call to `PutIfNotExist` fails midway through, a future retry
will retry the file write instead of noticing that a file exists and
short-circuiting.

I confirmed that this fixes #24025 by putting a TPC-C 1k restore under heavy
chaos and setting `setting kv.bulk_sst.sync_size` to 16kb. Without this change,
the crash from #24025 consistently occurred after only a few minutes. With
this change, I never saw the crash and the restores always eventually succeeded.

Release note: None


26805: release 2.0: cli: fix use of finalized context r=andreimatei a=andreimatei

Cherry-pick of #26726 
One more use of the wrong context, possibly resulting in panics while
handling log.Fatal().

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Commits on Jun 19, 2018
release-2.0: fix timeseries success/error counters
This has been fixed on master for a while, but in a PR
that isn't going to be backported (#25343).

Release note (bug fix): Successes of time series maintenance queue
operations were previously counted as errors in the metrics dashboard.
sql: bugfix to interleaved join planning criteria
Previously, interleaved joins could take place if the tables were
ancestors of each other, there was a merge join ordering on the equality
columns, and the left equality columns were a prefix of the interleaved
columns. These criteria are insufficient: the right equality columns
must also be a prefix of the interleaved columns. A case where this
wouldn't naturally hold is where the right equality columns get a merge
join ordering because they're known to be constant.

Release note (bug fix): joins across two interleaved tables no longer
return incorrect results under certain circumstances when the equality
columns aren't all part of the interleaved columns.
craig[bot] and jordanlewis
Merge #26832
26832: backport-2.0: sql: bugfix to interleaved join planning criteria r=jordanlewis a=jordanlewis

Backport 1/1 commits from #26756.

/cc @cockroachdb/release

---

Previously, interleaved joins could take place if the tables were
ancestors of each other, there was a merge join ordering on the equality
columns, and the left equality columns were a prefix of the interleaved
columns. These criteria are insufficient: the right equality columns
must also be a prefix of the interleaved columns. A case where this
wouldn't naturally hold is where the right equality columns get a merge
join ordering because they're known to be constant.

Release note (bug fix): joins across two interleaved tables no longer
return incorrect results under certain circumstances when the equality
columns aren't all part of the interleaved columns.

Fixes #25838. I verified that the new test fails before the fix was applied.


Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
Commits on Jun 20, 2018
craig[bot] and tschottdorf
Merge #26820
26820: release-2.0: storage: fix timeseries success/error counters r=a-robinson a=tschottdorf

This has been fixed on master for a while, but in a PR
that isn't going to be backported (#25343).

Release note (bug fix): Successes of time series maintenance queue
operations were previously counted as errors in the metrics dashboard.

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
sqlmigrations: simplify descriptor validation to get around #26422
This validation is fine because the function itself is only
upgrading the format via the migration path and not changing any
cross references. If in the future if we decide to use this
descriptor upgrade function for something more complex
we can reintroduce the use of Validate.

related to #26422

Release note: None
craig[bot] and vivekmenezes
Merge #26857
26857: backport-2.0: sqlmigrations: simplify descriptor validation to get around #26422 r=vivekmenezes a=vivekmenezes

Backport 1/1 commits from #26784.

/cc @cockroachdb/release

---

This validation is fine because the function itself is only
upgrading the format.

related to #26422

Release note: None


Co-authored-by: Vivek Menezes <vivek@cockroachlabs.com>
Commits on Jun 25, 2018
storage: filter abort span during split copy
We are suspecting that some workloads create a sufficient number of
abort span records to cause split commands that are too large to be
proposed to Raft, in effect rendering the range permanently
unsplittable. This is a problem since the range will at some point
start to backpressure writes (and even without that, it's a resource
hog). Most of the problematic abort span records would very likely
be removed during garbage collection; however, abort span records
aren't counted in any quantity that triggers GC.

Instead of copying the entirety of the abort span, restrict to the
entries that would not be removed by a GC operation. In practice,
this means that unless a high number of abort span records are
created every hour, very few records will actually be copied, and
in return the split command size should be small.

See #26830.

Release note (bug fix): Avoid a situation in which ranges repeatedly
fail to perform a split.
craig[bot] and tschottdorf
Merge #26944
26944: backport-2.0: storage: filter abort span during split copy r=bdarnell a=tschottdorf

Backport 1/1 commits from #26934.

Needed slight adjustments to apply, but nothing in the split logic.

/cc @cockroachdb/release

---

We are suspecting that some workloads create a sufficient number of
abort span records to cause split commands that are too large to be
proposed to Raft, in effect rendering the range permanently
unsplittable. This is a problem since the range will at some point
start to backpressure writes (and even without that, it's a resource
hog). Most of the problematic abort span records would very likely
be removed during garbage collection; however, abort span records
aren't counted in any quantity that triggers GC.

Instead of copying the entirety of the abort span, restrict to the
entries that would not be removed by a GC operation. In practice,
this means that unless a high number of abort span records are
created every hour, very few records will actually be copied, and
in return the split command size should be small.

See #26830.

Release note (bug fix): Avoid a situation in which ranges repeatedly
fail to perform a split.


Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
distsql: do not plan against unhealthy nodes
A bug was introduced in 0cd1da0 which allows table readers to be planned
on unhealthy or incompatible nodes for LIMIT queries. They should use
the gateway node instead. This was causing a panic in execution because
the node was not in the nodeAddresses map.

Fixes #26140

Release note (bug fix): Fixed 'node not in nodeAddresses map' panic,
which could occur when distributed LIMIT queries were run on a cluster
with at least one unhealthy node.
craig[bot] and solongordon
Merge #26953
26953: release-2.0: distsql: do not plan against unhealthy nodes r=solongordon a=solongordon

Backport 1/1 commits from #26950.

/cc @cockroachdb/release

---

A bug was introduced in 0cd1da0 which allows table readers to be planned
on unhealthy or incompatible nodes for LIMIT queries. They should use
the gateway node instead. This was causing a panic in execution because
the node was not in the nodeAddresses map.

Fixes #26140

Release note (bug fix): Fixed 'node not in nodeAddresses map' panic,
which could occur when distributed queries were run on a cluster with at
least one unhealthy node.


Co-authored-by: Solon Gordon <solon@cockroachlabs.com>
Commits on Jun 26, 2018
importccl: set descriptor DropTime during failed IMPORT
This allows ClearRange to be used instead of the 1.1-era batch deletions.

Release note (bug fix): failed IMPORTs will now begin to clean up
partially imported data immediatily and in a faster manner.
craig[bot] and mjibson
Merge #26986
26986: release-2.0: importccl: set descriptor DropTime during failed IMPORT r=mjibson a=mjibson

Backport 1/1 commits from #26959.

/cc @cockroachdb/release

---

This allows ClearRange to be used instead of the 1.1-era batch deletions.

Release note (bug fix): failed IMPORTs will now begin to clean up
partially imported data immediatily and in a faster manner.


Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
Commits on Jun 27, 2018
release-2.0: cherry-pick etcd-io/etcd#9887
Release note (bug fix): Alleviate a scenario in which a large number of
uncommitted Raft commands could cause memory pressure at startup time.
Commits on Jun 28, 2018
craig[bot] and bdarnell
Merge #27024
27024: release-2.0: cherry-pick etcd-io/etcd#9887 r=tschottdorf a=bdarnell

This is the release-2.0 counterpart to #27009.

Release note (bug fix): Alleviate a scenario in which a large number of
uncommitted Raft commands could cause memory pressure at startup time.

Co-authored-by: Ben Darnell <ben@cockroachlabs.com>
knz
sql: ensure that serializable means serializable
Prior to this patch SET transaction_isolation would always set the iso
level to SNAPSHOT regardless of the value passed. This patch fixes it.

Release note (bug fix): the pg-specific syntax `SET
transaction_isolation` did not support any setting other than SNAPSHOT
properly. (This bug did not affect the standard SQL `SET TRANSACTION
ISOLATION LEVEL`.) This is now fixed.
Commits on Jun 29, 2018
craig[bot] and knz
Merge #27047
27047: sql-2.0: ensure that serializable means serializable r=knz a=knz

Prior to this patch SET transaction_isolation would always set the iso
level to SNAPSHOT regardless of the value passed. This patch fixes it.

Release note (bug fix): the pg-specific syntax `SET
transaction_isolation` did not support any setting other than SNAPSHOT
properly. (This bug did not affect the standard SQL `SET TRANSACTION
ISOLATION LEVEL`.) This is now fixed.

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Commits on Jul 03, 2018
ui: make sure we identify the cluster
In #24996 I accidentally moved the identify call rather than copying it,
meaning we frequently failed to issue the identify.  This fixes that
by trying to identify on every page if we have not yet.

Release note: None
build: assert clean for submodules
I saw failing builds due to a dirty `c-deps/protobuf` directory; this
isn't cleaned up by the prior version of the script (and it also does
not tell you the diff). This new one will.

Also added a second `-f` to `git clean` which does not stop in nested
repos (excluding submodules). We don't need that but it seemed right
to add it.

Release note: None
ui: fix typo in node map onboarding screen
Fixes #25096

Release note (admin ui change): fix typo in node map onboarding screen
3 authors
Merge #27129
27129: backport-2.0: ui: fix typo in node map onboarding screen r=benesch a=vilterp

Backport of #25112

Fixes #25096

Release note (admin ui change): fix typo in node map onboarding screen

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Co-authored-by: Pete Vilter <vilterp@cockroachlabs.com>
Commits on Jul 04, 2018
cli: Deprecate --wait=live in `node decommission`
Refer to issue #26880.

When you try to decommission a node that is down, today one has to use
`decommission --wait=live`, which does not verify whether the down node
is part of any ranges, and manually verify that the cluster has
up-replicated all ranges elsewhere. This is far from ideal, in particular
since there is no automated way to reliably check this.

`--wait=live` is deprecated and its behaviour replaced by that of
`--wait=all`. Instead of relying on metrics for replica counts, which
may be stale, look at authoritive source - range metadata.

Release note (cli change):
Deprecate `--wait=live` parameter for `node decommission`. `--wait=all`
is the default behaviour. This ensures CockroachDB checks ranges on the
node to be decommissioned are not under-replicated before the node is
decommissioned.
Commits on Jul 05, 2018
kv: Bail out early if context is canceled
I think we've seen commands pile up in the command queue because we
don't check for cancellation until too late in the local execution
path.

Release note (bug fix): Abandon commands earlier when a deadline has
been reached.
Commits on Jul 06, 2018
knz
sql: fix a pretty-print rendering bug for DISTINCT ON
The DISTINCT ON clause was not pretty-printed correctly (its
pretty-printing code was simply not tested). This would impact the
correctness of reporting data, statement statistics, and
pretty-printed output, between other things.

Release note (bug fix): the DISTINCT ON clause was not reported
properly in statement statistics. This is now fixed.
craig[bot] and knz
Merge #27222
27222: release-2.0: sql: fix a pretty-print rendering bug for DISTINCT ON r=knz a=knz

Backport 1/1 commits from #27221.

/cc @cockroachdb/release

---

The DISTINCT ON clause was not pretty-printed correctly (its
pretty-printing code was simply not tested). This would impact the
correctness of reporting data, statement statistics, and
pretty-printed output, between other things.

Release note (bug fix): the DISTINCT ON clause was not reported
properly in statement statistics. This is now fixed.



Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
sql: fix union of cols w/ mismatched type aliases
Previously, the ColumnType.Equals method provided by protobuf generated
code was used to compare column types for equality when determining
whether two DistSQL streams were mergeable. This was incorrect because
column types can have type aliases known as VisibleTypes, which might
not match despite the underlying column types being equal. An example is
the column types INT and INTEGER. We preserve the difference between
those two, so that SHOW CREATE TABLE produces the same output as the
original CREATE TABLE.

To fix this, a new ColumnType.Equivalent method is introduced that
ignores the VisibleType field on ColumnTypes.

Release note: None
sql: fix panic with UNION ALL
Previously, UNION ALL distsql plans could panic the server if the two
sub plans had different post processing. Now, this condition is
detected, and a no-op stage is inserted after the plans if necessary.

Release note (bug fix): fix a server crash when trying to plan
certain UNION ALL queries.
craig[bot] and jordanlewis
Merge #27233
27233: cherrypick-2.0: fix panic with UNION ALL r=asubiotto a=asubiotto

Cherry-pick #25720 and #25747. There was a merge conflict to do with the metadata test receiver, which is not present in 2.0, so that code was removed.

Fixes #27200

Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
Commits on Jul 09, 2018
craig[bot] and couchand
Merge #27133
27133: release-2.0: ui: make sure we identify the cluster r=couchand a=couchand

Backport 1/1 commits from #27132.

/cc @cockroachdb/release

---

In #24996 I accidentally moved the identify call rather than copying it,
meaning we frequently failed to issue the identify.  This fixes that
by trying to identify on every page if we have not yet.

Release note: None


Co-authored-by: Andrew Couch <andrew@cockroachlabs.com>
craig[bot] and bdarnell
Merge #27215
27215: cherrypick-2.0: Add early checks for context cancellation r=tschottdorf a=bdarnell

Cherry-picks 2/3 commits from #26643. The other commit is left out because the batching logic in release-2.0 is different and doesn't have the same continue-after-error problem. 

Co-authored-by: Ben Darnell <ben@cockroachlabs.com>
build: stress by number of runs, not by time
We previously launched a 15m stress job for every package. This resulted
in extremely poor resource allocation: packages with many tests, like
the sql package, would run each of its tests only a few times, while
packages with few tests, like package workload, would run each of its
tests thousands of times.

Instead, run each package's test suite 100 times. This effectively
weights packages by the running time of their tests. Packages with more
tests or longer tests will use proportionally more CPU time. Total
runtime of the nightly stress jobs will go down, but coverage should not
meaningfully decrease, and will in fact increase for particularly large
packages. We can tweak the number of runs in the future to either
increase coverage or decrease runtime, as desired.

For reference, a test suite that takes one minute to run will now
require approximately 12.5 minutes of wall-clock time on our eight-core
TeamCity agents. Package sql, whose test suite takes approximately three
minutes to run, will take about 38 minutes to complete. Package
workload, whose test suite takes less than one second, will complete in
about ten seconds.

Release note: None
importccl: correctly error when encountering a sequence operation
Fixes #26850

Release note (bug fix): fix a panic in IMPORT when creating a table
using a sequence operation in a column DEFAULT (like nextval).
craig[bot] and mjibson
Merge #27294
27294: cherrypick-2.0: importccl: correctly error when encountering a sequence operation r=mjibson a=mjibson

Cherrypick #27122 
Fixes #26850

Release note (bug fix): fix a panic in IMPORT when creating a table
using a sequence operation in a column DEFAULT (like nextval).

Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
Commits on Jul 10, 2018
server: check all meta records during decommissioning
By scanning only the meta2 information, replicas of the meta2 ranges
itself would be ignored. Include meta1 in the scan to address this.

This wasn't released, so no release note has been added.

Release note: None
cli: massage the behavior for release-2.0
We want to wait for the (now consistent) replica count
even if --wait=live is specified, though we do not
want to change anything about the output.

Release note: None
cherrypick-2.0:storage: add InitialHeartBeatFailedError as a structur…
…ed error

This patch makes the GRPC Connect function return a structured error
when it's unable to connect and has never seen a heartbeat on that
connection. Before it returned an ambiguous error which prevented the
range descrptor cache from evicting stale entries upstream. In some cases
this could lead to a node being unable to find nodes that have replicas
for a given range because its cache is stale, but it doesn't refresh it
because there's an ambiguous error.

Closes #27109

Release note: None
craig[bot] and m-schneider
Merge #27338
27338: cherrypick-2.0:storage: add InitialHeartBeatFailedError as a structured error r=m-schneider a=m-schneider

This patch makes the GRPC Connect function return a structured error
when it's unable to connect and has never seen a heartbeat on that
connection. Before it returned an ambiguous error which prevented the
range descrptor cache from evicting stale entries upstream. In some cases
this could lead to a node being unable to find nodes that have replicas
for a given range because its cache is stale, but it doesn't refresh it
because there's an ambiguous error.

Closes #27109

Release note: None

Co-authored-by: Masha Schneider <masha@cockroachlabs.com>
3 authors
Merge #27158
27158: backport-2.0: cli: Deprecate --wait=live in `node decommission` r=bdarnell a=tschottdorf

This is a bit of an unusual backport. The previous behavior is correctly documented, but it simply isn't that useful to operators who want to feel reasonably certain that cluster health is back to green before carrying on with their day. I'm not sure what our policy here should be this late into 2.0, but I'm tempted to say that we're not going to cherry-pick this. @bdarnell, what's your take here?

Backport 1/1 commits from #27027.

/cc @cockroachdb/release

---

Refer to issue #26880.

When you try to decommission a node that is down, today one has to use
`decommission --wait=live`, which does not verify whether the down node
is part of any ranges, and manually verify that the cluster has
up-replicated all ranges elsewhere. This is far from ideal, in particular
since there is no automated way to reliably check this.

`--wait=live` is deprecated and its behaviour replaced by that of
`--wait=all`. Instead of relying on metrics for replica counts, which
may be stale, look at authoritive source - range metadata.

Release note (cli change):
Deprecate `--wait=live` parameter for `node decommission`. `--wait=all`
is the default behaviour. This ensures CockroachDB checks ranges are on
the node to be decommissioned are not under-replicated before the node
is decommissioned.


Co-authored-by: neeral <neeral@users.noreply.github.com>
Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Commits on Jul 18, 2018
knz
sql/tree: fix the walk recursion for WindowDef and OrderBy
This was previously wrong.

Release note: None
craig[bot] and knz
Merge #27711
27711: cherrypick-2.0: sql/tree: fix the walk recursion for WindowDef and OrderBy r=knz a=knz

Picks #27695

cc @cockroachdb/release 

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Commits on Jul 19, 2018
server: persist correct cluster version on join
NB: I had to adapt the test to use versions 1.0 and 2.0, I have verified
that it fails without the remainder of the PR.

----

Prior to this commit, when a node joined an existing cluster, it would
likely persist its MinSupportedVersion as the cluster version (even
though it would operate at the correct version). This is problematic
since this node's data directories wouldn't be regarded compatible with
the successor version of the real version, even though they are.

The problem was that the callback which receives the real cluster
version would fire before the stores were bootstrapped, and bootstrap
would erroneously pick the minimum supported version. We now "backfill"
the correct cluster version after having bootstrapped all stores.

Even though the fixtures will need to be regenerated:
Fixes #27525.

Optimistically:
Fixes #27162.

Release note: None
craig[bot] and tschottdorf
Merge #27741
27741: backport-2.0: server: persist correct cluster version on join r=petermattis a=tschottdorf

Backport 1/2 commits from #27639.

I had to adapt the test, and I have verified that it fails without the main change.

/cc @cockroachdb/release

---

Prior to this commit, when a node joined an existing cluster, it would
likely persist its MinSupportedVersion as the cluster version (even
though it would operate at the correct version). This is problematic
since this node's data directories wouldn't be regarded compatible with
the successor version of the real version, even though they are.

The problem was that the callback which receives the real cluster
version would fire before the stores were bootstrapped, and bootstrap
would erroneously pick the minimum supported version. We now "backfill"
the correct cluster version after having bootstrapped all stores.

Even though the fixtures will need to be regenerated:
Fixes #27525.

Optimistically:
Fixes #27162.

Release note: None


Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Commits on Jul 20, 2018
knz
sql: report unimplemented error on ORDER BY in function arguments
CockroachDB currently does not support specifying the order of
aggregations (#23620) but accepts the syntax silently. This is not
useful, because it creates the wrong expectations with users and
prevents CRL from learning about user demand for the feature.

This patch ensures that the syntax reports an unimplemented error.

Note that this will break Sequelize support until the issue #25148 is
solved.

Release note (sql change): CockroachDB now properly reports an error
when a query attempts to use ORDER BY within a function argument list,
which is an unsupported feature.
craig[bot] and knz
Merge #25147
25147: cherrypick-2.0: sql: report unimplemented error on ORDER BY in function arguments r=knz a=knz

Picks #25146.


Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Commits on Jul 29, 2018
storage/engine: limit the size of batch groups
Limit the size of batch groups to 1 MB. A larger batch can still be
committed, but it won't be grouped with any other batches.

Fixes #27865

Release note (bug fix): Limit the size of "batch groups" when committing
a batch to RocksDB to avoid rare scenarios in which multi-gigabyte batch
groups are created which can cause a server to run out of memory when
replaying the RocksDB log at startup.
Commits on Aug 01, 2018
justinj and knz
sql: support binary format for DInterval
Fixes #24525.

Release note (sql change): The binary postgres wire format is now supported for
intervals.
craig[bot] and justinj
Merge #28135
28135: release-2.0: sql: support binary format for DInterval r=knz a=knz

Backport 1/1 commits from #25242.

Fixes #28134

/cc @cockroachdb/release

---

Release note (sql change): The binary postgres wire format is now supported for
intervals.


Co-authored-by: Justin Jaffray <justin@cockroachlabs.com>
cli: Fix alignment in `node status` output
`cockroach node status` will hide decommissioning or dead nodes, unless
the user requests all nodes to be shown. The data shown is collected
from two different RPC calls and these do not line up. For example, a
node's liveness will be put together with a different node's
decommission status.

To view this, decommission the node with NodeID 1 in a cluster. Run
`cockroach node status --decommission` and `cockroach node status`. Look
at the columns IsLive and UpdatedAt. You'll see they go out of sync.

Release note: None
Commits on Aug 02, 2018
dt
storageccl: don't skip row after deleted row
We don't need to manually advance the iterator before calling `continue`
since the for loop advances it as well, and doing so means the row
_following_ a deleted row is also skipped.

Fixes #28171.

Release note (bug fix): Fix bug that could skip the row following a deleted row during BACKUP.
storage: fix batch commit performance regression
PR #27895 had a subtle bug. Only the leader at the head of the pending
queue was waiting for the previous commit to finish. The other leaders
were marching right past the wait loop and committing their singleton
batch. This was not a correctness problem, only a performance
problem. But fixing it did reveal two other correctness problems. First,
we need to broadcast to wake all waiters on the cond. Prior to fix the
first bug there was only one cond waiter, so this was only a lurking
problem. Second, we need to cap the capacity of the slice passed to the
syncer to prevent concurrent mutation (this showed up readily as a race
condition).

Release note (performance improvement): Fix batch commit performance
regression that reduced write performance by 20%.
craig[bot] and dt
Merge #28196
28196: release-2.0: storageccl: don't skip row after deleted row r=dt a=dt

Backport 1/1 commits from #28172.

/cc @cockroachdb/release

---

We don't need to manually advance the iterator before calling `continue`
since the for loop advances it as well, and doing so means the row
_following_ a deleted row is also skipped.

Fixes #28171.

Release note (bug fix): Fix bug that could skip the row following a deleted row during BACKUP.


Co-authored-by: David Taylor <tinystatemachine@gmail.com>
craig[bot] and petermattis
Merge #28009
28009: release-2.0: storage/engine: limit the size of batch groups r=benesch a=petermattis

Backport 1/1 commits from #27895.

/cc @cockroachdb/release

Queuing this up so we don't forget about it.

---

Limit the size of batch groups to 1 MB. A larger batch can still be
committed, but it won't be grouped with any other batches.

Fixes #27865

Release note (bug fix): Limit the size of "batch groups" when committing
a batch to RocksDB to avoid rare scenarios in which multi-gigabyte batch
groups are created which can cause a server to run out of memory when
replaying the RocksDB log at startup.


Co-authored-by: Peter Mattis <petermattis@gmail.com>
Commits on Aug 03, 2018
craig[bot] and neeral
Merge #27808
27808: cli: fix alignment in `node status` output r=benesch a=neeral

See #27807

`cockroach node status` will hide decommissioning or dead nodes, unless
the user requests all nodes to be shown. The data shown is collected
from two different RPC calls and these do not line up. For example, a
node's liveness will be put together with a different node's
decommission status.

To view this, decommission the node with NodeID 1 in a cluster. Run
`cockroach node status --decommission` and `cockroach node status`. Look
at the columns IsLive and UpdatedAt. You'll see they go out of sync.

Release note: None

Co-authored-by: neeral <neeral@users.noreply.github.com>
Commits on Aug 06, 2018
neeral and dt
importccl: Preserve '\r\n' during CSV import
See #25344.

It appears this is being caused by the behaviour of Golang's
encoding/csv library, which folds \r\n into \n when reading. This was
fixed in golang/go#21201 but then reverted golang/go#22746. It appears
based on that second issue that Go is unlikely to change that behavior.

Check in the stdlib `encoding/csv` into `pkg/util` with
golang/go#22746 reverted.

Release note:
`\r\n` characters in CSV files were silently converted into `\n`. This
causes imported data to be different. This is now fixed.
dt
encoding/csv: update license file paths
Forgot to update these when vendoring the stdlib package.
Also switch the Example_ tests to test this package instead of stdlib.

Release note: none.
storage: prevent unbounded raft log growth without quorum
Fixes #27772.

This change adds safeguards to prevent cases where a raft log
would grow without bound during loss of quorum scenarios. It
also adds a new test that demonstrates that the raft log does
not grow without bound in these cases.

There are two cases that need to be handled to prevent the
unbounded raft log growth observed in #27772.
1. When the leader proposes a command and cannot establish a
   quorum. In this case, we know the leader has the entry in
   its log, so there's no need to refresh it with `reasonTicks`.
   To avoid this, we no longer use `refreshTicks` as a leader.
2. When a follower proposes a command that is forwarded to the
   leader who cannot establish a quorum. In this case, the
   follower can't be sure (currently) that the leader got the
   proposal, so it needs to refresh using `reasonTicks`. However,
   the leader now detects duplicate forwarded proposals and
   avoids appending redundant entries to its log. It does so
   by maintaining a set of in-flight forwarded proposals that
   it has received during its term as leader. This set is reset
   after every leadership change.

Both of these cases are tested against in the new
TestLogGrowthWhenRefreshingPendingCommands. Without both of
the safeguards introduced in this commit, the test fails.

Release note (bug fix): Prevent unbounded growth of the raft log
caused by a loss of quorum.
storage: cancel liveness update context on Stopper cancellation
Fixes #27878.

The test was flaky because a NodeLiveness update was getting stuck when
killing a majority of nodes in a Range. The fix was to tie NodeLiveness'
update context to its stopper so that liveness updates are canceled when
their node begins to shut down.

This was a real issue that I suspect would occasionally make nodes hang on
shutdown when their liveness range was becoming unavailable. There are
probably other issues in the same class of bugs, but stressing
`TestLogGrowthWhenRefreshingPendingCommands` isn't showing anything else.

In doing so, the change needed to extend `stopper.WithCancel` into
`stopper.WithCancelOnQuiesce` and `stopper.WithCancelOnStop`.

Release note: None
stopper: avoid memory leak in Stopper.WithCancel methods
Before this change, `WithCancelOnQuiesce` and `WithCancelOnStop` were
dangerous to use because they would never clean up memory. This meant
that any use of the methods that happened more than a constant number
of times would slowly leak memory. This was an issue in `client.Txn.rollback`.
This change fixes the methods so that it's possible for callers to clean
up after themselves.

Added a warning to `Stopper.AddCloser` because this had a similar
issue.

Release note: None
stopper: Ensure that Closers and Ctxs are always cleaned up
Before this change the Stopper made no guarantee that `Closers`
registered by the `AddCloser` method or `Contexts` tied to
the `Stopper` with `WithCancelOn{Quiesce,Stop}` were ever properly
cleaned up. In cases where these methods were called concurrently
with the Stopper stopping, it was possible for the resources to
leak. This change fixes this by handling cases where these methods
are called concurrently with or after the Stopper has stopped. This
allows them to provide stronger external guarantees.

Release note: None
4 authors
Merge #27868 #28225
27868: backport-2.0: storage: prevent unbounded raft log growth without quorum r=nvanbenschoten a=nvanbenschoten

Backport 2/2 commits from #27774.

/cc @cockroachdb/release

---

Fixes #27772.

This change adds safeguards to prevent cases where a raft log
would grow without bound during loss of quorum scenarios. It
also adds a new test that demonstrates that the raft log does
not grow without bound in these cases.

There are two cases that need to be handled to prevent the
unbounded raft log growth observed in #27772.
1. When the leader proposes a command and cannot establish a
   quorum. In this case, we know the leader has the entry in
   its log, so there's no need to refresh it with `reasonTicks`.
   To avoid this, we no longer use `refreshTicks` as a leader.
2. When a follower proposes a command that is forwarded to the
   leader who cannot establish a quorum. In this case, the
   follower can't be sure (currently) that the leader got the
   proposal, so it needs to refresh using `reasonTicks`. However,
   the leader now detects duplicate forwarded proposals and
   avoids appending redundant entries to its log. It does so
   by maintaining a set of in-flight forwarded proposals that
   it has received during its term as leader. This set is reset
   after every leadership change.

Both of these cases are tested against in the new
TestLogGrowthWhenRefreshingPendingCommands. Without both of
the safeguards introduced in this commit, the test fails.

Release note (bug fix): Prevent loss of quorum situations from
allowing unbounded growth of a Range's Raft log.


28225: release-2.0: importccl: Preserve '\r\n' during CSV import r=dt a=dt

Backport 1/1 commits from #28181.

/cc @cockroachdb/release

---

See #25344.


Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: neeral <neeral@users.noreply.github.com>
Co-authored-by: David Taylor <tinystatemachine@gmail.com>
Commits on Aug 07, 2018
knz
cli/sql: avoid a panic when reading '?' while non-interactive
Release note (bug fix): `cockroach sql` and `cockroach demo` now
properly print a warning when a `?` character is mistakenly used to
receive contextual help in a non-interactive session, instead of
crashing.
craig[bot] and knz
Merge #28325
28325: release-2.0: cli/sql: avoid a panic when reading '?' while non-interactive r=knz a=knz

Backport 1/1 commits from #28324.

/cc @cockroachdb/release

---

Fixes  #28239.

Release note (bug fix): `cockroach sql` and `cockroach demo` now
properly print a warning when a `?` character is mistakenly used to
receive contextual help in a non-interactive session, instead of
crashing.


Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Commits on Aug 08, 2018
kv: scale LeaseHolderCache size in tests by shard count
The LeaseHolderCache splits its available capacity across all shards.
This means that in tests where the total size is static, the size
available per shard shrinks as the number of CPUs grow. This was causing
the tests to fail on a 24 CPU machine.

This change addresses this by scaling the cache capacity by the shard
count.

Release note: None
storage: Cap the size of raft logs included in preemptive snapshots
Release note (bug fix): Fix out-of-memory errors caused by very large
raft logs.
Commits on Aug 09, 2018
storage: Use ClearRange for any large key ranges in clearRangeData.
We previously assumed that only the user keyspace could contain enough
data to require ClearRange instead of ClearIterRange, but we've seen
that that's not true; the raft log can also get large enough that
ClearIterRange will exhaust memory.

Release note (bug fix): Fix out-of-memory errors caused by very large
raft logs.
3 authors
Merge #28398
28398: release-2.0: Avoid OOMs when raft logs are very large r=nvanbenschoten a=bdarnell

Backports #28293 (and sneaks in a backport of #22915 because my new laptop has enough cores to make leaseholdercache_test.go fail).

There were some merge conflicts, mainly in the second commit. 

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Ben Darnell <ben@bendarnell.com>
Commits on Aug 13, 2018
release-2.0: Backport #28511
Release note (bug fix): Additional fixes for out-of-memory errors
caused by very large raft logs.

Release note (performance improvement): Greatly improved performance
when catching up followers that are behind when raft logs are large.
craig[bot] and bdarnell
Merge #28526
28526: release-2.0: Backport #28511 r=benesch a=bdarnell

Release note (bug fix): Additional fixes for out-of-memory errors
caused by very large raft logs.

Release note (performance improvement): Greatly improved performance
when catching up followers that are behind when raft logs are large.

Co-authored-by: Ben Darnell <ben@bendarnell.com>
Commits on Aug 15, 2018
knz
deps: bump go-libedit to fix bugs
Release note (bug fix): `cockroach sql` and `cockroach demo` now work
properly even when the TERM environment variable is not set.

Release note (bug fix): `cockroach sql` and `cockroach demo` are now
properly able to customize the prompt with `~/.editrc` on Linux.

Release note (bug fix): `cockroach sql` and `cockroach demo` again
support copy-pasting special unicode character from other documents.
craig[bot] and knz
Merge #28614
28614: cherrypick-2.0: deps: bump go-libedit to avoid a panic r=knz a=knz

Picks #28233 and #28613.

cc @cockroachdb/release 

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Abhishek Madan and solongordon
sql: Deduplicate groupNode columns
Fixes #24413.

Release note: None
Commits on Aug 16, 2018
craig[bot] Abhishek Madan
craig[bot] and Abhishek Madan
Merge #28625
28625: release-2.0: sql: Deduplicate groupNode columns r=solongordon a=solongordon

Backport 1/1 commits from #24478.

I'm backporting this since it fixes #27648, which is not an issue on master thanks to this enhancement.

/cc @cockroachdb/release


Co-authored-by: Abhishek Madan <abhishek@cockroachlabs.com>
Commits on Aug 18, 2018
storage/engine: invalidate cached iterator state when necessary
`Iterator.{MVCCGet,MVCCScan,FindSplitKey,ComputeStats}` need to
invalidate the cached iterator state. They were previously failing to do
so which could lead to rare scenarios where a key could be considered
present in the database which had never been written.

Fixes #28025

Release note (bug fix): Fix rare scenario where the value written for
one system key was seen when another system key was read leading to the
violation of internal invariants.
craig[bot] and petermattis
Merge #28798
28798: release-2.0: storage/engine: invalidate cached iterator state when necessary r=benesch a=petermattis

Backport 1/1 commits from #28794.

/cc @cockroachdb/release

---

`Iterator.{MVCCGet,MVCCScan,FindSplitKey,ComputeStats}` need to
invalidate the cached iterator state. They were previously failing to do
so which could lead to rare scenarios where a key could be considered
present in the database which had never been written.

Fixes #28025

Release note (bug fix): Fix rare scenario where the value written for
one system key was seen when another system key was read leading to the
violation of internal invariants.


Co-authored-by: Peter Mattis <petermattis@gmail.com>
Commits on Aug 21, 2018
sql: mark expired schema change lease as a temporary error
An expired schema change lease is no reason to rollback
a schema change. The schema change mechanism uses transactions
and doesn't need a guarantee that it only run by a single lease
holder. An expired lease should simply mean: stop and allow the
lease holder to proceed.

related to #27273
related to #27958
related to #27760

Release note: None
craig[bot] and vivekmenezes
Merge #28881
28881: backport-2.0: sql: mark expired schema change lease as a temporary error r=vivekmenezes a=vivekmenezes

Backport 1/1 commits from #28762.

/cc @cockroachdb/release

---

An expired schema change lease is no reason to rollback
a schema change. The schema change mechanism uses transactions
and doesn't need a guarantee that it only run by a single lease
holder. An expired lease should simply mean: stop and allow the
lease holder to proceed.

related to #27273
related to #27958
related to #27760

Release note: None


Co-authored-by: Vivek Menezes <vivek@cockroachlabs.com>
Commits on Aug 22, 2018
storage: proactively renew r1's lease
Run a store-level background worker that proactively renews the lease
for r1. It could easily be extended to also renew other expiration-based
leases if we so desire in the future.

Fixes #24753

Release note: None
storage: Proactively renew all expiration-based leases
This builds off the previous commit which proactively renewed
only the first range's lease.

Release note: None
Commits on Aug 27, 2018
storage/txnwait: don't leak waiting queries in MaybeWaitForQuery
Fixes #28849.

The theorized leak in #28849 was almost spot on. I instrumented the original
code and pointed 6 workload generators running `tpcc` with `wait=false` at a
single node (adding more workers to a single workload generator wasn't as good
at creating contention). I was pretty quickly able to see the size of
`Queue.mu.queries` grow on a few hot ranges due to context cancellation in
`MaybeWaitForQuery`. It turns out that both context cancellation and the
internal timeout in `MaybeWaitForQuery` (`maxWaitCh`) could result in a memory
leak.

This change fixes this leak by doing a better job ensuring that waiters clean up
after themselves if they stop waiting prematurely. In doing so, it simplifies
the `waitingQueries` structure by changing it from a slice of channels and a
useless `*roachpb.QueryTxnRequest` (that exacerbated the leak and turned out to
be quite useful in debugging it) to a single refcounted channel. With this
structure, its straightforward for waiters to unregister prematurely and clean
up associated resources if necessary.

I could not reproduce the same behavior with `tpcc` after this change. I wasn't
able to get the size of `Queue.mu.queries` to grow without bound.

Release note (bug fix): Fix memory leak when contended queries timed out.
craig[bot] and nvanbenschoten
Merge #29100
29100: release-2.0: storage/txnwait: don't leak waiting queries in MaybeWaitForQuery r=nvanbenschoten a=nvanbenschoten

Backport 1/1 commits from #29025.

/cc @cockroachdb/release

---

Fixes #28849.

The theorized leak in #28849 was almost spot on. I instrumented the original
code and pointed 6 workload generators running `tpcc` with `wait=false` at a
single node (adding more workers to a single workload generator wasn't as good
at creating contention). I was pretty quickly able to see the size of
`Queue.mu.queries` grow on a few hot ranges due to context cancellation in
`MaybeWaitForQuery`. It turns out that both context cancellation and the
internal timeout in `MaybeWaitForQuery` (`maxWaitCh`) could result in a memory
leak.

This change fixes this leak by doing a better job ensuring that waiters clean up
after themselves if they stop waiting prematurely. In doing so, it simplifies
the `waitingQueries` structure by changing it from a slice of channels and a
useless `*roachpb.QueryTxnRequest` (that exacerbated the leak and turned out to
be quite useful in debugging it) to a single refcounted channel. With this
structure, its straightforward for waiters to unregister prematurely and clean
up associated resources if necessary.

I could not reproduce the same behavior with `tpcc` after this change. I wasn't
able to get the size of `Queue.mu.queries` to grow without bound.

Release note (bug fix): Fix memory leak when contended queries time out.


Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Commits on Aug 28, 2018
craig[bot] and a-robinson
Merge #28931
28931: backport-2.0: storage: proactively renew expiration-based leases r=a-robinson a=a-robinson

Backport 2/2 commits from #25322.

/cc @cockroachdb/release

---

This mechanism won't scale incredibly well to proactively renewing all
expiration-based leases since it'd require a goroutine per lease. It
should be fine if we only auto-renew r1's lease, though.

Fixes #24753

Release note: None

----------------

To fix #27731 in v2.0.

Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
sql: stop using a txn for getting the index backfill timestamp
Index backfills use AOST so that their reads are less intrusive.
Before this patch, the timestamp for them reads was chosen in a bizarre
way (unless I'm missing something) - we were creating a txn for the soul
purpose of getting its timestamp. That txn was used to save the job
details with the timestamp in them, but I don't think using a txn for
that is necessary.
This patch gets a timestamp directly from the node's clock.

Besides seeming bizarre, the old code had a bug - it was incorrectly
using the job.WithTxn() API. The code was:

       if err := sc.db.Txn(ctx, func(ctx context.Context, txn *client.Txn) error {
               details := *sc.job.WithTxn(txn).Payload().Details.(*jobs.Payload_SchemaChange).SchemaChange
               if details.ReadAsOf == (hlc.Timestamp{}) {
                       details.ReadAsOf = txn.CommitTimestamp()
                       if err := sc.job.WithTxn(txn).SetDetails(ctx, details); err != nil {
                               return errors.Wrapf(err, "failed to store readAsOf on job %d", *sc.job.ID())
                       }
               }
               sc.readAsOf = details.ReadAsOf
               return nil
       }); err != nil {
               return err
       }

The first WithTxn() call is just bad - it mutates the job and persists
the txn in it, for no reason. If the second WithTxn() call runs, then
the effects of the first one will be negated - the SetDetails() call
will indirectly wipe the txn. However, if the 2nd call didn't run (which
happens, for example, when the txn gets restarted), then the job ends up
holding on to a committed transaction, which will cause it to fail when
it tries to do some further operations.

Fixes #25680

Release note(bug fix): Fixed a bug causing index creation to fail under
rare circustances.
Commits on Aug 29, 2018
storage: Adding testing knob to disable automatic lease renewals
In order to fix the test flakes caused by automatic lease renewals

Fixes #25537
Fixes #25540
Fixes #25573
Fixes #25576
Fixes #25589
Fixes #25594
Fixes #25599
Fixes #25605
Fixes #25620

Release note: None
storage: Fix flaky TestReplicaNotLeaseHolderError
Caused by the proactive renewal of expiration-based leases (see #25322
and #25625 for more detail).

Fixes #25672

Release note: None
storage: Deflake TestReplicaLease
This introduction of proactive lease renewals (#25322) made this test
flaky in two ways. First, the test was (oddly) creating two Replica
objects for range 1 on the same store (since #2996), leading to races
when combined with the background lease renewal thread. Second, the
test expects leases to expire so it needs to disable automatic
renewals.

Fixes #25748

Release note: None
craig[bot] and andreimatei
Merge #29203
29203: release-2.0: sql: stop using a txn for getting the index backfill timestamp r=andreimatei a=andreimatei

Backport 1/1 commits from #26265.

/cc @cockroachdb/release

---

Index backfills use AOST so that their reads are less intrusive.
Before this patch, the timestamp for them reads was chosen in a bizarre
way (unless I'm missing something) - we were creating a txn for the soul
purpose of getting its timestamp. That txn was used to save the job
details with the timestamp in them, but I don't think using a txn for
that is necessary.
This patch gets a timestamp directly from the node's clock.

Besides seeming bizarre, the old code had a bug - it was incorrectly
using the job.WithTxn() API. The code was:

       if err := sc.db.Txn(ctx, func(ctx context.Context, txn *client.Txn) error {
               details := *sc.job.WithTxn(txn).Payload().Details.(*jobs.Payload_SchemaChange).SchemaChange
               if details.ReadAsOf == (hlc.Timestamp{}) {
                       details.ReadAsOf = txn.CommitTimestamp()
                       if err := sc.job.WithTxn(txn).SetDetails(ctx, details); err != nil {
                               return errors.Wrapf(err, "failed to store readAsOf on job %d", *sc.job.ID())
                       }
               }
               sc.readAsOf = details.ReadAsOf
               return nil
       }); err != nil {
               return err
       }

The first WithTxn() call is just bad - it mutates the job and persists
the txn in it, for no reason. If the second WithTxn() call runs, then
the effects of the first one will be negated - the SetDetails() call
will indirectly wipe the txn. However, if the 2nd call didn't run (which
happens, for example, when the txn gets restarted), then the job ends up
holding on to a committed transaction, which will cause it to fail when
it tries to do some further operations.

Fixes #25680

Release note(bug fix): Fixed a bug causing index creation to fail under
rare circustances.


Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Commits on Aug 30, 2018
3 authors
Merge #29296
29296: backport-2.0: Fix test flakes caused by automatic lease renewal r=a-robinson a=a-robinson

Backport:
  * 1/1 commits from "storage: Adding testing knob to disable automatic lease renewals" (#25625)
  * 1/1 commits from "storage: Fix flaky TestReplicaNotLeaseHolderError" (#25676)
  * 1/1 commits from "storage: Deflake TestReplicaLease" (#25781)

Please see individual PRs for details.

/cc @cockroachdb/release

Fixes #29189. The flakiness was introduced to the 2.0 branch by #28931.

Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
Co-authored-by: Ben Darnell <ben@cockroachlabs.com>
Commits on Aug 31, 2018
storage: improve logging of queue errors
Fixes #25191.

Release note: None
Commits on Sep 02, 2018
craig[bot] and nvanbenschoten
Merge #29440
29440: backport 2.0: storage: improve logging of queue errors r=andreimatei a=andreimatei

Backport of one commit from #25245, because seeing these errors in a 2.0 cluster would have really helped me.

Fixes #25191.

Release note: None


Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Commits on Sep 06, 2018
knz
cli/zone: make the client cross-compatible with 2.1
Release note (cli change): The command `cockroach zone` is upgraded to
become compatible with CockroachDB 2.1; note however that `cockroach
zone` also becomes *deprecated* in CockroachDB 2.1 in favor of SQL
ALTER statements to update zone configurations.
storage: optimistically resolve pending prereqs in cmdQ before canceling
In a privately reported issue we have seen cases where cascading
cancellations of commands in the CommandQueue can cause an OOM
in `storage.(*cmd).ResolvePendingPrereq`. This is caused by a quadratic
blowup of prerequisites as canceled commands transfer their prerequisites
to their dependents.

This change introduces a measure to bound this quadratic blowup by
adding in a new optimistic resolution step to command cancellation,
during which time a command will optimistically resolve all prerequisites
that have already finished without blocking on any pending commands. This
should improve the cascading cancellation scenario because it will
allow a command to trim its prerequisite list before passing it on
to its dependents. Notably, since we consider all commands in the
prereq list instead of just the first, like we usually do, a single
pending command at the front of the list can't allow the rest of the
prereq list to grow full of finished/canceled commands.

Release note: None
storage: avoid migrating duplicate prereq commands in cmdQ
We now avoid transferring duplicate prereq commands from
a canceled prereq to a dependent command in the CommandQueue.
This avoids quadratic growth of memory when a large number
of commands are canceled which each share a command
prerequisite. Previously, this could result in a memory
explosion if it happened repeatedly.

The commit also adds a test that demonstrates the
effect of this change and the previous to optimistically
resolve pending prereqs in the cmdQ before canceling.

Release note: None
storage: don't keep track of empty remote proposals
I observed a very slow memory leak on a long running cluster
that pointed to a number of replica's `remoteProposals` maps.
After reproducing locally I found that some forwarded proposals
were not being cleaned up. The reason for this was that we don't
treat empty proposals (those generated to wake up quiesced leader)
the same as normal proposals during application time. This meant
that we were never clearing their command IDs from the remote
proposals map. This commit fixes that issue.

It will need to be backported to 2.0 and 2.1.

Release note: None
craig[bot] and nvanbenschoten
Merge #29654 #29656
29654: release-2.0: storage: optimistically resolve pending prereqs in cmdQ before canceling r=nvanbenschoten a=nvanbenschoten

Backport 2/2 commits from #27117.

/cc @cockroachdb/release

---

In a privately reported issue we have seen cases where cascading
cancellations of commands in the CommandQueue can cause an OOM
in `storage.(*cmd).ResolvePendingPrereq`. This is caused by a quadratic
blowup of prerequisites as canceled commands transfer their prerequisites
to their dependents.

This change introduces a measure to bound this quadratic blowup by
adding in a new optimistic resolution step to command cancellation,
during which time a command will optimistically resolve all prerequisites
that have already finished without blocking on any pending commands. This
should improve the cascading cancellation scenario because it will
allow a command to trim its prerequisite list before passing it on
to its dependents. Notably, since we consider all commands in the
prereq list instead of just the first, like we usually do, a single
pending command at the front of the list can't allow the rest of the
prereq list to grow full of finished/canceled commands.

Release note: None


29656: release-2.0: storage: don't keep track of empty remote proposals r=nvanbenschoten a=nvanbenschoten

Backport 1/1 commits from #29618.

/cc @cockroachdb/release

---

I observed a very slow memory leak on a long running cluster
that pointed to a number of replica's `remoteProposals` maps.
After reproducing locally I found that some forwarded proposals
were not being cleaned up. The reason for this was that we don't
treat empty proposals (those generated to wake up quiesced leader)
the same as normal proposals during application time. This meant
that we were never clearing their command IDs from the remote
proposals map. This commit fixes that issue.

It will need to be backported to 2.0 and 2.1.

Release note: None


Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Commits on Sep 07, 2018
craig[bot] and benesch
Merge #27289
27289: release-2.0: build: speed up nightly stress jobs r=benesch a=benesch

Backport 1/2 commits from #27284.

/cc @cockroachdb/release

---

As discussed on Slack. See individual commit messages for details.


Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
craig[bot] and knz
Merge #29632
29632: release-2.0: cli/zone: make the client cross-compatible with 2.1 r=knz a=knz

Accompanies #28612.

Release note (cli change): The command `cockroach zone` is upgraded to
become compatible with CockroachDB 2.1; note however that `cockroach
zone` also becomes *deprecated* in CockroachDB 2.1 in favor of SQL
ALTER statements to update zone configurations.

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
knz
githook: also produce the commit suggestions with --amend
Prior to this patch the commit message suggestions and the
default release note ("none") were only produced for new commits.

This patch fixes this to also include the suggestion for amends and
merges, and always adds the release note when not present.

Release note: None


#
# Commit message recommendations:
#
#     ---
#     <pkg>: <short description>
#
#     <long description>
#
#     Release note (category): <release note description>
#     ---
#
# Wrap long lines! 72 columns is best.
#
# The release note must be present if your commit has
# user-facing changes. Leave the default above if not.
#
# Categories for release notes:
# - cli change
# - sql change
# - admin ui change
# - general change (e.g., change of required Go version)
# - build change (e.g., compatibility with older CPUs)
# - enterprise change (e.g., change to backup/restore)
# - backwards-incompatible change
# - performance improvement
# - bug fix
craig[bot] and knz
Merge #29816
29816: release-2.0: githook: also produce the commit suggestions with --amend r=knz a=knz

Backport 1/1 commits from #29812.

/cc @cockroachdb/release

---


Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Commits on Sep 08, 2018
knz
githooks: fix the commit message pre-fill
This should not run when the git command is non-interactive,
because in that case git will not filter comments in the git
message automatically.

Release note: None
craig[bot] and knz
Merge #29976
29976: release-2.0: githooks: fix the commit message pre-fill r=knz a=knz

Backport 1/1 commits from #29973.

/cc @cockroachdb/release

---


Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Commits on Sep 10, 2018
storage: Up-replicate before removing replicas from dead stores
Fixes #25392, by preventing the following series of events:

1. Node x dies
2. We remove node x's replica of some range
3. Node y dies before we up-replicate, leaving the range unavailable (1
   out of 2 replicas dead)
4. Node x comes back online. It can't help the situation, because its
   replica was officially removed from the range.

Instead, we now up-replicate first before removing node x's replica.

Release note: None
gossip: don't allow loopback infos
When adding infos from a remote node, skip any info which originated on
the local node.

Release note: None
gossip: allow receipt of "loopback infos"
Receipt of loopback infos was disabled in #29398, but doing so had the
unfortunate effect of allowing gossip state to temporarily diverge
between nodes. Rather than disallowing loopback infos, we now ratchet
the gossip monotonic clock in order to avoid the assertion around the
gossip highwater timestamps.

Fixes #29992
Fixes #20986

Release note: None
craig[bot] and a-robinson
Merge #30005
30005: backport-2.0: storage: Up-replicate before removing replicas from dead stores r=a-robinson a=a-robinson

Backport 1/1 commits from #28875.

/cc @cockroachdb/release

---

Fixes #25392, by preventing the following series of events:

1. Node x dies
2. We remove node x's replica of some range
3. Node y dies before we up-replicate, leaving the range unavailable (1
   out of 2 replicas dead)
4. Node x comes back online. It can't help the situation, because its
   replica was officially removed from the range.

Instead, we now up-replicate first before removing node x's replica.

Release note: None


Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
Commits on Sep 11, 2018
Merge pull request #30017 from petermattis/backport2.0-29398-30008
release-2.0: fix gossip tight loopback connection issue and "impossible" assertion
Commits on Sep 12, 2018
sqlbase: don't casecade through NULL values
Prior to this fix, we would cascade updates or deletes through NULLs. See #28896
for a great test case.

This however did remind me that we use MATCH FULL instead of MATCH SIMPLE and we
should add support for MATCH SIMPLE (and make it the default).  See #20305.

Closes #28896

Release note (bug fix): ON DELETE/ON UPDATE CASCADE should not cascade through
NULLs.
sqlbase: fix composite fk column checking panic
Prior to this patch, it was possible to skip the check that ensures that all
columns are present in a composite fk if only the final column was present,
thus causing a panic.

Fixes #26748

Release note (bug fix): Fixed a panic that occured when not all values were
present in a composite foreign key.
Commits on Sep 13, 2018
Merge pull request #30154 from BramGruneir/backport2.0-30144
release-2.0: sqlbase: fix composite fk column checking panic
knz
sql/parser: fix the action for empty rules
A yacc-generated parser is a giant switch in a loop and the data
structure is a stack implemented as indexes into an array. Each
occurrence of `$$.val` is really `thestack[curpos].val`. Unless one
nils things out in the rule action the previous value remains.

(That's true of all yacc generators that don't provide destructors for
values. GNU Bison and Lemon provide destructors, for example, goyacc
and old Bison don't)

Release note (bug fix): the `WITH` operand of
import/export/backup/restore was sometimes processed improperly. This
has been corrected.

Release note (bug fix): common table expressions with `WITH` in larger
queries were sometimes processed improperly. This has been corrected.
Merge pull request #30129 from BramGruneir/backport2.0-30023
release-2.0: sqlbase: don't casecade through NULL values
knz
Merge pull request #30199 from knz/backport2.0-30197
release-2.0: sql/parser: fix the action for empty rules
Commits on Sep 17, 2018
kv: check transaction status before recording refresh spans
Fixes #29253.

We previously accounted for refresh spans before checking the
transaction's status. This meant that we would attempt to add new
refresh spans even if the txn had already been committed. If this
attempt failed because the `max_refresh_spans_bytes` threshold was
exceeded, we would erroneously return an error to the client even
though their transaction had already been committed. The simple
fix for this is to only record refresh spans for transactions that
are still PENDING. The new test case would have failed with the
error observed in #29253 without this fix.

This raises an auxiliary question about whether we should be asserting
that a transaction's status is not COMMITTED if we ever return an
error to the client. cc. @andreimatei.

On the positive side, this should be a minor perf win, as we'll
avoid recording refresh spans on the final batch of every txn.

A similar fix will need to be backported to 2.0. The backport
won't be clean though because of the new txnInterceptor structure.
However, the test shouldn't need to change.

Release note (bug fix): Don't return transaction size limit errors
for transactions that have already committed.
Merge pull request #30309 from nvanbenschoten/backport2.0-29685
release-2.0: kv: check transaction status before recording refresh spans
Commits on Sep 18, 2018
distsql: close merge joiner on error
In two cases, the merge joiner was not calling `close()` after
encountering an error. This could lead to it being endlessly `Next`ed by
its consumer.

Fixes #30363

Release note (bug fix): Fix potential infinite loop when the merge
joiner encountered an error or cancellation.
Merge pull request #30380 from solongordon/close-merge-joiner
release-2.0: distsql: close merge joiner on error
Commits on Sep 19, 2018
ui: fix default jobs page sort
In #18137 a column was added, but the default sort
(which is specified by index) was not updated.  This
updates it to correctly sort on creation time.

Fixes: #30418
Release note (admin ui change): Fixes an issue where
the default sort on the Jobs page was incorrectly
the User column rather than the Creation Time.
Merge pull request #30429 from couchand/backport2.0-30423
release-2.0: ui: fix default jobs page sort
Commits on Sep 25, 2018
backport-2.0: don't drain on decommissioning
This is somehow the salient part of #28707 that I managed to miss
during the initial backport of that PR: we don't want to set the
node to draining when it is being decommissioned.

Note that the /health?ready=1 endpoint will continue to return a
503 error once decommissioning starts due to a readiness check
(something I am preparing a PR against master for).

Release note: None

Release note: None
backport-2.0: storage: fix use of context with closed trace
Backport 1/1 commits from #25581

---

Fixes #25575

Release note: None
craig[bot] and a-robinson
Merge #30614
30614: backport-2.0: storage: fix use of context with closed trace r=tschottdorf a=mvijaykarthik

Backport 1/1 commits from #25581

---

Fixes #25575

Release note: None

Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
Merge pull request #30351 from tschottdorf/backport2.0-28707-taketwo
backport-2.0: don't drain on decommissioning
Commits on Oct 01, 2018
rpc,server: authenticate all gRPC methods
Previously only the roachpb.Batch RPC was correctly checking for an
authenticated user. All other RPCs were open to the public, even when
the server was running in secure mode.

To prevent future accidents of this kind, hoist the authentication check
to a gRPC interceptor that is guaranteed to run before all RPCs.

Release note (bug fix): A security vulnerability in which data could be
leaked from or tampered with in a cluster in secure mode has been fixed.

Release note: None
Merge pull request #30823 from benesch/secure-grpc-2.0
release-2.0: rpc,server: authenticate all gRPC methods
Commits on Oct 05, 2018
storage: only track forwarded proposals that were successfully appended
Fixes #30064.

Before this change, it was possible for a proposal that was forwarded to
a leader to be dropped instead of appended to its log. This was possible
if the leader had a `leadTransferee` set. This change avoids this issue
by verifying that the proposal was successfully added to the leaders log
before tracking it.

This will need to be backported to 2.1 and 2.0.

Release note: None
Merge pull request #31023 from nvanbenschoten/backport2.0-30990
release-2.0: storage: only track forwarded proposals that were successfully appended
Commits on Oct 12, 2018
knz
deps: bump go-libedit
Release note (build change): CockroachDB can now be built from source
on macOS 10.14 (Mojave).
Commits on Oct 13, 2018
knz
Merge pull request #31310 from knz/20181012-bump-libedit3
release-2.0: deps: bump go-libedit
Commits on Oct 15, 2018
kv: try next replica on RangeNotFoundError
Previously, if a Batch RPC came back with a RangeNotFoundError,
we would immediately stop trying to send to more replicas, evict the
range descriptor, and start a new attempt after a back-off.

This new attempt could end up using the same replica, so if the
RangeNotFoundError persisted for some amount of time, so would the
unsuccessful retries for requests to it as DistSender doesn't
aggressively shuffle the replicas.

It turns out that there are such situations, and the
election-after-restart roachtest spuriously hit one of them:

1. new replica receives a preemptive snapshot and the ConfChange
2. cluster restarts
3. now the new replica is in this state until the range wakes
   up, which may not happen for some time.
4. the first request to the range runs into the above problem

@nvanbenschoten: I think there is an issue to be filed about the
tendency of DistSender to get stuck in unfortunate configurations.

Fixes #30613.

Release note (bug fix): Avoid repeatedly trying a replica that was found
to be in the process of being added.
storage: return one entry less in Entries
This works around the bug outlined in:

etcd-io/etcd#10063

by matching Raft's internal implementation of commit pagination.
Once the above PR lands, we can revert this commit (but I assume
that it will take a little bit), and I think we should do that
because the code hasn't gotten any nicer to look at.

Fixes #28918.

Release note: None

#
# Commit message recommendations:
#
#     ---
#     <pkg>: <short description>
#
#     <long description>
#
#     Release note (category): <release note description>
#     ---
#
# Wrap long lines! 72 columns is best.
#
# The release note must be present if your commit has
# user-facing changes. Leave the default above if not.
#
# Categories for release notes:
# - cli change
# - sql change
# - admin ui change
# - general change (e.g., change of required Go version)
# - build change (e.g., compatibility with older CPUs)
# - enterprise change (e.g., change to backup/restore)
# - backwards-incompatible change
# - performance improvement
# - bug fix
#
# Commit message recommendations:
#
#     ---
#     <pkg>: <short description>
#
#     <long description>
#
#     Release note (category): <release note description>
#     ---
#
# Wrap long lines! 72 columns is best.
#
# The release note must be present if your commit has
# user-facing changes. Leave the default above if not.
#
# Categories for release notes:
# - cli change
# - sql change
# - admin ui change
# - general change (e.g., change of required Go version)
# - build change (e.g., compatibility with older CPUs)
# - enterprise change (e.g., change to backup/restore)
# - backwards-incompatible change
# - performance improvement
# - bug fix
Merge pull request #31351 from tschottdorf/backport2.0-29579
backport-2.0: storage: return one entry less in Entries
Merge pull request #31350 from tschottdorf/backport2.1-31013
backport-2.0: kv: try next replica on RangeNotFoundError
Commits on Oct 16, 2018
c-deps: bump CryptoPP to avoid SIGTRAP on macOS
Bump CryptoPP to pick up a fix for #31380.
Details reproduced below.

Fix #31380.

---

As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in #31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.

Release note (bug fix): CockroachDB no longer crashes due to a SIGTRAP error
soon after startup on macOS Mojave (#31380).

diff --git a/c-deps/cryptopp b/c-deps/cryptopp
index c621ce0532..6d6064445d 160000
--- a/c-deps/cryptopp
+++ b/c-deps/cryptopp
@@ -1 +1 @@
-Subproject commit c621ce053298fafc1e59191079c33acd76045c26
+Subproject commit 6d6064445ded787c7d6bf0a021fb718fddb63345
Merge pull request #31522 from benesch/backport2.0-31516
release-2.0: c-deps: bump CryptoPP to avoid SIGTRAP on macOS
Showing 800 changed files with 46,329 additions and 15,690 deletions.
View
@@ -0,0 +1,13 @@
status = [
"GitHub CI (Cockroach)"
]
pr_status = [
"license/cla"
]
block_labels = [
"do-not-merge"
]
[committer]
name = "craig[bot]"
email = "bors@cockroachlabs.com"
View
@@ -33,7 +33,7 @@ new engineers.
work due to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48891
- The standard C/C++ development headers on your system.
- On GNU/Linux, the terminfo development libraries, which may be
part of a ncurses development package (e.g. `libtinfo-dev` on
part of a ncurses development package (e.g. `libncurses-dev` on
Debian/Ubuntu, but `ncurses-devel` on CentOS).
- A Go environment with a recent 64-bit version of the toolchain. Note that
the Makefile enforces the specific version required, as it is updated
Oops, something went wrong.

No commit comments for this range