Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

ALL simple issue fixes #1906

Closed
wants to merge 65 commits into
base: unstable
from

Conversation

Projects
None yet
@mattsta
Copy link
Contributor

mattsta commented Aug 2, 2014

馃懡 DON'T PANIC 馃懡

This is every simple, non-controversial, single-issue fix I found in the issue/PR backlog.

Yes, this branch has 54 57 63 commits that will auto-close over 50 issues when merged (#463, #513, #537, #811, #857, #858, #878, #952, #1188, #1187, #997, #1092, #1105, #1129, #1161, #1186, #1191, #1300, #1327, #1376, #1428, #1450, #1456, #1543, #1519, #1561, #1597, #1610, #1614, #1615, #1631, #1637, #1660, #1681, #1696, #1512, #1728, #1741, #1602, #1741, #1774, #1801, #1842, #1843, #1847, #1844, #1883, #1645, #1647, #1900, #760, #566, #1313, #521, #1649, #1912, #1909, #1914, #1097, #1915)

Yes, this branch has 31 32 34 authors (HI COMMUNITY! We haven't forgotten you!).

Yes, I manually reviewed, edited, and/or fixed every commit here.

For quick browsing of all the changes, use this diff view: https://github.com/antirez/redis/pull/1906/files

Everything works, all tests pass (https://travis-ci.org/mattsta/redis/builds/31819638), and this Branch of Awesome Correctness is ready for merging into Actual Redis.

Notable changes include:

  • increased maximum sds size to 4 GB from 2 GB (just an int to unsigned int replacement; everything still works)
  • if Redis is in the foreground and you Ctrl-C, it will now SHUTDOWN cleanly instead of just killing the server.
  • Sentinel's INFO command now accepts "all" to act like the Redis INFO command.
  • adds -a option to redis-benchmark to allow auth when testing
  • stops Redis tests from polluting background colors if you have a non-black-background terminal
  • many code clarifications and movement towards being more correct everywhere possible
  • if you request a PID file, it will now always be written instead of only showing up in daemon mode.
  • many more technical fixes, tiny refactorings, and logic improvements throughout the code

mattsta and others added some commits Aug 1, 2014

redis-cli: Re-attach selected DB after auth
Previously, if you did SELECT then AUTH, redis-cli
would show your SELECT'd db even though it didn't
happen.

Note: running into this situation is a (hopefully) very limited
used case of people using multiple DBs and AUTH all at the same
time.

Fixes #1639
redis-cli: stop showing incorrectly selected DB
Previously redis-cli would happily show "-1" or "99999"
as valid DB choices.

Now, if the SELECT call returned an error, we don't update
the DB number in the CLI.

Inspired by @anupshendkar in #1313

Fixes #566, #1313
redis-cli: Add --no-raw option
Some people need formatted output even when they have no
interactive tty.

Fixes #760
redis-cli: fix latency result output
(Cleaned up a little by @mattsta)

Closes #1774
cluster: fix node connection memory leak
Cluster leaks memory while connecting due to missing freeaddrinfo()

Closes #1801
scripting: no eval with negative key count
Negative key count causes segfault in Lua functions.

Fixes #1842
Closes #1843
Handle large getrange requests
Previously the end was casted to a smaller type
which resulted in a wrong check and failed
with values larger than handled by unsigned.

Closes #1847, #1844
Fix key extraction for SORT
We only want to use the last STORE key, but we have to record
we actually found a STORE key so we can increment the final return
key count.

Test added to prevent further regression.

Closes #1883, #1645, #1647
Fix intset midpoint selection
The classic (min+max)/2 is provably unsafe.  Fixed
as recommended in research:
http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html

Fix inspired by @wjin, but I used a different approach.
(later, I found @kuebler fixed the same issue too).

Fixes #1741, #1602
Factor out duplicate signal handling code
No options are set for SIGTERM, so we can use the simpler
signal handling interface and remove unnecessary structure initilization.

sigemptyset() is moved outside of the ifdef to prevent any potential
unused variable warning.

Closes #1637
redis-check-dump: use names instead of magic nums
Use constants to avoid magic numbers in `types`, which is an array
that stores the names of `REDIS` types.

Closes #1681
Change unixsocketperm comment to 700 from 755
According to unix manuals, "Connecting to the socket object requires
read/write permission." -- mode 755 is useless for anybody
other than the owner.

Fixes #1696
install_server.sh: add missing bang
This was discovered by _bodya and reported in the IRC channel.
Everything worked fine as these scripts are always executed as shell
scripts.

Closes #1728
Fix assert technical correctness
dictAdd returns DICT_OK, not REDIS_OK. They both
have the same underlying values, so it works even though
the code is technically wrong.

Fixes #1512
Use 'void' for zero-argument functions
According to the C standard,
it is desirable to give the type 'void'
to functions have no argument.

Closes #1631
scripting: Add error handler only when needed
Cleans up the flow of code to read much easier too.

Closes #1615
Improve accuracy of HAVE_ATOMIC dependency check
[I had to split out the clang check due to
 clang *really* not liking the __GLIBC_PREREQ macro; -matt]

Closes #1456
Avoid slave loading RDB on startup
The replica instance will immediately sync with a master anyway,
so we don't need to load anything from disk.

If we add support for PSYNC after instance restart, we will
need different logic here anyway.

Closes #1543
pubsub: Return integers for NUMSUB, not strings
Also adds test for numsub 鈥斅燿ue to tcl being tcl,
it doesn't capture the "numberness" of the fix,
but now we at least have one test case for numsub.

Closes #1561
Reject MOVE to non-integer DBs
Previously, "MOVE key somestring" would move the key to
DB 0 which is just unexpected and wrong.
String as DB == error.

Test added too.

Fixes #1428
Fix potential error when generating INFO string
Clang (scan-build) flags this as an error, but it can't
happen under normal usage because the input is verified
in caller function.  Clang doesn't trust the caller
to always pass correct input though, so it gets registered
as an error in the report.

It's better to remove the possibility of a crash, even if
only technically.  It also makes our error report cleaner
since we know this isn't a problem.

mattsta added some commits Aug 6, 2014

Sentinel: Improve INFO command behavior
Improvements:
  - Return error if asking for bad section (INFO foo)
  - Fix potential memory leak (caused by sdsempty() then returned if >2 args)
  - Clean up argument parsing
  - Allow "all" as valid section (same as "default" or zero args currently)
  - Move strcasecmp to end of evaluation chain in conditionals

Also, since we're C99, I moved some variable declarations to be closer
to where they are actually used (saves us from needing to free an empty info
if detect argument errors up front).

Closes #1915
Deny CLIENT command in scripts
We don't want scripts doing CLIENT SETNAME
or CLIENT KILL or CLIENT LIST or CLIENT PAUSE.

Originally reported by Chris Wj then proper
action inspired by Itamar Haber.

Reference: https://groups.google.com/forum/#!topic/redis-db/09B2EYwyVgk
@SweeD

This comment has been minimized.

Copy link

SweeD commented Aug 7, 2014

You're a crazy crazy man... 鉂わ笍

@antirez

This comment has been minimized.

Copy link
Owner

antirez commented Aug 7, 2014

This is pretty impressive, thanks :-) I'll likely not be able to merge every single commit as I already spotted a few things I would do differently, so this is what I'll do: merge as much stuff as possible and rewrite all the obvious stuff in a branch called 1906-merge, and write here the log of what I'm doing and all the commits I can't merge in any way and why / what is needed to move forward. This way at least we'll have the code base updated with as much as code as possible from all the fixes here, instead of waiting for every single commit in this PR to be ok.

@badboy

This comment has been minimized.

Copy link
Contributor

badboy commented Aug 7, 2014

Sounds good to me. Looking forward to getting as much as possible merged.

@antirez

This comment has been minimized.

Copy link
Owner

antirez commented Aug 7, 2014

(Note: this comment will be updated incrementally)

  • Currently: merging stuff into 1906-merge branch.
  • Later: the branch will be merged into unstable and 3.0.
  • About 2.8: I'll cherry pick everything is safe and does not break API compatibility.

Commits merged

  • a1b42b3 redis-cli: Re-attach selected DB after auth
  • 19b5997 redis-cli: stop showing incorrectly selected DB
  • f2252d1 redis-cli: Add --no-raw option
  • 3591fbc Add support for compiling on AIX. (Changes: I added another commit with a comment explaining why we undef hz).
  • 46f90e4 Remove redundant else/return block
  • a159b1f redis-cli: fix latency result output. (Changes: output minimally modified to avoid nested parens).
  • 0efac15 cluster: fix node connection memory leak. (Changes: the freeaddrinfo() call was misplaced so actually the memory leak was still there if no address was bound. Fixed moving the call just after the for loop).
  • 7da1992 scripting: no eval with negative key count
  • f62cb0e Handle large getrange requests
  • 4b54b1e Fix key extraction for SORT
  • 4aa3300 Fix intset midpoint selection
  • 734a943 Add graceful exit when Ctrl-C is received (Changes: I added a commit that forces the server to exit if a second SIGINT is received, and exists ASAP without saving if the server is still loading data from disk at startup).
  • 534356d Fix issues raised by clang analyzer (Changes: the fix to genInfoString was odd, I assumed that clang static analyzer complained about a NULL pointer and modified the fix accordingly).
  • 0ad4417 redis-check-dump: use names instead of magic nums
  • c200a67 Change unixsocketperm comment to 700 from 755
  • 47cfac9 install_server.sh: add missing bang
  • 008d4da Fix assert technical correctness
  • 4b99ecb Use 'void' for zero-argument functions
  • 28e6d66 src/Makefile: Reword "to run make test" message
  • ef693f1 memtest: Add missing free()
  • 57de6c3 Remove redundant event loop fd processing
  • 0e90b76 Improve accuracy of HAVE_ATOMIC dependency check
  • 028f857 Avoid unnecessary decoding in ziplist.c (Note: reviewed the code yet another time given the fact it's so critical. zipEntry() itself uses the same macro so we should be totally safe by replacing zipEntry() with the macro directly).
  • b1d224a pubsub: Return integers for NUMSUB, not strings. (Note: technically this is an API breakage, but, it is very unlikely somebody will run into it because the command is mostly a debugging stuff. However we may still avoid to back port this into 2.8).
  • 289dc21 Reject MOVE to non-integer DBs. (Changes: modified to use getLongLongFromObject() API).
  • 18fac3 Remove unused LINE_BUFLEN definition
  • 5434c99 redis-check-dump: Prevent segfault if can't malloc
  • a7a907 Free memory in clusterLoadConfig error handler
  • 513d0fc Cleanup double semicolons
  • d1edcae Refactor cluster flag printing. (Changes: minimally modified and moved into right section of cluster.c).
  • 00ff893 Define AR to help with lua cross-compilation. (Note: I don't know cross platform toolchain enough to really have an idea about this patch, but since it is unlikely to cause any issue, I'm merging it).
  • 48645ee Use resolv library in Solaris.
  • fea5c33 Remove duplicate prototypes in redis.h.
  • c3db4f7 va_copy must be matched by va_end.
  • 18ca831 Remove unused function
  • 100c331 Extend range of bytesToHuman to TB and PB
  • 0a98b21 Add error check for writing RDB checksum
  • 233d24a redis-cli: fix prompt after shutdown command
  • d786fb6 Clarify argument to dict macro
  • 6a7a32a Clarify existing slot wording on cluster start
  • edca2b1 Remove warnings and improve integer sign correctness.
  • 7e3ccdd Add -W to compilation flags.
  • 68db7b1 Use unsigned integers in SDS header.
  • 2a0c018 was simplified using c->cmd->name.

Note no longer logging the commits merged since it's simpler to check the 1906-merge branch, in order to speedup the processing.

Commits that cannot be merged

  • 1782d04 can't be merged since Redis semantics is explicitly to forbid comments in the same lines of config directives in order for the config rewrite process to be easily able to retain comments (that are guaranteed to be in different lines) in the rewritten file using simple logic.
  • 1646df6 adds a new call in the likely code path in order to remove two calls in the unlikely (error) code path. The added call, lua_insert, needs to shift elements in the stack, so it performs a (short in this case) for loop. To accept such a patch a profiling activity should be done in order to ensure no speed regression: not worth it IMHO.
  • bb8ce05 is totally broken: It breaks Cluster / Sentinel failovers after a slave restart and the config directive to serve stale data if we can't connect with the master. In general a slave should do everything is its power to serve stuff after a restart. If it will later be able to connect with the master to refresh its view of the data, fine, otherwise what is authoritative is the latest data it was able to obtain.
  • 4cb3991 because testing PUBLISH per-se is like testing PING. Publishing to Pub/Sub channels has a complexity that depends on the number of listeners / patterns. A proper Pub/Sub test simulates how the speed changes while the number of subscribers change. Moreover to do with redis-benchmark what the commit does is as simple as: redis-benchmark PUBLISH foo bar.
  • 34d0362 broken patch, for example command getkeys mset a 1 b will allocate 6 bytes for two 4 bytes integers. Moreover the commit is pointless because when there is a first/last/step key discovery pattern, you can only allocate like 2x, for example with MSET, you never have pathological cases like SADD key ... a very large number of elements ... where you allocate a lot of memory with just a key. Not worth to make the code more complex IMHO.
  • bd4d5ed the useless check is wanted as a gentle reminder that we need to check this stuff if we add a new option like TYPE. Basically this way all the stanzas are the same instead of making the first one an exception.

Commits that need to be re-checked

  • 2af2c63 Not merging because the signal() interface is not exactly consistent, at least historically, so it's better to be explicit about what are the server signal flags. Update: @mattsta is right that we already use it and that the inconsistencies are hopefully gone today. I still remember at least a case where in some unix systems signal() implied SA_RESETHAND. Solaris maybe? Will reconsider the commit at the end. Thanks.
  • 3f8f158 is great but there is to check if there are any mixed signness ops / API breakage in 32 and 64 bit systems due to the API change.
  • aaf2db3 has an ifdef __unix that may not work everywhere. If the issue is with UCLIBC maybe we should include features.h in all cases but if __UCLIBC__ is defined.
  • 4a799b6 has something strange because server.pidfile is always true. There is currently no way to set the pid to a value that will disable it, and this may not even be desirable. An alternative is to add an option to write the pid file even if the server is not daemonized.
  • 09b619a does not apply, btw it is probably a good idea to specify that the else branch is just to avoid a false positive.
  • f92f2d2 non existing section should return an empty string like in Redis Server.
@mattsta

This comment has been minimized.

Copy link
Contributor Author

mattsta commented Aug 7, 2014

Comments on non-merges

1782d04 can't be merged since Redis semantics is explicitly to forbid

Good call. I didn't consider rewrite dropping those comments.

2af2c63 Factor out duplicate signal handling code. Not merging because the signal() interface is not exactly consistent, at least historically, so it's better to be explicit about what are the server signal flags.

Update: You said the magic word "Solaris!" A quick search tells us: signal() works fine everywhere except on Solaris: https://blogs.oracle.com/vlad/entry/signal_versus_sigaction_on_solaris 鈥斅燬olaris is the only platform that automatically unregisters signal handlers when a signal happens (when using signal()).

So, we need to keep the verbose "almost uselessly set all defaults" code in there. We probably want to shift SIGHUP and SIGPIPE to sigaction() too since those can go sideways on Solaris.

Seems a bit silly to keep extra code just setting up the default conditions that signal() gives us for free. Plus, signal() is already being used for other things:

redis.c:1702:    signal(SIGHUP, SIG_IGN);
redis.c:1703:    signal(SIGPIPE, SIG_IGN);

Plus, signal() is required by C89, C99, and POSIX.1-2001. If someone can compile Redis, the interface is specified to be available.


bb8ce05 is totally broken: It breaks Cluster / Sentinel failovers after a slave restart and
the config directive to serve stale data if we can't connect with the master.

Perhaps loading old data on startup should only happen if slave-serve-stale-data yes? I think the goal of the patch is to prevent a replica from wasting time loading large stale data instead of just immediately replicating fresh.

Or we could add better documentation about the bring-up state so people know: is old data fully loaded before new replication starts? If old data is being loaded when new replication starts, does it immediately abort loading old data?


34d0362 broken patch

Is it broken?

We may need a better word than 'broken' for something that works but is just unwanted. All the tests pass for me :)

If it is actually broken, we probably need better tests to detect the failures. :shipit:

for example command getkeys mset a 1 b will allocate 6 bytes for two 4 bytes integers.

But, that mset command is invalid since there isn't the proper number of arguments:

127.0.0.1:6379> mset a 1 b
(error) ERR wrong number of arguments for MSET

Moreover the commit is pointless because when there is a first/last/step key discovery pattern, you can only allocate like 2x, for example with MSET,

The for loop below only iterates over (last-first)/step, so it makes sense to size the keys array to be the maximum size of the iteration. Otherwise, we're just over-allocating double the space... because?


bd4d5ed the useless check is wanted as a gentle reminder that we need to check this stuff if we add a new option like TYPE. Basically this way all the stanzas are the same instead of making the first one an exception.

If we have useless checks in popular code, people will discover it and want to factor it out. :)
This patch rejection falls into the error category of "failure to read implementor's mind." 馃悷


2a0c018 was simplified using c->cmd->name.

That's almost what I did the first time too. But then I tried to tried to make it match the original error message format where the command was upper case. Either way works. :)


4a799b6 has something strange because server.pidfile is always true. There is currently no way to set the pid to a value that will disable it, and this may not even be desirable. An alternative is to add an option to write the pid file even if the server is not daemonized.

Good catch! We didn't notice the default value always existed.

Adding another config option seems a little redundant. We can fix the behavior of initializing server.pidfile pretty easily: only initialize server.pidfile to a default value after config parsing instead of before. If the user doesn't specify a pidfile and we're in the foreground, we can skip setting pidfile.

Otherwise, if we add another config option, we'd have config options of pidfile and actually-write-pidfile. The second seems implied by the first. :)

A quick patch (combined with some changes from the existing 4a799b6) for the fix seems to be:

diff --git a/src/redis.c b/src/redis.c
index 6400259..be7071c 100644
--- a/src/redis.c
+++ b/src/redis.c
@@ -1420,7 +1420,7 @@ void initServerConfig() {
     server.aof_selected_db = -1; /* Make sure the first time will not match */
     server.aof_flush_postponed_start = 0;
     server.aof_rewrite_incremental_fsync = REDIS_DEFAULT_AOF_REWRITE_INCREMENTAL_FSYNC;
-    server.pidfile = zstrdup(REDIS_DEFAULT_PID_FILE);
+    server.pidfile = NULL;
     server.rdb_filename = zstrdup(REDIS_DEFAULT_RDB_FILENAME);
     server.aof_filename = zstrdup(REDIS_DEFAULT_AOF_FILENAME);
     server.requirepass = NULL;
@@ -3554,7 +3554,10 @@ int main(int argc, char **argv) {
     }
     if (server.daemonize) daemonize();
     initServer();
-    if (server.daemonize) createPidFile();
+   if (server.daemonize || server.pidfile) {
+        if (!server.pidfile) server.pidfile = zstrdup(REDIS_DEFAULT_PID_FILE);
+        createPidFile();
+    }
     redisSetProcTitle(argv[0]);
     redisAsciiArt();

We could even set the default server.pidfile inside createPidFile() if we want more isolation of behavior.


[will be updated to defend positions as needed 馃檧]

@antirez

This comment has been minimized.

Copy link
Owner

antirez commented Aug 12, 2014

@mattsta about:

If it is actually broken, we probably need better tests to detect the failures

What I mean is that it will perform invalid write into into not allocated memory. The MSET argument check passes since it's "at least N args", so when the command has to be dispatched by Redis Cluster, or when the command is processed by the COMMAND command, it writes into memory which is technically unallocated. The fact that it passes tests is that the allocator will never allocate 6 bytes, but 8, for alignment/padding concerns, but still the code is broken. Broken is not an offense, many of my code is / was broken, and many code submitted as pull requests may be broken. It is just an adjective to say a patch is not correct, and the contrary is "sane". These are pretty standard words in some coding circles, for example the Linux kernel mailing list.

@antirez

This comment has been minimized.

Copy link
Owner

antirez commented Aug 12, 2014

@mattsta about:

This patch rejection falls into the error category of "failure to read implementor's mind.

It is fair that the patch was submitted, but why we should give away a small bit of defensive programming because a PR was submitted?

@antirez

This comment has been minimized.

Copy link
Owner

antirez commented Aug 12, 2014

@mattsta about bb8ce05:

You said:

Perhaps loading old data on startup should only happen if slave-serve-stale-data yes? I think the goal of the patch is to prevent a replica from wasting time loading large stale data instead of just immediately replicating fresh.

As I pointed out in a detailed way in my original comment, the most critical aspect with this patch is that it makes failover ineffective in a crash-recovery scenario:

  • Master goes down.
  • Slave reboot.
  • Slave is promoted with empty data set.

Being a replica of a master is a contract that should try to provide certain guarantees. Slaves can't just give away on the dataset because they are trying to connect again with a master.

@antirez

This comment has been minimized.

Copy link
Owner

antirez commented Aug 12, 2014

Btw replying to the technical issues above I believe I did not replied to your critique about my attitude, so it is better to explain a few things that are important IMHO for Redis / developers interaction.

  1. Broken: not offensive, just means, it does not work as expected. Writing to unallocated memory is definitely broken, it is important that we recognize it as a big error, otherwise our quality standards lower in a serious way.
  2. "Passes tests so it is correct" does not belong here. Tests are just a way to add safety to Redis development process, they are not authoritative of the server behavior. While I hope we'll increment the number of tests we have, it is impossible in our setup and with the way Redis is developed to get both full coverage and tests so detailed to capture the full specification. Writing 8 bytes to 6 bytes malloc is wrong, but to capture this, there are non trivial efforts to do. In the specific case we would need an MSET with odd argument test that was running over valgrind for example. This is just a case, to cover all those cases is extremely hard. So in Redis land something is broken if it contains an error, regardless of the tests.
  3. Sometimes it is very important to be picky when accepting patches. For example the outcome of me merging some of the patches proposed in this PR are very serious, like, the failover thing. If we start to talk about attitude, should I be offended that you submitted this PR and I risked to introduce a huge bug? No, I'm thankful because I'm merging a lot of interesting stuff because of your help. However IMHO similarly you should be ok with me reviewing what you submitted and avoiding to put issues inside the code base.
@antirez

This comment has been minimized.

Copy link
Owner

antirez commented Aug 12, 2014

Use unsigned length in sds header (commit 3f8f158) is interesting and deserves some discussion / care about a few things. It is also a great example of when my code is broken, since the signed integers were just accomplishing to waste one bit.

  1. This was originally provided for redis-cli, however we have a much more important issue due to this limit, which is DUMP/RESTORE. Redis Cluster is unable to migrate keys that serialized take more than 4GB. It's not a huge limit but it is not great and there is no trivial fix for that (but there are many non trivial fixes that don't require us to make all the sds strings headers larger).
  2. I see @mattsta modified int to long in sdsIncrLen(). I did not tested / checked carefully, but I've the feeling this is not enough for 32 bit builds where int and long are the same 32 bit data type?
  3. I wonder if sdsMakeRoomFor() should be modified to abort in a recognizable way when the 4GB limit is reached by the increment in size. However I'm concerned about two things. One is a possible performance penalty, but should be very very small, the second is that from sds.c we can't call redisPanic() or anything that is good enough to log a message, but we can always have sds.c be able to register a callback when this even happens, which is what we already do to log out of memory from zmalloc.c.
@jcspencer

This comment has been minimized.

Copy link

jcspencer commented Aug 12, 2014

Should PR #1930 be added to this?

@antirez

This comment has been minimized.

Copy link
Owner

antirez commented Aug 12, 2014

@JamesS237 your mention of the issue is sufficient to add it to my processing list, thanks :-)

@mattsta

This comment has been minimized.

Copy link
Contributor Author

mattsta commented Aug 12, 2014

The MSET argument check passes since it's "at least N args", so when the command has to be dispatched by Redis Cluster,

The "command only checked for >= N args" is the part I was missing. I was (for some reason? thinking the check for length/step happened before any other command processing functions were called. Reading comprehension fail on my part.

Would it make sense to move length/step calculation up to command processing (where arity is checked) and outside of individual command functions?

why we should give away a small bit of defensive programming because a PR was submitted?

With the combination of Redis being very popular and very open source, people will re-discover the same things that look like problems, even if they aren't.

Kinda like how people keep submitting requests to add proper markdown to the README. I feel bad wasting multiple people's time re-submitting the same issue that's just unwanted.

Slaves can't just give away on the dataset because they are trying to connect again with a master.

Yup. No need to change system behavior here. It felt like a correctness optimization to me but I was wrong. :)

Writing to unallocated memory is definitely broken,

Yup. I was just temporarily blind to the case where it could happen, so I couldn't force it to do incorrect things.

"Passes tests so it is correct" does not belong here. Tests are just a way to add safety to Redis development process, they are not authoritative of the server behavior.

We'll just have to agree to disagree there. Of course no test suite is absolutely complete, but comprehensive testing seems to be the best tool for outside contributors to verify changes to existing behaviors.

If people can break the system, but have all tests pass, things get confusing. Though, since we know no testing is 100% perfect, that's why we do develop->test->review and not develop->release. Well, most of the time.

However IMHO similarly you should be ok with me reviewing what you submitted and avoiding to put issues inside the code base.

Yup, no problem reviewing. I'm just working from my own (sometimes broken) idea of how things work, so just need more education most times.

Happy Tuesday! 馃拑

@antirez

This comment has been minimized.

Copy link
Owner

antirez commented Aug 13, 2014

Would it make sense to move length/step calculation up to command processing (where arity is checked) and outside of individual command functions?

Some commands have non trivial logic and we never found some reason to change the current setup so far. Btw if we want to fix the problem of MSET allocating 2x the memory for key extraction, we can do so just by fixing the code in a way that makes sure we don't add edge cases. Probably it is enough to add (step-1) to argc before the division to never under-allocate.

With the combination of Redis being very popular and very open source, people will re-discover the same things that look like problems, even if they aren't.

Sometimes we may be able to fix this just adding a small comment.

We'll just have to agree to disagree there. Of course no test suite is absolutely complete, but comprehensive testing seems to be the best tool for outside contributors to verify changes to existing behaviors.
If people can break the system, but have all tests pass, things get confusing. Though, since we know no testing is 100% perfect, that's why we do develop->test->review and not develop->release. Well, most of the time.

I would love to reach a level of testing where if the test passes, it is very likely that the patch is sane, however even after spending a considerable amount of efforts in testing, we are not near this point at all, and I believe we'll never be with the current setup of two core developers.

I strongly believe that when there are scarse development resources, the best idea is to try to write correct code beforehand and carefully review patches, instead of investing a huge amount of time writing more tests that will anyway not cover enough code and states to provide a good degree of safety.

I think that there is currently no open source system software that can claim a degree of testing high enough that running the test is enough to merge a patch without manual inspection of the code. Probably the nearest system is Sqlite btw, so I'm curious about asking the core developers if they review or not the patches manually and how likely is a code breakage when merging something that passes all the tests but was not reviewed.

We need to keep adding more and more tests, but it's just a safety net.

@antirez

This comment has been minimized.

Copy link
Owner

antirez commented Aug 25, 2014

Ok, everything I was able to merge is now merged, now merging all the stuff into unstable as well, and starting to cherry pick from 3.0 / 2.8 when appropriate. Please could we use a new PR for the remaining issues in this PR to restart fresh? Thanks.

@mattsta

This comment has been minimized.

Copy link
Contributor Author

mattsta commented Aug 25, 2014

Thanks!

Every issue referenced here is now closed or has a note added about needing more work. We're down to under 200 pull requests now! :)

@mattsta mattsta closed this Aug 25, 2014

@badboy

This comment has been minimized.

Copy link
Contributor

badboy commented Aug 25, 2014

Yey! Awesome! Thanks @mattsta for that hard work, thanks @antirez for merging so much stuff.

(Btw., looking at really old PRs, there are definitely some that can be closed too)

@antirez

This comment has been minimized.

Copy link
Owner

antirez commented Aug 27, 2014

Thanks @mattsta, the backport process has finished, I merged everything into 3.0 and everything applicable into 2.8 (basically only the cluster stuff were not merged). I'll wait a few weeks for things to settle before doing a new 2.8 release.

@mariano-perez-rodriguez

This comment has been minimized.

Copy link
Contributor

mariano-perez-rodriguez commented Aug 27, 2014

This is great! Maybe #1902 could be merged as well before the new release?

@mattsta mattsta referenced this pull request Sep 28, 2014

Closed

Add SIGINT handler #1637

uqs pushed a commit to freebsd/freebsd-ports that referenced this pull request Oct 10, 2014

osa
Upgrade from 3.0.0-beta8 to 3.0.0-rc1.
<ChangeLog>

--[ Redis 3.0.0 RC1 (version 2.9.101) ] Release date: 9 oct 2014

This is the first release candidate of Redis Cluster.

>> General changes

* [FIX] An very large number of small fixes, old and new, merged in the
        context of a the issue #1906. Please see the issue page here
        for exact credits: antirez/redis#1906
        of each commit. (Matt Stancliff and many others).
* [FIX] SAVE is no longer propagated to AOF / slaves.
* [FIX] GETRANGE test no longer fails for 32 bit builds (Matt Stancliff).
* [FIX] Limit SCAN latency when the hash table is in an odd state (very few
        populted buckets because rehashing is in progress). (Xiaost and
        Salvatore Sanfilippo)

* [NEW] Redis is now able to load truncated AOF files without requiring a
        redis-check-aof utility run. The default now is to load truncated
        (but apparently not corrupted) AOFs, you can change this in redis.conf.
        (Salvatore Sanfilippo).
* [NEW] DEBUG POPULATE two args form implemented. It is now possible to
        call it with DEBUG POPULATE <count> <prefix>. Default prefix
        is "key:" as usually.
* [NEW] INCR: Modify incremented object in-place when possible. This results
        in speed improvements + possibly better memory locality.

>> Cluster changes

* [FIX] Cluster: claim ping_sent time even if we can't connect.
* [FIX] redis-trib should not abort easily on connection issues.
* [FIX] Cluster test: less console-spammy resharding test.
* [FIX] Fix logic to detect we are among a minority.
* [FIX] Process gossip section only for known nodes.

* [NEW] Redis Cluster is stable and tested enough, there is a clear MVP,
        so it was promoted from beta to stable.
* [NEW] New unit 09, Pub/Sub across the cluster.
* [NEW] New unit 08, update messages.
* [NEW] New cluster option to work with partial slots coverage.
* [NEW] More chatty cluster slaves when failover is stalled. They log reason
        with rate limiting, only when reason changes or a given time
        has elapsed.

>> Sentinel changes

* [FIX] Sentinel critical bug fixed: the absolute majority was computed in a
        wrong way because of a programming error. Now the implementation does
        what the specification says and the majority to authorize a failover
        (that should not be confused with the ODOWN quorum) is the majority of
        *all* the Sentinels ever seen for a given master, regardless of their
        current state.
* [FIX] Resolved a memory leak in the hiredis library causing a memory leak
        in Redis Sentinel when a monitored instance or another Sentinel is
        unavailable. Every reconnection attempt will leak a small amount of
        memory, but in the long run the process can reach a considerable size.

* [NEW] Sentinel: ability to announce itself with an arbitrary IP/port to work
        in the context of natted networks. However this is probably still
        not enough since there is no equivalent mechanism for slaves listed
        in the master INFO output. (Dara Kong and Salvatore Sanfilippo)

</ChangeLog>


git-svn-id: svn+ssh://svn.freebsd.org/ports/head@370602 35697150-7ecd-e111-bb59-0022644237b5

uqs pushed a commit to freebsd/freebsd-ports that referenced this pull request Oct 10, 2014

Upgrade from 3.0.0-beta8 to 3.0.0-rc1.
<ChangeLog>

--[ Redis 3.0.0 RC1 (version 2.9.101) ] Release date: 9 oct 2014

This is the first release candidate of Redis Cluster.

>> General changes

* [FIX] An very large number of small fixes, old and new, merged in the
        context of a the issue #1906. Please see the issue page here
        for exact credits: antirez/redis#1906
        of each commit. (Matt Stancliff and many others).
* [FIX] SAVE is no longer propagated to AOF / slaves.
* [FIX] GETRANGE test no longer fails for 32 bit builds (Matt Stancliff).
* [FIX] Limit SCAN latency when the hash table is in an odd state (very few
        populted buckets because rehashing is in progress). (Xiaost and
        Salvatore Sanfilippo)

* [NEW] Redis is now able to load truncated AOF files without requiring a
        redis-check-aof utility run. The default now is to load truncated
        (but apparently not corrupted) AOFs, you can change this in redis.conf.
        (Salvatore Sanfilippo).
* [NEW] DEBUG POPULATE two args form implemented. It is now possible to
        call it with DEBUG POPULATE <count> <prefix>. Default prefix
        is "key:" as usually.
* [NEW] INCR: Modify incremented object in-place when possible. This results
        in speed improvements + possibly better memory locality.

>> Cluster changes

* [FIX] Cluster: claim ping_sent time even if we can't connect.
* [FIX] redis-trib should not abort easily on connection issues.
* [FIX] Cluster test: less console-spammy resharding test.
* [FIX] Fix logic to detect we are among a minority.
* [FIX] Process gossip section only for known nodes.

* [NEW] Redis Cluster is stable and tested enough, there is a clear MVP,
        so it was promoted from beta to stable.
* [NEW] New unit 09, Pub/Sub across the cluster.
* [NEW] New unit 08, update messages.
* [NEW] New cluster option to work with partial slots coverage.
* [NEW] More chatty cluster slaves when failover is stalled. They log reason
        with rate limiting, only when reason changes or a given time
        has elapsed.

>> Sentinel changes

* [FIX] Sentinel critical bug fixed: the absolute majority was computed in a
        wrong way because of a programming error. Now the implementation does
        what the specification says and the majority to authorize a failover
        (that should not be confused with the ODOWN quorum) is the majority of
        *all* the Sentinels ever seen for a given master, regardless of their
        current state.
* [FIX] Resolved a memory leak in the hiredis library causing a memory leak
        in Redis Sentinel when a monitored instance or another Sentinel is
        unavailable. Every reconnection attempt will leak a small amount of
        memory, but in the long run the process can reach a considerable size.

* [NEW] Sentinel: ability to announce itself with an arbitrary IP/port to work
        in the context of natted networks. However this is probably still
        not enough since there is no equivalent mechanism for slaves listed
        in the master INFO output. (Dara Kong and Salvatore Sanfilippo)

</ChangeLog>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can鈥檛 perform that action at this time.