Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Comparing changes

Choose two branches to see what's changed or to start a new pull request. If you need to, you can also compare across forks.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also compare across forks.
base fork: Whiteknight/whiteknight.github.com
base: 93ccc6417f
...
head fork: Whiteknight/whiteknight.github.com
compare: b5682f1351
Checking mergeability… Don't worry, you can still create the pull request.
  • 2 commits
  • 1 file changed
  • 0 commit comments
  • 1 contributor
Showing with 114 additions and 0 deletions.
  1. +114 −0 _posts/2012-07-10-io_cleanup_final.md
View
114 _posts/2012-07-10-io_cleanup_final.md
@@ -0,0 +1,114 @@
+---
+layout: post
+categories: [Parrot, IO]
+title: IO Cleanups Home Stretch
+---
+
+I had made a round of fixes with regards to encodings in the
+`whiteknight/io_cleanup1` branch a few days ago. Rakudo hacker Moritz was
+able to take a look at Rakudo's spectests and verify that more tests were
+indeed passing because of it. The remaining test failures represent the
+changing semantics for the `read` method and what appear to be two genuine
+regressions or bugs.
+
+Hopefully I will be able to get all these things sorted out this week before
+I go away on a mini vacation next weekend. Otherwise I can't imagine this
+branch gets merged before the 4.6 release this month.
+
+A few days ago I wrote a [post about readline](/2012/06/13/io_readline.html)
+and some of the intricacies involved in that, and some of the weird semantics
+that I was attempting to unify. It turns out that some of these semantics are
+a major cause in one of the last bugs in the branch. Let's look at some code
+in master to see where the hangup is. First, `readline` on a Socket:
+
+ METHOD readline(STRING *delimiter :optional,
+ INTVAL has_delimiter :opt_flag) {
+ INTVAL idx;
+ STRING *result;
+ STRING *buf;
+ GET_ATTR_buf(INTERP, SELF, buf);
+
+ if (!has_delimiter)
+ delimiter = CONST_STRING(INTERP, "\n");
+
+ if (Parrot_io_socket_is_closed(INTERP, SELF))
+ RETURN(STRING * STRINGNULL);
+
+ if (buf == STRINGNULL)
+ buf = Parrot_io_reads(INTERP, SELF, CHUNK_SIZE);
+
+ while ((idx = Parrot_str_find_index(INTERP, buf, delimiter, 0)) < 0) {
+ STRING * const more = Parrot_io_reads(INTERP, SELF, CHUNK_SIZE);
+ if (Parrot_str_length(INTERP, more) == 0) {
+ SET_ATTR_buf(INTERP, SELF, STRINGNULL);
+ RETURN(STRING *buf);
+ }
+ buf = Parrot_str_concat(INTERP, buf, more);
+ }
+
+ idx += Parrot_str_length(INTERP, delimiter);
+ result = Parrot_str_substr(INTERP, buf, 0, idx);
+ buf = Parrot_str_substr(INTERP, buf, idx, Parrot_str_length(INTERP, buf) - idx);
+ SET_ATTR_buf(INTERP, SELF, buf);
+ RETURN(STRING *result);
+ }
+
+We can ignore the fact that this implementation of `readline` doesn't call
+`Parrot_io_readline` like every other PMC does. Or that if we did call that
+function the program would throw an exception because `Parrot_io_readline`
+doesn't support sockets anyway. Whatever. Moving on...
+
+For comparison, let's look at the version from the Handle PMC (which is
+inherited by FileHandle):
+
+ METHOD readline() {
+ STRING * const string_result = Parrot_io_readline(INTERP, SELF);
+ RETURN(STRING *string_result);
+ }
+
+The Socket version takes a `delimiter` parameter which is a STRING. When doing
+readline on a Socket, you can pass in any arbitrary string which is used as
+the token for end of line. With FileHandle, you don't seem to have that.
+However, you can definitely use custom delimiters with FileHandle. However,
+we clearly don't take a delimiter here and we aren't passing one in as an
+argument to `Parrot_io_readline` like we do in the branch. Let's see
+how it's done instead. Here's a snippet from Handle PMC:
+
+ ATTR INTVAL record_separator; /* Record separator (only single char supported) */
+
+We don't need to look at any other code. This is the smoking gun.
+`Socket.readline()` can take any arbitrary STRING to use as a record
+separator, but `FileHandle.readline()` can only use a single codepoint, which
+it doesn't take as an argument.
+
+So that's the problem right there. When I standardized the readline mechanics
+between types, I picked the FileHandle semantics. This was probably the wrong
+decision, because not only could Sockets use a more general mechanism but
+Rakudo relies on that behavior in its spectests. This does raise a question
+about why nobody ever expected this same behavior from FileHandle, or why the
+difference was not considered some kind of bug. It really goes to show how
+immature our IO system has been for all these years, and how we had all just
+grown accustomed to the arbitrary, inconsistent, nonsensical behaviors. It
+just works for some basic usages, so nobody ever complains about it. That
+time is, thankfully, coming quickly to an end.
+
+Fixing this issue is actually going to take some serious work. Several
+function signatures are going to need updating to take a STRING delimiter
+instead of an INTVAL codepoint, and a major chunk of buffering logic is going
+to need to be rewritten to work on substrings instead of on individual
+codepoints. This, in turn, is going to require a heck of a lot more testing.
+
+Last night I started putting in some of the changes necessary to use a
+substring terminator instead of a single codepoint. Most of what I've already
+done has been modifying function signatures. The real changes need to occur
+deep within the buffering logic and will require a little bit more time.
+
+I'm looking forward to getting this branch fixed up and merged back to master
+so I can get to work on my next project. I think 6model is going to be the
+next thing I dig into, before I find something else that annoys me enough to
+put in a huge amount of effort to rewrite it. I'll post more updates about
+my future projects and plans as I go.
+
+
+
+

No commit comments for this range

Something went wrong with that request. Please try again.