File#each_char performance #5029

ghost · 2017-09-24T11:03:19Z

I wrote following code in order to simulate a automata (this is a condensed version which shows the same issue)

f = File.open("./my_file", mode = "r")

count = 0

h_map = {'U' => 0,
         'D' => 1,
         'A' => 2,
         'Z' => 3,
         'R' => 4,
}

f.each_char do |val|
  next if val == '\n'
  count += h_map[val]
end

p count

where my_file is something like

UDRAZRAZAZRRAZURURURAZAZRRAZARRRZARZRRRAZAZRZAZARRZAZDAZAZUDURAZDRURAZRAZRRAZ
AZUUDRAZUDUARZUAZARRZ
AZRUAZUAZUAZDAZUARRRRRZARZDAZARZARRRZUARZAZDUAZARRZRAZARZAZRRRRDDARRZRARZAZAZAZUDRRAZARZARZAZUAZARZUARRZDARRZRUDDUAZURRDRARRZDARZARZRAZDARZARZAZUARRZURARZARZUDARZAZARZARRZUARRZRARZAZAZAZARZARZDARZUDAZAZUDARZARRZARZAZAZAZUDUARRRZAZRRAZDRURA

but about 8GB of that

I have a C implementation which just loads all in RAM and takes under 20s

this crystal one takes about 2min though

real	1m58,173s
user	1m49,195s
sys	0m6,172s

how would I make the file read speed better? (I compiled using --release)
thanks

The text was updated successfully, but these errors were encountered:

lbguilherme · 2017-09-24T11:27:02Z

First of all, this Crystal version is doing a Hash lookup for each letter, this is not really efficient, while I suspect the C version is handling it with a switch-statement or a sequence of "if"s. Could you try without the Hash lookup there?

Also, Crystal's Char is a Unicode character, instead of a ASCII character, so this has to be decoded from the file (that should be fast, though. But still does more work than C).

bew · 2017-09-24T11:44:12Z

Using each_byte instead of each_char, and without hash lookup:

count = 0_u64

File.open("file.txt") do |f|
  f.each_byte do |byte|
    next if byte == '\n'.ord
    count += case byte
             when 'U'.ord then 0
             when 'D'.ord then 1
             when 'A'.ord then 2
             when 'Z'.ord then 3
             when 'R'.ord then 4
             else raise "Bad char #{byte.chr}"
             end
  end
end

pp count

I get:

ghost · 2017-09-24T11:58:41Z

Yes you are right, I converted the file beforehand using tr in

real	0m26,412s
user	0m8,031s
sys	0m23,088s

and then run the crystal snippet without the hash but still with `.to_i'

real	0m43,881s
user	0m40,359s
sys	0m2,886s

the whole c program which does some lookup in an array every time takes

real	0m34,573s
user	0m19,527s
sys	0m13,227s

bew's version on the original file took

real	1m7,758s
user	1m3,747s
sys	0m3,053s

I first wanted to use
https://crystal-lang.org/api/0.23.1/File.html#read_at%28offset%2Cbytesize%2C%26block%29-instance-method
to read chunks of bytes but the IO part got me

so the remaining question would be, how is the preferred version of reading a file efficiently but still fast? Is the each_char method just fine or are there other options for reading in a file more efficiently?

thanks for your quick response!

RX14 · 2017-09-24T12:39:50Z

Can you show us the C version?

asterite · 2017-09-24T12:48:00Z

Yes, please, show us the C version :-)

I guess the fastest way would be to do more or less what the C version does.

In the meantime, I found this to be more efficient:

f = File.open("./my_file")

# Lookup table: faster than a case
values = Slice.new(256) do |i|
  case i
  when 'D' then 1
  when 'A' then 2
  when 'Z' then 3
  when 'R' then 4
  else          0
  end
end

count = 0

# Read by chunks.
# You can try increasing this number but in my tests
# the performance gain was minimal
bytes = Bytes.new(8192)
while f.read(bytes) != 0
  bytes.each do |val|
    count += values[val]
  end
end

pp count

In my tests most of the time was spent in that case. Doing an array lookup seems faster here. I also read bytes by chunk instead of the slower each_byte (because the later advances the IO byte per byte).

I tried it on an 8GB file and it took about 13 seconds. That is compiled with --release.

@monouser7dig It would be nice to know how much time this variant takes on your machine.

RX14 · 2017-09-24T12:56:04Z

@asterite ideally buffered each_byte would compile down to essentially what you wrote...

I wouldn't have thought that the case would be the bottleneck though, i guess the entire file was already in the page cache.

asterite · 2017-09-24T13:04:13Z

I don't think each_byte can be compiled to that. You can always do:

file.each_byte do |byte|
  break if some_condition
end

That means you can read maybe 5 bytes from the file and the stop. Reading a next byte will give you the sixth one.

In my code above I read a full chunk, and then iterate that chunk. So I read 8192 bytes in one go. Reading a next byte will give you the 8193-rd. There's no way to push back bytes. It's doing something different.

But PRs are welcome, if you can optimize each_byte somehow :-)

asterite · 2017-09-24T13:06:53Z

To say it another way, each block iteration of each_byte updates the IO by incrementing its pointer. That's why it's slower.

I also wouldn't have though that a case was slower than a lookup table, but it seems true.

ghost · 2017-09-24T14:27:24Z

here is the c code (which is not really my own but was given to me in some reduced form for an exercise)

int results[9] = {1,1,1,1,1,1,1,1,0};
int deltas[9][5] = {
    {1,8,4,8,0}, //q0 
    {2,0,5,8,1}, //q1
    {3,1,6,8,2}, //q2
    {8,2,7,8,3}, //q3

    {8,8,8,0,4}, //q4
    {8,8,8,1,5}, //q5
    {8,8,8,2,6}, //q6
    {8,8,8,3,7}, //q7

    {8,8,8,8,8}, //q8
};

int result_f(char* w){
    int sol;
    int pointer = 0;
    double counter;
    counter = 0;
    while (*w) {
        pointer = deltas[pointer] [*w++ -'0'];
        counter++;
    }
    sol = results[pointer];
    printf("%d\n", sol);
    printf("%f\n", counter);
    return results [pointer];
}

the function is called with a pointer that points to the beginn of the buffer which is located in RAM (which I only did because I did not know better) using fopen and malloc

asterite I will look at your version/ try and report back

ghost · 2017-09-24T14:59:18Z

real	0m5,913s
user	0m3,934s
sys	0m1,925s

for the version of the founder of crystal ;-)

somehow glad I asked now, thanks!!

the Slize + block is actually a bit fancy but I think I can grasp it after reading the doc
bytes.to_slize leaves me a bit confused as the doc did not help me, Byte is a Slice(UInt8) already
and the File#read method receives a Filename and not a Slice like the IO#read method, but f is a File, not an IO ..?
why is the values Slize 256 elements wide although it only is accessed at the first 5 indices count += values[val]

maybe I am a bit dense here but if I'd understand that would be really nice, (tried the documentation, appreciate this is alpha and you guys work hard on all the things!)

my guess is that this works better because the program may not have to wait until the whole file is buffered but can start as soon as the first block is loaded? not sure if that really is how it Is implemented though.

asterite · 2017-09-24T15:10:46Z

bytes.to_slice is a small mistake, I originally had bytes be a StaticArray, but since now it's a slice you can just pass bytes. I updated the snippet above
It needs to be 256-wide because a byte value can be anything from 0 to 255. It's not correct that only the 5 first indices will be accessed: 'A' is at index 65, 'D' it at index 68, etc.

This works better than each_byte because of #5029 (comment)

ghost · 2017-09-24T15:17:57Z

ok but crystal did not give me any errors despite to_slize, maybe it should have?

so, values[val] actually calls the to_i method implicitly? I hope/ think I understand

thanks, great help!

asterite · 2017-09-24T15:26:57Z

Slice#to_slice trivially returns self. Many methods in Crystal and Ruby just exist so that you can use duck-typing (try to convert this to a slice no matter what I have)

values[val] just accesses the array by index. The index can be a byte (UInt8) or any integer type.

ghost · 2017-09-24T16:13:55Z

I see because you read the file into a UInt8 Size it is not a char but a UInt8 (which is probably basically the same) and thus the case statement can check against Chars

I bow my head for your experience ;)

btw the type check if crystal is really radical, I love the strong typedefs but a bit of casting would not hurt at some point, my two cents while trying these things out ;) (read the topics about it and issues)

asterite added the kind:question label Sep 24, 2017

asterite closed this as completed Sep 24, 2017

ghost mentioned this issue Sep 25, 2017

[RFC] Create crystal-lang/crystal-rfcs #4944

Closed

ghost mentioned this issue Oct 5, 2017

GitHub issue template: redirect users to Google Groups, Stack Overflow, etc. #5076

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File#each_char performance #5029

File#each_char performance #5029

ghost commented Sep 24, 2017

lbguilherme commented Sep 24, 2017 •

edited

bew commented Sep 24, 2017 •

edited

ghost commented Sep 24, 2017

RX14 commented Sep 24, 2017

asterite commented Sep 24, 2017 •

edited

RX14 commented Sep 24, 2017 •

edited

asterite commented Sep 24, 2017

asterite commented Sep 24, 2017

ghost commented Sep 24, 2017 •

edited by ghost

ghost commented Sep 24, 2017

asterite commented Sep 24, 2017

ghost commented Sep 24, 2017

asterite commented Sep 24, 2017

ghost commented Sep 24, 2017

File#each_char performance #5029

File#each_char performance #5029

Comments

ghost commented Sep 24, 2017

lbguilherme commented Sep 24, 2017 • edited

bew commented Sep 24, 2017 • edited

ghost commented Sep 24, 2017

RX14 commented Sep 24, 2017

asterite commented Sep 24, 2017 • edited

RX14 commented Sep 24, 2017 • edited

asterite commented Sep 24, 2017

asterite commented Sep 24, 2017

ghost commented Sep 24, 2017 • edited by ghost

ghost commented Sep 24, 2017

asterite commented Sep 24, 2017

ghost commented Sep 24, 2017

asterite commented Sep 24, 2017

ghost commented Sep 24, 2017

lbguilherme commented Sep 24, 2017 •

edited

bew commented Sep 24, 2017 •

edited

asterite commented Sep 24, 2017 •

edited

RX14 commented Sep 24, 2017 •

edited

ghost commented Sep 24, 2017 •

edited by ghost