Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File#each_char performance #5029

Closed
ghost opened this issue Sep 24, 2017 · 14 comments
Closed

File#each_char performance #5029

ghost opened this issue Sep 24, 2017 · 14 comments

Comments

@ghost
Copy link

ghost commented Sep 24, 2017

I wrote following code in order to simulate a automata (this is a condensed version which shows the same issue)

f = File.open("./my_file", mode = "r")

count = 0

h_map = {'U' => 0,
         'D' => 1,
         'A' => 2,
         'Z' => 3,
         'R' => 4,
}

f.each_char do |val|
  next if val == '\n'
  count += h_map[val]
end

p count

where my_file is something like

UDRAZRAZAZRRAZURURURAZAZRRAZARRRZARZRRRAZAZRZAZARRZAZDAZAZUDURAZDRURAZRAZRRAZ
AZUUDRAZUDUARZUAZARRZ
AZRUAZUAZUAZDAZUARRRRRZARZDAZARZARRRZUARZAZDUAZARRZRAZARZAZRRRRDDARRZRARZAZAZAZUDRRAZARZARZAZUAZARZUARRZDARRZRUDDUAZURRDRARRZDARZARZRAZDARZARZAZUARRZURARZARZUDARZAZARZARRZUARRZRARZAZAZAZARZARZDARZUDAZAZUDARZARRZARZAZAZAZUDUARRRZAZRRAZDRURA

but about 8GB of that

I have a C implementation which just loads all in RAM and takes under 20s

this crystal one takes about 2min though

real	1m58,173s
user	1m49,195s
sys	0m6,172s

how would I make the file read speed better? (I compiled using --release)
thanks

@lbguilherme
Copy link
Contributor

lbguilherme commented Sep 24, 2017

First of all, this Crystal version is doing a Hash lookup for each letter, this is not really efficient, while I suspect the C version is handling it with a switch-statement or a sequence of "if"s. Could you try without the Hash lookup there?

Also, Crystal's Char is a Unicode character, instead of a ASCII character, so this has to be decoded from the file (that should be fast, though. But still does more work than C).

@bew
Copy link
Contributor

bew commented Sep 24, 2017

Using each_byte instead of each_char, and without hash lookup:

count = 0_u64

File.open("file.txt") do |f|
  f.each_byte do |byte|
    next if byte == '\n'.ord
    count += case byte
             when 'U'.ord then 0
             when 'D'.ord then 1
             when 'A'.ord then 2
             when 'Z'.ord then 3
             when 'R'.ord then 4
             else raise "Bad char #{byte.chr}"
             end
  end
end

pp count

I get:
each_byte

@ghost
Copy link
Author

ghost commented Sep 24, 2017

Yes you are right, I converted the file beforehand using tr in

real	0m26,412s
user	0m8,031s
sys	0m23,088s

and then run the crystal snippet without the hash but still with `.to_i'

real	0m43,881s
user	0m40,359s
sys	0m2,886s

the whole c program which does some lookup in an array every time takes

real	0m34,573s
user	0m19,527s
sys	0m13,227s

bew's version on the original file took

real	1m7,758s
user	1m3,747s
sys	0m3,053s

I first wanted to use
https://crystal-lang.org/api/0.23.1/File.html#read_at%28offset%2Cbytesize%2C%26block%29-instance-method
to read chunks of bytes but the IO part got me

so the remaining question would be, how is the preferred version of reading a file efficiently but still fast? Is the each_char method just fine or are there other options for reading in a file more efficiently?

thanks for your quick response!

@RX14
Copy link
Contributor

RX14 commented Sep 24, 2017

Can you show us the C version?

@asterite
Copy link
Member

asterite commented Sep 24, 2017

Yes, please, show us the C version :-)

I guess the fastest way would be to do more or less what the C version does.

In the meantime, I found this to be more efficient:

f = File.open("./my_file")

# Lookup table: faster than a case
values = Slice.new(256) do |i|
  case i
  when 'D' then 1
  when 'A' then 2
  when 'Z' then 3
  when 'R' then 4
  else          0
  end
end

count = 0

# Read by chunks.
# You can try increasing this number but in my tests
# the performance gain was minimal
bytes = Bytes.new(8192)
while f.read(bytes) != 0
  bytes.each do |val|
    count += values[val]
  end
end

pp count

In my tests most of the time was spent in that case. Doing an array lookup seems faster here. I also read bytes by chunk instead of the slower each_byte (because the later advances the IO byte per byte).

I tried it on an 8GB file and it took about 13 seconds. That is compiled with --release.

@monouser7dig It would be nice to know how much time this variant takes on your machine.

@RX14
Copy link
Contributor

RX14 commented Sep 24, 2017

@asterite ideally buffered each_byte would compile down to essentially what you wrote...

I wouldn't have thought that the case would be the bottleneck though, i guess the entire file was already in the page cache.

@asterite
Copy link
Member

I don't think each_byte can be compiled to that. You can always do:

file.each_byte do |byte|
  break if some_condition
end

That means you can read maybe 5 bytes from the file and the stop. Reading a next byte will give you the sixth one.

In my code above I read a full chunk, and then iterate that chunk. So I read 8192 bytes in one go. Reading a next byte will give you the 8193-rd. There's no way to push back bytes. It's doing something different.

But PRs are welcome, if you can optimize each_byte somehow :-)

@asterite
Copy link
Member

To say it another way, each block iteration of each_byte updates the IO by incrementing its pointer. That's why it's slower.

I also wouldn't have though that a case was slower than a lookup table, but it seems true.

@ghost
Copy link
Author

ghost commented Sep 24, 2017

here is the c code (which is not really my own but was given to me in some reduced form for an exercise)

int results[9] = {1,1,1,1,1,1,1,1,0};
int deltas[9][5] = {
    {1,8,4,8,0}, //q0 
    {2,0,5,8,1}, //q1
    {3,1,6,8,2}, //q2
    {8,2,7,8,3}, //q3

    {8,8,8,0,4}, //q4
    {8,8,8,1,5}, //q5
    {8,8,8,2,6}, //q6
    {8,8,8,3,7}, //q7

    {8,8,8,8,8}, //q8
};

int result_f(char* w){
    int sol;
    int pointer = 0;
    double counter;
    counter = 0;
    while (*w) {
        pointer = deltas[pointer] [*w++ -'0'];
        counter++;
    }
    sol = results[pointer];
    printf("%d\n", sol);
    printf("%f\n", counter);
    return results [pointer];
}

the function is called with a pointer that points to the beginn of the buffer which is located in RAM (which I only did because I did not know better) using fopen and malloc

asterite I will look at your version/ try and report back

@ghost
Copy link
Author

ghost commented Sep 24, 2017

real	0m5,913s
user	0m3,934s
sys	0m1,925s

for the version of the founder of crystal ;-)

somehow glad I asked now, thanks!!

  • the Slize + block is actually a bit fancy but I think I can grasp it after reading the doc
  • bytes.to_slize leaves me a bit confused as the doc did not help me, Byte is a Slice(UInt8) already
    and the File#read method receives a Filename and not a Slice like the IO#read method, but f is a File, not an IO ..?
  • why is the values Slize 256 elements wide although it only is accessed at the first 5 indices count += values[val]

maybe I am a bit dense here but if I'd understand that would be really nice, (tried the documentation, appreciate this is alpha and you guys work hard on all the things!)

my guess is that this works better because the program may not have to wait until the whole file is buffered but can start as soon as the first block is loaded? not sure if that really is how it Is implemented though.

@asterite
Copy link
Member

  • bytes.to_slice is a small mistake, I originally had bytes be a StaticArray, but since now it's a slice you can just pass bytes. I updated the snippet above
  • It needs to be 256-wide because a byte value can be anything from 0 to 255. It's not correct that only the 5 first indices will be accessed: 'A' is at index 65, 'D' it at index 68, etc.

This works better than each_byte because of #5029 (comment)

@ghost
Copy link
Author

ghost commented Sep 24, 2017

ok but crystal did not give me any errors despite to_slize, maybe it should have?

so, values[val] actually calls the to_i method implicitly? I hope/ think I understand

thanks, great help!

@asterite
Copy link
Member

Slice#to_slice trivially returns self. Many methods in Crystal and Ruby just exist so that you can use duck-typing (try to convert this to a slice no matter what I have)

values[val] just accesses the array by index. The index can be a byte (UInt8) or any integer type.

@ghost
Copy link
Author

ghost commented Sep 24, 2017

I see because you read the file into a UInt8 Size it is not a char but a UInt8 (which is probably basically the same) and thus the case statement can check against Chars

I bow my head for your experience ;)

btw the type check if crystal is really radical, I love the strong typedefs but a bit of casting would not hurt at some point, my two cents while trying these things out ;) (read the topics about it and issues)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants