Skip to content

Commit

Permalink
Merged features/perfomance into master
Browse files Browse the repository at this point in the history
  • Loading branch information
AndrewDryga committed Aug 7, 2016
2 parents 0b0b314 + 6a966cc commit c511fa7
Show file tree
Hide file tree
Showing 19 changed files with 761 additions and 150 deletions.
92 changes: 57 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,40 @@ File is read by 4096 byte chunks, BSONEach iterates over all documents till the

* This module archives low memory usage (on my test environment it's constantly consumes 28.1 Mb on a 1.47 GB fixture with 1 000 000 BSON documents).
* Correlation between file size and parse time is linear. (You can check it by running ```mix bench```).

```
$ mix bench
Settings:
duration: 1.0 s
## EachBench
[18:49:38] 1/10: read and iterate 1 document
[18:49:41] 2/10: read and iterate 30 documents
[18:49:42] 3/10: read and iterate 300 documents
[18:49:44] 4/10: read and iterate 30_000 documents
[18:49:45] 5/10: read and iterate 3_000 documents
[18:49:47] 6/10: stream and iterate 1 document
[18:49:48] 7/10: stream and iterate 30 documents
[18:49:50] 8/10: stream and iterate 300 documents
[18:49:51] 9/10: stream and iterate 30_000 documents
[18:49:56] 10/10: stream and iterate 3_000 documents
Finished in 20.43 seconds
## EachBench
read and iterate 1 document 20000 100.07 µs/op
stream and iterate 1 document 10000 150.70 µs/op
read and iterate 30 documents 1000 1327.53 µs/op
stream and iterate 30 documents 1000 1424.17 µs/op
read and iterate 300 documents 100 12882.34 µs/op
stream and iterate 300 documents 100 13631.52 µs/op
read and iterate 3_000 documents 10 126870.90 µs/op
stream and iterate 3_000 documents 10 168413.20 µs/op
read and iterate 30_000 documents 1 1301289.00 µs/op
stream and iterate 30_000 documents 1 5083005.00 µs/op
```

* It's better to pass a file to BSONEach instead of stream, since streamed implementation works so much slower.
* BSONEach is CPU-bounded. Consumes 98% of CPU resources on my test environment.
* (```time``` is not a best way to test this, but..) on large files BSONEach works almost 2 times faster comparing to loading whole file in memory and iterating over it:

Expand All @@ -22,48 +56,32 @@ File is read by 4096 byte chunks, BSONEach iterates over all documents till the
Run different task types:

```bash
$ time mix print_read test/fixtures/1000000.bson
mix print_read test/fixtures/1000000.bson 994.60s user 154.40s system 87% cpu 21:51.88 total
$ time mix count_read test/fixtures/1000000.bson
Compiling 2 files (.ex)
"Done parsing 1000000 documents."
mix print_read test/fixtures/1000000.bson 59.95s user 5.69s system 99% cpu 1:05.74 total
```

```bash
$ time mix print_each test/fixtures/1000000.bson
mix print_each test/fixtures/1000000.bson 583.67s user 66.86s system 75% cpu 14:27.26 total
$ time mix count_each test/fixtures/1000000.bson
Compiling 2 files (.ex)
Generated bsoneach app
"Done parsing 1000000 documents."
mix count_each test/fixtures/1000000.bson 45.37s user 2.74s system 102% cpu 46.876 total
```

* Pass a file to BSONEach instead of streams, since streamed implementation works so much slower:
* This implementation works faster than [timkuijsten/node-bson-stream](https://github.com/timkuijsten/node-bson-stream) NPM package (we comparing with Node.js on file with 30k documents):

```bash
$ mix bench
Compiling 1 file (.ex)

Settings:
duration: 1.0 s

## EachBench
[15:02:11] 1/10: read and iterate 1 document
[15:02:12] 2/10: read and iterate 30 documents
[15:02:15] 3/10: read and iterate 300 documents
[15:02:18] 4/10: read and iterate 30_000 documents
[15:02:21] 5/10: read and iterate 3_000 documents
[15:02:23] 6/10: stream and iterate 1 document
[15:02:26] 7/10: stream and iterate 30 documents
[15:02:28] 8/10: stream and iterate 300 documents
[15:02:30] 9/10: stream and iterate 30_000 documents
[15:04:37] 10/10: stream and iterate 3_000 documents
Finished in 151.93 seconds
$ time mix count_each test/fixtures/30000.bson
"Done parsing 30000 documents."
mix count_each test/fixtures/30000.bson 1.75s user 0.35s system 114% cpu 1.839 total
```

## EachBench
read and iterate 1 document 10000 140.63 µs/op
stream and iterate 1 document 10000 190.69 µs/op
read and iterate 30 documents 1000 2601.48 µs/op
stream and iterate 30 documents 500 3198.02 µs/op
read and iterate 300 documents 100 25354.27 µs/op
stream and iterate 300 documents 50 41764.02 µs/op
read and iterate 3_000 documents 10 252262.90 µs/op
read and iterate 30_000 documents 1 2514610.00 µs/op
stream and iterate 3_000 documents 1 6238468.00 µs/op
stream and iterate 30_000 documents 1 126495171.00 µs/op
```bash
$ time node index.js
Read 30000 documents.
node index.js 2.09s user 0.05s system 100% cpu 2.139 total
```

## Installation
Expand Down Expand Up @@ -92,7 +110,7 @@ It's available on [hex.pm](https://hex.pm/packages/bsoneach) and can be installe

```elixir
"test/fixtures/300.bson" # File path
|> File.open!([:read, :binary, :raw]) # Open file in :binary, :raw mode
|> BSONEach.File.open # Open file in :binary, :raw, :read_ahead modes
|> BSONEach.each(&process_bson_document/1) # Send IO.device to BSONEach.each function and pass a callback
|> File.close # Don't forget to close referenced file
```
Expand All @@ -107,3 +125,7 @@ It's available on [hex.pm](https://hex.pm/packages/bsoneach) and can be installe
```

When you process large files its a good thing to process documents asynchronously, you can find more info [here](http://elixir-lang.org/docs/stable/elixir/Task.html).

## Thanks

I want to thank to @ericmj for his MongoDB driver. All code that encodes and decodes to with BSON was taken from his repo.
25 changes: 10 additions & 15 deletions bench/each_bench.exs
Original file line number Diff line number Diff line change
Expand Up @@ -13,72 +13,67 @@ defmodule EachBench do

bench "read and iterate 1 document", [fixtures: get_fixtures()] do
fixtures[:single]
|> File.open!([:read, :binary, :raw])
|> BSONEach.File.open
|> BSONEach.each(&foo/1)
|> File.close
end

bench "read and iterate 30 documents", [fixtures: get_fixtures()] do
fixtures[:small]
|> File.open!([:read, :binary, :raw])
|> BSONEach.File.open
|> BSONEach.each(&foo/1)
|> File.close
end

bench "read and iterate 300 documents", [fixtures: get_fixtures()] do
fixtures[:medium]
|> File.open!([:read, :binary, :raw])
|> BSONEach.File.open
|> BSONEach.each(&foo/1)
|> File.close
end

bench "read and iterate 3_000 documents", [fixtures: get_fixtures()] do
fixtures[:large]
|> File.open!([:read, :binary, :raw])
|> BSONEach.File.open
|> BSONEach.each(&foo/1)
|> File.close
end

bench "read and iterate 30_000 documents", [fixtures: get_fixtures()] do
fixtures[:xlarge]
|> File.open!([:read, :binary, :raw])
|> BSONEach.File.open
|> BSONEach.each(&foo/1)
|> File.close
end

bench "stream and iterate 1 document", [fixtures: get_fixtures()] do
fixtures[:single]
|> File.stream!([:read, :binary, :raw, :read_ahead], 4096)
|> BSONEach.File.stream
|> BSONEach.each(&foo/1)
|> File.close
end

bench "stream and iterate 30 documents", [fixtures: get_fixtures()] do
fixtures[:small]
|> File.stream!([:read, :binary, :raw, :read_ahead], 4096)
|> BSONEach.File.stream
|> BSONEach.each(&foo/1)
|> File.close
end

bench "stream and iterate 300 documents", [fixtures: get_fixtures()] do
fixtures[:medium]
|> File.stream!([:read, :binary, :raw, :read_ahead], 4096)
|> BSONEach.File.stream
|> BSONEach.each(&foo/1)
|> File.close
end

bench "stream and iterate 3_000 documents", [fixtures: get_fixtures()] do
fixtures[:large]
|> File.stream!([:read, :binary, :raw, :read_ahead], 4096)
|> BSONEach.File.stream
|> BSONEach.each(&foo/1)
|> File.close
end

bench "stream and iterate 30_000 documents", [fixtures: get_fixtures()] do
fixtures[:xlarge]
|> File.stream!([:read, :binary, :raw, :read_ahead], 4096)
|> BSONEach.File.stream
|> BSONEach.each(&foo/1)
|> File.close
end

def foo(_) do
Expand Down
5 changes: 4 additions & 1 deletion config/.credo.exs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@
name: "default",
files: %{
included: ["lib/"],
excluded: ["lib/mix/tasks"]
excluded: [
"lib/mix/tasks",
"lib/bson/binary_utils.ex" # TODO: https://github.com/rrrene/credo/issues/144
]
},
checks: [
{Credo.Check.Design.TagTODO, exit_status: 0}
Expand Down
1 change: 1 addition & 0 deletions config/dogma.exs
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,6 @@ config :dogma,
rule_set: Dogma.RuleSet.All,
override: [
%Rule.LineLength{ max_length: 120 },
%Rule.TakenName{ enabled: false }, # TODO: https://github.com/lpil/dogma/issues/201
%Rule.InfixOperatorPadding{ enabled: false }
]
1 change: 1 addition & 0 deletions coveralls.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"skip_files": [
"lib/mix/*",
"lib/bson/",
"lib/counter_agent/*"
]
}
35 changes: 35 additions & 0 deletions lib/bson/binary_utils.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
defmodule BSON.BinaryUtils do
@moduledoc false

defmacro int64 do
quote do: signed-little-64
end

defmacro int32 do
quote do: signed-little-32
end

defmacro int16 do
quote do: signed-little-16
end

defmacro uint16 do
quote do: unsigned-little-16
end

defmacro int8 do
quote do: signed-little-8
end

defmacro float64 do
quote do: float-little-64
end

defmacro float32 do
quote do: float-little-32
end

defmacro binary(size) do
quote do: binary-size(unquote(size))
end
end

0 comments on commit c511fa7

Please sign in to comment.