Skip to content

Commit

Permalink
SortedSet Open Source Release
Browse files Browse the repository at this point in the history
  • Loading branch information
ihumanable committed May 10, 2019
0 parents commit d19ede4
Show file tree
Hide file tree
Showing 37 changed files with 4,686 additions and 0 deletions.
9 changes: 9 additions & 0 deletions .formatter.exs
@@ -0,0 +1,9 @@
[
inputs: [
"lib/**/*.{ex,exs}",
"test/**/*.{ex,exs}",
"config/**/*.exs",
"bench/**/*.exs",
"mix.exs"
]
]
29 changes: 29 additions & 0 deletions .gitignore
@@ -0,0 +1,29 @@
# The directory Mix will write compiled artifacts to.
/_build

# The directory that rustler will write compiled artifacts to.
/priv

# The directory that rust will compile to for testing.
/native/sorted_set_nif/target

# If you run "mix test --cover", coverage assets end up here.
/cover

# The directory Mix downloads your dependencies sources to.
/deps

# Where 3rd-party dependencies like ExDoc output generated docs.
/doc

# Ignore .fetch files in case you like to edit your project deps locally.
/.fetch

# If the VM crashes, it generates a dump, let's ignore it too.
erl_crash.dump

# Also ignore archive artifacts (built via "mix archive.build").
*.ez

# Ignore the output of benchmarking
/bench/results
25 changes: 25 additions & 0 deletions .travis.yml
@@ -0,0 +1,25 @@
sudo: false
language: elixir
git:
depth: 3
env:
- MIX_ENV=test
script:
- mix test
matrix:
include:
- name: "Elixir 1.5.3 OTP 20.3.8"
elixir: 1.5.3
otp_release: 20.3.8
- name: "Elixir 1.6.6 OTP 20.3.8"
elixir: 1.6.6
otp_release: 20.3.8
- name: "Elixir 1.6.6 OTP 21.1.1"
elixir: 1.6.6
otp_release: 21.1.1
- name: "Elixir 1.7.3 OTP 20.3.8"
elixir: 1.7.3
otp_release: 20.3.8
- name: "Elixir 1.7.3 OTP 21.1.1"
elixir: 1.7.3
otp_release: 21.1.1
21 changes: 21 additions & 0 deletions LICENSE
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2018 Discord

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
132 changes: 132 additions & 0 deletions README.md
@@ -0,0 +1,132 @@
# Discord.SortedSet

[![Master](https://travis-ci.org/discordapp/sorted_set_nif.svg?branch=master)](https://travis-ci.org/discordapp/sorted_set_nif)
[![Hex.pm Version](http://img.shields.io/hexpm/v/sorted_set_nif.svg?style=flat)](https://hex.pm/packages/sorted_set_nif)

SortedSet is a fast and efficient data structure that provides certain guarantees and
functionality. The core data structure and algorithms are implemented in a Native Implemented
Function in the Rust Programming Language, using the [Rustler crate](https://github.com/hansihe/rustler).

## Installation

Add SortedSet to your dependencies and then install with `mix do deps.get, deps.compile`

```elixir
def deps do
[
{:sorted_set_nif, "~> 1.0.0"}
]
end
```

## Implementation Details

Internally the Elixir terms stored in the SortedSet are converted to Rust equivalents and
stored in a Vector of Vectors. The structure is similar to a skip-list, almost every operation
on the SortedSet will perform a linear scan through the buckets to find the bucket that owns the
term, then a binary search is done within the bucket to complete the operation.

Why not just a Vector of Terms? This approach was explored but when the Vector needs to grow
beyond it's capacity, copying Terms over to the new larger Vector proved to be a performance
bottle neck. Using a Vector of Vectors, the Bucket pointers can be quickly copied when
additional capacity is required.

This strategy provides a reasonable trade off between performance and implementation complexity.

When using a SortedSet, the caller can tune bucket sizes to their use case. A default bucket
size of 500 was chosen as it provides good performance for most use cases. See `new/2` for
details on how to provide custom tuning details.

## Guarantees

1. Terms in the SortedSet will be sorted based on the Elixir sorting rules.
2. SortedSet is a Set, any item can appear 0 or 1 times in the Set.

## Functionality

There is some special functionality that SortedSet provides beyond sorted and uniqueness
guarantees.

1. SortedSet has a defined ordering, unlike a pure mathematical set.
2. SortedSet can report the index of adding and removing items from the Set due to it's defined
ordering property.
3. SortedSet can provide random access of items and slices due to it's defined ordering
property.

## Caveats

1. Due to SortedSet's implementation, some operations that are constant time in sets have
different performance characteristic in SortedSet, these are noted on the operations.
2. SortedSets do not support some types of Elixir Terms, namely `reference`, `pid`, `port`,
`function`, and `float`. Attempting to store any of these types (or an allowed composite
type containing one of the disallowed types) will result in an error, namely,
`{:error, :unsupported_type}`

## Documentation

Documentation is [hosted on hexdocs](https://hexdocs.pm/sorted_set_nif).

For a local copy of the documentation, the `mix.exs` file is already set up for generating
documentation, simply run the following commands to generate the documentation from source.

```bash
$ mix deps.get
$ mix docs
```

## Running the Tests

There are two test suites available in this library, an ExUnit test suite that tests the
correctness of the implementation from a black box point of view. These tests can be run by
running `mix test` in the root of the library.

The rust code also contains tests, these can be run by running `cargo test` in the
`native/sorted_set_nif` directory.

## Running the Benchmarks

Before running any benchmarks it's important to remember that during development the NIF will be
built unoptimized. Make sure to rebuild an optimized version of the NIF before running the
benchmarks.

There are benchmarks available in the `bench` folder, these are written with
[Benchee](https://github.com/PragTob/benchee) and can be run with the following command.

```bash
$ OPTIMIZE_NIF=true mix run bench/{benchmark}.exs
```

Adding the `OPTIMIZE_NIF=true` will force the benchmark to run against the fully optimized NIF.

## Basic Usage

SortedSet lives in the `Discord` namespace to prevent symbol collision, it can be used directly

```elixir
defmodule ExampleModule do
def get_example_sorted_set() do
Discord.SortedSet.new()
|> Discord.SortedSet.add(1)
|> Discord.SortedSet.add(:atom),
|> Discord.SortedSet.add("hi there!")
end
end
```

You can always add an `alias` to make this code less verbose

```elixir
defmodule ExampleModule do
alias Discord.SortedSet

def get_example_sorted_set() do
SortedSet.new()
|> SortedSet.add(1)
|> SortedSet.add(:atom),
|> SortedSet.add("hi there!")
end
end
```

Full API Documentation is available, there is also a full test suite with examples of how the
library can be used.
87 changes: 87 additions & 0 deletions bench/add.exs
@@ -0,0 +1,87 @@
add_scenario = fn inputs, size ->
cell_size = 500

prefix =
size
|> Integer.floor_div(1000)
|> Integer.to_string(10)
|> String.pad_leading(4, "0")

padded_size =
size
|> Integer.to_string(10)
|> String.pad_leading(7, " ")

[:beginning, :middle, :ending]
|> Enum.with_index(1)
|> Enum.reduce(inputs, fn {placement, idx}, inputs ->
human_placement =
placement
|> Atom.to_string()
|> String.capitalize()

key = "#{prefix}-#{idx}. #{padded_size} Set // #{cell_size} cell // #{human_placement}"
Map.put(inputs, key, {size, cell_size, placement})
end)
end

make_input = fn {size, cell_size, position} ->
set =
1..size
|> Enum.map(&(&1 * 10_000))
|> Discord.SortedSet.from_proper_enumerable(cell_size)

item =
case position do
:beginning ->
15000

:middle ->
size * 5000 + 5000

:ending ->
size * 10000 + 5000
end

{set, item, size}
end

Benchee.run(
%{
"Add 1000 New Items" => fn {set, item, size} ->
for i <- 1..1000 do
Discord.SortedSet.add(set, item + i)
end

{set, size}
end
},
inputs:
%{}
|> add_scenario.(5000)
|> add_scenario.(50_000)
|> add_scenario.(250_000)
|> add_scenario.(500_000)
|> add_scenario.(750_000)
|> add_scenario.(1_000_000),
before_each: make_input,
after_each: fn {set, size} ->
expected = size + 1000
actual = Discord.SortedSet.size(set)

if expected != actual do
raise "Set size incorrect: expected #{expected} but found #{actual}"
end
end,
formatters: [
&Benchee.Formatters.Console.output/1,
&Benchee.Formatters.HTML.output/1
],
formatter_options: [
html: [file: "bench/results/add/html/add.html"]
],
save: %{
path: "bench/results/add/runs"
},
time: 60
)
48 changes: 48 additions & 0 deletions bench/construction.exs
@@ -0,0 +1,48 @@
make_inputs = fn size ->
sorted = Enum.to_list(1..size)
shuffled = Enum.shuffle(sorted)
{size, sorted, shuffled}
end

Benchee.run(
%{
"Sorted Iterative Construction" => fn {size, _, _} ->
Enum.reduce(1..size, SortedSet.new(), &SortedSet.add(&2, &1))
:ok
end,
"Sorted Proper Enumerable Construction" => fn {_, sorted, _} ->
SortedSet.from_proper_enumerable(sorted)
:ok
end,
"Sorted Proper Enumerable Chunked Construction" => fn {_, sorted, _} ->
SortedSet.from_proper_enumerable_chunked(sorted)
:ok
end,
"Shuffle Enumerable Construction" => fn {_, _, shuffled} ->
SortedSet.from_enumerable(shuffled)
:ok
end,
"Shuffle Enumerable Chunked Construction" => fn {_, _, shuffled} ->
SortedSet.from_enumerable_chunked(shuffled)
:ok
end
},
inputs: %{
"1. 5,000 Items" => make_inputs.(5000),
"2. 50,000 Items" => make_inputs.(50_000),
"3. 250,000 Items" => make_inputs.(250_000),
"4. 500,000 Items" => make_inputs.(500_000),
"5. 750,000 Items" => make_inputs.(750_000),
"6. 1,000,000 Items" => make_inputs.(1_000_000)
},
formatters: [
&Benchee.Formatters.Console.output/1,
&Benchee.Formatters.HTML.output/1
],
formatter_options: [
html: [file: "bench/results/construction/html/construction.html"]
],
save: %{
path: "bench/results/construction/runs"
}
)
1 change: 1 addition & 0 deletions config/config.exs
@@ -0,0 +1 @@
use Mix.Config

0 comments on commit d19ede4

Please sign in to comment.