Skip to content
A read-optimized, in-memory, columnar store database (column-oriented DBMS)
Go Makefile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Godeps
cli
data
error
index
input
query
server
.gitignore
LICENSE
Makefile
README.md
main.go

README.md

PDQdb

A read-optimized, in-memory, columnar store database (column-oriented DBMS). It's pretty Damn Quick.

Instructions

  1. Install goLang on your computer
  2. $ git clone https://github.com/adam-hanna/PDQdb.git
  3. $ make
  • Only builds for Linux and MacOS, for now
  1. $ PDQdb -f "path/to/your/file.csv" -c "path/to/your/config.json"
  2. http api (returns json)
  • Grab data by key: curl -v -XGET http://127.0.0.1:38216/key/{your key}
  • Count keys: curl -v -XGET http://127.0.0.1:38216/count
  • Grab data by query: see the section on weeQL

weeQL

weeQL is the query language written for PDQdb. It was inspired by SQL and implements many of the features of the SQL language; however, it is a very lightweight implementation and does not include all the functionality of SQL. Pronounced "wee" + "quill". Get it? Wee SQL? Tiny SQL?

All weeQL queries are made by sending a json query object using the http POST method to http://127.0.0.1:38216/query

The json query object is structured as follows:

{
  "SELECT":  [ "COL1", "COL2", ... ], // SITUATIONAL
  "COUNT": "*", // SITUATIONAL
  "WHERE":   { "FIELD1": "VAL1", "FIELD2": "VAL2", ... }, // see the section on where
  "GROUP BY": "COL1" // OPTIONAL
}

The "WHERE" property supports multiple logical operators:

  • `"$OR": [ { "FIELD1": "VAL1" }, {"FIELD2": "VAL2" }, ... ]`
  • `"$NOT": { "FIELD1": "VAL1", "FIELD2": "VAL2", ... }`
  • `"$NOR": { "FIELD1": "VAL1", "FIELD2": "VAL2", ... }`; This is the same as `"$NOT": { "$OR": [ ... ] }`
  • `"FIELD1": "$IN": [ "VAL1", "VAL2", ... ]`
  • `"FIELD1": "$NIN": [ "VAL1", "VAL2", ... ]` This is the same as `"$NOT": { "$IN": [ ... ] }`

Exporting and aggregating is explained in more detail, below.

1. Exporting Data

Exporting data is done with the `"SELECT"` query parameter (omit `"COUNT"` and `"GROUP BY"`)
Properties
  • `"SELECT" : [ “FIELD 1”, “FIELD 2”, … ]`: SITUATIONAL. An array of strings that indicates the columns to be returned. Omitted if using `"COUNT"`!
  • `"WHERE": { "FIELD1": "VAL1", "FIELD2": "VAL2", ... }`: a subdocument of filters. Multiple filters are returned as the intersection of data that meet each criteria (i.e. "FIELD1" = "VAL1" AND "FIELD2" = "VAL2"). Other logical operators besides AND are available. See the section on `"WHERE"` for more info.
example
query: ``` { SELECT : [ “ID” ], WHERE: { “A”: “foo” } } ```

yields:

[
  {
    "ID": "1"
  },
  {
    "ID": "3"
  },
  {
    "ID": "4"
  }
]

2. Aggregation

Perform counts on data with query parameters and an optional `"GROUP BY"` command (omit `"SELECT"`)
Properties
  • `"COUNT": "*"`: SITUATIONAL. The only value currently supported is "*". Omitted if using `"SELECT"`!
  • `"WHERE": { "FIELD1": "VAL1", "FIELD2": "VAL2", ... }`: a subdocument of filters. Multiple filters are returned as the intersection of data that meet each criteria (i.e. "FIELD1" = "VAL1" AND "FIELD2" = "VAL2"). Other logical operators besides AND are available. See the section on `"WHERE"` for more info.
  • `"GROUP BY": "COL1"`: OPTIONAL. Only valid with `"COUNT"` queries. This is the string column name by which to group count results.
examples
query #1:
{
  “COUNT”: “*”,
  "WHERE": {
    “A”: “foo”
  }
}

yields:

{ "COUNT": 3 }

query #2 with "GROUP BY"

{
  “COUNT”: “*”,
  "WHERE": {
      “A”: “foo”
  },
  “GROUP BY”: “B”
}

yields:

{
  "bar": 0,
  "barbar": 0,
  "barbarbar": 0,
  "test": 1,
  "testtest": 1,
  "testtesttest": 1
}

query #3 with "GROUP BY" and "$OR":

{
  “COUNT”: “*”,
  "WHERE": {
    "A": "foo",
    "$OR": [
      { "B": "testtesttest" },
      { "B": "testtest" }
    ]
  },
  “GROUP BY”: “B”
}

yields:

{
  "bar": 0,
  "barbar": 0,
  "barbarbar": 0,
  "test": 0,
  "testtest": 1,
  "testtesttest": 1
}

Example data

###example.csv

ID,A,B
1,foo,test
2,foobar,testtest
3,foo,testtesttest
4,foo,testtest
5,foobarbar,bar
6,foofoo,barbar
7,foofoo,barbarbar

###example_config.json

{
  "header": [
    {"ID": "string"},
    {"A": "string"},
    {"B": "string"}
  ],
  "id_field": "ID",
  "index_fields": ["A", "B"],
  "start_at_line": 2
}  
You can’t perform that action at this time.