Skip to content
This repository was archived by the owner on Sep 28, 2022. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
* **v0.7.0** (2017-10-04):
* Added support for creating range encoded frames.
* Added `Xor` call.
* Added support for excluding bits or attributes from bitmap calls. In order to exclude bits, call `setExcludeBits(true)` in your `QueryOptions.Builder`. In order to exclude attributes, call `setExcludeAttributes(true)`.
* Added support for excluding bits or attributes from bitmap calls.
* Added range field operations.
* Customizable CSV timestamp format (Contributed by @lachlanorr).
* **Deprecation** Row and column labels are deprecated, and will be removed in a future release of this library. Do not use `column_label` field when creating `Index` objects and do not use `row_label` field when creating `Frame` objects for new code. See: https://github.com/pilosa/pilosa/issues/752 for more info.
Expand Down
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
SRC_DIR = pilosa/internal
DST_DIR = pilosa/internal

.PHONY: cover doc generate test test-all
.PHONY: build clean cover doc generate test test-all release upload

cover:
py.test --cov=pilosa tests integration_tests
Expand All @@ -25,3 +25,6 @@ upload:
twine upload dist/*

release: build upload

clean:
rm -rf build dist pilosa.egg-info
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

<img src="https://www.pilosa.com/img/ce.svg" style="float: right" align="right" height="301">

Python client for Pilosa high performance distributed bitmap index.
Python client for Pilosa high performance distributed row index.

## What's New?

Expand Down Expand Up @@ -44,31 +44,31 @@ schema = client.schema()
# Create an Index object
myindex = schema.index("myindex")

# Create a Frame object
myframe = myindex.frame("myframe")
# Create a Field object
myfield = myindex.field("myfield")

# make sure the index and frame exists on the server
# make sure the index and field exists on the server
client.sync_schema(schema)

# Send a SetBit query. PilosaError is thrown if execution of the query fails.
client.query(myframe.setbit(5, 42))
client.query(myfield.set(5, 42))

# Send a Bitmap query. PilosaError is thrown if execution of the query fails.
response = client.query(myframe.bitmap(5))
response = client.query(myfield.row(5))

# Get the result
result = response.result

# Act on the result
if result:
bits = result.bitmap.bits
print("Got bits: ", bits)
columns = result.row.columns
print("Got columns: ", columns)

# You can batch queries to improve throughput
response = client.query(
myindex.batch_query(
myframe.bitmap(5),
myframe.bitmap(10),
myfield.row(5),
myfield.row(10),
)
)
for result in response.results:
Expand Down
30 changes: 15 additions & 15 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
Welcome to Python Client for Pilosa's documentation!
====================================================

Python client for `Pilosa <https://www.pilosa.com>`_ high performance distributed bitmap index.
Python client for `Pilosa <https://www.pilosa.com>`_ high performance distributed row index.


.. toctree::
Expand Down Expand Up @@ -43,37 +43,37 @@ at ``localhost:10101`` (the default):
# Create the default client
client = pilosa.Client()

# Create an Index object
myindex = pilosa.Index("myindex")
# Retrieve the schema
schema = client.schema()

# Make sure the index exists on the server
client.ensure_index(myindex)
# Create an Index object
myindex = schema.index("myindex")

# Create a Frame object
myframe = myindex.frame("myframe")
# Create a Field object
myfield = myindex.field("myfield")

# Make sure the frame exists on the server
client.ensure_frame(myframe)
# make sure the index and field exists on the server
client.sync_schema(schema)

# Send a SetBit query. PilosaError is thrown if execution of the query fails.
client.query(myframe.setbit(5, 42))
client.query(myfield.set(5, 42))

# Send a Bitmap query. PilosaError is thrown if execution of the query fails.
response = client.query(myframe.bitmap(5))
response = client.query(myfield.row(5))

# Get the result
result = response.result

# Act on the result
if result:
bits = result.bitmap.bits
print("Got bits: ", bits)
columns = result.row.columns
print("Got columns: ", columns)

# You can batch queries to improve throughput
response = client.query(
myindex.batch_query(
myframe.bitmap(5),
myframe.bitmap(10),
myfield.row(5),
myfield.row(10),
)
)
for result in response.results:
Expand Down
86 changes: 39 additions & 47 deletions docs/data-model-queries.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Data Model and Queries

## Indexes and Frames
## Indexes and Fields

*Index* and *frame*s are the main data models of Pilosa. You can check the [Pilosa documentation](https://www.pilosa.com/docs) for more detail about the data model.
*Index* and *field*s are the main data models of Pilosa. You can check the [Pilosa documentation](https://www.pilosa.com/docs) for more detail about the data model.

`schema.index` method is used to create an index object. Note that this does not create an index on the server; the index object simply defines the schema.

Expand All @@ -11,89 +11,89 @@ schema = pilosa.Schema()
repository = schema.index("repository")
```

Frames are created with a call to `index.frame` method:
Fields are created with a call to `index.field` method:

```python
stargazer = repository.frame("stargazer")
stargazer = repository.field("stargazer")
```

Similar to index objects, you can pass custom options to the `index.frame` method:
Similar to index objects, you can pass custom options to the `index.field` method:

```python
stargazer = repository.frame("stargazer", time_quantum=pilosa.TimeQuantum.YEAR_MONTH_DAY)
stargazer = repository.field("stargazer", time_quantum=pilosa.TimeQuantum.YEAR_MONTH_DAY)
```

## Queries

Once you have indexes and frame objects created, you can create queries for them. Some of the queries work on the columns; corresponding methods are attached to the index. Other queries work on rows, with related methods attached to frames.
Once you have indexes and field objects created, you can create queries for them. Some of the queries work on the columns; corresponding methods are attached to the index. Other queries work on rows, with related methods attached to fields.

For instance, `Bitmap` queries work on rows; use a frame object to create those queries:
For instance, `Row` queries work on rows; use a field object to create those queries:

```python
bitmap_query = stargazer.bitmap(1) # corresponds to PQL: Bitmap(frame='stargazer', row=1)
row_query = stargazer.row(1) # corresponds to PQL: Bitmap(field='stargazer', row=1)
```

`Union` queries work on columns; use the index object to create them:

```python
query = repository.union(bitmap_query1, bitmap_query2)
query = repository.union(row_query1, row_query2)
```

In order to increase throughput, you may want to batch queries sent to the Pilosa server. The `index.batch_query` method is used for that purpose:

```python
query = repository.batch_query(
stargazer.bitmap(1),
repository.union(stargazer.bitmap(100), stargazer.bitmap(5)))
stargazer.row(1),
repository.union(stargazer.row(100), stargazer.row(5)))
```

The recommended way of creating query objects is, using dedicated methods attached to index and frame objects. But sometimes it would be desirable to send raw queries to Pilosa. You can use the `index.raw_query` method for that. Note that, query string is not validated before sending to the server:
The recommended way of creating query objects is, using dedicated methods attached to index and field objects. But sometimes it would be desirable to send raw queries to Pilosa. You can use the `index.raw_query` method for that. Note that, query string is not validated before sending to the server:

```python
query = repository.raw_query("Bitmap(frame='stargazer', row=5)")
query = repository.raw_query("Bitmap(field='stargazer', row=5)")
```

This client supports [Range encoded fields](https://www.pilosa.com/docs/latest/query-language/#range-bsi). Read [Range Encoded Bitmaps](https://www.pilosa.com/blog/range-encoded-bitmaps/) blog post for more information about the BSI implementation of range encoding in Pilosa.

In order to use range encoded fields, a frame should be created with one or more integer fields. Each field should have their minimums and maximums set. Here's how you would do that using this library:
In order to use range encoded fields, a field should be created with one or more integer fields. Each field should have their minimums and maximums set. Here's how you would do that using this library:
```python
index = schema.index("animals")
frame = index.frame("traits", fields=[pilosa.IntField.int("captivity", min=0, max=956)])
field = index.field("traits", fields=[pilosa.IntField.int("captivity", min=0, max=956)])
client.sync_schema(schema)
```

If the frame with the necessary field already exists on the server, you don't need to create the field instance, `client.syncSchema(schema)` would load that to `schema`. You can then add some data:
If the field with the necessary field already exists on the server, you don't need to create the field instance, `client.syncSchema(schema)` would load that to `schema`. You can then add some data:
```python
# Add the captivity values to the field.
captivity = frame.field("captivity")
captivity = field.field("captivity")
data = [3, 392, 47, 956, 219, 14, 47, 504, 21, 0, 123, 318]
query = index.batch_query()
for i, x in enumerate(data):
column = i + 1
query.add(captivity.set_value(column, x))
query.add(captivity.setvalue(column, x))
client.query(query)
```

Let's write a range query:
```python
# Query for all animals with more than 100 specimens
response = client.query(captivity.gt(100))
print(response.result.bitmap.bits)
print(response.result.row.columns)

# Query for the total number of animals in captivity
response = client.query(captivity.sum())
print(response.result.value)
```

It's possible to pass a bitmap query to `sum`, so only columns where a row is set are filtered in:
It's possible to pass a row query to `sum`, so only columns where a row is set are filtered in:
```python
# Let's run a few setbit queries first
# Let's run a few set queries first
client.query(index.batch_query(
frame.setbit(42, 1),
frame.setbit(42, 6)
field.set(42, 1),
field.set(42, 6)
))
# Query for the total number of animals in captivity where row 42 is set
response = client.query(captivity.sum(frame.bitmap(42)))
response = client.query(captivity.sum(field.row(42)))
print(response.result.value)
```

Expand All @@ -103,35 +103,27 @@ Please check [Pilosa documentation](https://www.pilosa.com/docs) for PQL details

Index:

* `union(self, *bitmaps)`
* `intersect(self, *bitmaps)`
* `difference(self, *bitmaps)`
* `count(self, bitmap)`
* `union(self, *rows)`
* `intersect(self, *rows)`
* `difference(self, *rows)`
* `count(self, row)`
* `set_column_attrs(self, column_id, attrs)`
* `xor(self, *bitmaps)`
* `xor(self, *rows)`

Frame:
Field:

* `bitmap(self, row_id)`
* `setbit(self, row_id, column_id, timestamp=None)`
* `clearbit(self, row_id, column_id)`
* `topn(self, n, bitmap=None, field="", *values)`
* `row(self, row_id)`
* `set(self, row_id, column_id, timestamp=None)`
* `clear(self, row_id, column_id)`
* `topn(self, n, row=None, field="", *values)`
* `range(self, row_id, start, end)`
* `set_row_attrs(self, row_id, attrs)`
* (**deprecated**) `inverse_bitmap(self, column_id)`
* (**deprecated**) `inverse_topn(self, n, bitmap=None, field="", *values)`
* (**deprecated**) `inverse_range(self, column_id, start, end)`
* (**deprecated**) `sum(self, bitmap, field)`
* (**deprecated**) `set_field_value(self, column_id, field, value)`

Field:

* `lt(self, n)`
* `lte(self, n)`
* `gt(self, n)`
* `gte(self, n)`
* `between(self, a, b)`
* `sum(self, bitmap=None)`
* `min(self, bitmap=None)`
* `max(self, bitmap=None)`
* `set_value(self, column_id, value)`
* `sum(self, row=None)`
* `min(self, row=None)`
* `max(self, row=None)`
* `setvalue(self, column_id, value)`
12 changes: 6 additions & 6 deletions docs/imports.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

If you have large amounts of data, it is more efficient to import it to Pilosa instead of several `SetBit` queries.

This library supports importing bits in the CSV (comma separated values) format:
This library supports importing columns in the CSV (comma separated values) format:
```
ROW_ID,COLUMN_ID
```
Expand All @@ -13,12 +13,12 @@ ROW_ID,COLUMN_ID,TIMESTAMP
```

Note that, each line corresponds to a single bit and the lines end with a new line (`\n` or `\r\n`).
The target index and frame must have been created before hand.
The target index and field must have been created before hand.

Here's some sample code:
```python
import pilosa
from pilosa.imports import csv_bit_reader
from pilosa.imports import csv_column_reader

try:
# python 2.7 and 3
Expand All @@ -33,11 +33,11 @@ text = u"""
3,41,683793385
10,10485760,683793385
"""
reader = csv_bit_reader(StringIO(text))
reader = csv_column_reader(StringIO(text))
client = pilosa.Client()
schema = client.schema()
index = schema.index("sample-index")
frame = index.frame("sample-frame", time_quantum=pilosa.TimeQuantum.YEAR_MONTH_DAY_HOUR)
field = index.field("sample-field", time_quantum=pilosa.TimeQuantum.YEAR_MONTH_DAY_HOUR)
client.sync_schema(schema)
client.import_frame(frame, reader)
client.import_field(field, reader)
```
Loading