Skip to content
This repository was archived by the owner on Sep 28, 2022. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

* **v1.0.0** (2018-06-28)
* Compatible with Pilosa 1.0.
* Added `shards` option to `client.query` which allows limiting a query to be run for the specified shards.
* Removed all deprecated code.
* Following terminology was changed:
* frame to field
Expand Down
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Python Client for Pilosa

<a href="https://github.com/pilosa"><img src="https://img.shields.io/badge/pilosa-0.9-blue.svg"></a>
<a href="https://github.com/pilosa"><img src="https://img.shields.io/badge/pilosa-1.0-blue.svg"></a>
<a href="https://pypi.python.org/pypi/pilosa"><img src="https://img.shields.io/pypi/v/pilosa.svg?maxAge=2592&updated=2"></a>
<a href="http://pilosa.readthedocs.io/en/latest/?badge=latest"><img src="https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat"></A>
<a href="https://travis-ci.org/pilosa/python-pilosa"><img src="https://api.travis-ci.org/pilosa/python-pilosa.svg?branch=master"></a>
Expand All @@ -12,7 +12,7 @@ Python client for Pilosa high performance distributed row index.

## What's New?

See: [CHANGELOG](CHANGELOG.md)
See: [CHANGELOG](https://github.com/pilosa/python-pilosa/blob/master/CHANGELOG.md)

## Requirements

Expand Down Expand Up @@ -50,10 +50,10 @@ myfield = myindex.field("myfield")
# make sure the index and field exists on the server
client.sync_schema(schema)

# Send a SetBit query. PilosaError is thrown if execution of the query fails.
# Send a Set query. PilosaError is thrown if execution of the query fails.
client.query(myfield.set(5, 42))

# Send a Bitmap query. PilosaError is thrown if execution of the query fails.
# Send a Row query. PilosaError is thrown if execution of the query fails.
response = client.query(myfield.row(5))

# Get the result
Expand All @@ -73,27 +73,27 @@ response = client.query(
)
for result in response.results:
# Act on the result
print(result)
print(result.row.columns)
```

## Documentation

### Data Model and Queries

See: [Data Model and Queries](docs/data-model-queries.md)
See: [Data Model and Queries](https://github.com/pilosa/python-pilosa/blob/master/docs/data-model-queries.md)

### Executing Queries

See: [Server Interaction](docs/server-interaction.md)
See: [Server Interaction](https://github.com/pilosa/python-pilosa/blob/master/docs/server-interaction.md)

### Importing and Exporting Data

See: [Importing and Exporting Data](docs/imports.md)
See: [Importing and Exporting Data](https://github.com/pilosa/python-pilosa/blob/master/docs/imports.md)

## Contributing

See: [CONTRIBUTING](CONTRIBUTING.md)
See: [CONTRIBUTING](https://github.com/pilosa/python-pilosa/blob/master/CONTRIBUTING.md)

## License

See: [LICENSE](LICENSE)
See: [LICENSE](https://github.com/pilosa/python-pilosa/blob/master/LICENSE)
22 changes: 11 additions & 11 deletions docs/data-model-queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Fields are created with a call to `index.field` method:
stargazer = repository.field("stargazer")
```

Similar to index objects, you can pass custom options to the `index.field` method:
You can pass custom options to the `index.field` method:

```python
stargazer = repository.field("stargazer", time_quantum=pilosa.TimeQuantum.YEAR_MONTH_DAY)
Expand All @@ -30,7 +30,7 @@ Once you have indexes and field objects created, you can create queries for them
For instance, `Row` queries work on rows; use a field object to create those queries:

```python
row_query = stargazer.row(1) # corresponds to PQL: Bitmap(field='stargazer', row=1)
row_query = stargazer.row(1) # corresponds to PQL: Row(stargazer=1)
```

`Union` queries work on columns; use the index object to create them:
Expand All @@ -47,41 +47,41 @@ query = repository.batch_query(
repository.union(stargazer.row(100), stargazer.row(5)))
```

The recommended way of creating query objects is, using dedicated methods attached to index and field objects. But sometimes it would be desirable to send raw queries to Pilosa. You can use the `index.raw_query` method for that. Note that, query string is not validated before sending to the server:
The recommended way of creating query objects is, using dedicated methods attached to index and field objects. But sometimes it would be desirable to send raw queries to Pilosa. You can use the `index.raw_query` method for that. Note that, the query string is not validated before sending to the server:

```python
query = repository.raw_query("Bitmap(field='stargazer', row=5)")
query = repository.raw_query("Row(stargazer=5)")
```

This client supports [Range encoded fields](https://www.pilosa.com/docs/latest/query-language/#range-bsi). Read [Range Encoded Bitmaps](https://www.pilosa.com/blog/range-encoded-bitmaps/) blog post for more information about the BSI implementation of range encoding in Pilosa.

In order to use range encoded fields, a field should be created with one or more integer fields. Each field should have their minimums and maximums set. Here's how you would do that using this library:
```python
index = schema.index("animals")
field = index.field("traits", fields=[pilosa.IntField.int("captivity", min=0, max=956)])
traits = index.field("traits", int_min=0, int_max=956)
captivity = index.field("captivity")
client.sync_schema(schema)
```

If the field with the necessary field already exists on the server, you don't need to create the field instance, `client.syncSchema(schema)` would load that to `schema`. You can then add some data:
If the field with the necessary field already exists on the server, you don't need to create the field instance, `client.sync_schema(schema)` would load that to `schema`. You can then add some data:
```python
# Add the captivity values to the field.
captivity = field.field("captivity")
data = [3, 392, 47, 956, 219, 14, 47, 504, 21, 0, 123, 318]
query = index.batch_query()
for i, x in enumerate(data):
column = i + 1
query.add(captivity.setvalue(column, x))
query.add(traits.setvalue(column, x))
client.query(query)
```

Let's write a range query:
```python
# Query for all animals with more than 100 specimens
response = client.query(captivity.gt(100))
response = client.query(traits.gt(100))
print(response.result.row.columns)

# Query for the total number of animals in captivity
response = client.query(captivity.sum())
response = client.query(traits.sum())
print(response.result.value)
```

Expand All @@ -93,7 +93,7 @@ client.query(index.batch_query(
field.set(42, 6)
))
# Query for the total number of animals in captivity where row 42 is set
response = client.query(captivity.sum(field.row(42)))
response = client.query(traits.sum(captivity.row(42)))
print(response.result.value)
```

Expand Down
2 changes: 1 addition & 1 deletion docs/imports.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Importing Data

If you have large amounts of data, it is more efficient to import it to Pilosa instead of several `SetBit` queries.
If you have large amounts of data, it is more efficient to import it to Pilosa instead of several `Set` queries.

This library supports importing columns in the CSV (comma separated values) format:
```
Expand Down
10 changes: 5 additions & 5 deletions docs/server-interaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ client = pilosa.Client()
To use a a custom server address, pass the address in the first argument:

```python
client = pilosa.Client("http://db1.pilosa.com:15000")
client = pilosa.Client("http://node1.pilosa.com:15000")
```

If you are running a cluster of Pilosa servers, you can create a `pilosa.Cluster` object that keeps addresses of those servers:
Expand Down Expand Up @@ -87,11 +87,11 @@ You can send queries to a Pilosa server using the `query` method of client objec
response = client.query(field.row(5))
```

`query` method accepts optional `columns` argument:
`query` method accepts optional arguments, including `column_attrs`, `exclude_columns`, `exclude_attrs` and `shards`.

```python
response = client.query(field.row(5),
columns=True # return column data in the response
column_attrs=True # return column data in the response
)
```

Expand All @@ -114,7 +114,7 @@ for result in response.results:
# act on the result
```

Similarly, a `QueryResponse` object may include a number of column objects, if `columns=True` query option was used:
Similarly, a `QueryResponse` object may include a number of column objects, if `column_attrs=True` query option was used:

```python
# check that there's a column object and act on it
Expand All @@ -133,7 +133,7 @@ for column in response.columns:
* `count_items` property to retrieve column count per row ID entries returned from `topn` queries,
* `count` attribute to retrieve the number of rows per the given row ID returned from `count` queries.
* `value` attribute to retrieve the result of `Min`, `Max` or `Sum` queries.
* `changed` attribute shows whether a `SetBit` or `ClearBit` query changed a bit.
* `changed` attribute shows whether a `Set` or `Clear` query changed a bit.

```python
result = response.result
Expand Down
15 changes: 14 additions & 1 deletion integration_tests/test_client_it.py
Original file line number Diff line number Diff line change
Expand Up @@ -355,13 +355,26 @@ def test_exclude_attrs_columns(self):
def test_http_request(self):
self.get_client().http_request("GET", "/status")

def test_shards(self):
shard_width = 1048576
client = self.get_client()
client.query(self.col_index.batch_query(
self.field.set(1, 100),
self.field.set(1, shard_width),
self.field.set(1, shard_width*3),
))

response = client.query(self.field.row(1), shards=[0,3])
self.assertEquals(2, len(response.result.row.columns))
self.assertEquals(100, response.result.row.columns[0])
self.assertEquals(shard_width*3, response.result.row.columns[1])

def test_create_index_fail(self):
server = MockServer(404)
with server:
client = Client(server.uri)
self.assertRaises(PilosaServerError, client.create_index, self.index)


@classmethod
def random_index_name(cls):
cls.counter += 1
Expand Down
9 changes: 6 additions & 3 deletions pilosa/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,17 +104,18 @@ def __init__(self, cluster_or_uri=None, connect_timeout=30000, socket_timeout=30
self.__client = None
self.logger = logging.getLogger("pilosa")

def query(self, query, column_attrs=False, exclude_columns=False, exclude_attrs=False):
def query(self, query, column_attrs=False, exclude_columns=False, exclude_attrs=False, shards=None):
"""Runs the given query against the server with the given options.

:param pilosa.PqlQuery query: a PqlQuery object with a non-null index
:param bool column_attrs: Enables returning column data from row queries
:param bool exclude_columns: Disables returning columns from row queries
:param bool exclude_attrs: Disables returning attributes from row queries
:param list(int) slices: Returns data from a subset of slices
:return: Pilosa response
:rtype: pilosa.Response
"""
request = _QueryRequest(query.serialize(), column_attrs=column_attrs, exclude_columns=exclude_columns, exclude_row_attrs=exclude_attrs)
request = _QueryRequest(query.serialize(), column_attrs=column_attrs, exclude_columns=exclude_columns, exclude_row_attrs=exclude_attrs, shards=shards)
path = "/index/%s/query" % query.index.name
try:
headers = {
Expand Down Expand Up @@ -499,18 +500,20 @@ def _reset(self):

class _QueryRequest:

def __init__(self, query, column_attrs=False, exclude_columns=False, exclude_row_attrs=False):
def __init__(self, query, column_attrs=False, exclude_columns=False, exclude_row_attrs=False, shards=None):
self.query = query
self.column_attrs = column_attrs
self.exclude_columns = exclude_columns
self.exclude_row_attrs = exclude_row_attrs
self.shards = shards or []

def to_protobuf(self, return_bytearray=_IS_PY2):
qr = internal.QueryRequest()
qr.Query = self.query
qr.ColumnAttrs = self.column_attrs
qr.ExcludeColumns = self.exclude_columns
qr.ExcludeRowAttrs = self.exclude_row_attrs
qr.Shards.extend(self.shards)
if return_bytearray:
return bytearray(qr.SerializeToString())
return qr.SerializeToString()
Expand Down