Skip to content

Commit

Permalink
Merge pull request #34 from citrusvanilla/issue-32--compact_storage
Browse files Browse the repository at this point in the history
Issue 32, Issue 33
  • Loading branch information
citrusvanilla committed Mar 21, 2023
2 parents 5e75868 + 5002d4c commit f5fc7cc
Show file tree
Hide file tree
Showing 19 changed files with 330 additions and 93 deletions.
10 changes: 5 additions & 5 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }} on platform ${{ matrix.platform }}
uses: actions/setup-python@v3
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Assert no dependencies for TinyFlux
Expand All @@ -21,14 +21,14 @@ jobs:
pip install --upgrade pip
pip install -r requirements.txt
- name: Check code formatting
run: black --check tinyflux/ tests/
run: black --check tinyflux/ tests/ examples/
- name: Check code style
run: flake8 tinyflux/ tests/
run: flake8 tinyflux/ tests/ examples/
- name: Check static typing
run: mypy tinyflux/ tests/
run: mypy tinyflux/ tests/ examples/
- name: Run tests
run: coverage run --source tinyflux/ -m pytest && coverage report -m
- name: Upload Coverage to Codecov
uses: codecov/codecov-action@v2
uses: codecov/codecov-action@v3
with:
token: ${{ secrets.CODECOV_TOKEN }}
13 changes: 12 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,17 @@ Quick Links
- `Contributing`_


Recent Updates
**************

v0.3.0 (2023-3-21)
^^^^^^^^^^^^^^^^^^

* Tag and field keys can be compacted when using CSVStorage, saving potentially many bytes per Point (resolves issue #32).
* Fixed bug that causes tag values of '' to be serialized as "_none" (resolves issue #33).



Installation
************

Expand Down Expand Up @@ -78,7 +89,7 @@ Writing to TinyFlux
... tags={"room": "bedroom"},
... fields={"temp": 72.0}
... )
>>> db.insert(p)
>>> db.insert(p, compact_key_prefixes=True)
Querying TinyFlux
Expand Down
9 changes: 8 additions & 1 deletion docs/source/changelog.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,17 @@
Changelog
=========

v0.3.0 (2023-3-21)
^^^^^^^^^^^^^^^^^^

* Tag and field keys can be compacted when using CSVStorage, saving potentially many bytes per Point (resolves issue #32).
* Fixed bug that causes tag values of '' to be serialized as "_none" (resolves issue #33).


v0.2.6 (2023-3-9)
^^^^^^^^^^^^^^^^^^

* TinyFlux is not PEP 561 compliant (resolves issue #31).
* TinyFlux is now PEP 561 compliant (resolves issue #31).

v0.2.4 (2023-2-15)
^^^^^^^^^^^^^^^^^^
Expand Down
4 changes: 2 additions & 2 deletions docs/source/contributing-tooling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ TinyFlux conforms to `PEP 8`_ for style, and `Google Python Style Guide`_ for do
Formatting
^^^^^^^^^^

TinyFlux uses standard configuration black_ for code formatting, with an enforced line-length of 79 characters.
TinyFlux uses standard configuration black_ for code formatting, with an enforced line-length of 80 characters.

After installing the project requirements:

Expand All @@ -43,7 +43,7 @@ After installing the project requirements:
Style
^^^^^

TinyFlux uses standard configuration flake8_ for style enforcement, with an enforced line-length of 79 characters.
TinyFlux uses standard configuration flake8_ for style enforcement, with an enforced line-length of 80 characters.

After installing the project requirements:

Expand Down
4 changes: 2 additions & 2 deletions docs/source/data-elements.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ On disk:
Tag Key
^^^^^^^

A tag key is the identifier for a :ref:`tag value` in a :ref:`tag set`. On disk, a tag key is prepended with ``_tag_``.
A tag key is the identifier for a :ref:`tag value` in a :ref:`tag set`. On disk, a tag key is prefixed with ``_tag_`` (default) or ``t_`` (compact).

In the following, the tag key is ``city``.

Expand Down Expand Up @@ -149,7 +149,7 @@ On disk:
Field Key
^^^^^^^^^

A field key is the identifier for a :ref:`field value` in a :ref:`field set`. On disk, a field key is prepended with ``_field_``.
A field key is the identifier for a :ref:`field value` in a :ref:`field set`. On disk, a field key is prefixed with ``_field_`` (default) or ``f_`` (compact).

In the following, the field key is ``num_restaurants``.

Expand Down
12 changes: 12 additions & 0 deletions docs/source/tips.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,18 @@ If ``auto-index`` is set to ``True`` (the default setting), then the next read w
If possible, Points should be inserted into TinyFlux in time-order.


Saving Space
^^^^^^^^^^^^

If you are using a text-based storage layer (such as the default ``CSVStorage``) keep in mind that every character requires usually one (but up to four) bytes of memory for storage in a UTF-8 encoding. To save space, here are a few tips:

• Keep measurement names, tag keys, and field keys short and concise.
• Precision matters! Even more so with text-backed storage. ``1.0000`` requires twice as much space to store compared to ``1.0``, and 5x more space than ``1``.
• When inserting points into TinyFlux, make sure to set the ``compact_key_prefixes`` option to ``True`` (e.g. ``db.insert(my_point, compact_key_prefixes=True)``). This saves three bytes per tag key/value pair and five bytes per field key/value pair.

If your dataset is approaching 1 GB in size, keep reading.


Dealing with Growing Datasets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
18 changes: 11 additions & 7 deletions docs/source/writing-data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@ Writing Data

The standard method for inserting a new data point is through the ``db.insert(...)`` method. To insert more than one Point at the same time, use the ``db.insert_multiple([...])`` method, which accepts a ``list`` of points. This might be useful when creating a TinyFlux database from a CSV of existing observations.

.. hint::

To save space in text-based storage instances (including ``CSVStorage``), set the ``compact_key_prefixes`` argument to ``true`` in the ``.insert()`` and ``.insert_multiple()`` methods. This will result in the tag and field keys having a shorter ``t_`` and ``f_`` prefix in front of them in the storage layer rather than the default ``__tag__`` and ``__field__`` prefixes. Regardless of your choice, TinyFlux will handle Points with either prefix in the database.

.. note::

**TinyFlux vs. TinyDB Alert!**
Expand All @@ -22,10 +26,10 @@ Example:

To recap, these are the two methods supporting the insertion of data.

+------------------------------------------+-----------------------------------------------------+
| **Methods** |
+------------------------------------------+-----------------------------------------------------+
| ``db.insert(point)`` | Insert one Point into the database. |
+------------------------------------------+-----------------------------------------------------+
| ``db.insert_multiple([point, ...])`` | Insert multiple Points into the database. |
+------------------------------------------+-----------------------------------------------------+
+------------------------------------------------------------------+-----------------------------------------------------+
| **Methods** |
+------------------------------------------------------------------+-----------------------------------------------------+
| ``db.insert(point, compact_key_prefixes=False)`` | Insert one Point into the database. |
+------------------------------------------------------------------+-----------------------------------------------------+
| ``db.insert_multiple([point, ...], compact_key_prefixes=False)`` | Insert multiple Points into the database. |
+------------------------------------------------------------------+-----------------------------------------------------+
2 changes: 1 addition & 1 deletion examples/3_iot_datastore_with_mqtt.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
db = TinyFlux(TINYFLUX_DB)

# Interthread queue.
q = Queue()
q: Queue = Queue()

# Init but do not set a threading exit event for graceful exit.
exit_event = threading.Event()
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[tool.black]
line-length = 79
line-length = 80
exclude = '(docs|\.ipynb$)'

[build-system]
Expand Down
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ pytest
pytest-cov
sphinx
sphinx_autodoc_typehints
sphinx_rtd_theme
sphinx_rtd_theme
types-paho-mqtt
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[flake8]
max-line-length = 79
max-line-length = 80
exclude =
.git,
__pycache__,
Expand Down
76 changes: 76 additions & 0 deletions tests/test_point.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,12 @@ def test_tags():
{
"key1": "value1",
},
{
"key1": "",
},
{
"key1": None,
},
{"key2": "value2", "key3": "value3"},
]

Expand Down Expand Up @@ -414,3 +420,73 @@ def test_serialize_zero_values():
and new_p.fields["b"] == 0.0
and new_p.fields["c"] is None
)


def test_serialize_none_values():
"""Test serializing/deserializing None values."""
p = Point(fields={"a": None}, tags={"a": None})
s = p._serialize_to_list()
assert s[3] == p._none_str and s[5] == p._none_str

new_p = Point()._deserialize_from_list(s)

assert p == new_p

assert new_p.fields["a"] is None and new_p.tags["a"] is None


def test_serialize_empty_strings():
"""Test serializing/deserializing empty string tag values."""
p = Point(tags={"a": ""})
s = p._serialize_to_list()
assert s[3] == ""

new_p = Point()._deserialize_from_list(s)

assert p == new_p

assert new_p.tags["a"] == ""


def test_compact_tag_keys():
"""Test compact tag keys in CSV Storage."""
p = Point(fields={"a": 0, "b": 0.0, "c": None})
s = p._serialize_to_list(compact_key_prefixes=True)
s1 = p._serialize_to_list(compact_key_prefixes=False)

assert all(s[i].startswith(p._compact_field_key_prefix) for i in (2, 4, 6))

new_p = Point()._deserialize_from_list(s)
new_p1 = Point()._deserialize_from_list(s1)

assert p == new_p == new_p1

assert all(i in new_p.fields for i in ("a", "b", "c"))

assert (
new_p.fields["a"] == 0
and new_p.fields["b"] == 0.0
and new_p.fields["c"] is None
)


def test_compact_field_keys():
"""Test compact tag keys in CSV Storage."""
p = Point(tags={"a": "aa", "b": "bb", "c": None})
s = p._serialize_to_list(compact_key_prefixes=True)
s1 = p._serialize_to_list(compact_key_prefixes=False)

assert all(s[i].startswith(p._compact_tag_key_prefix) for i in (2, 4, 6))

new_p = Point()._deserialize_from_list(s)
new_p1 = Point()._deserialize_from_list(s1)

assert p == new_p == new_p1

assert all(i in new_p.tags for i in ("a", "b", "c"))

assert (
new_p.tags["a"] == "aa"
and new_p.tags["b"] == "bb"
and new_p.tags["c"] is None
)
20 changes: 5 additions & 15 deletions tests/test_queries.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,19 +59,11 @@ def test_repr():
c5 = q2 | q4
c6 = ~q3

rc1 = (
f"CompoundQuery({c1.operator.__name__}, 'SimpleQuery', 'SimpleQuery')"
)
rc2 = (
f"CompoundQuery({c2.operator.__name__}, 'SimpleQuery', 'SimpleQuery')"
)
rc1 = f"CompoundQuery({c1.operator.__name__}, 'SimpleQuery', 'SimpleQuery')"
rc2 = f"CompoundQuery({c2.operator.__name__}, 'SimpleQuery', 'SimpleQuery')"
rc3 = f"CompoundQuery({c3.operator.__name__}, 'SimpleQuery')"
rc4 = (
f"CompoundQuery({c4.operator.__name__}, 'SimpleQuery', 'SimpleQuery')"
)
rc5 = (
f"CompoundQuery({c5.operator.__name__}, 'SimpleQuery', 'SimpleQuery')"
)
rc4 = f"CompoundQuery({c4.operator.__name__}, 'SimpleQuery', 'SimpleQuery')"
rc5 = f"CompoundQuery({c5.operator.__name__}, 'SimpleQuery', 'SimpleQuery')"
rc6 = "CompoundQuery(not_, 'SimpleQuery')"

assert repr(c1) == rc1
Expand Down Expand Up @@ -854,9 +846,7 @@ def test_basequery():
):
q & (TagQuery().a == "b")

with pytest.raises(
RuntimeError, match="Cannot logical-OR an empty query."
):
with pytest.raises(RuntimeError, match="Cannot logical-OR an empty query."):
q | (TagQuery().a == "b")

with pytest.raises(
Expand Down
28 changes: 27 additions & 1 deletion tests/test_storages.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import random
import re
import tempfile
from typing import Any

import pytest

Expand Down Expand Up @@ -238,7 +239,9 @@ def _deserialize_storage_item(self, item):
"""Deserialize storage."""
...

def _serialize_point(self, Point) -> None:
def _serialize_point(
self, point: Point, *args: Any, **kwargs: Any
) -> None:
"""Serialize Point."""
...

Expand Down Expand Up @@ -639,3 +642,26 @@ def test_temporary_storage(tmpdir):
# Exception should be thrown if temp storage not initialized.
with pytest.raises(IOError):
storage.append([], temporary=True)


def test_compact_key_prefixes(tmpdir):
"""Test compact keys option."""
# Memory.
m = MemoryStorage()
p = Point(tags={"a": "aa"}, fields={"a": 1})
assert (
m._serialize_point(p, compact_key_prefixes=True)
== m._serialize_point(p, compact_key_prefixes=False)
== m._serialize_point(p)
== p
)

# CSV.
path = os.path.join(tmpdir, "test.csv")
m = CSVStorage(path)
assert m._serialize_point(
p, compact_key_prefixes=True
) == p._serialize_to_list(compact_key_prefixes=True)
assert m._serialize_point(
p, compact_key_prefixes=False
) == p._serialize_to_list(compact_key_prefixes=False)
Loading

0 comments on commit f5fc7cc

Please sign in to comment.