diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 27b3437..72cfe91 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -20,6 +20,8 @@ jobs: run: | pip install --upgrade pip pip install -r requirements.txt + - name: Check README + run: rstcheck README.rst - name: Check code formatting run: black --check tinyflux/ tests/ examples/ - name: Check code style diff --git a/README.rst b/README.rst index f8594fb..8bd720b 100644 --- a/README.rst +++ b/README.rst @@ -20,7 +20,7 @@ Recent Updates ************** v0.3.0 (2023-3-21) -^^^^^^^^^^^^^^^^^^ +================== * Tag and field keys can be compacted when using CSVStorage, saving potentially many bytes per Point (resolves issue #32). * Fixed bug that causes tag values of '' to be serialized as "_none" (resolves issue #33). @@ -126,6 +126,11 @@ The `examples `_ 2. `Local Analytics Workflow with a TinyFlux Database `_ 3. `TinyFlux as a MQTT Datastore for IoT Devices `_ +Tips +**** + +Checkout some tips for working with TinyFlux `here `_. + TinyFlux Across the Internet **************************** @@ -141,7 +146,7 @@ Contributing New ideas, new developer tools, improvements, and bugfixes are always welcome. Follow these guidelines before getting started: -1. Make sure to read `Getting Started `_ and the `Contributing `_ section of the documentation. +1. Make sure to read `Getting Started `_ and the `Contributing Tooling and Conventions `_ section of the documentation. 2. Check GitHub for `existing open issues `_, `open a new issue `_ or `start a new discussion `_. 3. To get started on a pull request, fork the repository on GitHub, create a new branch, and make updates. 4. Write unit tests, ensure the code is 100% covered, update documentation where necessary, and format and style the code correctly. diff --git a/docs/source/conf.py b/docs/source/conf.py index d6151a3..429c00c 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -19,7 +19,7 @@ # -- Project information ----------------------------------------------------- project = "TinyFlux" -copyright = "2022, Justin Fung" +copyright = "2023, Justin Fung" author = "Justin Fung" # The full version, including alpha/beta/rc tags diff --git a/docs/source/index.rst b/docs/source/index.rst index 07258dc..0dd4252 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -1,5 +1,5 @@ -Get started with -================ +Getting started with +==================== .. image:: https://github.com/citrusvanilla/tinyflux/blob/master/artwork/tinyfluxdb-dark.png?raw=true#gh-light-mode-only :width: 500px diff --git a/docs/source/time.rst b/docs/source/time.rst index 0671931..22710cc 100644 --- a/docs/source/time.rst +++ b/docs/source/time.rst @@ -15,86 +15,86 @@ To illustrate the way time is handled in TinyFlux, below are the five ways time 1. ``time`` is not set by the user when the Point is initialized so its default value is ``None``. AFTER it is inserted into TinyFlux, it is assigned a UTC timestamp corresponding to the time of insertion. ->>> from tinyflux import Point, TinyFlux ->>> db = TinyFlux("my_db.csv") # an empty db ->>> p = Point() ->>> p.time is None -True ->>> db.insert(p) ->>> p.time -datetime.datetime(2021, 10, 30, 13, 53, 552872, tzinfo=datetime.timezone.utc) + >>> from tinyflux import Point, TinyFlux + >>> db = TinyFlux("my_db.csv") # an empty db + >>> p = Point() + >>> p.time is None + True + >>> db.insert(p) + >>> p.time + datetime.datetime(2021, 10, 30, 13, 53, 552872, tzinfo=datetime.timezone.utc) 2. ``time`` is set with a value, but it is not a ``datetime`` object. TinyFlux raises an exception. ->>> Point(time="2022-01-01") -ValueError: Time must be datetime object. + >>> Point(time="2022-01-01") + ValueError: Time must be datetime object. 3. ``time`` is set with a ``datetime`` object that is "timezone-naive". TinyFlux considers this time to be local to the timezone of the computer that is running TinyFlux and will convert this time to UTC using the ``astimezone`` attribute of the ``datetime`` module upon insertion. This will lead to confusion down the road if TinyFlux is running on a remote computer, or the user was annotating data for points corresponding to places in other timezones. ->>> from datetime import datetime ->>> # Example: Our computer is in Californa, but we are working with a dataset of ->>> # air quality measurements for Beijing, China. ->>> # Here, AQI was measured at 1pm local time in Beijing on Aug 28, 2021. ->>> p = Point( -... time=datetime(2021, 8, 28, 13, 0), # 1pm, datetime-naive -... tags={"city": "beijing"}, -... fields={"aqi": 118} -... ) ->>> p.time -datetime.datetime(2021, 8, 28, 13, 0) ->>> # Insert the point into the database. ->>> db.insert(p) ->>> # The point is cast to UTC, assuming the time was local to California, not Beijing. ->>> p.time -datetime.datetime(2021, 8, 28, 20, 0, tzinfo=datetime.timezone.utc) + >>> from datetime import datetime + >>> # Example: Our computer is in Californa, but we are working with a dataset of + >>> # air quality measurements for Beijing, China. + >>> # Here, AQI was measured at 1pm local time in Beijing on Aug 28, 2021. + >>> p = Point( + ... time=datetime(2021, 8, 28, 13, 0), # 1pm, datetime-naive + ... tags={"city": "beijing"}, + ... fields={"aqi": 118} + ... ) + >>> p.time + datetime.datetime(2021, 8, 28, 13, 0) + >>> # Insert the point into the database. + >>> db.insert(p) + >>> # The point is cast to UTC, assuming the time was local to California, not Beijing. + >>> p.time + datetime.datetime(2021, 8, 28, 20, 0, tzinfo=datetime.timezone.utc) 4. ``time`` is set with a ``datetime`` object that is timezone-aware but the timezone is not UTC- TinyFlux casts the time to UTC for internal storage and retrieval and the original timezone is lost (it is up to the user to cast the timezone again after retrieval). ->>> from tinyflux import Point, TinyFlux ->>> from datetime import datetime ->>> from zoneinfo import ZoneInfo ->>> db = TinyFlux("my_db.csv") # an empty db ->>> la_point = Point( -... time=datetime(2000, 1, 1, tzinfo=ZoneInfo("US/Pacific")), -... tags={"city": "Los Angeles"} -... fields={"temp_f": 54.0} -... ) ->>> ny_point = Point( -... time=datetime(2000, 1, 1, tzinfo=ZoneInfo("US/Eastern")), -... tags={"city": "New York City"} -... fields={"temp_f": 15.0} -... ) ->>> db.insert_multiple([la_point, ny_point]) ->>> # Notice the time attributes no longer carry the timezone information: ->>> la_point.time -datetime.datetime(2000, 1, 1, 8, 0, tzinfo=datetime.timezone.utc) ->>> ny_point.time -datetime.datetime(2000, 1, 1, 5, 0, tzinfo=datetime.timezone.utc) - -.. hint:: - - If you need to keep the original, non-UTC timezone along with the dataset, consider adding a ``tag`` to your point indicating the timezone, for easier conversion after retrieval. TinyFlux will not assume nor attempt to store the timezone of your data for you. + >>> from tinyflux import Point, TinyFlux + >>> from datetime import datetime + >>> from zoneinfo import ZoneInfo + >>> db = TinyFlux("my_db.csv") # an empty db + >>> la_point = Point( + ... time=datetime(2000, 1, 1, tzinfo=ZoneInfo("US/Pacific")), + ... tags={"city": "Los Angeles"} + ... fields={"temp_f": 54.0} + ... ) + >>> ny_point = Point( + ... time=datetime(2000, 1, 1, tzinfo=ZoneInfo("US/Eastern")), + ... tags={"city": "New York City"} + ... fields={"temp_f": 15.0} + ... ) + >>> db.insert_multiple([la_point, ny_point]) + >>> # Notice the time attributes no longer carry the timezone information: + >>> la_point.time + datetime.datetime(2000, 1, 1, 8, 0, tzinfo=datetime.timezone.utc) + >>> ny_point.time + datetime.datetime(2000, 1, 1, 5, 0, tzinfo=datetime.timezone.utc) + + .. hint:: + + If you need to keep the original, non-UTC timezone along with the dataset, consider adding a ``tag`` to your point indicating the timezone, for easier conversion after retrieval. TinyFlux will not assume nor attempt to store the timezone of your data for you. 5. ``time`` is set with a ``datetime`` object that is timezone-aware and the timezone is UTC. This is the easiest way to handle time. If needed, infomation about the timezone is stored in a tag. ->>> from datetime import datetime, timezone ->>> from tinyflux import TinyFlux, Point ->>> from zoneinfo import ZoneInfo ->>> # Time now is 10am in Los Angeles, which is 6pm UTC: ->>> t = datetime.now(timezone.utc) ->>> t -datetime.datetime(2022, 11, 9, 18, 0, 0, tzinfo=datetime.timezone.utc) ->>> # Store the time in UTC, but keep the timezone as a tag for later use. ->>> p = Point( -... time=t, -... tags={"room": "bedroom", "timezone": "America/Los_Angeles"}, -... fields={"temp": 72.0} -... ) ->>> # Time is still UTC: ->>> p.time -datetime.datetime(2022, 11, 9, 18, 0, 0, tzinfo=datetime.timezone.utc) ->>> # To cast back to local time in Los Angeles: ->>> la_timezone = ZoneInfo(p.tags["timezone"]) ->>> p.time.astimezone(la_timezone) -datetime.datetime(2022, 11, 9, 10, 0, tzinfo=zoneinfo.ZoneInfo(key='America/Los_Angeles')) \ No newline at end of file + >>> from datetime import datetime, timezone + >>> from tinyflux import TinyFlux, Point + >>> from zoneinfo import ZoneInfo + >>> # Time now is 10am in Los Angeles, which is 6pm UTC: + >>> t = datetime.now(timezone.utc) + >>> t + datetime.datetime(2022, 11, 9, 18, 0, 0, tzinfo=datetime.timezone.utc) + >>> # Store the time in UTC, but keep the timezone as a tag for later use. + >>> p = Point( + ... time=t, + ... tags={"room": "bedroom", "timezone": "America/Los_Angeles"}, + ... fields={"temp": 72.0} + ... ) + >>> # Time is still UTC: + >>> p.time + datetime.datetime(2022, 11, 9, 18, 0, 0, tzinfo=datetime.timezone.utc) + >>> # To cast back to local time in Los Angeles: + >>> la_timezone = ZoneInfo(p.tags["timezone"]) + >>> p.time.astimezone(la_timezone) + datetime.datetime(2022, 11, 9, 10, 0, tzinfo=zoneinfo.ZoneInfo(key='America/Los_Angeles')) \ No newline at end of file diff --git a/docs/source/tips.rst b/docs/source/tips.rst index c95188b..7a7808f 100644 --- a/docs/source/tips.rst +++ b/docs/source/tips.rst @@ -3,44 +3,6 @@ Tips for TinyFlux Below are some tips to get the most out of TinyFlux. -Optimizing Queries -^^^^^^^^^^^^^^^^^^ - -Unlike TinyDB, TinyFlux never pulls in the entirety of its data into memory (unless the ``.all()`` method is called). This has the benefit of reducing the memory footprint of the database, but means that database operations are usually I/O bound. By using an index, TinyFlux is able to construct a matching set of items from the storage layer without actually reading any of those items. For database operations that return Points, TinyFlux iterates over the storage, collects the items that belong in the set, deserializes them, and finally returns them to the caller. - -This utlimately means that the smaller the set of matches, the less I/O TinyFlux must perform. - -.. hint:: - - Queries that return smaller sets of matches perform best. - -.. warning:: - - Resist the urge to build your own time range query using the ``.map()`` query method. This will result in slow queries. Instead, use two ``TimeQuery`` instances combined with the ``&`` or ``|`` operator. - - -Keeping The Index Intact -^^^^^^^^^^^^^^^^^^^^^^^^ - -TinyFlux must build an index when it is initialized as it currently does not save the index upon closing. If the workflow for the session is read-only, then the index state will never be modified. If, however, a TinyFlux session consists of a mix of writes and reads, then the index will become invalid if at any time, a Point is inserted out of time order. - ->>> from tinyflux import TinyFlux, Point ->>> from datetime import datetime, timedelta, timezone ->>> db = TinyFlux("my_db.csv") ->>> t = datetime.now(timezone.utc) # current time ->>> db.insert(Point(time=t)) ->>> db.index.valid -True ->>> db.insert(Point(time=t - timedelta(hours=1))) # a Point out of time order ->>> db.index.valid -False - -If ``auto-index`` is set to ``True`` (the default setting), then the next read will rebuild the index, which may just seem like a very slow query. For smaller datasets, reindexing is usually not noticeable. - -.. hint:: - - If possible, Points should be inserted into TinyFlux in time-order. - Saving Space ^^^^^^^^^^^^ @@ -82,3 +44,42 @@ For example, if a TinyFlux database currently holds Points for two separate meas .. hint:: When queries and indexes slow down a workflow, consider creating separate databases. Or, just migrate to InfluxDB. + + +Optimizing Queries +^^^^^^^^^^^^^^^^^^ + +Unlike TinyDB, TinyFlux never pulls in the entirety of its data into memory (unless the ``.all()`` method is called). This has the benefit of reducing the memory footprint of the database, but means that database operations are usually I/O bound. By using an index, TinyFlux is able to construct a matching set of items from the storage layer without actually reading any of those items. For database operations that return Points, TinyFlux iterates over the storage, collects the items that belong in the set, deserializes them, and finally returns them to the caller. + +This utlimately means that the smaller the set of matches, the less I/O TinyFlux must perform. + +.. hint:: + + Queries that return smaller sets of matches perform best. + +.. warning:: + + Resist the urge to build your own time range query using the ``.map()`` query method. This will result in slow queries. Instead, use two ``TimeQuery`` instances combined with the ``&`` or ``|`` operator. + + +Keeping The Index Intact +^^^^^^^^^^^^^^^^^^^^^^^^ + +TinyFlux must build an index when it is initialized as it currently does not save the index upon closing. If the workflow for the session is read-only, then the index state will never be modified. If, however, a TinyFlux session consists of a mix of writes and reads, then the index will become invalid if at any time, a Point is inserted out of time order. + +>>> from tinyflux import TinyFlux, Point +>>> from datetime import datetime, timedelta, timezone +>>> db = TinyFlux("my_db.csv") +>>> t = datetime.now(timezone.utc) # current time +>>> db.insert(Point(time=t)) +>>> db.index.valid +True +>>> db.insert(Point(time=t - timedelta(hours=1))) # a Point out of time order +>>> db.index.valid +False + +If ``auto-index`` is set to ``True`` (the default setting), then the next read will rebuild the index, which may just seem like a very slow query. For smaller datasets, reindexing is usually not noticeable. + +.. hint:: + + If possible, Points should be inserted into TinyFlux in time-order. diff --git a/requirements.txt b/requirements.txt index 9daf6b2..a15a32b 100644 --- a/requirements.txt +++ b/requirements.txt @@ -4,6 +4,7 @@ flake8-docstrings mypy pytest pytest-cov +rstcheck sphinx sphinx_autodoc_typehints sphinx_rtd_theme