Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update from staging #54

Merged
merged 28 commits into from Sep 25, 2023
Merged

Update from staging #54

merged 28 commits into from Sep 25, 2023

Conversation

justinGilmer
Copy link

Catch master branch up to date with staging.

justinGilmer and others added 28 commits June 6, 2023 16:57
* Release v5.15.0

* update protobuf to v4.22.3

* Add threaded streamset calls

Using concurrent.futures.ThreadPoolExecutor

* Blacken code

* Update for failing tests

* Ignore flake8 as part of testing

pytest-flake8 seems to have issues with the later versions of flake8

tholo/pytest-flake8#92

* Update .gitignore

* Update ignore and remove extra print.

* Remove idea folder (pycharm)

---------

Co-authored-by: David Konigsberg <72822263+davidkonigsberg@users.noreply.github.com>
Co-authored-by: Jeff Lin <42981468+jleifnf@users.noreply.github.com>
* Release v5.15.0

* update protobuf to v4.22.3

* Add threaded streamset calls

Using concurrent.futures.ThreadPoolExecutor

* Blacken code

* Update for failing tests

* Ignore flake8 as part of testing

pytest-flake8 seems to have issues with the later versions of flake8

tholo/pytest-flake8#92

* Update .gitignore

* Update proto definitions.

* Update endpoint to support arrow methods

* Support arrow endpoints

* Additional arrow updates

* Update transformers, add polars conversion

* Update .gitignore

* Update ignore and remove extra print.

* Remove idea folder (pycharm)

* Update requirements.txt

* Update btrdb/transformers.py

* Update the way to check for arrow-enabled btrdb

This has not been "turned on" yet though, since we dont know the version number this will be enabled for. The method is currently commented out, but can be re-enabled pretty easily.

* Use IPC streams to send the arrow bytes for insert

Instead of writing out feather files to an `io.BytesIO` stream and then sending the feather files over the wire, this creates a buffered outputstream and then sends that data back as bytes to btrdb.

* Create arrow specific stream methods.

* Update test conn object to support minor version

* Update tests and migrate arrow code.

* Arrow and standard streamset insert

* Create basic arrow to dataframe transformer

* Support multirawvalues, arrow transformers

* Multivalue arrow queries, in progress

* Update stream filter to properly filter for sampling frequency

* Update arrow values queries for multivalues

* Update param passing for sampling frequency

* Update index passing, and ignore depth

* benchmark raw values queries for arrow and current api

* Add aligned windows and run func

* Streamset read benchmarks (WIP)

In addition:
* update streamset.count to support the `precise` boolean flag.

* Update mock return value for versionMajor

* In progress validation of stream benchs

---------

Co-authored-by: David Konigsberg <72822263+davidkonigsberg@users.noreply.github.com>
Co-authored-by: Jeff Lin <42981468+jleifnf@users.noreply.github.com>
* Add 3.10 python to the testing matrix

* Fix yaml parsing

* Update requirements to support 3.10

* Use pip-tools `pip-compile` cli tool to generate requirements.txt files from the updated pyproject.toml file
* Include pyproject.toml with basic features to support proper extra deps
* Support different ways to install btrdb from pip
  * `btrdb, btrdb[data], btrdb[all], btrdb[testing], btrdb[ray]`
* Update transformers.py to build up a numpy array when the subarrays are not the same size (number of entries)
  * This converts the main array's dtype to `object`
  * tests still pass with this change
* recompile the btrdb proto files with latest protobuf and grpc plugins
* Create multiple requirements.txt files for easier updating in the future as well as a locked version with pinned dependencies

* Ignore protoc generated flake errors

* Update test requirements

* Include pre-commit and setup.

* Pre-commit lints.
add staging to pre-commit checks
* Fix multistream endpoint bugs

* The streamset was passing the incorrect params to the endpoint
* The endpoint does not return a `version` in its response, just `stat` and `arrowBytes`

Params have been updated and a NoneType is passed around to ignore the
lack of version info, which lets us use the same logic for all bytes
decoding.

* Add multistream benchmark methods for timesnap and no timesnap.
Benchmarking methods added for:

* stream inserts using tuples of time, value data
* stream inserts using pyarrow tables of timestamps, value columns

* streamset inserts using a dict map of streamset stream uuids, and lists of tuples of time, value data
* streamset inserts using a dict map of streamset stream uuids, and pyarrow tables of timestamps, values.
* Add insert benchmarking methods

Benchmarking methods added for:

* stream inserts using tuples of time, value data
* stream inserts using pyarrow tables of timestamps, value columns

* streamset inserts using a dict map of streamset stream uuids, and lists of tuples of time, value data
* streamset inserts using a dict map of streamset stream uuids, and pyarrow tables of timestamps, values.

* Include nullable false in pyarrow schema inserts

* This was the only difference in the schemas between go and python.
* also using a bytesIO stream to act as the sink for the ipc bytes.
Previously, the debug logging in the api would create the f-strings no matter if logging.DEBUG was the current log level or not.

This can impact the performance, especially for benchmarking.

Now, a cached IS_DEBUG flag is created for the stream operations, and other locations, the logger.isEnabledFor boolean is checked.

Note that in the stream.py, this same function call is only executed once, and the results are cached for the rest of the logic.
* Update pandas testing and fix flake8 issues

* Update stream logic for unpacking arrow tables, update integration tests.

* add init.py for integration tests.

* Add additional tests for arrow methods vs their old api counterparts.

* Add tests for timesnap boundary conditions. (#32)

* Add more integration tests.

* Add additional integration tests, modify the name_callable ability of the arrow_values.

* remove extraneous prints.

* Include retry logic.

* Update statpoint order in arrow, fix some bugs with the arrow methods.

* Update testing to account for NaNs.

* Update github action versions.

* Update tests, add in a test for duplicate values.

* Remove empty test, remove extraneous prints

---------

Co-authored-by: andrewchambers <andrewchamberss@gmail.com>
* Update docs, add in final enhanced edits.
Once we have a v5.30tag of the server with arrow/multistream, we can
merge this and complete the ticket.
* fix: patch up stream object type and other bugs

* fix: resolve depth errors in stream window

* fix: resolve remaining test warnings

* fix: resolve test imports

* chore: add pre-commit install to readme
Catch master branch up to date with staging.
@justinGilmer justinGilmer merged commit 2f32815 into master Sep 25, 2023
15 checks passed
@justinGilmer justinGilmer deleted the update_from_staging branch September 25, 2023 16:34
@justinGilmer
Copy link
Author

Update btrdb python with latest changes from [sc-26167]

@shortcut-integration
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants