Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from clockworklabs:master #2

Open
wants to merge 282 commits into
base: master
Choose a base branch
from

Conversation

pull[bot]
Copy link

@pull pull bot commented Mar 5, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Mar 5, 2024
Centril and others added 29 commits March 15, 2024 01:06
)

* remove tracing from InstanceEnv::*_by_col_eq

* Added #[tracing::instrument(skip_all)] to call_reducer_with_tx and call_reducer

* remove tracing on some stuff in wasm_intance_env

---------

Co-authored-by: Tyler Cloutier <cloutiertyler@aol.com>
It turns out that the changes introduced in #734 do not result in more
reliable detection of incompatible schema updates. This is because the
datastructures involved can be converted into each other, but that
conversion is not bijective.

Fix this by manually adjusting the schema of the existing table to be
comparable to the proposed table.

Also log details about a schema mismatch to the user-retrievable database log,
in unified diff format.
* Log a warning when doing `iter_by_col_range` without an index

* Only warn if the table is sufficiently large for a scan to be bad

Per Tyler's review, this commit gates the warning behind `rdb_num_table_rows`,
so that the warning is only printed
if the table in question has at least `TOO_MANY_ROWS_FOR_SCAN` rows.

`TOO_MANY_ROWS_FOR_SCAN` is defined as 1000
because that's the number Tyler said in his comment.

* Gate the unindexed warning behind a feature in `core`
* Distinguish between inner and semijoins in `QueryExpr` AST.

This commit adds a flag `semi: bool` to `JoinExpr`, which signifies a semijoin,
as opposed to an inner join.

A new optimization pass, `QueryExpr::try_semi_join`, is defined
which can detect a certain common case of inner joins and rewrite them into semijoins.

The punchline here is that `core::vm::join_inner` used to accept a flag `semi: bool`
which it could use to avoid some expensive `Header` mutations,
but that flag was always passed as `false` because we had no way to distinguish semijoins.
With this commit, the flag is actually used,
so evaluating non-indexed semijoins should avoid allocating a new `Header`.

* Address Joshua's review

- Remove a test that was silly and backwards, and intentionally thwarted the optimizer
  in a way that will hopefully stop working soon.
- Add a test that an `IncrementalJoin`'s `virtual_plan` looks like we expect.
- Rename the `JoinExpr` argument to `core::vm::join_inner` for clarity.
- Sprinkle comments around about how we compile and optimize joins.
Co-authored-by: joshua-spacetime <josh@clockworklabs.io>
…959)

Closes #813.

A subscription will no longer materialize product values,
for queries with read-only row operations.
but instead it will serialize from bflatn straight to bsatn.

Co-authored-by: joshua-spacetime <josh@clockworklabs.io>
#951)

* eval_updates: use map entry apis

* dedup logic in remove_subscription + use entry api to hash only once

* stop cloning PVs in eval_updates

* address Phoebe's comments

* add tracing for perf testing

---------

Co-authored-by: joshua-spacetime <josh@clockworklabs.io>
Co-authored-by: joshua-spacetime <josh@clockworklabs.io>
`AlgebraicTypeLayout` and friends already include full layout information,
including properly-aligned offsets for `ProductTypeElementLayout`s.
As such, there's no need to do any alignment computation
during `serialize_value` or `write_value`.

Instead, while traversing a `ProductTypeLayout`,
we can use each element's `offset` to update the `curr_offset`.
* Nuke `to_mem_table_with_op_type`

Rather than annotating rows with `__op_type` during `eval_incr` of selects,
partition the rows before evaluation, then merge after.

* Add historical comment.

Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>

* Remove `_replaced_source_id`

---------

Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
)

* incr-join, find_updates: avoid unncecessary clones & use partition

* JoinSide: store 'Vec<PV>'s instead

* address joshua & phoebe's reviews
If a subscription drops its read lock on the database too early,
that is before it sends its updates to the client,
this test will fail.
Updates a test to wake up a writer tx only after a reader tx has started.
Fixes #1009.

Looking up a positional FieldName in a Header was broken.
* Implement (but do not use) a fast path for BFLATN -> BSATN conversion

* fmt and clippy

* `u16` offset rather than `usize`

* Address Joshua's review

* Define methods on `RowRef` and `RelValue` which use the new serializer

* Comment in `align_to` about div-by-zero

Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>

* Add benchmark comparing BFLATN -> BSATN with and without the fast path

* Add benchmark on `u64_u64_u32`, which has less interior padding than `u32_u64_u64`

* Remove `to_len` from `to_bsatn_extend`

It turns out to be slower than just eating the `realloc`s.

* Remove unused `to_bsatn_slice`

I thought I would need it, but it ended up not being useful.

* Expand comment with example; `Box<[...]>` to reduce memory footprint

* Comments from Mazdak's review

---------

Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
* bump Rust to 1.77 + fix warnings + use Bound::map

* use .truncate(true) for OpenOptions
Closes #1024.

Before this change,
we would serialize messages **before** inserting into the send queue.

Because we commit the tx only after inserting into the send queue,
this meant we were holding onto the database lock unnecessarily.

After this change,
we serialize messages **after** inserting into the send queue.
This means we serialize only after committing the tx.
…#1028)

* add AlgebraicValue::take for a neater interface

* btree_index: move test-only code to tests
* eval_incr: add RelValue::ProjRef(&PV) to avoid cloning PVs

* 1. rename `build_source_query` -> `in_mem_to_rel_ops`
2. `SourceExpr::{MemTable -> InMemory}`
3. clariy some commentary re. SourceExpr/SourceSet and friends
4. cleanup: simplify `compile_select_eval_incr`
5. remove ProgramStore; twas dead code.

* add SourceProvider, simplifying the source set stuff

* use MemTable less

* split DatabaseTableUpdate in deletes/inserts vecs

* incr-join: avoid temp Vec<_> allocs

* store deletes/inserts separately in eval_incr results; mostly cleanup
…atabases (#901)

* Fixed an issue which caused metrics to only be recorded for on-disk databases

* cargo fmt

* Removed unused args

* Fixed logic error
* Binary WebSocket API: Brotli-compress all outgoing messages

* Decrease buffer size; comment on future work

Co-authored-by: joshua-spacetime <josh@clockworklabs.io>
Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>

* Note experimental compression ratio

---------

Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>
Co-authored-by: joshua-spacetime <josh@clockworklabs.io>
First in a series of patches to implement the new commitlog format.

This patch implements the base format, leaving the transaction payload
generic. Segment handling, writing and reading is implemented based on
an in-memory backend, which greatly simplifies testing.

As a notable deviation from the previous implementation, segments are
never implicitly trimmed. Instead, faulty commits are ignored if and
only if the next commit in the log sequence is valid and has the right
offset. On the write path, this entails closing the active segment when
an (I/O) error occurs, but retaining the commit in memory such that it
is written to the next segment.

Note that this patch does not define the final public API.
Provides a commitlog backing store based on files, and defines the
exported `Commitlog` type which fixes the store to the file-based one.
kim and others added 30 commits June 11, 2024 18:10
Signed-off-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Signed-off-by: Kim Altintop <kim@eagain.io>
Co-authored-by: Kim Altintop <kim@eagain.io>
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
Co-authored-by: Zeke Foppa <github.com/bfops>
Co-authored-by: Zeke Foppa <github.com/bfops>
Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Signed-off-by: Ingvar Stepanyan <me@rreverser.com>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
…ave the correct `-s` param (#1457)

Co-authored-by: Zeke Foppa <github.com/bfops>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet