Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On arm/android a zero initialized vec is not actually filled with zeros #1311

Open
4 of 11 tasks
Frando opened this issue Aug 14, 2023 · 5 comments
Open
4 of 11 tasks
Labels
A-qemu Area: qemu runners bug

Comments

@Frando
Copy link

Frando commented Aug 14, 2023

Checklist

Describe your issue

Hi, when cross-compiling a crate we use (redb) to android we discovered a very weird issue with targets aarch64-linux-android and armv7-linux-androideabi. The original issue is here but I managed to create a minimal reproduction (see below).

The issue is: When repeatedly creating a vector filled with 0u32, it will happen that this vector's first element is not 0u32 but instead a different value. So basically, the TLDR is:

let data = vec![0u32; len];
assert!(data[0], 0u32)

should never fail the assertion, but it does on aarch64-linux-android with cross test.

We have been pondering this for a few days but are completely out of ideas how and why this happens. If the qemu VM runs out of memory or something, it should panic with an OOM, shouldn't it?

Example

cargo init

paste the following into src/lib.rs

#[cfg(test)]
mod tests {
    #[test]
    fn it_works() {
        let mut cache = vec![];
        let len = 8192;
        for round in 0..10000 {
            let data = vec![0u32; len];
            for i in 0..len {
                assert_eq!(data[i], 0u32, "data[{i}] is {} (at round {round})", data[i])
            }
            cache.push(data);
        }
    }
}

and run

cross test --target aarch64-linux-android

The test should pass (and does when testing directly on the host) - but the above command fails with this:

running 1 test
test tests::it_works ... FAILED

failures:

---- tests::it_works stdout ----
thread 'tests::it_works' panicked at 'assertion failed: `(left == right)`
  left: `8192`,
 right: `0`: data[0] is 8192 (at round 4097)', src/lib.rs:10:17

Additional information / notes

No response

What target(s) are you cross-compiling for?

aarch64-linux-android

Which operating system is the host (e.g computer cross is on) running?

  • macOS
  • Windows
  • Linux / BSD
  • other OS (specify in description)

What architecture is the host?

  • x86_64 / AMD64
  • arm32
  • arm64 (including Mac M1)

What container engine is cross using?

  • docker
  • podman
  • other container engine (specify in description)

cross version

cross 0.2.5 (37c681a 2023-07-17)

@Emilgardis
Copy link
Member

Emilgardis commented Aug 14, 2023

what qemu version are you using? I'll test this eventually myself, but would be good information to have (and also if maybe a later qemu has fixed it)

docker run --privileged --rm tonistiigi/binfmt:master --version

@Emilgardis Emilgardis added bug A-qemu Area: qemu runners labels Aug 14, 2023
@Frando
Copy link
Author

Frando commented Aug 15, 2023

$ docker run --privileged --rm tonistiigi/binfmt:master --version
binfmt/82a943b qemu/v7.1.0 go/1.20.7

@Emilgardis
Copy link
Member

Sorry, seems like I was mistaken in that binfmt showed the installed version... Can you do

qemu-system-x86_64 --version

instead :)

here's an example of mismatch

$ docker run --privileged --rm tonistiigi/binfmt:master --version
binfmt/82a943b qemu/v7.1.0 go/1.20.2
 
$ qemu-system-x86_64 --version
QEMU emulator version 8.0.0
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers

@Frando
Copy link
Author

Frando commented Aug 15, 2023

Yes, I get the same as you:

$ docker run --privileged --rm tonistiigi/binfmt:master --version
binfmt/82a943b qemu/v7.1.0 go/1.20.7

$ qemu-system-x86_64 --version
QEMU emulator version 8.0.0
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers

Frando added a commit to n0-computer/iroh that referenced this issue Aug 23, 2023
github-merge-queue bot pushed a commit to n0-computer/iroh that referenced this issue Aug 24, 2023
## Description

This PR adds `iroh-sync`, a document synchronization protocol, to iroh,
and integrates with `iroh-net`, `iroh-gossip` and `iroh-bytes`.

* At the core is the `iroh-sync` crate, with a set reconciliation
algorithm implemented by @dignifiedquire. See [the old iroh-sync
repo](https://github.com/n0-computer/iroh-sync/) for the prehistory and
#1216 for the initial PR (fully included in this PR, and by now
outdated)
* Iroh sync is integrated in the iroh node, with iroh-gossip, in the RPC
interface, and the CLI.
* `LiveSync` is the handle to an actor that integrates sync with
[gossip](#1149 ) to broadcast and receive document updates from peers.
For each open document a gossip swarm is joined with a `TopicId` derived
from the doc namespace.
* mod `download` contains the new downloader. It will be improved in
#1344 .
* mod `client` is the new high-level RPC client. It currently only has
methods for dealing with docs and sync, other things should be added
once we merged this. CLI commands for sync are in `commands/sync.rs`.
Will be much better with #1356 .
* `examples/sync.rs` has a REPL to modify and sync docs. It does a full
setup without using the iroh console. Also includes code to sync
directories, and a hammer command for load testing.
* The PR also introduces `iroh::client::Iroh`, a wrapper around the RPC
client, and `iroh::client::Doc`, a wrapper around RPC client for a
single document

## Notes & open questions

#### Should likely happen before merge:

* [x] Make `iroh_sync::Store:::list_authors` and `list_replicas` return
iterators `iroh-sync` *fixed in #1366 *
* [ ] Add `iroh_sync::Store::close_replica`
* [x] `ContentStatus` in `on_insert` callback is reported as `Ready` if
the content is still `baomap::PartialEntry` (in-process download) *fixed
in a8e8093*


#### Can happen after merge, but  before `0.6` release

* [ ] Implement `AuthorImport` and `AuthorShare` RPC & CLI commands
* [ ] sync store `list_namespaces` and `list_authors` internally
collect, return iterator instead
* [ ] Fix cross-compiles to arm/android. See
cross-rs/cross#1311
* [ ] Ensure that fingerpring calculation is efficient and/or cached for
large documents. Currently calculating the initial fingerprint iterates
once over all entries in a document.
* [ ] Make content downloads be more reliable
* [ ] Add some way to download content from peers independent of the
first insertion event for a remote entry. The downloader with retries is
tracked in #1334 and 1344, but independent of that, we still would
currently only ever try to queue a download when the `on_insert`
callback triggers, which is only once. There should be a way, even if
manual for now, to try to download missing content in a replica from
peers.
* [ ] during `iroh-sync` sync include info if content is available for
each entry
* [ ] Add basic peer management and persistence. Currently live sync
will stop to do anything after a node restart.
* [ ] Persist the addressbook of peers for a document, to reconnect when
restarting the node
* [ ] Implement `PeerAdd` and `PeerList` RPC & CLI commands. The latter
needs changes in `iroh-net` to expose information of currently-connected
peers and their peer info.
* [ ] Make read-only replicas possible
* [ ] Improve reliablity of replica sync. 
* sync is triggered on each `NeighborUp` event from gossip. check that
we don't sync too much.
* maybe include peer info in gossip messages, to queue syncs with those
(but not all at once)
* track and exchange the timestamp of last full sync for peers, to know
if you missed gossiped message and react accordingly
     * add more tests with peers coming and leaving

#### Open questions

* [ ] `iroh_sync::EntrySignature` should the signatures include a
namespace prefix?
* [ ] do we want the 1:1 mapping of `NamespaceId`and gossip `TopicId`,
or would the topic id as a hash be better?

#### Other TODOs collected from the code

* [ ] Port `hammer` and `fs` commands from REPL example to iroh cli
* [ ] docs/naming: settle terminology about keypairs,
private/secret/signing keys, public keys/identifiers and make docs and
symbols consistent
* [ ] Make `bytes_get` streaming in the RPC interface
* [ ] Allow to unset the subscription on a replica
* [ ] `iroh-sync` shouldn't depend on `iroh-bytes` only for `Hash` type
-> #1354
* [ ] * [ ] Move `sync::live::PeerSource` to iroh-net or even better ->
#1354
* [ ] `StoreInstance::put` propagate error and verify timestamp is
reasonable.
* [ ] `StoreInstance::get_range` implement inverted range
* [ ] `iroh_sync`: Remove some items only used in tests (marked with
#[cfg(test)])
* [ ] `iroh_sync` fs store: verify get method fetches all keys with this
namespace
* [ ] `ranger::SimpleStore::get_range`: optimize
* [ ] `ranger::Peer` avoid allocs?
* [ ] `fs::StoreInstance::get_fingerprint` optimize
* [ ] `SyncEngine::doc_subscribe` remove unwrap, handle error


## Change checklist

- [x] Self-review.
- [x] Documentation updates if relevant.
- [ ] Tests if relevant.

---------

Co-authored-by: dignifiedquire <me@dignifiedquire.com>
Co-authored-by: Asmir Avdicevic <asmir.avdicevic64@gmail.com>
Co-authored-by: Kasey <klhuizinga@gmail.com>
@Frando
Copy link
Author

Frando commented Sep 19, 2023

Hi @Emilgardis, do you have any further ideas or how we could go about identifying/fixing this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-qemu Area: qemu runners bug
Projects
None yet
Development

No branches or pull requests

2 participants