Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Syncstorage inital sqlite support #1520

Draft
wants to merge 75 commits into
base: master
Choose a base branch
from

Conversation

Eragonfr
Copy link
Contributor

@Eragonfr Eragonfr commented Feb 11, 2024

Description

Re-implementation of #935 on the newer codebase.
Thanks for @mqus for the first implementation.

Issue(s)

Partially fixes #498

Todo

  • Code Cleanup
  • Commit history cleanup (I'm not really working in a clean way)
  • Add Sqlite clippy checks to CircleCI
  • docker container (Thanks @bcspragu for the help)
  • e2e tests image (Thanks @bcspragu for the help)
  • pass the e2e tests

Current state is that it runs on my computer, but it's probably not clean at all, there's still a lot of duplicate code, and some SQL queries are still in MySQL-flavored SQL but could be used by the SQLite backend.

@Eragonfr
Copy link
Contributor Author

To pass the checks I have the choice between upgrading diesel or ignoring the issue with libsqlite-sys (which tbh is probably a non-issue because we should not be using SQLite printf anyway).

@Eragonfr Eragonfr force-pushed the syncstorage-sqlite branch 2 times, most recently from 1047197 to 9d83956 Compare March 10, 2024 21:36
@Eragonfr Eragonfr force-pushed the syncstorage-sqlite branch from 6927869 to 62774e3 Compare April 6, 2024 18:54
@Eragonfr Eragonfr force-pushed the syncstorage-sqlite branch from 62774e3 to 5ea3b39 Compare May 1, 2024 09:52
@Eragonfr
Copy link
Contributor Author

Eragonfr commented May 9, 2024

I choose to ignore the libsqlite3-sys CVE because it's not impacting this project. That should let me run the other tests in the CI.

clippy_sqlite:
# Matches what's run in circleci
cargo clippy --workspace --all-targets --no-default-features --features=syncstorage-db/sqlite,tokenserver-db/sqlite -- -D warnings

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if you saw #1513, but one of the things we're trying to do is to pull more of tokenserver into syncstorage-rs.

(Weird history lesson: Tokenserver was supposed to support more than just sync, but that never really happened. Having it distinct turned out to be a headache for a lot of reasons, so we're folding as much of Tokenserver as we can into the main syncstorage)

As a side note, this also means that the tokenserver storage stuff can probably live along-side the syncstorage data fairly easily. Possibly as just a bunch of other tables.

It's a bit more complicated for us since we have yet to migrate tokenserver over to spanner, but that's on our internal roadmap. When we get to doing that, however, is unknown.

We also have a few crate refactoring tickets open. I'll update those to indicate that we should really put the data handlers for both syncstorage and tokenserver into their own distinct crate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not saw that PR.

You want to remove tokenserver completely. Is that right ?

I don't understand what it means for my PR.
Should I change how I handle tokenserver ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that it impacts much of your PR, since you're only focused on the Storage part. There may be a few opportunities for folk running their own "Stand Alone" version to have just one database running instead of two, and depending on how we decide to do this for Spanner, there may be some additional functions we introduce to the database layer. We'll try to isolate and highlight those so that we don't cause more work.

@bcspragu
Copy link

bcspragu commented Aug 8, 2024

I'm really jazzed about this PR, thanks for all the work on it! I noticed this in the PR desc:

  • docker container (Help needed)

If this is a question of building a Docker container from this codebase that bundles all the SQLite deps and nothing more, I'm happy to take on that work.

@Eragonfr
Copy link
Contributor Author

Eragonfr commented Aug 9, 2024

I'm really jazzed about this PR, thanks for all the work on it! I noticed this in the PR desc:

  • docker container (Help needed)

If this is a question of building a Docker container from this codebase that bundles all the SQLite deps and nothing more, I'm happy to take on that work.

Something like that yes. There might be some docker-compose configuration files to make too.
I gave multiple tries to Docker but I never seems to get a "standard" docker environment. I always get something with some broken networking, nuked firewall or something else. At this point I've given up on that technology. (docker should release VMs that I can start and get a "standard" env with networking and stuff)

That would leave me to clean up my code

@bcspragu
Copy link

bcspragu commented Aug 9, 2024

Sounds good, I'll give it a shot!

@bcspragu
Copy link

bcspragu commented Aug 9, 2024

Okay, I think I've got something approximately working in Eragonfr/syncstorage-rs@syncstorage-sqlite...bcspragu:syncstorage-rs:syncstorage-sqlite

E.g.

docker build -t mozsync-sqlite --build-arg DATABASE_BACKEND=sqlite .

# The --userns line is a rootless Podman thing, can be ignored mostly
docker run --rm -it \
    --userns=keep-id:uid=10001,gid=10001 \
    -v $PWD/temp-data-do-not-submit:/data \
    -e "SYNC_HOST=0.0.0.0" \
    -e "SYNC_MASTER_SECRET=secret0" \
    -e "SYNC_SYNCSTORAGE__DATABASE_URL=sqlite:///data/syncdb.sqlite" \
    -e "SYNC_TOKENSERVER__DATABASE_URL=sqlite:///data/tokenserverdb.sqlite" \
    -e "SYNC_TOKENSERVER__RUN_MIGRATIONS=true" \
    mozsync-sqlite

Seems to start up successfully, but I'm not brave enough to connect it to my real Firefox sync yet

@Eragonfr
Copy link
Contributor Author

Thanks @bcspragu !

@bcspragu
Copy link

bcspragu commented Aug 12, 2024

For sure! Also, looking at the current ci/circleci: build-and-test failure, that's a result of missing --features flags here:

command: cargo build

Not sure what the team would prefer here, but there's two options:

  1. If you're just looking for a sanity check, test against MySQL (e.g --features=syncstorage-db/mysql,tokenserver-db/mysql)
  2. Run separate builds for each DB backend
  3. [Actually, this is probably what we want] Fix the default flags to use MySQL correctly

Unrelated, one thing I forgot to mention is that regardless of which features are enabled, MySQL client libs are required on the host (e.g. there are places where it isn't gated behind the feature flag, it gets linked in regardless). I started adding the relevant #cfg(feature(...)) in places where I noticed it, but again, not sure what the team would prefer there.

@Eragonfr
Copy link
Contributor Author

I thought of dropping the default flags because the user should choose their storage backend instead of relying on a default one.


We want to have the proper #cfg(feature(...)) when it's needed without breaking the spanner backend. (It uses MySQL for the token server database)

@Eragonfr
Copy link
Contributor Author

Eragonfr commented Sep 8, 2024

The ci fails on sqlite e2e tests. I think it is because the /data/ folder is missing, which means that the database can't be created and that makes the ci to fail.
I need some help to figure out how to either create this folder or to not need it in the container.

The checks are unrelated and are probably failing on the master branch too.

@Eragonfr
Copy link
Contributor Author

Eragonfr commented Sep 22, 2024

🤦‍♀️ I'm dumb the e2e tests can't work because they require a connection to the database from the python script and syncserver at the same time. Which is not possible with sqlite.
There was no need for my last height commits.
But the e2e tests are not usable with sqlite.

@Eragonfr Eragonfr force-pushed the syncstorage-sqlite branch 3 times, most recently from 7ca1578 to 1c08e86 Compare September 22, 2024 23:08
@JulianWgs
Copy link

Hi @Eragonfr , thanks for your hard work! I am trying to get this to run locally, but I can't execute this query against the token database, because the database file is empty. Am I doing something wrong? Could you try running the server with a missing token database and check if the server correctly sets it up? Thanks again! I am really looking forward to this!

@Eragonfr
Copy link
Contributor Author

Eragonfr commented Oct 13, 2024

@JulianWgs I also have the issue. I think that I broke the auto migration when doing some refactor.
The tokenserver part is giving me some trouble to get it to use SQLite queries, it doesn't pass its unit tests and there is quite a lot of its queries that doesn't run.

EDIT: Can't reproduce that bug anymore… wtf

@Eragonfr Eragonfr force-pushed the syncstorage-sqlite branch 4 times, most recently from ff5be44 to a8dda23 Compare October 19, 2024 23:16
@Eragonfr
Copy link
Contributor Author

Eragonfr commented Feb 12, 2025

I GOT THE TESTS TO RUN!!!
I modified them for SQLite, but I should be able to add only two conditions and everything else will run exactly the same on MySQL and SQLite

Also, none of them failed.
EDIT: Some are failing as I rework the code

@Eragonfr
Copy link
Contributor Author

Finger crossed… this might fail because of python and test lib version mismatch

@Eragonfr
Copy link
Contributor Author

Eragonfr commented Mar 8, 2025

Oops… the checks part will fail until we change something in it.

error: cannot install package `cargo-audit 0.21.2`, it requires rustc 1.81.0 or newer, while the currently active rustc version is 1.78.0
`cargo-audit 0.21.1` supports rustc 1.74.0

@Eragonfr
Copy link
Contributor Author

db5b1dc Will fail I haven't managed to get the ORM system to work exactly as needed to pass the tests

@almereyda
Copy link

almereyda commented Mar 15, 2025

There are three errors, one that shows in all e2e tests, one in the checks and one in the SQLite one:

  1. All environments fail with importing a type from SQLAlchemy:
Traceback (most recent call last):
  File "/app/tools/integration_tests/run.py", line 11, in <module>
    from tokenserver.run import run_end_to_end_tests, run_local_tests
  File "/app/tools/integration_tests/tokenserver/run.py", line 6, in <module>
    from tokenserver.test_authorization import TestAuthorization
  File "/app/tools/integration_tests/tokenserver/test_authorization.py", line 5, in <module>
    from tokenserver.test_support import TestCase
  File "/app/tools/integration_tests/tokenserver/test_support.py", line 28, in <module>
    from tokenserver.tables import Users, Nodes, Services
  File "/app/tools/integration_tests/tokenserver/tables.py", line 1, in <module>
    from sqlalchemy import Integer, String, Null, BigInteger, Index
ImportError: cannot import name 'Null' from 'sqlalchemy' (/usr/local/lib/python3.9/dist-packages/sqlalchemy/__init__.py)

As far as I can see it the used SQLAlchemy version does not have a type called Null. But there is one called NullType.

Maybe that's already it?

  1. The checks issue is related to a version and dependency jump from a dependency:
error: cannot install package `cargo-audit 0.21.2`, it requires rustc 1.81.0 or newer, while the currently active rustc version is 1.78.0
`cargo-audit 0.21.1` supports rustc 1.74.0

Does the patch upgrade from cargo-audit justify moving to rustc >= 1.81.0? I suppose this error would also apply on other branches. Maybe this can be fixed on mozilla-services:master and be rebased here?

  1. https://app.circleci.com/pipelines/github/mozilla-services/syncstorage-rs/5579/workflows/ee50f1bb-8d13-4fb3-b1e1-c7bed79be30a/jobs/22461?invite=true#step-106-410_334
called `Result::unwrap()` on an `Err` value: ApiError { kind: Db(DbError { kind: Sql(SqlError { kind: DieselConnection(BadConnection("Unable to open the database file")), status: 500, backtrace:    0: <syncserver_db_common::error::SqlError as core::convert::From<syncserver_db_common::error::SqlErrorKind>>::from

Might this be to a read-only filesystem on the CI runner? To the contrary, the databases in the other tests boot up just fine.

@Eragonfr
Copy link
Contributor Author

Eragonfr commented Mar 19, 2025

There are three errors, one that shows in all e2e tests, one in the checks and one in the SQLite one:

1. All environments fail with importing a type from SQLAlchemy:

As far as I can see it the used SQLAlchemy version does not have a type called Null. But there is one called NullType.

Interesting, this one might be related to version mismatch between my dev env and our test env I think.
I'm running python 3.13 with latest sqlalchemy on my machine.

2. The checks issue is related to a version and dependency jump from a dependency:

I already knew that
See #1520 (comment)
Already fixed in master on march 10 see 8c56cae.

Might this be to a read-only filesystem on the CI runner? To the contrary, the databases in the other tests boot up just fine.

docker FS is not read only
This is just me that fucked up the SQLite DB path and it points to a folder that doesn't exists
There's a good reason why the db in the other tests is able to start, one test uses MySQL only and the other uses MySQL+spanner. None of them uses SQLite except the sqlite one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support sqlite/postgresql backend
7 participants