Skip to content

Marco/ruff and ty checks#153

Merged
MarcoForte merged 14 commits intomainfrom
marco/ruff_and_ty_checks
Jan 28, 2026
Merged

Marco/ruff and ty checks#153
MarcoForte merged 14 commits intomainfrom
marco/ruff_and_ty_checks

Conversation

@MarcoForte
Copy link
Copy Markdown
Contributor

@MarcoForte MarcoForte commented Jan 17, 2026

Ruff and ty and prek for precommit checks

CICD caught the intentional error in this commit

MarcoForte and others added 7 commits January 6, 2026 22:46
* tentative maturin update

* bumping the version

* reverting the pyproject change, not important

* [chore] Cleaner wds (#147)

* Some cleanup + better benchmarks

* cargo fmt, broken dev setup

* removing a silly change

* better workload spread while wds, tarball extraction is still the bottleneck

* better workload spread while wds, tarball extraction is still the bottleneck

* adding a small plot helper

* Renaming a confusing variable + adding more benchmarks

* Updating pyproject + PD12M bench on Epyc

* Fix the missing attributes from wds samples

* slightly cleaner code

* Adding a unit test + cleaner error handling

* Adding backtrace to the tests

* code review

* Updating the version post-merge

* [infra] Update the build process (#148)

* tentative maturin update

* bumping the version

* reverting the pyproject change, not important
Comment thread .pre-commit-config.yaml
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
- repo: builtin
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://prek.j178.dev/builtin/
faster though not compatible with standard precommit

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh new to me, could be useful at Mistral actually / big pre-commit users and it's quite slow

Comment thread pyproject.toml
Comment on lines +26 to +32
"prek>=0.2.29",
]

dev = [
"matplotlib",
"pandas",
"prek>=0.2.29",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I'm not sure how we want to split the dependencies between dev and test, if there should be overlap?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above looks good to me, basically test == unit tests and dev is benchmarks really ?

if sweep:
results_sweep = {}
for num_workers in range(2, (os.cpu_count() * 2 or 2), 2):
for num_workers in range(2, (os.cpu_count() or 1) * 2, 2):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty flagged this

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes this is really crap, I've been changing that in branches to just go *2 until we reach cpu count. I wanted to so something more iterative here but that's a bit ugly/hard to read

Comment on lines -183 to +193
def custom_transform(sample):
if "jpg" in sample:
sample["jpg"] = transform(sample["jpg"])
if "png" in sample:
sample["png"] = transform(sample["png"])
return sample

if transform:

def custom_transform(sample):
if "jpg" in sample:
sample["jpg"] = transform(sample["jpg"])
if "png" in sample:
sample["png"] = transform(sample["png"])
return sample
else:
custom_transform = lambda x: x # noqa: E731
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will revisit this

Comment thread uv.lock
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supposed to commit this apparently, but I'm not too sure

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'm still not super used to it but it's more precise than the toml in that everything is versioned, so it makes sure we're all working with the same binaries. We should have done this before, my bad

Copy link
Copy Markdown
Collaborator

@blefaudeux blefaudeux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good to me, thanks @MarcoForte ! Happy to add you to the authors if you'd like to keep going with some additions by the way

- name: Install deps
run: |
uv pip install maturin twine
uv tool install maturin
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I forgot but twine is not required indeed (top of head) I think that maturin does everything here

@MarcoForte MarcoForte marked this pull request as ready for review January 28, 2026 11:40
@MarcoForte MarcoForte merged commit 23cfd2a into main Jan 28, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants