Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: asdf list commands to sort output by version, not lexical #1428

Open
Kache opened this issue Jan 13, 2023 · 8 comments
Open

feat: asdf list commands to sort output by version, not lexical #1428

Kache opened this issue Jan 13, 2023 · 8 comments

Comments

@Kache
Copy link

Kache commented Jan 13, 2023

Is your feature request related to a problem? Please describe

asdf list, asdf list python, etc output version numbers that are lexicographically sorted, e.g.:

❯ asdf list python
  2.7.18
  3.10.0
  3.10.3
  3.10.6
 *3.10.9
  3.11.1
  3.6.15
  3.7.14
  3.7.6
  3.8.13
  3.9.11

But it'd be nicer to see them version-sorted (example below)

Describe the proposed solution

Would like to see this:

❯ asdf list python
  2.7.18
  3.6.15
  3.7.6
  3.7.14
  3.8.13
  3.9.11
  3.10.0
  3.10.3
  3.10.6
 *3.10.9
  3.11.1

Describe similar asdf features and why they are not sufficient

n/a

Describe other workarounds you've considered

piping to sort -V kinda works, but the * messes it up

@jthegedus
Copy link
Contributor

Any PR to resolve this should also include a test case which we expect to fail with the data from this issue, so we can guard against regressions.

I think the fix for this is actually pretty simple.

Mechanisms like sort -V and ls usage is already banned in the repo.

# sort --sort-version isn't supported everywhere
"sort.*-V"
"sort.*--sort-versions"
# ls often gets used when we want to glob for files that match a pattern
# or when we want to find all files/directories that match a pattern or are
# found in a certain location. Using shell globs is preferred over ls, and
# find is better at locating files that are in a certain location or that
# match certain filename patterns.
# https://github-wiki-see.page/m/koalaman/shellcheck/wiki/SC2012
'\bls '

The solution I would propose is moving sort_versions from

sort_versions() {
sed 'h; s/[+-]/./g; s/.p\([[:digit:]]\)/.z\1/; s/$/.z/; G; s/\n/ /' |
LC_ALL=C sort -t. -k 1,1 -k 2,2n -k 3,3n -k 4,4n -k 5,5n | awk '{print $2}'
}

into lib/utils.bash.

And then using sort_versions in lib/command-list.bash here:

versions=$(list_installed_versions "$plugin_name")

like so:

-versions=$(list_installed_versions "$plugin_name")
+versions=$(list_installed_versions "$plugin_name" | sort_versions)

I am AFK at the moment so cannot PR this myself. Whoever gets to it, the PR must include test cases for what we expect to succeed and what we expect to fail.

Whatever solution we implement here, won't order the more complex versions from plugins like Java, but the current solution is just a glob pattern, so I guess this wouldn't technically be worse 🤔

Java plugin versions:

...
sapmachine-musl-19.0.1-beta
semeru-jre-openj9-8u302-b08_openj9-0.27.0
...
trava-11.0.15+1
zulu-6.2.0.9
...

@jthegedus jthegedus changed the title aesthetic feat: asdf list commands to sort output by version, not lexical feat: asdf list commands to sort output by version, not lexical Jan 13, 2023
@stephanGarland
Copy link

stephanGarland commented Jan 15, 2023

I would alternatively suggest moving away from sort_versions() in its current incarnation, and simplifying it as much as possible by utilizing the built-in features of tools. See my PR here for asdf-plugin-template, and issue comment here for a discussion on why I think this is advantageous.

In sort_versions() specifically, there are three piped commands, and that function is being piped into from another command. The following were done on a base-model Macbook Air M1, with zsh-5.8.1.

# First downloaded tagged refs to avoid network bias
❯ git ls-remote --tags --refs "https://github.com/python/cpython" > python_versions.txt

# And again, but using git ls-remote's built-in sort
❯ git -c 'versionsort.suffix=a' -c 'versionsort.suffix=b' -c 'versionsort.suffix=r' -c 'versionsort.suffix=p' -c 'versionsort.suffix=-' -c 'versionsort.suffix=_' ls-remote --exit-code --tags --ref --sort="version:refname" "https://github.com/python/cpython" > python_sorted_versions.txt

# Gather times for both, dumping stdout to /dev/null, and running 100 times for each
❯ for i in {0..99}; do (time cat python_versions.txt | sed 'h; s/[+-]/./g; s/.p\([[:digit:]]\)/.z\1/; s/$/.z/; G; s/\n/ /' | LC_ALL=C sort -t. -k 1,1 -k 2,2n -k 3,3n -k 4,4n -k 5,5n | awk '{print $2}' >/dev/null) 2>> python_sort.txt; done

❯ for i in {0..99}; do (time cat python_sorted_versions.txt | awk -F'[/v]' '$NF ~ /^[0-9]+.*/ { print $NF }' >/dev/null) 2>> python_ls_remote.txt; done

# The result looks like this
❯ head python_sort.txt
cat python_versions.txt  0.00s user 0.00s system 71% cpu 0.001 total
sed 'h; s/[+-]/./g; s/.p\([[:digit:]]\)/.z\1/; s/$/.z/; G; s/\n/ /'  0.00s user 0.00s system 76% cpu 0.002 total
LC_ALL=C sort -t. -k 1,1 -k 2,2n -k 3,3n -k 4,4n -k 5,5n  0.00s user 0.00s system 70% cpu 0.002 total
awk '{print $2}' > /dev/null  0.00s user 0.00s system 64% cpu 0.006 total
...

❯ head python_ls_remote.txt
cat python_sorted_versions.txt  0.00s user 0.00s system 68% cpu 0.003 total
awk -F'[/v]' '$NF ~ /^[0-9]+.*/ { print $NF }' > /dev/null  0.00s user 0.00s system 74% cpu 0.01
...

# So four lines and two lines, respectively, need to have their 2nd-to-last column summed (total time AKA wall time)
❯ awk '{ s += $(NF-1) } NR % 4 == 0 { print s; s = 0 }' python_sort.txt >> python_total_sort.txt
❯ awk '{ s += $(NF-1) } NR % 2 == 0 { print s; s = 0 }' python_ls_remote.txt >> python_total_ls_remote.txt

# Finally, average them both and print
❯ awk '{ s += $(NF-1) } END { print s / NR }' python_total_sort.txt
0.01012

❯ awk '{ s += $(NF-1) } END { print s / NR }' python_total_ls_remote.txt
0.00618

The latter is ~40% faster. While these times are small, this kind of heavily piped code exists all over asdf, and they add up. Additionally, this is on a fairly zippy machine. On a much slower Debian server with spinning disks, I get this:

❯ awk '{ s += $(NF-1) } END { print s / NR }' python_total_sort.txt
0.01842

❯  awk '{ s += $(NF-1) } END { print s / NR }' python_total_ls_remote.txt
0.00851

That is over twice as fast as before, with the first taking nearly 20 milliseconds.

As far as the actual output as it relates to this question, here is a sample showing that it correctly sorts and differentiates between alpha, beta, release candidate, and final versions:

❯ awk -F'[/v]' '$NF ~ /^[0-9]+.*/ { print $NF }' <<< $(cat python_sorted_versions.txt) | grep -A14 3.9.16
3.9.16
3.10.0a1
3.10.0a2
3.10.0a3
3.10.0a4
3.10.0a5
3.10.0a6
3.10.0a7
3.10.0b1
3.10.0b2
3.10.0b3
3.10.0b4
3.10.0rc1
3.10.0rc2
3.10.0

@Stratus3D
Copy link
Member

For the list-all command we intentionally left version ordering in the output up to each plugin's bin/list-all callback. No sorting is done in asdf core because not all versions can be sorted by semantic version. For example, Erlang versions (which changed versioning schemes a while back), and versions of Java (which mix version and JVM type).

I'm not certain what we should do in this case with the list-all command. One solution would be to just sort them in asdf core as best we can (certainly no worse than what we have now). Or we could add a version sorting callback to plugins, and then pass the list of installed versions for the plugin to sort and return. Obviously this would be more hassle than what we have now, and perhaps not worth the hassle.

@jthegedus
Copy link
Contributor

jthegedus commented Jan 18, 2023

To recap:

  • asdf list doesn't currently do any sorting, ordering depends on Bash globbing
  • asdf list all <tool> is deferred to <tool>/bin/list-all command
  • sort_versions as suggested in this Issue by me is currently only used in the asdf update command. We probably don't want to increase it's usage.

After mulling this over for a few days I was going to suggest a optional (mandatory? 🤔) plugin bin/sort_versions.

@stephanGarland asdf list doesn't currently hit the network, so we can't really use the ls-remote built-in --sort for this specific issue. The current asdf-plugin-template repo relies on some sort-versions function, which we could replace as suggested by @stephanGarland (I am looking into that PR!).

I see these as our options:

  1. leave asdf list unsorted and close this issue as a no-fix 👎
  2. naively run asdf list through a local sorting method (sort_versions) 🤷
  3. add opt-in plugin callback bin/sort_versions and do nothing as current. Then update asdf-plugin-template to start adoption (with default implementation as ls-remote --sort suggested by @stephanGarland) 👍 👍 👍
  4. do 3 and fallback to 2 if no plugin callback. 👎 👎

My preference would be 3.

@stephanGarland
Copy link

Ah, gotcha. I'm still trying to work through the various parts of asdf and see where various functions are being called / used. I'm fine with any of options 2-4.

As an aside, lest it be buried in an old issue, I made this comment, with a proposed patch (no PR). It speeds up asdf list and asdf current by about 30% and 20%, respectively. It also causes a test to fail (test/where_command.bats), but I think it's a false positive based on other manual testing.

@jthegedus
Copy link
Contributor

jthegedus commented Jan 18, 2023

As an aside, lest it be buried in an old issue, I made this comment, with a proposed patch (no PR). It speeds up asdf list and asdf current by about 30% and 20%, respectively. It also causes a test to fail (test/where_command.bats), but I think it's a false positive based on other manual testing.

I did see that post. Haven't run the patch myself yet. We're happy to receive patches for our broken tests! :D We're also happy to receive performance improvements, though, as you can see from my comment in that thread (immediately above yours), I do want to get performance testing into our CI pipeline to start capturing, tracking and publishing our perf data so it can be replicated and improved. So, happy to hear your thoughts on that in the other thread. Thanks for helping out!

@Stratus3D
Copy link
Member

From your list @jthegedus , my preference would be 3. If we add support for that to asdf core plugins that need it could implement and start using it immediately after the next release of asdf.

@jthegedus
Copy link
Contributor

Great! A consensus on 3.

add opt-in plugin callback bin/sort_versions and do nothing as current. Then update asdf-plugin-template to start adoption (with default implementation as ls-remote --sort suggested by @stephanGarland) 👍 👍 👍

I will introduce this after I have completed #1445 plugin docs refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants