Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rex::Commands::Sync excludes are poorly handled #1597

Open
bbrtj opened this issue Jun 12, 2023 · 6 comments · May be fixed by #1598
Open

Rex::Commands::Sync excludes are poorly handled #1597

bbrtj opened this issue Jun 12, 2023 · 6 comments · May be fixed by #1598
Labels
triage needed A potential bug that needs to be reproduced and understood

Comments

@bbrtj
Copy link

bbrtj commented Jun 12, 2023

Describe the bug

There are two issues with sync_up and sync_down commands regarding excluded files and directories:

  1. Excludes are looked at too late, after calculating md5 of all files in the directory (recursively). This means a lot of time is spent doing meaningless work, for example like calculating md5 of all node_modules which are often excluded anyway.
  2. Excludes work well only when a flat file is excluded. Excluding nested files or whole directories does not work as one would expect from using tools like gitignore.

Expected behavior

  1. There's absolutely no need to calculate md5 sums of excluded files, as those will be thrown away unused.
  2. I can't grasp how the excluded directories currently work, but I would expect I have to pass a full path to a file which I want excluded, minus any wildcards.

How to reproduce it

  1. Have a large directory alongside your files
  2. Try syncing up with the directory excluded

Code example

sync_up 'my_dir', 'my_remote_dir', {
  exclude => ['directory_name'],
};

Additional context

No response

Rex version

(R)?ex 1.14.2

Perl version

This is perl 5, version 36, subversion 0 (v5.36.0) built for amd64-freebsd

Operating system running rex

FreeBSD 13.2 (amd64)

Operating system managed by rex

FreeBSD 13.1

How rex was installed?

cpan client

@bbrtj bbrtj added the triage needed A potential bug that needs to be reproduced and understood label Jun 12, 2023
@bbrtj bbrtj linked a pull request Jun 12, 2023 that will close this issue
5 tasks
@ferki
Copy link
Member

ferki commented Jun 16, 2023

There are two issues with sync_up and sync_down

Then I would recommend opening two separate GitHub issues ;) I'm OK treating Nr. 1 as a bug, but Nr 2. definitely feels like a new feature. History shows they tend to need different details, reasoning, discussion, timelines, etc., and bundling is rarely the simplest way to track them.

I propose focusing on the "superfluous checksumming of excluded files" bug in this issue, and when that's done we'll see if it's best to bundle the rest of the ideas here, or to split that off into its own discussion and solution.

@bbrtj
Copy link
Author

bbrtj commented Jun 17, 2023

@ferki I can chop off fixing how excludes work from the PR, but then I will not deliver as detailed tests for sync_up/down with excludes. I just don't get them, apart from excluding a single flat file. Will that do?

Also, I think making excludes work like gitignore should not be considered a new feature. I mean, it is a functional change, but the current system is underwhelming and straight up confusing to use - and I tried for several hours before doing the refactor. I consider it a bugfix.

@mrmuskrat
Copy link

It seems to me that sync_down's excludes should work the same as sync_up and allow excluding directories.

@ferki
Copy link
Member

ferki commented Oct 14, 2023

It seems to me that sync_down's excludes should work the same as sync_up and allow excluding directories.

@mrmuskrat: yes, I agree in principle 👍 I didn't look into deeper implications on the actual implementation side yet, though.

@ferki
Copy link
Member

ferki commented Oct 14, 2023

@ferki I can chop off fixing how excludes work from the PR, but then I will not deliver as detailed tests for sync_up/down with excludes. I just don't get them, apart from excluding a single flat file. Will that do?

Looking at the current tests around syncing functionality, I find them to be in a quite poor state. There's no Rex::Commands::Sync tests at all yet, and t/rsync.t has several perlcritic violations. However the scope and expected results are highly similar (if not identical).

Based on that, I would certainly start with fixing that before attempting to change anything about the code. I would probably approach further work on these topics something like the following:

  1. Preparation work:
  • clean up t/rsync.t violations one-by-one
  1. Add proper sync tests:
  • when clean, see if we can convert t/rsync.t to also run the same generated tests for each test case, by calling them with sync_down and sync_up instead of sync
    • if yes, have a single test file exercising all (r)sync-related functionality
    • if not, (re)use a similar test approach for sync_down and sync_up
  • if we identify bugs compared to the expected behavior specified by the sync tests, fix them

Note that that up until this point is about covering the existing code with tests, against current expectations. The sync code is ~10 years old, we certainly don't want to accidentally break it. It's also certainly interesting to find any bugs against original expectations that may have been hidden so far due to lack of test coverage.

The goal is to reach a state where it is safe and simple to express new expectations by new test code.

  1. Change the behavior
  • if all's well, new test cases may be added, for example proposing a change for the file exclusion logic; the new tests will fail at this point, because they are not implemented yet
  • the new expected logic can be implemented to pass the tests, proving that none of the old behavior breaks, and all of the new behavior works

In fact, I started some of this work locally as an exploration of possibilities, but I haven't progressed too much yet. I might revisit the branch and try to push it somewhere for reference on initial work (so that part does not need to be duplicated by someone else).

Also, I think making excludes work like gitignore should not be considered a new feature. I mean, it is a functional change, but the current system is underwhelming and straight up confusing to use - and I tried for several hours before doing the refactor. I consider it a bugfix.

It is a quite old part of the codebase, without any test coverage, and which was added as an escape hatch for situations when rsync is not available. I'm certain that we should past mistakes right sooner or later by adding proper tests before touching the legacy core code in any way, and before looking at how to improve it.

Going forward I see two options:

  • we specify how excludes are expected to work now and in the future by comprehensive tests, proving that the current behavior is the same, and the future behavior is supported; I don't mind introducing breaking behavior in core if warranted, since that's what feature flags are for
  • we make an external Rex::Commands::Sync::Gitignore or similar module outside of the core project (the maintainer of which is free to make any decision however they see fit for their own and users' purposes); that may either find its way to core over time, if desired, or in-core sync could be deprecated in favor of a superior external solution

@bbrtj
Copy link
Author

bbrtj commented Nov 28, 2023

I'm tempted to turn this into a new module, but I don't think leaving Sync as it is is a good idea.

Keeping backward compatibility here is likely a mistake. The old behavior seems to be exclude any file that matches the basename. If you specify test.txt it will happily exclude both test.txt and dir/test.txt. There is no way to exclude a file nested in specific tree and I have not been able to successfully exclude a directory. So the only use case right now is to exclude all files with a given name no matter in which directory, which is not very useful unless your files are very distinctively named.

I'd say breaking compatibility with this would cause less headaches than leaving it as it is and letting people find out why they are missing random files. This is a possibility given that there is no documentation explaining how these excludes work and the code is not that straightforward either.

The code I delivered in the PR is fine and I'm using it successfully, but I have difficulty living up to the standards of this repository, for which I apologize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage needed A potential bug that needs to be reproduced and understood
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants