Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"fds forget" feature proposal #65

Open
guysmoilov opened this issue Jun 21, 2021 · 5 comments
Open

"fds forget" feature proposal #65

guysmoilov opened this issue Jun 21, 2021 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@guysmoilov
Copy link
Member

Scenario: You accidentally git add'ed or dvc add'ed a path that you didn't intend to.

It's a commonly googled question: https://stackoverflow.com/questions/1274057/how-to-make-git-forget-about-a-file-that-was-tracked-but-is-now-in-gitignore

What fds forget can add:

  1. Easier naming - no more googling required
  2. Automatically detect whether the file is tracked by git or DVC
  3. Remove the file from DVC cache if it is tracked by DVC (after confirmation from the user)
  4. Remove the relevant .dvc file if it exists, and also make git forget about that file
  5. More?
@guysmoilov guysmoilov added the enhancement New feature or request label Jun 21, 2021
@indweller
Copy link
Contributor

Hi @guysmoilov
I looked at the git part of this problem. There are two parts:
a) If you have not yet committed the file yet, then a simple git restore --staged <file> will do.
b) But if you want to untrack a file that has already been tracked and committed, then it's tricky because doing git rm --cached will remove the file from others' systems (locally) when they do a git pull (You also have to list the file in .gitignore). If we do git update-index --assume-unchanged, then it won't show the file in unstaged changes, but I think it continues to remain in the repo.

@guysmoilov
Copy link
Member Author

@indweller Thanks for the research!
Yes, making git forget a committed file is daly next to impossible for a distributed repo.
As the first line in the issue suggests, I think we should focus on git add and dvc add - fds forget is IMO much easier to remember than git restore --staged <file> and also should handle removing the file from DVC tracking.

@indweller
Copy link
Contributor

indweller commented Aug 4, 2021

Ok so for the git part it can do git restore and the for the DVC part it can do dvc remove (https://dvc.org/doc/user-guide/how-to/stop-tracking-data). Can I work on this issue?

@guysmoilov
Copy link
Member Author

guysmoilov commented Aug 5, 2021

@indweller I think you also need to run some form of dvc gc after dvc remove.
And sure, thank you!

@guysmoilov
Copy link
Member Author

Interesting potentially relevant project: https://rtyley.github.io/bfg-repo-cleaner/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants