Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

atuin history gc? #2015

Open
AlJohri opened this issue May 13, 2024 · 10 comments
Open

atuin history gc? #2015

AlJohri opened this issue May 13, 2024 · 10 comments

Comments

@AlJohri
Copy link

AlJohri commented May 13, 2024

I seem to have accidentally re-run atuin import zsh after already using atuin for a period of time. I installed atuin on a new computer and kept syncing and was getting 0 results. I wasn't sure what was going on until I realized that the new installation was using the sync v2. I have since upgraded all installations to use sync v2, however as part of trying to figure out what was going on, I had re-run the atuin import zsh command and then run atuin sync. This appears to have doubled the size of my history from 150k records to 300k records.

Is there anyway I can "garbage collect" down to a list of unique entries and clean up my atuin history?

@AlJohri
Copy link
Author

AlJohri commented May 20, 2024

It seems like its taking almost 0.5 to 1 second when I press CTRL-R (not sure how to check exactly). Would love to figure out some way to trim some of this history down or speed up the TUI so CTRL-R is instantaneous.

As a proxy for timing, it takes about 5 seconds to run atuin history list:

❯ atuin history list | wc -l
366974

❯ time atuin history list &> /dev/null
atuin history list &> /dev/null  4.87s user 2.81s system 172% cpu 4.448 total

@AlJohri
Copy link
Author

AlJohri commented Jun 22, 2024

Looks like my slowdown is likely related to getting the unique set of commands: See #475

@zuzzurro
Copy link

I have the same issue. I am actually running "atuin import zsh" multiple times in order to incrementally save my zsh history before I actually really start using atuin and commands are added multiple times. Some sort of deduplication based on the time and command would be extremely appreciated.

@zuzzurro
Copy link

Any feedback on this ticket at all?

@ellie
Copy link
Member

ellie commented Jul 17, 2024

@AlJohri in response to your issue, I'd suggest trying to craft a atuin search --delete that specifically targets your newly-imported data. You should be able to delete all records after a certain date.

If that is not possible, then I'm afraid the only real way forwards would be to delete and start from scratch :/

Due to how the importers work at the moment, it's not that straightforward to correctly identify duplicate imports vs commands that have been ran a few times.

Otherwise, if you're able to profile the startup/searching with your data, I'd really appreciate it! You may find that the prefix search mode performs much faster.

I am actually running "atuin import zsh" multiple times in order to incrementally save my zsh history before I actually really start using atuin and commands are added multiple times.

@zuzzurro I'm afraid this use case isn't supported. Importing is for migrating old data to Atuin, and not for keeping Atuin up-to-date. Given that the original data soruce doesn't always have enough information to effectively de-dupe, it's not that straightforward.

Your request isn't quite the same as the OP, so I'd appreciate you opening a new issue if there's anything you wish to have help with.

@zuzzurro
Copy link

I asked before trying to solve the issue myself....
But let me ask you one question. In the zsh import scenario spotting duplicate records should not be that hard since history items come with a time? So if the cmd is the same and the time is the same, why importing it twice? I may be also used to the fact that in zsh I have the "histignorealldups" that in atuin doesn't exist if I'm not wrong...

@ellie
Copy link
Member

ellie commented Jul 17, 2024

@zuzzurro #2290

@zuzzurro
Copy link

Thanks. As I said my personal perspective is probably skewed by the fact that in zsh I don't care about saving multiple copies of the same command... so no hard feelings at all if it nobody else cares about it.

@AlJohri
Copy link
Author

AlJohri commented Jul 18, 2024

If that is not possible, then I'm afraid the only real way forwards would be to delete and start from scratch :/

Due to how the importers work at the moment, it's not that straightforward to correctly identify duplicate imports vs commands that have been ran a few times.

What if we ignored the distinction between commands that have been run a few times and duplicate imports and deduplicated it anyway? That way I still have at least 1 copy of every command I have ever run accessible through prefix or exact match search. And I can build up the more frequent commands again over time into the history.

Is this possible to do? Are there any implications with doing this that I am missing?

@AlJohri
Copy link
Author

AlJohri commented Jul 18, 2024

Otherwise, if you're able to profile the startup/searching with your data, I'd really appreciate it! You may find that the prefix search mode performs much faster.

Is there a guide on profiling atuin I can follow? Or some high level steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants