Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle large history file properly by reading lines in the streaming way #3810

Merged
merged 3 commits into from Oct 3, 2023

Conversation

daxian-dbw
Copy link
Member

@daxian-dbw daxian-dbw commented Sep 21, 2023

PR Summary

Fix #3771, Fix #1360, Fix #537

Handle large history file properly by reading lines in the streaming way.

PR Checklist

  • PR has a meaningful title
    • Use the present tense and imperative mood when describing your changes
  • Summarized changes
  • Make sure you've added one or more new tests
  • Make sure you've tested these changes in terminals that PowerShell is commonly used in (i.e. conhost.exe, Windows Terminal, Visual Studio Code Integrated Terminal, etc.)
  • User-facing changes
    • Not Applicable
    • OR
    • Documentation needed at PowerShell-Docs
      • Doc Issue filed:
Microsoft Reviewers: Open in CodeFlow

PSReadLine/History.cs Outdated Show resolved Hide resolved
// After seeking, the current position may point at the middle of a history record, or even at a
// byte within a UTF-8 character (history file is saved with UTF-8 encoding). So, let's ignore the
// first line read from that position.
sr.ReadLine();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov Regarding you comment "I could save the file in Utf16", PSReadLine always saves the history file in UTF-8 encoding. Both File.CreateText and File.AppendText writes UTF-8 encoding text to the file, so if a user saves the file in Unicode, then it may be broken to read later after PSReadLine writes something to the file. So, I will assume it's UTF-8 encoding.

Can you please review again? Thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you see no problem that the history file may be corrupted in very rare cases this code looks good.


For reflection only, I have the following questions and doubts.

  1. The user can actually save the file in a different encoding.
  2. Only .Net Core at some point started using Utf8 by default. Windows PowerShell can write in Utf16 by default.
  3. Although StreamReader detects the encoding automatically, the current code seems to ignore this as it skips the beginning of the file.
  4. Why do we store the beginning of a large file when now we don't use it anyway and the user doesn't even know about it? Probably we can find a way to trim it, although it's not easy to think of.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user can actually save the file in a different encoding.

If the user manually changes encoding, to like UTF-16 LE, the history file will be corrupted once PSReadLine writes updates to the file, because PSReadLine always writes text in UTF-8. I verified this using both Windows PowerShell and PowerShell 7+.

See the code below (Both File.CreateText and File.AppendText writes UTF-8 encoding text to the file, on both .NET and .NET Framework):

using (var file = overwritten ? File.CreateText(Options.HistorySavePath) : File.AppendText(Options.HistorySavePath))
{
for (var i = start; i <= end; i++)
{
HistoryItem item = _history[i];
item._saved = true;
// Actually, skip writing sensitive items to file.
if (item._sensitive) { continue; }
var line = item.CommandLine.Replace("\n", "`\n");
file.WriteLine(line);
}
}

Only .Net Core at some point started using Utf8 by default. Windows PowerShell can write in Utf16 by default.

Again, File.CreateText and File.AppendText writes UTF-8 encoding text to the file, on both .NET and .NET Framework

Although StreamReader detects the encoding automatically, the current code seems to ignore this as it skips the beginning of the file.

I tried it out, and it turns out that StreamReader detects encoding when creating the instance.

Why do we store the beginning of a large file when now we don't use it anyway and the user doesn't even know about it? Probably we can find a way to trim it, although it's not easy to think of.

I think you missed something in the code. If the user decides to set max history count to more than 20,000, we will read all content from the file. So, the history is still accessible to the user as long as they want.

@daxian-dbw
Copy link
Member Author

@iSazonov Can you please approve the PR if all your concerns are resolved? I need an approval to merge the PR.

@iSazonov
Copy link

iSazonov commented Sep 28, 2023

@iSazonov Can you please approve the PR if all your concerns are resolved? I need an approval to merge the PR.

Approved with one note but I haven't full permissions :-)

@daxian-dbw daxian-dbw merged commit ac69010 into PowerShell:master Oct 3, 2023
2 checks passed
@daxian-dbw daxian-dbw deleted the history branch October 3, 2023 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants