Handle large history file properly by reading lines in the streaming way #3810

daxian-dbw · 2023-09-21T22:07:54Z

PR Summary

Fix #3771, Fix #1360, Fix #537

Handle large history file properly by reading lines in the streaming way.

PR Checklist

PR has a meaningful title
- Use the present tense and imperative mood when describing your changes
Summarized changes
Make sure you've added one or more new tests
Make sure you've tested these changes in terminals that PowerShell is commonly used in (i.e. conhost.exe, Windows Terminal, Visual Studio Code Integrated Terminal, etc.)
User-facing changes
- Not Applicable
- OR
- Documentation needed at PowerShell-Docs
  - Doc Issue filed:

Microsoft Reviewers: Open in CodeFlow

PSReadLine/History.cs

daxian-dbw · 2023-09-26T20:01:41Z

PSReadLine/History.cs

+                    // After seeking, the current position may point at the middle of a history record, or even at a
+                    // byte within a UTF-8 character (history file is saved with UTF-8 encoding). So, let's ignore the
+                    // first line read from that position.
+                    sr.ReadLine();


@iSazonov Regarding you comment "I could save the file in Utf16", PSReadLine always saves the history file in UTF-8 encoding. Both File.CreateText and File.AppendText writes UTF-8 encoding text to the file, so if a user saves the file in Unicode, then it may be broken to read later after PSReadLine writes something to the file. So, I will assume it's UTF-8 encoding.

Can you please review again? Thanks!

If you see no problem that the history file may be corrupted in very rare cases this code looks good.

For reflection only, I have the following questions and doubts.

The user can actually save the file in a different encoding.

Only .Net Core at some point started using Utf8 by default. Windows PowerShell can write in Utf16 by default.

Although StreamReader detects the encoding automatically, the current code seems to ignore this as it skips the beginning of the file.

Why do we store the beginning of a large file when now we don't use it anyway and the user doesn't even know about it? Probably we can find a way to trim it, although it's not easy to think of.

The user can actually save the file in a different encoding.

If the user manually changes encoding, to like UTF-16 LE, the history file will be corrupted once PSReadLine writes updates to the file, because PSReadLine always writes text in UTF-8. I verified this using both Windows PowerShell and PowerShell 7+.

See the code below (Both File.CreateText and File.AppendText writes UTF-8 encoding text to the file, on both .NET and .NET Framework):

PSReadLine/PSReadLine/History.cs

Lines 366 to 379 in d045b50

using (var file = overwritten ? File.CreateText(Options.HistorySavePath) : File.AppendText(Options.HistorySavePath))

{

for (var i = start; i <= end; i++)

{

HistoryItem item = _history[i];

item._saved = true;

// Actually, skip writing sensitive items to file.

if (item._sensitive) { continue; }

var line = item.CommandLine.Replace("\n", "`\n");

file.WriteLine(line);

}

}

Only .Net Core at some point started using Utf8 by default. Windows PowerShell can write in Utf16 by default.

Again, File.CreateText and File.AppendText writes UTF-8 encoding text to the file, on both .NET and .NET Framework

Although StreamReader detects the encoding automatically, the current code seems to ignore this as it skips the beginning of the file.

I tried it out, and it turns out that StreamReader detects encoding when creating the instance.

Why do we store the beginning of a large file when now we don't use it anyway and the user doesn't even know about it? Probably we can find a way to trim it, although it's not easy to think of.

I think you missed something in the code. If the user decides to set max history count to more than 20,000, we will read all content from the file. So, the history is still accessible to the user as long as they want.

PSReadLine/History.cs

daxian-dbw · 2023-09-27T23:07:30Z

@iSazonov Can you please approve the PR if all your concerns are resolved? I need an approval to merge the PR.

iSazonov · 2023-09-28T04:32:37Z

@iSazonov Can you please approve the PR if all your concerns are resolved? I need an approval to merge the PR.

Approved with one note but I haven't full permissions :-)

PSReadLine/History.cs

Handle large history file properly by reading lines in the streaming way

238ff97

daxian-dbw requested a review from SteveL-MSFT September 21, 2023 22:08

iSazonov reviewed Sep 22, 2023

View reviewed changes

PSReadLine/History.cs Outdated Show resolved Hide resolved

daxian-dbw added 2 commits September 26, 2023 12:44

Read lines instead of bytes

59d5587

Fix for .net462

d293db1

daxian-dbw commented Sep 26, 2023

View reviewed changes

iSazonov reviewed Sep 27, 2023

View reviewed changes

PSReadLine/History.cs Show resolved Hide resolved

iSazonov approved these changes Sep 28, 2023

View reviewed changes

iSazonov reviewed Sep 28, 2023

View reviewed changes

PSReadLine/History.cs Show resolved Hide resolved

daxian-dbw merged commit ac69010 into PowerShell:master Oct 3, 2023

daxian-dbw deleted the history branch October 3, 2023 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle large history file properly by reading lines in the streaming way #3810

Handle large history file properly by reading lines in the streaming way #3810

Uh oh!

daxian-dbw commented Sep 21, 2023 •

edited

Loading

Uh oh!

Uh oh!

daxian-dbw Sep 26, 2023

Uh oh!

iSazonov Sep 27, 2023

Uh oh!

daxian-dbw Sep 27, 2023 •

edited

Loading

Uh oh!

Uh oh!

daxian-dbw commented Sep 27, 2023

Uh oh!

iSazonov commented Sep 28, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

	using (var file = overwritten ? File.CreateText(Options.HistorySavePath) : File.AppendText(Options.HistorySavePath))
	{
	for (var i = start; i <= end; i++)
	{
	HistoryItem item = _history[i];
	item._saved = true;

	// Actually, skip writing sensitive items to file.
	if (item._sensitive) { continue; }

	var line = item.CommandLine.Replace("\n", "`\n");
	file.WriteLine(line);
	}
	}

Handle large history file properly by reading lines in the streaming way #3810

Handle large history file properly by reading lines in the streaming way #3810

Uh oh!

Conversation

daxian-dbw commented Sep 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

PR Checklist

Microsoft Reviewers: Open in CodeFlow

Uh oh!

Uh oh!

daxian-dbw Sep 26, 2023

Choose a reason for hiding this comment

Uh oh!

iSazonov Sep 27, 2023

Choose a reason for hiding this comment

Uh oh!

daxian-dbw Sep 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

daxian-dbw commented Sep 27, 2023

Uh oh!

iSazonov commented Sep 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daxian-dbw commented Sep 21, 2023 •

edited

Loading

daxian-dbw Sep 27, 2023 •

edited

Loading

iSazonov commented Sep 28, 2023 •

edited

Loading