Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle large history file properly by reading lines in the streaming way #3810

Merged
merged 3 commits into from
Oct 3, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
51 changes: 50 additions & 1 deletion PSReadLine/History.cs
Original file line number Diff line number Diff line change
Expand Up @@ -457,12 +457,61 @@ private void ReadHistoryFile()
{
WithHistoryFileMutexDo(1000, () =>
{
var historyLines = File.ReadAllLines(Options.HistorySavePath);
var historyLines = ReadHistoryLinesImpl(Options.HistorySavePath, Options.MaximumHistoryCount);
UpdateHistoryFromFile(historyLines, fromDifferentSession: false, fromInitialRead: true);
var fileInfo = new FileInfo(Options.HistorySavePath);
_historyFileLastSavedSize = fileInfo.Length;
});
}

static IEnumerable<string> ReadHistoryLinesImpl(string path, int historyCount)
{
const long offset_1mb = 1048576;
const long offset_05mb = 524288;

// 1mb content contains more than 34,000 history lines for a typical usage, which should be
// more than enough to cover 20,000 history records (a history record could be a multi-line
// command). Similarly, 0.5mb content should be enough to cover 10,000 history records.
// We optimize the file reading when the history count falls in those ranges. If the history
// count is even larger, which should be very rare, we just read all lines.
long offset = historyCount switch
{
<= 10000 => offset_05mb,
<= 20000 => offset_1mb,
_ => 0,
};

using var fs = new FileStream(path, FileMode.Open);
using var sr = new StreamReader(fs);

if (offset > 0 && fs.Length > offset)
{
// When the file size is larger than the offset, we only read that amount of content from the end.
fs.Seek(-offset, SeekOrigin.End);

// After seeking, the current position may point at the middle of a history record, or even at a
// byte within a UTF-8 character (history file is saved with UTF-8 encoding). So, let's ignore the
// first line read from that position.
sr.ReadLine();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov Regarding you comment "I could save the file in Utf16", PSReadLine always saves the history file in UTF-8 encoding. Both File.CreateText and File.AppendText writes UTF-8 encoding text to the file, so if a user saves the file in Unicode, then it may be broken to read later after PSReadLine writes something to the file. So, I will assume it's UTF-8 encoding.

Can you please review again? Thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you see no problem that the history file may be corrupted in very rare cases this code looks good.


For reflection only, I have the following questions and doubts.

  1. The user can actually save the file in a different encoding.
  2. Only .Net Core at some point started using Utf8 by default. Windows PowerShell can write in Utf16 by default.
  3. Although StreamReader detects the encoding automatically, the current code seems to ignore this as it skips the beginning of the file.
  4. Why do we store the beginning of a large file when now we don't use it anyway and the user doesn't even know about it? Probably we can find a way to trim it, although it's not easy to think of.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user can actually save the file in a different encoding.

If the user manually changes encoding, to like UTF-16 LE, the history file will be corrupted once PSReadLine writes updates to the file, because PSReadLine always writes text in UTF-8. I verified this using both Windows PowerShell and PowerShell 7+.

See the code below (Both File.CreateText and File.AppendText writes UTF-8 encoding text to the file, on both .NET and .NET Framework):

using (var file = overwritten ? File.CreateText(Options.HistorySavePath) : File.AppendText(Options.HistorySavePath))
{
for (var i = start; i <= end; i++)
{
HistoryItem item = _history[i];
item._saved = true;
// Actually, skip writing sensitive items to file.
if (item._sensitive) { continue; }
var line = item.CommandLine.Replace("\n", "`\n");
file.WriteLine(line);
}
}

Only .Net Core at some point started using Utf8 by default. Windows PowerShell can write in Utf16 by default.

Again, File.CreateText and File.AppendText writes UTF-8 encoding text to the file, on both .NET and .NET Framework

Although StreamReader detects the encoding automatically, the current code seems to ignore this as it skips the beginning of the file.

I tried it out, and it turns out that StreamReader detects encoding when creating the instance.

Why do we store the beginning of a large file when now we don't use it anyway and the user doesn't even know about it? Probably we can find a way to trim it, although it's not easy to think of.

I think you missed something in the code. If the user decides to set max history count to more than 20,000, we will read all content from the file. So, the history is still accessible to the user as long as they want.


string line;
while ((line = sr.ReadLine()) is not null)
{
if (!line.EndsWith("`", StringComparison.Ordinal))
daxian-dbw marked this conversation as resolved.
Show resolved Hide resolved
{
// A complete history record is guaranteed to start from the next line.
break;
daxian-dbw marked this conversation as resolved.
Show resolved Hide resolved
}
}
}

// Read lines in the streaming way, so it won't consume to much memory even if we have to
// read all lines from a large history file.
while (!sr.EndOfStream)
{
yield return sr.ReadLine();
}
}
}

void UpdateHistoryFromFile(IEnumerable<string> historyLines, bool fromDifferentSession, bool fromInitialRead)
Expand Down