Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a GUID when creating the temp path #4645

Open
wants to merge 6 commits into
base: master
from

Conversation

@jwood803
Copy link
Contributor

jwood803 commented Jan 11, 2020

Addresses #568.

@justinormont Not sure if the placement of it is the best, but it should still prevent the collisions.

@jwood803 jwood803 requested a review from dotnet/mlnet-core as a code owner Jan 11, 2020
@@ -256,7 +256,7 @@ protected void GetPath(out string pathEnt, out string pathTemp, string dir, stri
string root = Path.GetFullPath(DirTemp ?? @"x:\dummy");
string entityPath = Path.Combine(root, dir ?? "", name);
entityPath = Path.GetFullPath(entityPath);
string tempPath = Path.Combine(root, PathMap.Count.ToString());
string tempPath = Path.Combine(root, Guid.NewGuid().ToString());

This comment has been minimized.

Copy link
@harishsk

harishsk Jan 12, 2020

Member

Can you please help me understand the problem you are trying to fix?

This comment has been minimized.

Copy link
@justinormont

justinormont Jan 12, 2020

Member

Currently, there's a race condition in creating temporary files, causing a naming collision.

The paths are created as C:\Users\jormont\AppData\Local\Temp\2\TLC_25553D77\0, where:

  1. TLC_ is fixed
  2. 25553D77 is a seeded random in hex (bad)
  3. 0 is the count [0..N] of folders existing in the parent folder (also bad).

The prefix TLC_ (1) is fine, though should be updated to the current name, perhaps to ML_NET_. Or Microsoft_ML_, though I'd worry about creating too long of paths under Windows ~260 chars, especially as a we move towards a GUID. Note users can have long usernames, plus the names of the files placed in this path.

First issue -- The seeded random number (2) is an issue since running parallelly (or rerunning) with the same seed creates the same "random" folder name. This part is a deterministic collision.

Second issue -- The race condition then occurs when counting how many folders (3) are in the parent folder. When multiple processes/threads check the contents of the folder, they all receive the same number. The purposed fix replaces this count [0..N] with a GUID.

The need is to have unique folders for creating temporary files, otherwise they clobber each other. This is causing a crash and has the possibility of silent failures, especially on macOS/Linux where you can delete and overwrite files with currently open handles.


Currently the way I run into this, is running N parallel copies of the AutoML․NET CLI, which I do for validating changes to the ML․NET AutoML API on ~30 datasets. Some of the processes crash with the error in the linked issue, unless they are started staggered to avoid the race condition.

This comment has been minimized.

Copy link
@justinormont

justinormont Jan 12, 2020

Member

Pushing deeper into the code, the random number (2) is not seeded, but uses the system seed which is set from the current time. I expect the collision occurs due the remark in System.Random docs, "On most Windows systems, Random objects created within 15 milliseconds of one another are likely to have identical seed values." (src). Hence another race condition in (2).

Hopefully the .NET GUIDs don't have the same 15ms race condition as the RNG.

This comment has been minimized.

Copy link
@harishsk

harishsk Jan 14, 2020

Member

In the current code, it looks like we are using our own random number generation, which seems odd.
It is okay to use GUIDs. But they seem a bit of overkill.

Path.GetRandomFileName seems to be intended for this kind of scenario.
Would that work better?

This comment has been minimized.

Copy link
@justinormont

This comment has been minimized.

Copy link
@jwood803

jwood803 Jan 15, 2020

Author Contributor

@harishsk I had no idea that method existed. Thanks for letting me know! Updated to use it.

@codecov

This comment has been minimized.

Copy link

codecov bot commented Jan 12, 2020

Codecov Report

❗️ No coverage uploaded for pull request base (master@0435f6b). Click here to learn what that means.
The diff coverage is 40%.

@@            Coverage Diff            @@
##             master    #4645   +/-   ##
=========================================
  Coverage          ?   75.84%           
=========================================
  Files             ?      947           
  Lines             ?   172184           
  Branches          ?    18579           
=========================================
  Hits              ?   130595           
  Misses            ?    36416           
  Partials          ?     5173
Flag Coverage Δ
#Debug 75.84% <40%> (?)
#production 71.43% <40%> (?)
#test 90.7% <ø> (?)
Impacted Files Coverage Δ
src/Microsoft.ML.Core/Data/Repository.cs 79.72% <40%> (ø)
@@ -256,7 +256,7 @@ protected void GetPath(out string pathEnt, out string pathTemp, string dir, stri
string root = Path.GetFullPath(DirTemp ?? @"x:\dummy");
string entityPath = Path.Combine(root, dir ?? "", name);
entityPath = Path.GetFullPath(entityPath);
string tempPath = Path.Combine(root, PathMap.Count.ToString());
string tempPath = Path.Combine(root, Path.GetRandomFileName());
tempPath = Path.GetFullPath(tempPath);

string parent = Path.GetDirectoryName(entityPath);

This comment has been minimized.

Copy link
@justinormont

justinormont Jan 15, 2020

Member

The current function here, GetPath(), creates the path. It's called below in CreateEntry():

GetPath(out pathEnt, out pathTemp, dir, name, true);
if (PathMap.ContainsKey(pathEnt))
throw ExceptionContext.ExceptParam(nameof(name), "Duplicate entry: '{0}'", pathEnt);
else
PathMap.Add(pathEnt, pathTemp);

There another minor race condition in how the new path is being checked/added to the dictionary. It's a nit, and minor enough to ignore if you want.

If we want to handle in this PR, we could fix by either adding a lock around the IF/ELSE, or by moving to a ConcurrentDictionary and using its TryAdd() to decide if to throw.

(I tagged this on a line of code to allow for a threaded discussion)

This comment has been minimized.

Copy link
@jwood803

jwood803 Jan 15, 2020

Author Contributor

Went the ConcurrentDictionary route, but let me know if that's not the correct implementation of it.

@@ -333,10 +334,8 @@ public Entry CreateEntry(string dir, string name)
string pathEnt;
string pathTemp;
GetPath(out pathEnt, out pathTemp, dir, name, true);
if (PathMap.ContainsKey(pathEnt))
if (PathMap.TryAdd(pathEnt, pathTemp))

This comment has been minimized.

Copy link
@justinormont

justinormont Jan 16, 2020

Member

Assuming I'm reading the docs right, TryAdd() returns false when it doesn't add (docs).

Hence need to invert the return value.

Suggested change
if (PathMap.TryAdd(pathEnt, pathTemp))
if (!PathMap.TryAdd(pathEnt, pathTemp))

This comment has been minimized.

Copy link
@jwood803

jwood803 Jan 16, 2020

Author Contributor

Whoops! I thought I added that, but I must have missed it. Thanks for catching!

@@ -525,7 +524,7 @@ public Entry OpenEntryOrNull(string dir, string name)
// Extract to a temporary file.
Directory.CreateDirectory(Path.GetDirectoryName(pathTemp));
entry.ExtractToFile(pathTemp);
PathMap.Add(pathLower, pathTemp);
PathMap.TryAdd(pathLower, pathTemp);

This comment has been minimized.

Copy link
@justinormont

justinormont Jan 16, 2020

Member

To maintain similar to the previous behavior, we'll have to throw if the TryAdd() returns false.

The previous Dictionary.Add() throws an ArgumentException if the value being added already exists (docs).

Suggested change
PathMap.TryAdd(pathLower, pathTemp);
if (!PathMap.TryAdd(pathLower, pathTemp))
throw ExceptionContext.ExceptParam(nameof(name), "Duplicate entry: '{0}'", pathLower);

This comment has been minimized.

Copy link
@jwood803

jwood803 Jan 16, 2020

Author Contributor

Updated to throw exception.

@justinormont

This comment has been minimized.

Copy link
Member

justinormont commented Jan 16, 2020

I restarted the failed tests, hopefully they succeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.