Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port missing test: TestIndexWriterOnJRECrash #768

Closed
NightOwl888 opened this issue Nov 22, 2022 · 11 comments · Fixed by #786
Closed

Port missing test: TestIndexWriterOnJRECrash #768

NightOwl888 opened this issue Nov 22, 2022 · 11 comments · Fixed by #786

Comments

@NightOwl888
Copy link
Contributor

This test was excluded because we don't have a JRE. But just because we don't have a JRE doesn't mean we don't have to contend with runtime crashes.

The test would need to be refactored to emulate a crash on .NET. The idea is that when a crash occurs, it doesn't corrupt the index.

For anyone curious what it is like to port Java code to .NET, this is a relatively small task that can satisfy your curiosity and will help to ensure Lucene.NET is as stable as it can be.

@NightOwl888 NightOwl888 added up-for-grabs This issue is open to be worked on by anyone help-wanted Extra attention is needed testability good-first-issue Good for newcomers pri:normal labels Nov 22, 2022
@Jeevananthan-23
Copy link
Contributor

Jeevananthan-23 commented Dec 16, 2022

Hello @NightOwl888, I find this is interesting to me I could like to work on this issue. Totally new for FTS, Thanks!

@rclabo
Copy link
Contributor

rclabo commented Dec 16, 2022

@Jeevananthan-23 Thanks for volunteering to tackle this. In general we try to port the Java code to .NET as literally as possible while also adhering to standard .NET coding conventions.

This unit test will need to be ported a bit less literally then most code since it's written to test that Lucene handles a JRE crash without corrupting the index. While we don't have a JRE in .NET, the analogous thing to test would be to test that the index does not get corrupted if the Lucene thread is terminated at a random interval.

This could be accomplished by 1) running the Lucene code on a different thread then the unit test and then terminating the Lucene thread after a random time period or 2) by having the test launch a separate process (which will of course inherently has a thread) that runs Lucene and then killing that process after a random interval.

Either approach would probably work fine.

@NightOwl888
Copy link
Contributor Author

Hi @Jeevananthan-23. Thanks for volunteering.

I don't believe that .NET has a built-in "crash" function that can be executed from the outside of a process like Java does, so we will need to change the approach. My thought is to do something along the lines of:

  1. Have the test generate the C# code to create a basic console app that runs 2 threads.
    a. Thread 1 will execute the code in the TestNRTThreads.TestNRTThreads_Mem() method up to 10 times. This test should be modified to check for the directory to create the index in from an environment variable, and if the environment variable isn't passed to proceed with a random index directory as it currently does.
    b. Thread 2 will sleep for a random number of milliseconds (between 3000 and 4000) and then throw an uncaught exception. This will crash the application on .NET Core. InheritThreadJob to make the thread, similar to how they are done in Java. ThreadJob automatically re-throws the exception when you call Join, which if we leave uncaught will also crash on .NET Framework.
  2. The test would use System.Diagnostics.Process to call dotnet run to compile and run the console app. It should attach an in-memory listener to STDERR so it can confirm that an error occurred.
  3. The test should create a random temp directory name and pass the name to the console app using the EnvironmentVariables property of System.Diagnostics.Process.StartInfo.
  4. The test will wait for the process to exit before proceeding.
  5. Once it has exited, the test should check to ensure the output of STDERR has a length greater than 0.
  6. Next, proceed with the original checkIndexes(tempDir) test with the directory that was created in step 3.

Of course, if during your research you think you have found another approach that works, you can modify it accordingly. But please use the original TestNRTThreads.TestNRTThreads_Mem() and checkIndexes(tempDir) to generate the index and check to see that it is not corrupted.

@NightOwl888 NightOwl888 removed up-for-grabs This issue is open to be worked on by anyone help-wanted Extra attention is needed good-first-issue Good for newcomers labels Dec 16, 2022
@NightOwl888
Copy link
Contributor Author

Actually, after thinking this through a bit more, rather than wiring up a new console app, it makes more sense to simply fire up dotnet test and point it to the assembly that has the test in it with an NUnit filter to only run the one test. The test framework extends NUnit and contains a lot of plumbing that will be required to get test to set up properly so this will save a lot of work.

We don't need to dynamically compile anything because the project this test is in contains the test we need to run. So by the time we run the test there is already a DLL file that we can pass to dotnet test to run. The test just needs to use the assembly that it is in to get the full path of the DLL file to execute. See https://stackoverflow.com/a/52956.

As for target framework, we have another test where we have done that already.

  • InstallationTest reads a custom MetaDataAttribute where the moniker for the current target framework is compiled into the assembly.
  • The build adds the MetaDataAttribute here. Simply use the same approach in the Lucene.Net.Tests._I-J.csproj file.

The above test also has an example of using System.Diagnostics.Process to launch an external command.

@NightOwl888
Copy link
Contributor Author

Of course, we still would need a second thread to kill the process, so we can use a similar approach as in Java and just subclass the original test to run and execute that test in the 2nd process. It can use the environment variable that is passing the directory of the index to determine whether it is the original test or the one that is spawned from the original.

@Jeevananthan-23
Copy link
Contributor

Jeevananthan-23 commented Dec 17, 2022

Sounds Challenging to me let me dive deep into it🤿. Thank you for your explanation!

@Jeevananthan-23
Copy link
Contributor

Jeevananthan-23 commented Dec 22, 2022

I think can't stress test internally in my system maybe something is wrong with my code it taking a long time to run 0958023 . Please check the code for the test if possible is there something wrong let me know (Process and ThreadPumper issue). Thanks!

@NightOwl888
Copy link
Contributor Author

Looks like it is coming along.

  1. I think the main issue is that you are not passing the name of the test DLL to dotnet test. If not passed a name, it will run all of the tests, which could take a several minutes.
  2. I don't think that ThreadPumper makes any sense in .NET. ThreadPumper is a Java-only thing. We can just attach StandardOutput events directly from process and it should "just work" without any "pumping". See: https://stackoverflow.com/a/285841 and https://stackoverflow.com/a/9730455
  3. Since we will have a random directory name, it is important that it is passed to the spawned thread from the original thread in an environment variable so the spawned thread knows where to create the index and the final check can use the same directory after the crash to check it. You are passing one in, but the second process (in the SetUp() method) needs to use it if is passed in. This is critical because the test in the base class uses this value when it creates the index. We need to use TempDir.FullName as the value of the "tempDir" environment variable that is passed to the spawned process. We can also use the "tempDir" environment variable instead of "tests.crashmode" to detect which process we are running and to set the TempDir in SetUp() of the spawned process. See the example below.
  4. We can eliminate everything in CrashJRE and simply throw an exception to bring down the runtime. However, we need to ensure that the crash thread Join() method call is before the loop for Join() method call. I think we will need two threads for this to work 1) A thread to run the crash (which we have) and 2) A thread to run the loop for base.TestNRTThreads_Mem(). The main thread should call Join() and wait for both of the other threads to complete. We may need some experimentation with what priority will work, though. ThreadJob will catch the exception and re-throw it when Join() is called, which is definitely where this needs to happen on .NET Framework. It should work on .NET Core also, but it would be a better test in that environment if we override SafeRun(ThreadStart) and leave the implementation empty so the exception happens during the thread run instead of at the Join() on the main thread.
  5. For the test name matching, I think we might need to use the contains (~) operator instead of the equality (=) operator. Also the name to use is Name rather than TestName on NUnit. If that advice doesn't help, you might want to create a throwaway project with a couple of tests in it to experiment with getting it to execute one and not the other.
        [SetUp]
        public override void SetUp()
        {
            base. Setup();
            var tempDir = Environment.GetEnvironmentVariable("tempDir");
            if (tempDir is null)
            {
                TempDir = CreateTempDir("netcrash");
                TempDir.Delete();
                TempDir.Create();
            }
            else
            {
                TempDir = new DirectoryInfo(tempDir);
            }
        }

@NightOwl888
Copy link
Contributor Author

NightOwl888 commented Dec 22, 2022

Wait - about passing a temp directory. Let's not do that. I looked at the code and it is passing tests.seed, which will sync up the directory name (as well as make the spawned process repeatable).

So, let's NOT do:

        [SetUp]
        public override void SetUp()
        {
            base. Setup();
            var tempDir = Environment.GetEnvironmentVariable("tempDir");
            if (tempDir is null)
            {
                TempDir = CreateTempDir("netcrash");
                TempDir.Delete();
                TempDir.Create();
            }
            else
            {
                TempDir = new DirectoryInfo(tempDir);
            }
        }

And instead pass the system properties. We have renamed them in .NET using a : because . has a special meaning. This can either be passed as an environment variable or on the command line, although, the latter is a bit of black magic.

You are missing a bunch of command line arguments to get this to work. This is probably the bare minimum of what we need.

ProcessStartInfo startInfo = new ProcessStartInfo
{
    FileName = "dotnet",
    Arguments = String.Join(" ", new[] {
        "test", this.GetType().Assembly.Location,
        "--framework", GetTargetFramework(),
        "--filter", "Name~TestIndexWriterOnJRECrash",
        "--logger:\"console;verbosity=normal\"",
        "--no-build",
        // NOTE: This is special syntax that dotnet test supports to pass .runsettings on the command line
        "--", $"RunConfiguration.TargetPlatform={GetTargetPlatform()}" }),
    WorkingDirectory = theDirectory,
    EnvironmentVariables = {
        { "lucene:tests:seed", RandomizedContext.CurrentContext.RandomSeedAsHex }, 
        { "lucene:tests:culture", Thread.CurrentThread.CurrentCulture.Name }, 
        { "tests:crashmode", "true" }, 
        // passing NIGHTLY to this test makes it run for much longer, easier to catch it in the act...
        { "lucene:tests:nightly", "true" }
    },
    RedirectStandardOutput = true,
    RedirectStandardError = true,
    UseShellExecute = false
};

And yes, that means we can keep the existing SetUp() method and tests:crashmode environment variable code.

I am not sure how much testing has been done with passing system properties as environment variables has been done, but I know for certain the command line approach works if the above code does not.

NOTE: The lucene: prefix is required for environment variables as system properties only. When passing system properties on the command line, this prefix is not required. If you change the code to read SystemProperties.GetProperty("tests:crashmode") instead of Environment.GetEnvironmentVariable("tests.crashmode") then all of the properties can be passed in the same way (that is, specified as "lucene:tests:crashmode" in an environment variable or "tests:crashmode" on the command line).

Note that I don't know for certain this is the right filter. I got it working before, but I thought I specified ~= or =~ in that case.

@Jeevananthan-23
Copy link
Contributor

Jeevananthan-23 commented Dec 22, 2022

Let me check further it's consuming my memory a lot because it takes all the test classes so filter variables are not correct.

@Jeevananthan-23
Copy link
Contributor

@NightOwl888, I submitted the PR and look forward to your feedback. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants