Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconfigure the detectors to use global code for the Binary Serializers #112

Closed
lmalenfant opened this issue Oct 6, 2023 · 65 comments · Fixed by #129
Closed

Reconfigure the detectors to use global code for the Binary Serializers #112

lmalenfant opened this issue Oct 6, 2023 · 65 comments · Fixed by #129
Assignees

Comments

@lmalenfant
Copy link
Member

The Detectors section of MonteCarlo has a lot of duplicated code and every time we add another detector, the code is duplicated more. The Binary Serializers could be consolidated for the detectors so this would reduce the code duplication.

@lmalenfant lmalenfant self-assigned this Oct 6, 2023
@dcuccia
Copy link
Contributor

dcuccia commented Oct 6, 2023

We should agree on a design for this up front before implementing - IMO a good reason to have discussions that get promoted to issues.

@lmalenfant
Copy link
Member Author

I agree but I already started a POC and needed to commit the code somewhere so I created a branch where I can try things out. This will probably take some time and I really need to try things out in code before coming up with a design. If you prefer I work on a fork, I can do that.

@dcuccia
Copy link
Contributor

dcuccia commented Oct 6, 2023

Fork could work, or, middle of the road. Could be to rename this ticket to something like "spike - experiment with decoupling detector serialization".

In my opinion, there's more refactoring to consider beyond serialization for detectors, that could make it easier to implement with a lower touch/surface area. This larger piece could interact with our serialization strategy.

I may have a little bit of time opening up to complete the initial python code and work on this with you on a spike branch where we intend to throw away the code and implement the final design we come up with.

@dcuccia
Copy link
Contributor

dcuccia commented Nov 2, 2023

Hi friends. Did a little thinking and have two (hopefully constructive) pieces of feedback on this ticket:

  1. Per our live conversation on naming, I thought a little bit about the naming and namespace location of BinaryArraySerializerFactory. I think it's appropriate. It does what it says - it has one static factory method that creates BinaryArraySerializer(s). And in terms of discovery it's probably in the right place. See this justification, for example.

The alternative would be to put the CreateSerializer implementation into the BinaryArraySerializer class directly, which would be ok with me (but see caveat below). Many classes have factory creation methods. To lead developers into "the pit of success", they also make the class constructor private, so there's "only one way" to create the class (to minimize confusion/maximize correctness). If we did that, we'd probably want to clean up (remove) the public "setter" components of the BinaryArraySerializer, so that there's truly only one way they are instantiated - through the factory method - and mutation (source of bugs) is not allowed.

Caveat - as I've leaned into more "functional programming" modalities, I've tended to see the benefit of the segregation of constructed classes from the "helper" methods that allow their creation/manipulation. Think LINQ - all those Select, Where operators etc are defined in separate classes from the collection classes that "hold" the data.

Another note: I see that the Obsolete attribute warning on the BinaryArraySerializer.DataArray property member is not present on this new feature branch. I think it's important to remove (and remove assignments in all of the implementations), because it incorrectly exposes/guides improper use of the functionality, and is never used/needed anyway.

Lmk what y'all think - I can make any of the above changes if we want.

  1. I think I figured out a relatively safe/extensible way to flatten/remove all the various "for loops" in the factory (487 lines of code down to 107). See my Gist here:

https://gist.github.com/dcuccia/552c52716211cfecaf48dd84fe254f8b

It doesn't change the surface area of the public API nothing in the detector code would change. Could make the factory method consolidation to BinaryArraySerializer a little more palatable, if we were going that direction. I haven't done much testing, but this should be a "good performance" version without any boxing or copying. Let me know what you think.

@lmalenfant
Copy link
Member Author

@dcuccia thank you for thinking this through! Let me think about your suggestions and I will also take a look at your Gist when I have more time.

Another note: I see that the Obsolete attribute warning on the BinaryArraySerializer.DataArray property member is not present on this new feature branch. I think it's important to remove (and remove assignments in all of the implementations), because it incorrectly exposes/guides improper use of the functionality, and is never used/needed anyway.

Just a quick note on this, we didn't forget it, @hayakawa16 has made a note to add it back in once all the work on the detectors is complete because it is not obsolete until then and it will avoid many warnings on the unchanged detectors while the code is in progress. Good eye noticing that though :)

@dcuccia
Copy link
Contributor

dcuccia commented Nov 2, 2023

Cool, thanks on both accounts!

@hayakawa16
Copy link
Member

I added the Obsolete attribute to DataArray in BinaryArraySerializer and all unit tests run fine. I think that means I have updated all detectors?
So outstanding items on this branch is potential modifications to BinaryArraySerializerFactory and possibly modifying the AllDetectorTests to use each detector class to write and read binary so that other binary (besides Mean and SecondMoment) get tested too (I think this was Lisa's suggestion?). I can work on latter.

@dcuccia
Copy link
Contributor

dcuccia commented Nov 2, 2023

Thanks Carole. I think there are still 53 references to BinaryArraySerializer.DataArray throughout the solution. Can we remove these obsolete assignments? (they're not used). For tests, should we create an integration test class for each named detector, to test the specific serialization expectations of that particular detector (and possibly share any boilerplate code needed between those tests)?

@hayakawa16
Copy link
Member

Hi, I did a Find on BinaryArraySerializer.DataArray and selected "Entire Solution" and the only one that is shown is in Vts.xml. How are you searching?

@hayakawa16
Copy link
Member

Are you looking at the feature/112-reconfigure-the-detectors-to-use-global-code-for-the-binary-serializers_2 (note "_2" at end) branch? This is the new branch Lisa and I made that picked out only those changes that support your implementation (other trial code was omitted).

@dcuccia
Copy link
Contributor

dcuccia commented Nov 2, 2023

I was looking at _2. Is this not the right branch?

@hayakawa16
Copy link
Member

Yes! How did you find those 53 references?

@dcuccia
Copy link
Contributor

dcuccia commented Nov 2, 2023

VS said there were 53 usages. On my way home and will try to repro there.

@hayakawa16
Copy link
Member

Thanks! Also have you pulled the latest? I've been busy updating detectors this week.

@dcuccia
Copy link
Contributor

dcuccia commented Nov 2, 2023

Yes, you have! <3

After rebuilding, it says it's actually 49 references:

image

Here's all the examples. These lines can be removed:

image

@dcuccia
Copy link
Contributor

dcuccia commented Nov 2, 2023

Oh, that's weird. Might be a Visual Studio cache issue...sorry stand by.

@hayakawa16
Copy link
Member

Wait, now I'm seeing them. Okay, let me work on those detectors. I kind of feel like I modified these detectors already. Well oh well. Deja vu all over again!

@dcuccia
Copy link
Contributor

dcuccia commented Nov 2, 2023

I cloned fresh and there were only 23, with a couple on the tests side:

image

@hayakawa16
Copy link
Member

Thanks for the helpful screen shot. I modified all of those files. The only place string "DataArray" shows up now is in Vts.xml and [Obsolete] DataArray in BinaryArraySerializer and BinaryArraySerializerFactory.
Unit tests pass and I pushed my fixes.

@lmalenfant
Copy link
Member Author

lmalenfant commented Nov 2, 2023

This conversation reminded me that when I was working with the serialization code, it doesn't appear that Dimensions in BinaryArraySerializer is used either. @dcuccia & @hayakawa16 Could you you take a look and see if that is correct or if it is being used elsewhere?

@hayakawa16
Copy link
Member

I found no references to BinaryArraySerializer.Dimensions. Shall I add Obsolete attribute?

@dcuccia
Copy link
Contributor

dcuccia commented Nov 2, 2023

Good call - that is no longer needed, and was new as of this effort. Please delete it.

@lmalenfant
Copy link
Member Author

@dcuccia it was not new, it's existing, I noticed it wasn't being used though when I first started the work on the detectors. We should probably put the obsolete attribute on it just in case it was being used externally and then remove it when we remove DataArray.

@dcuccia
Copy link
Contributor

dcuccia commented Nov 3, 2023

My mistake. I think I'd added a GetDimensions method and removed it, which is what had confused me. Sounds good.

@dcuccia
Copy link
Contributor

dcuccia commented Nov 6, 2023

Hi Carole,

I will have time later to check out the code, but I find this type of code is useful to have as a test-only library helper (it's not relevant for the production code, just the tests. It could be in a Helpers folder in the test library, or if multiple test libraries need access to it, create a Vts.Test.Common or similar library.

In terms of location, this would be testing the serialization of that specific class, so I assume we want a 1:1 relationship between the .cs test class and the detector (1 for each). This item unit tests the serialization, another could unit test the accuracy, etc.

@lmalenfant
Copy link
Member Author

@hayakawa16 when I create helper methods for unit testing, I usually create a separate folder called helpers that doesn't have a matching folder in the project being tested. You could put that in the MonteCarlo folder or at the folder level of the Detectors depending on where it will be used.

I also wanted to mention something here about the tests in the VTS. When we were adding test coverage we noticed that the majority of the tests were in fact integration tests and not true unit tests. Where possible, I added unit tests to increase the code coverage but in some case it wasn't possible due to way the original code was written, static classes and methods etc. It would be good once we have the coverage where we want it to be, to identify the areas where we are missing unit tests and add them.

This might be a good time to come up with a structure for the test application so we can easily differentiate unit tests and integration tests. In my projects I name the unit test classes the same as the class being unit tested and then I put the integration tests in a separate location under an IntegrationTests folder because they usually include many classes. Any suggestions for this would be welcome.

@dcuccia
Copy link
Contributor

dcuccia commented Nov 6, 2023

Great points Lisa. At Modulim, we go as far as to have separate libraries, e.g. Vts.Test.Unit and Vts.Test.Integration. We run the Unit tests in CI for every branch commit, and run the Integration tests once a (non-draft) PR is created.

@lmalenfant
Copy link
Member Author

@dcuccia I like that idea! I was going to say separate projects but I didn't know if that was overkill. The fact that you used it at Modulim is good because it sounds like it worked well.

@dcuccia
Copy link
Contributor

dcuccia commented Nov 6, 2023

It does. The other way to do it is to flag tests/test classes as integration or unit or e2e, and use a command line switch to run one or the other. We have some very slow running tests that depend on an outside service that we only run before we build a production release candidate, and we use flags for those.

@hayakawa16
Copy link
Member

I like these ideas. I have more nuts and bolts questions:

  1. for the single value detectors (e.g. ATotalDetector), GetBinarySerializers() => null; is in code. So I plan to leave those tests as they are?
  2. I put DetectorBinarySerializerHelper for now in Vts.Test/MonteCarlo/IO/. We may want to move around later. I just needed it somewhere to update the existing unit tests.
  3. There are currently 18 detectors in DetectorIOTests. I have 54 more tests to add. Can I go ahead and do this and after that we can move things around based on "unit" vs "integration" tests? Or would you rather I do that before I fill out these tests?
  4. These updated 18 unit tests found an error in the way that the new code in GetBinarySerializer was written in some detectors. So that is good! Turns out you need to define the SecondMoment arrays without checking if TallySecondMoment is true, i.e. for ReflectedMTOfRhoAndSubregionHistDetector:
Mean ??= new double[Rho.Count - 1, MTBins.Count - 1];
SecondMoment ??= new double[Rho.Count - 1, MTBins.Count - 1];
var allSerializers = ...

rather than

Mean ??= new double[Rho.Count - 1, MTBins.Count - 1];
if (TallySecondMoment)
{
  SecondMoment ??= new double[Rho.Count - 1, MTBins.Count - 1];
}
var allSerializers =...

Otherwise the SecondMoment array was not in the serializers list in WriteClearAndReReadArrays in DetectorBinarySerializerHelper.

I have pushed code in case you'd like to give it a look.

@dcuccia
Copy link
Contributor

dcuccia commented Nov 6, 2023

Defer to Lisa on 2 & 3.

IMO for 1, they should be returning an empty array, not null, and I'd make sure to "move" those tests to the same format as all others (testing serializers is just one part of unit testing the detector class - you'll also want to test correctness, etc).

Re: 4, I don't understand - why does SecondMoment need to be allocated, if it's never used? That said, a "fully-initialized" class is always preferable from a code reliability perspective, I just thought we'd kept those null so that we could use all the available memory for the main detector, if that was a limitation. FWIW, if we intend to keep them possibly null, we should mark with the nullable operator (i.e. double[,]? SecondMoment).

@hayakawa16
Copy link
Member

Regarding 4, you're absolutely correct. When I was debugging code and stepping through, SecondMoment wasn't getting set somehow. I reverted code and all is good now.

@hayakawa16
Copy link
Member

Regarding 1, the single valued detectors write their Mean and SecondMoment to the detector.txt ascii file, there is no binary serialization going on. Do you still feel same way?

@dcuccia
Copy link
Contributor

dcuccia commented Nov 6, 2023

Right - that's always been the case. The test, in that case, should just confirm the expectation - that GetBinarySerializers() returns an empty, non-null array.

@lmalenfant
Copy link
Member Author

I put DetectorBinarySerializerHelper for now in Vts.Test/MonteCarlo/IO/. We may want to move around later. I just needed it somewhere to update the existing unit tests.

I think that's a good place

There are currently 18 detectors in DetectorIOTests. I have 54 more tests to add. Can I go ahead and do this and after that we can move things around based on "unit" vs "integration" tests? Or would you rather I do that before I fill out these tests?

Yes add them now and we can figure it out later, try to keep integration and unit tests separate even if it is just with a comment in the same file

@lmalenfant
Copy link
Member Author

I agree with @dcuccia for 1. returning empty array vs null but if this is too much to change now (this is prevalent in our code and needs resolving globally) we can leave it as is and fix later.

@hayakawa16
Copy link
Member

Thanks for your feedback! I will continue along adding unit tests then.

@hayakawa16
Copy link
Member

I am finding I am coding the same code in multiple tests, so I was thinking of optimizing somehow. At first I tried:

        [Test]
        public void Validate_1DRealDetectors_deserialized_class_is_correct_when_using_GetBinarySerializers()
        {

            var detectors = new List<IDetector>();
            var detectorName = "testrofangle";
            detectors.Add(new ROfAngleDetector
            {
                Angle = new DoubleRange(0, 10, 3),
                TallySecondMoment = true, // tally SecondMoment
                Name = detectorName,
                Mean = new double[] { 1, 2, 3 },
                SecondMoment = new double[] { 4, 5, 6 }
            });
            detectorName = "testrofrho";
            detectors.Add(new ROfRhoDetector
            {
                Rho = new DoubleRange(0, 10, 3),
                TallySecondMoment = true, // tally SecondMoment
                Name = detectorName,
                Mean = new double[] { 1, 2, 3 },
                SecondMoment = new double[] { 4, 5, 6 }
            });
            foreach (var detector in detectors)
            {
                DetectorBinarySerializationHelper.WriteClearAndReReadArrays(detector, detector.Mean, detector.SecondMoment);

                Assert.AreEqual(1, detector.Mean[0]);
                Assert.AreEqual(2, detector.Mean[1]);
                Assert.AreEqual(3, detector.Mean[2]);
                Assert.AreEqual(4, detector.SecondMoment[0]);
                Assert.AreEqual(5, detector.SecondMoment[1]);
                Assert.AreEqual(6, detector.SecondMoment[2]);
            }
        }

but IDetector does not include Mean or SecondMoment so not possible.
So I am thinking about putting the Assert code in a unit test method.
Any thoughts on these optimizations? On the one hand I see that it is great to have an individual, full-contained test for each detector. On the other hand, there is a lot of duplicate code in this unit test file.

@dcuccia
Copy link
Contributor

dcuccia commented Nov 7, 2023

I think this is the unit test for the detector, right? So, my suggestion is a 1:1 relationship between the .cs test class and the detector (1 "XYZDetectorTests.cs" for each class). And in that case, each detector "knows" about it's own things, so you don't need to represent it with an interface, but instead the implementation type. Yes, it's repetitive, but it segregates different detector tests, and will make those files that are "fine" essentially read-only, while new detectors/tests can be written without disturbing the existing ones (more scalable for new work, new devs, etc). We can make more "helper" static libraries, like var rho = DetectorTestHelper.Generate1DRho() etc.

@lmalenfant
Copy link
Member Author

@hayakawa16 Consolidating test code is a great idea, I think I mentioned this last week in our meeting that if we can create generic code for testing using the data we have, we should go that route rather than duplicating the same code in individual tests.

If you would like to work together on this, let me know. We can probably create another helper method that can be called from each test to avoid the duplication.

@dcuccia
Copy link
Contributor

dcuccia commented Nov 7, 2023

(and in the case of the 1D, 2D, etc...those should be tests of DetectorIO libraries, with only one example for each "shape"

@hayakawa16
Copy link
Member

To respond to @dcuccia, this is the combined test for ROfAngleDetector and ROfRhoDetector. I think you are suggesting not to do this and that the current DetectorIOTests file be separated into one for each detector? If so would that detector test also test methods like Normalize or just GetBinarySerializers?

@dcuccia
Copy link
Contributor

dcuccia commented Nov 7, 2023

@hayakawa16 all tests relevant to that detector's performance and behavior (and nothing else).

@hayakawa16
Copy link
Member

Got it. Let me discuss with @lmalenfant how we organize these types of single layer tests versus the integration tests. In the meantime I will continue filling out the tests in DetectorIOTests so that I can verify new code, and I can separate later.

@dcuccia
Copy link
Contributor

dcuccia commented Nov 7, 2023

Sounds good!

@hayakawa16
Copy link
Member

I'm half way there with the unit tests! I see that there have been multiple topics brought up in this issue. I have created a list of the order I plan to proceed (this is up for discussion):

  1. complete unit tests of all detectors GetBinarySerializers (all in one file)
  2. update single value detectors to use empty array instead of null and write corresponding GetBinarySerializers unit tests (I tried this today and had trouble, may need help)
  3. write unit test for BinaryArraySerializerFactory (again will gladly welcome any help)
  4. separate detector unit tests into separate files
    ---- at this point merge in branch (option: could merge after 3)
  5. analyze PopulateFromEnumerable and possibly update application to be column-major
  6. reorganize Vts.Test to Vts.Test.Unit, Vts.Test.Integration
    I think the last two are separate topics are beyond this issue and should be solved on other issue branches.
    Let me know if I omitted anything. Just trying to lay out steps for myself.

@lmalenfant
Copy link
Member Author

lmalenfant commented Nov 8, 2023

This looks good @hayakawa16 although I think 2. could be further down the list because this is part of a larger cleanup effort. Plus the detectors that contain this code do not have the serialization code and are not being modified with this change.

Since this is already a huge change we need to keep these changes to a minimum because they will be easier to review in the PR.

I would choose the option to merge after 3.

@lmalenfant
Copy link
Member Author

@hayakawa16 would you like me to write the unit tests for BinaryArraySerializerFactory?

@dcuccia I was able to look at your Gist finally and I like it, if we write the unit tests for the code as is, we can switch it for the new code and validate that it still works the same way.

@dcuccia
Copy link
Contributor

dcuccia commented Nov 8, 2023

Sounds good! :)

@hayakawa16
Copy link
Member

@lmalenfant that would be super great!

@hayakawa16
Copy link
Member

I have finished updated all 72 detectors with the new GetBinarySerializers code and written unit tests for them all (included testing what results if TallySecondMoment is set to true and false using neat nunit [TestCase(true)] and [TestCase(false)] attributes that Lisa used in her unit tests for BinaryArraySerializerFactoryTests). These tests are currently in a single file. I plan to break each test into a separate file named by the detector it is testing. For example, ROfRhoDetectorTests.cs. The GetBinarySerializers method will be the only test for ROfRhoDetector in this file, however in the future we can add other unit tests to this file for the other methods in ROfRhoDetector class.

I plan to create a folder under Vts.Test named "Unit". Then in that folder, a subfolder named "MonteCarlo", then a subfolder in that named "Detectors". Then when I separate the unit tests into the separate files, I will put them in this "Detectors" folder and give the tests the namespace Vts.Test.Unit.MonteCarlo.Detectors.

Let me know if you have any comments or questions about this plan.

@lmalenfant
Copy link
Member Author

@hayakawa16 That sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants