Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various changes needed in order to handle Dedicon test books #97

Merged
merged 4 commits into from
Sep 16, 2019

Conversation

bertfrees
Copy link
Collaborator

@bertfrees bertfrees commented Apr 17, 2019

Commit bb28b93 was your attempt to fix the oscillation issue (#70).
I believe commit b13a6dd fixes a bug in your commit.
Commit 80185f8 is to improve the speed at which the volume breaking algorithm converges to an optimum.
Commit b8dc003 was part of my initial attempt to fix the oscillation issue, but it's still relevant I think.
Commits b662b0d, b13a6dd, fe7804e and 051d099 were done by me and Paul when I was at Dedicon some weeks ago. After this the oscillation issue was fixed.
Commits e34b5dd and aa8bfe9 fix two other issues that I discovered during my visit to Dedicon. The first issue was that the braille page numbering would break at volume transitions in some cases. The number would become even on right hand pages where it was uneven in the previous volume, or visa versa. The second issue was that we would sometimes get a StackOverFlowError during the resolution of a marker-reference.

@@ -191,7 +191,9 @@ private PageImpl nextPageInner(int pageNumberOffset, boolean hyphenateLastLine,
{
PageImpl current = newPage(pageNumberOffset);
if (pcbl!=null) {
blockContext.getRefs().setNextPageDetailsInSequence(pcbl, current.getDetails());
if (transitionContent.isPresent() && transitionContent.get().getType()==TransitionContent.Type.INTERRUPT) {
Copy link
Collaborator Author

@bertfrees bertfrees Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some explanation: Only set if this is the last page of the volume (and there is a volume transition). getNextPageDetailsInSequence is only used in combination with getTransitionProperties, and getTransitionProperties is only used to get and compare the transition properties of the recto and verso pages. But these transition properties of course only make sense if they correspond to volume breaks at those pages.

@joeha480
Copy link
Contributor

joeha480 commented May 2, 2019

Commit bb28b93 and a modified version of b13a6dd (using Optional) have been merged.

Commit 80185f8 could be implemented by modifying the cost function instead of modifying the calculation provided by the EvenSizeVolumeSplitterCalculator.

I've rewritten aa8bfe9 to avoid using Object and reflection.

@joeha480
Copy link
Contributor

joeha480 commented May 2, 2019

Processed b662b0d

@joeha480
Copy link
Contributor

joeha480 commented May 3, 2019

Processed e34b5dd

@bertfrees
Copy link
Collaborator Author

bertfrees commented May 8, 2019

I'm adding three more commits. I already know you are probably not going to accept two of them, but this way you can at least help me find a solution.

  • Commit 2678e33 is a logical next step after d386eff: it also uses BlockLineLocation for storing the "breakable" properties of sheets. The downside is that the keep-with-previous-sheets feature had to be disabled. Maybe it's not that hard to implement after all. Could it just be a matter of remembering the BlockLineLocations of the previous sheets, and when encountering a keep-with-previous-sheets we call keepBreakable on those locations?

  • Commit 00e4bda prevents the CrossReferenceHandler to become dirty when it is read-only. You already said the current behavior is intentional. I can see how, but still it seems wrong, because the information that you expect to be present might never be set (because of the read-onlyness). I think maybe the solution is to not make it read-only?

    EDIT: Later Joel was confronted with this issue himself and wasn't sure anymore that it should be possible for CrossReferenceHandler to become dirty when read-only. We discussed it some more and this resulted in commit 456d999.

  • Commit 4528bf1 fixes an issue with volume-keep-priority on nested blocks, and at the same time simplified the code significantly. I couldn't completely understand the old code so it might be that I removed something that I shouldn't have. E.g. the part where you check that "the previous block's break after priority is equal to this block's break inside priority". I tested it by feeding the OBFL below and then running a debugger and checking the getAvoidVolumeBreakAfterPriority() of the resulting RowGroups. I'm not sure how to make a real test out of this because it's not really an integration test like the other ones, but it's also not a unit test because it involves several parts of the code (ObflParserImpl, BlockSequence, RowGroupRequence, RowGroupDataSource?, ...).

    <obfl xmlns="http://www.daisy.org/ns/2011/obfl" version="2011-1" xml:lang="en">
      <layout-master name="body" page-width="15" page-height="6" duplex="true">
        <default-template>
          <header/>
          <footer/>
        </default-template>
      </layout-master>
      <volume-template sheets-in-volume-max="3"/>
      <volume-transition range="sheet"/>
      <sequence master="body">
        <block volume-keep-priority="9">
          <block volume-keep-priority="1">
            <block>⠉⠉ ⠉⠉ ⠉⠉ ⠉⠉ ⠉⠉ ⠉⠉ ⠉⠉ ⠉⠉ ⠉⠉</block> <!-- 1 -->
            <block>⠉⠉ ⠉⠉ ⠉⠉ ⠉⠉ ⠉⠉ ⠉⠉ ⠉⠉ ⠉⠉</block> <!-- 9 -->
          </block>
          <block volume-keep-priority="7">
            <block volume-keep-priority="1">
              <block>⠒⠒ ⠒⠒ ⠒⠒ ⠒⠒ ⠒⠒ ⠒⠒ ⠒⠒ ⠒⠒ ⠒⠒</block> <!-- 1 -->
              <block>⠒⠒ ⠒⠒ ⠒⠒ ⠒⠒ ⠒⠒ ⠒⠒ ⠒⠒ ⠒⠒</block> <!-- 7 -->
            </block>
            <block volume-keep-priority="1">
              <block>⠤⠤ ⠤⠤ ⠤⠤ ⠤⠤ ⠤⠤ ⠤⠤ ⠤⠤ ⠤⠤ ⠤⠤</block> <!-- 1 -->
              <block>⠤⠤ ⠤⠤ ⠤⠤ ⠤⠤ ⠤⠤ ⠤⠤ ⠤⠤ ⠤⠤</block> <!-- expected EMPTY, but got 7 !!! -->
            </block>
          </block>
        </block>
        <block volume-keep-priority="1">
          <block>⠿⠿ ⠿⠿ ⠿⠿ ⠿⠿ ⠿⠿ ⠿⠿ ⠿⠿ ⠿⠿ ⠿⠿</block> <!-- 1 -->
          <block>⠿⠿ ⠿⠿ ⠿⠿ ⠿⠿ ⠿⠿ ⠿⠿ ⠿⠿ ⠿⠿</block> <!-- EMPTY -->
        </block>
      </sequence>
    </obfl>

@bertfrees bertfrees self-assigned this Jul 31, 2019
@kalaspuffar
Copy link
Member

@bertfrees will look at this pull request and resolve conflicts and see if there is any code that could be used still and either fix this PR so it's ready for pulling or creating a new PR with good code.

@kalaspuffar kalaspuffar requested review from joeha480 and removed request for joeha480 July 31, 2019 11:59
@kalaspuffar kalaspuffar changed the title Various changes needed in order to handle Dedicon test books WIP: Various changes needed in order to handle Dedicon test books Jul 31, 2019
@kalaspuffar kalaspuffar changed the title WIP: Various changes needed in order to handle Dedicon test books Various changes needed in order to handle Dedicon test books Jul 31, 2019
@kalaspuffar kalaspuffar added this to Backlog in Release 2019 Q3 Jul 31, 2019
@bertfrees
Copy link
Collaborator Author

bertfrees commented Aug 2, 2019

The following commits were already in master:

The following commits have been rebased. If needed they can be split up into multiple PRs.

@bertfrees
Copy link
Collaborator Author

bertfrees commented Aug 5, 2019

Commit 80185f8 could be implemented by modifying the cost function instead of modifying the calculation provided by the EvenSizeVolumeSplitterCalculator.

If I remember correctly what Joel meant by this is that the volume sizes calculated by EvenSizeVolumeSplitter were never meant to be maximum values, only target values.

However the cost function already allows for higher values. The problem is that the algorithm to find the optimal break points is greedy, i.e. the individual break points are calculated one by one, and there is currently no way for the cost function to look back or ahead in order to predict that the allocated volumes and suggested target sheet counts aren't going to suffice. The cost function already detects when we're in the last volume, and tries to fill it completely with the remaining sheets, but apparently that is not enough (when we have a lot of volumes).

The other problem is that EvenSizeVolumeSplitter does not provide a good prediction of the required sheets because the calculation is based on the total number of required sheets in the previous iteration, but by adding extra volumes the overhead will increase which adds to the total number of sheets. By trying to predict the added overhead for the extra volumes we can improve things.

Something that is harder to predict, is how much the "breakable" constraints push the actual average volume size down (or up). At the moment EvenSizeVolumeSplitter doesn't take into account this effect at all. If the actual average volume size is lower than the target size, this will currently result in remaining sheets which will result in a decrease in the target size, but this may push the actual size down even more.

All in all, I think my initial solution of relaxing the target size slightly towards the limit wasn't such a bad idea because this way you leave some leeway for things that are hard to predict.

@bertfrees
Copy link
Collaborator Author

By trying to predict the added overhead for the extra volumes we can improve things.

I have done this now (locally).

@kalaspuffar kalaspuffar moved this from Backlog to In progress in Release 2019 Q3 Aug 6, 2019
@bertfrees
Copy link
Collaborator Author

I'm going to test some Dedicon books with the new commit (that I haven't pushed yet) and with/without commit db1e949, but I think that's gonna be easier to do when the other PRs have been merged. So focusing on them first.

@bertfrees
Copy link
Collaborator Author

I rebased the branch onto master and left out one commit (ee43e16) for which I created a new PR.

@bertfrees
Copy link
Collaborator Author

I have tested all the Dedicon books with only these two commits (and in combination with #111) and I get no oscillations:

  • the "Only save "NextPageDetailsInSequence" and "TransitionProperties" info when needed" commit (d6ad13f) -> rebased to 45ae05b
  • and another commit that replaces a8534a9 -> c27f558

"Base 'breakable' info in CrossReferenceHandler on BlockLineLocation" (8c184d4) appears to be not needed (anymore), so I'll leave that one out, at least for now, because it broke the keep-with-previous-sheets feature.

The "Don't remember exactly why this was needed" commit (6110c1f) was probably never needed to fix the oscillation issue, but it is possible that I made the change to fix an issue that I noticed while testing the Dedicon books, or maybe it was just an issue I spotted in the code. Unfortunately I don't remember. I suggest that I move this commit to a separate PR and then we can check how/if this changes the regression test suite, and judge whether it is an improvement or not.

PMD complained about this.
@bertfrees
Copy link
Collaborator Author

A side effect of c27f558 is still that in some cases the volumes are not as "equal" as before, although it's less the case than with the the original change a8534a9, which consistently allocated too much spaces for volumes.

The volume-breaks-advanced3-input.obfl test shows that in this particular case the extra overhead for the added volume was overestimated. I don't know if there is a way to avoid this. As I explained before the way the volume breaking works is just very limiting.

@bertfrees
Copy link
Collaborator Author

@kalaspuffar I think this branch is ready now for running the regression tests on.

@kalaspuffar
Copy link
Member

Hi @bertfrees

Shouldn't I wait to run the regression test until all the unit tests passes?

Best regards
Daniel

@bertfrees
Copy link
Collaborator Author

But I explained didn't I? The behavior changed. I can change the expected result of that unit test if you want. The unit test is of course not a very realistic case because the volumes are so tiny. I'm more interested to know what happens with real books.

@kalaspuffar
Copy link
Member

Hi @bertfrees

I think you should change the test, we won't merge any code that has failing tests. And if you think the test is incomplete or incorrect you may remove it as well. But we need a solid test suite that can find the issues before we find them in production.

Best regards
Daniel

@bertfrees
Copy link
Collaborator Author

I didn't say anything about merging. Sure, I'll modify the test, no problem. I was just unsure about the result, and I'm trying to get some feedback from you. Firstly, as I said it's difficult to say whether the results of these small tests are fine. We should probably create some tests with more and larger volumes. But I figured we could maybe first look at the books from the regression test suite to get an impression. Secondly, you and I might have different goals. Dedicon's primary goal is to fix the oscillation issue that prevents some books to get through at all. I think that the volume breaking algorithm itself has room for improvements as well, although right now that is of secondary importance to me. On the other side there is MTM who might be more concerned about the continuity in the behavior of the volume breaking and might attach more importance to the "evenness" of volumes. I don't know. So the fact that I am happy with this change doesn't necessarily mean you should be happy with it.

@kalaspuffar
Copy link
Member

Hi @bertfrees

Great comment about what your goal for this change is. I've now run the regression tests and this change did also change 162 books so I need to look into these further to get a picture of the changes.

Do you have any preference on which result I look at first?

When it comes to expectations and goals, our goal is to progress the work with the product in the most stable manner.

I've not had any discussions with MTM of the specific topic but I don't see any reason why the evenness of volume breaks should be a deal-breaker if we need to improve the code in order to produce some kinds of books.

The main goal for MTM at the moment is to solve the issues Dedicon has with the product so you can produce your workload.

Best regards
Daniel

@bertfrees
Copy link
Collaborator Author

bertfrees commented Sep 11, 2019

Do you have any preference on which result I look at first?

I haven't.

I've been thinking about what is required to validate a test result, when the volume breaks may have moved. There are two parts to it:

  • Checking that only the volume breaks have changed and that the actual content is unchanged. Note that it is also possible that page breaks change near the volume breaks. Writing a tool to do this check is an option, but it's not trivial and maybe we are confident enough that it's true by manually checking a few files.
  • Checking that the volume breaks are better or equally good as before. I think that it makes little sense to visually check the results and to try to judge whether the volume breaks are good because there are so many things you need to take into account. It's not a simple checklist. It's an equation of several factors. So what about writing a tool that analyzes a PEF and computes a "volume break score" for it, which we can easily compare between different versions? That would probably be the ideal solution, however it is not that simple to write such a tool, mainly because not all required information is present in the PEF. What would be easier to do is compute this score within Dotify because Dotify has all the information it needs to compute it. I realize that this way we basically let the tool evaluate itself, which is not so great, but it's better than nothing I think. Note that it is currently not so that Dotify can simply find the solution with the best score. This is due to the limitations of the current optimization algorithm. Another downside of letting Dotify compute the score is that you can only have a single way to compute it, i.e. there can be only one interpretation of what are good volume breaks (unlike when we would have a tool that would analyze the PEF because we can have several versions of them). But on the other hand if we need multiple interpretations we also need Dotify's volume breaking to become configurable and I'm not sure we should go there.

@kalaspuffar
Copy link
Member

Hi @PaulRambags and @bertfrees

We have run our tool to verify pages and see what changes that generated. And what I can see there is actually only two documents changed with this PR. The rest is due to us not changing the baseline (Baseline must be a published version).

Looking at these there are minor changes moving some text from one volume to another. I've reviewed and approved the code so when @PaulRambags have reviewed this PR we can merge it.

Best regards
Daniel

@bertfrees
Copy link
Collaborator Author

OK cool.

Although not so urgent anymore, the ideas from my previous comment are still relevant I think. I'll add it as a topic for the next meeting.

@kalaspuffar kalaspuffar merged commit 54733b2 into brailleapps:master Sep 16, 2019
@kalaspuffar kalaspuffar moved this from In progress to Done in Release 2019 Q3 Sep 16, 2019
@bertfrees bertfrees deleted the issue-70 branch March 28, 2020 10:49
bertfrees added a commit to bertfrees/dotify.formatter.impl that referenced this pull request Apr 20, 2021
Without this change it is possible that we end up in an endless
alternation between two volume renderings. (In other words,
brailleapps#70 was not
completely fixed yet.)

I proposed this change in
brailleapps#97 already
but we did not include it at the time.
bertfrees added a commit to bertfrees/dotify.library that referenced this pull request May 6, 2021
Without this change it is possible that we end up in an endless
alternation between two volume renderings. (In other words,
brailleapps/dotify.formatter.impl#70 was not
completely fixed yet.)

I proposed this change in
brailleapps/dotify.formatter.impl#97 already
but we did not include it at the time.
bertfrees added a commit to bertfrees/dotify.library that referenced this pull request May 17, 2021
Without this change it is possible that we end up in an endless
alternation between two volume renderings. (In other words,
brailleapps/dotify.formatter.impl#70 was not
completely fixed yet.)

I proposed this change in
brailleapps/dotify.formatter.impl#97 already
but we did not include it at the time.
kalaspuffar pushed a commit to mtmse/dotify.library that referenced this pull request Jun 11, 2021
Without this change it is possible that we end up in an endless
alternation between two volume renderings. (In other words,
brailleapps/dotify.formatter.impl#70 was not
completely fixed yet.)

I proposed this change in
brailleapps/dotify.formatter.impl#97 already
but we did not include it at the time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants