Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for multiple model occurrence in perf main #1649

Merged
merged 5 commits into from Oct 19, 2020

Conversation

bettina-gier
Copy link
Contributor

@bettina-gier bettina-gier commented May 6, 2020

Before you start, read CONTRIBUTING.md and the guide for diagnostic developers.

Please discuss your idea with the development team before getting started, to avoid disappointment later. The way to do this is to open a new issue on GitHub. If you are planning to modify an existing functionality, please discuss it with the original author(s) by tagging them in the issue.


Tasks

  • Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
  • Give this pull request a descriptive title that can be used as a one line summary in a changelog
  • Make sure your code is composed of functions of no more than 50 lines and uses meaningful names for variables
  • Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
  • Preferably Codacy code quality checks pass, however a few remaining hard to solve Codacy issues are still acceptable. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.

Modified recipe/diagnostic

  • Assign the author(s) of the affected recipe(s) as reviewer(s)

If you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.


Closes #1605

Fixing Klaus' issue with SMPI (and also perfmetrics) not supporting the use of a single model with different ensemble/experiment. Will also change dataset name in the relevant plots for differentiation.

@mattiarighi
Copy link
Contributor

Is this ready for testing?
If so, please mark it as ready.

@bettina-gier
Copy link
Contributor Author

Was wanting to wait for @zklaus feedback to see if it fixes his problem before marking ready. Ready from my side though.

@mattiarighi mattiarighi added this to In progress in High priority issues May 22, 2020
@mattiarighi mattiarighi moved this from In progress to Review in High priority issues May 22, 2020
@mattiarighi mattiarighi removed this from Review in High priority issues May 22, 2020
@bettina-gier bettina-gier marked this pull request as ready for review June 11, 2020 12:04
@bettina-gier
Copy link
Contributor Author

Just gonna mark this as ready since there's been no word from @zklaus and I guess other people can test it just as well knowing what the issue was.

@bouweandela bouweandela added this to the v2.0.0 milestone Jul 8, 2020
@bouweandela
Copy link
Member

@mattiarighi or @zklaus It would be great if you could have a go at reviewing/testing this before the ESMValTool v2.0.0 feature freeze next week?

@mattiarighi
Copy link
Contributor

Will test once @zklaus has approved.

Copy link

@YanchunHe YanchunHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is small bug in this snippet, which will repeatedly include the experiment/ensemble to the datasetnames.

I propose some changes to immediately exit from the loop once the experiment/ensemble names are appended.

  ; Extend model name by ensemble/experiment on multiple occurence
   if (.not. (dimsizes(datasetnames) .eq. \
       count_unique_values(datasetnames))) then
     new_datasetnames = datasetnames
     experiments = metadata_att_as_array(info_items, "exp")
     ensembles = metadata_att_as_array(info_items, "ensemble")

     do imod = 0, dimsizes(datasetnames) - 1
       do jmod = 0, dimsizes(datasetnames) - 1
         if imod.eq.jmod then
           continue
         else
           if (datasetnames(imod) .eq. datasetnames(jmod)) then
             if (experiments(imod) .ne. experiments(jmod)) then
               new_datasetnames(imod) = new_datasetnames(imod) + " " + \
                                         experiments(imod)
               break
             end if
           end if
         end if
       end do
     end do
     do imod = 0, dimsizes(datasetnames) - 1
       do jmod = 0, dimsizes(datasetnames) - 1
         if imod.eq.jmod then
           continue
         else
           if (datasetnames(imod) .eq. datasetnames(jmod)) then
             if (ensembles(imod) .ne. ensembles(jmod)) then
               new_datasetnames(imod) = new_datasetnames(imod) + " " + \
                                         ensembles(imod)
               break
             end if
           end if
         end if
       end do
     end do
     datasetnames = new_datasetnames
     delete(new_datasetnames)
   end if

Before:
pr-global_to_clt-global_BIAS_1

After:
pr-global_to_clt-global_BIAS_1

@mattiarighi
Copy link
Contributor

@bettina-gier do you want to try the suggested change?

@bettina-gier
Copy link
Contributor Author

Testing now but just looking at the code I think the change is correct, my previous test was only with 2 ensemble members so I didn't have that issue. Also thinking if there's a better way to do it than splitting it in 2 loops.

@YanchunHe
Copy link

YanchunHe commented Aug 3, 2020

Testing now but just looking at the code I think the change is correct, my previous test was only with 2 ensemble members so I didn't have that issue. Also thinking if there's a better way to do it than splitting it in 2 loops.

Sounds great, Bettina!

I don't see a straightforward way to put them into the same loop(s) myself.

But indeed one can put these two statements outside the loop (I updated above):

             experiments = metadata_att_as_array(info_items, "exp")
             ensembles = metadata_att_as_array(info_items, "ensemble")

@bettina-gier
Copy link
Contributor Author

Updated with your fix, works fine for me. Thanks for testing and finding the bug and fix!

I found a way to leave it in the single loop to check if the name already has the experiment appended before appending again, I'll attach it if you want to see. However, your version results in more consistent naming as it will always have dataset - experiment - ensemble, whereas the 1-loop version will append first whatever differs first.

  ; Extend model name by ensemble/experiment on multiple occurence
  if (.not. (dimsizes(datasetnames) .eq. \
      count_unique_values(datasetnames))) then
    new_datasetnames = datasetnames
    experiments = metadata_att_as_array(info_items, "exp")
    ensembles = metadata_att_as_array(info_items, "ensemble")
    do imod = 0, dimsizes(datasetnames) - 1
      do jmod = 0, dimsizes(datasetnames) - 1
        if imod.eq.jmod then
          continue
        else
          if (datasetnames(imod) .eq. datasetnames(jmod)) then
              if ((experiments(imod) .ne. experiments(jmod)) .and. \
                  (all(str_split(datasetnames(imod), " ") .ne. \
                   experiments(jmod)))) then
                new_datasetnames(imod) = new_datasetnames(imod) + " " + \
                                          experiments(imod)
              end if
              if ((ensembles(imod) .ne. ensembles(jmod)) .and. \
                  (all(str_split(datasetnames(imod), " ") .ne. \
                   ensembles(jmod)))) then
                new_datasetnames(imod) = new_datasetnames(imod) + " " + \
                                          ensembles(imod)
              end if
          end if
        end if
      end do
    end do
    datasetnames = new_datasetnames
    delete(new_datasetnames)
  end if   

@YanchunHe
Copy link

Updated with your fix, works fine for me. Thanks for testing and finding the bug and fix!

I found a way to leave it in the single loop to check if the name already has the experiment appended before appending again, I'll attach it if you want to see. However, your version results in more consistent naming as it will always have dataset - experiment - ensemble, whereas the 1-loop version will append first whatever differs first.

Great, thanks! Hope this will be fixed in the next release.

@bouweandela
Copy link
Member

@YanchunHe or @axel-lauer Do you think you could find someone to try this out and approve the pull request? If it gets approved and merged before Monday morning, it can still be included in v2.1 of the tool.

@axel-lauer
Copy link
Contributor

I am testing this right now...

@bouweandela
Copy link
Member

We'll do the feature freeze around 3:30 PM CEST today.

@axel-lauer axel-lauer self-requested a review October 19, 2020 11:51
Copy link
Contributor

@axel-lauer axel-lauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested these changes with just one and with multiple ensemble members of individual models. Works great! Thumbs up!

@bouweandela bouweandela merged commit 48c339d into master Oct 19, 2020
1 check passed
@bouweandela bouweandela deleted the fix_perf_multi_models branch October 19, 2020 11:59
@bouweandela bouweandela changed the title support for multiple model occurence in perf main support for multiple model occurrence in perf main Oct 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SMPI not working on CMIP6 data
5 participants