Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review potential updates to Tmpbeis4 code #84

Closed
cseppan opened this issue Nov 9, 2023 · 7 comments
Closed

review potential updates to Tmpbeis4 code #84

cseppan opened this issue Nov 9, 2023 · 7 comments
Assignees

Comments

@cseppan
Copy link
Member

cseppan commented Nov 9, 2023

Based on October 29, 2023 email from Carlie Coats

Draft code (has not been compiled or tested). Switch to using M3UTILIO module in Tmpbeis4. Change grid-and-species loop nests. Check for failure after environment variable calls (e.g. ENVINT).

"tmpbeis4.0.f" is the un-changed reference version
"tmpbeis4.1.f" is the minimal changes-for-M3UTILIO version
"tmpbeis4.2.f" with loop-nest orders changed for efficiency
"tmpbeis4.f" further revision sent

tmpbeis.zip

@cseppan
Copy link
Member Author

cseppan commented Nov 9, 2023

Discussion of the I/O API M3UTILIO module and how to convert existing code
https://cmascenter.org/ioapi/documentation/all_versions/html/M3UTILIO.html

@hnqtran
Copy link

hnqtran commented Nov 9, 2023

SMOKE v5.0 compiled successfully with Carlie's modified tmpbeis4.f
Have not checked how fast SMOKE run with the update

@hnqtran
Copy link

hnqtran commented Jan 10, 2024

Summary of Carlie's updates to tmpbeis4.f:

  1. Reconcile ALLOCATE statements. For example, lines 793 - 805 in original tmpbeis4.f
        IF( PX_VERSION ) THEN
            ALLOCATE( SOILM( NCOLS, NROWS ), STAT=IOS )
            CALL CHECKMEM( IOS, 'SOILM', PROGNAME )

            ALLOCATE( SOILT( NCOLS, NROWS ), STAT=IOS )
            CALL CHECKMEM( IOS, 'SOILT', PROGNAME )

            ALLOCATE( SOILT2( NCOLS, NROWS ), STAT=IOS )
            CALL CHECKMEM( IOS, 'SOILT2', PROGNAME )

            ALLOCATE( ISLTYP( NCOLS, NROWS ), STAT=IOS )
            CALL CHECKMEM( IOS, 'ISLTYP', PROGNAME )
        END IF

was modified to (lines 524 - 528 in updated tmpbeis4.f):

 IF (PX_VERSION) THEN ! line 480
....
            ALLOCATE( SOILM( NCOLS, NROWS ),
     &                SOILT( NCOLS, NROWS ),
     &               SOILT2( NCOLS, NROWS ),
     &               ISLTYP( NCOLS, NROWS ), STAT=IOS )
            CALL CHECKMEM( IOS, 'SOILM...ISLTYP', PROGNAME )
  1. Re-arrange loop structure for better execution efficiency. For example, lines 1030 in original tmpbeis4.f:
                    DO I = 1, NCOLS
                        DO J = 1, NROWS

C.............................  If switch equal to 0 use winter normalized emissions
                          IF( SWITCH_FILE ) THEN
                            IF( SWITCH( I,J ) == 0 ) THEN
                                SEMIS( I, J, 1:NSEF   ) =
     &                              AVGEMIS( I, J, 1:NSEF  , NWINTER )
                                     .........

was modified to (~ line 1048 in updated tmpbeis4.f. Note how I and J loop was switched, and also a reminder that Fortran is column-major):

                        DO J = 1, NROWS
                        DO I = 1, NCOLS

                            IF( SWITCH( I,J ) == 0 ) THEN
                                SEMIS( I, J, 1:NSEF   ) =
     &                              AVGEMIS( I, J, 1:NSEF  , NWINTER )
                                     .........
  1. Check for failure when getting environment-variable (e.g., ENVINT). For example, the following check was added for getting environment-variable 'OUTZONE' (line 247 in original tmpbeis4.f)
        TZONE = ENVINT( 'OUTZONE', 'Output time zone', 0, IOS )
        IF ( IOS .GT. 0 ) THEN
            CALL M3EXIT( PROGNAME,0,0, 'Bad env vble "OUTZONE"', 2 )
        END IF
  1. Introduction of USE M3UTILIO statement in place of using INCLUDE IOAPI's include file (e.g., PARMS3.EXT, FDESC3.EXT, IODECL3.EXT) which would simplify downstream variable declarations and cross-module dependency.

  2. Carlie also added a code block for unit conversion from mole/hr to mole/s (~ lines 1001 - 1010 in updated tmpbeis4.f). This could be a typo since this unit conversion was taken care of elsewhere in later section of tmpbeis4. Furthermore, it is more efficient to just make MLFAC = MLFAC * HR2SEC rather than putting MLFAC in double loops.

C............  Convert to moles/second if necessary

        IF ( UNITTYPE .EQ. 2 ) THEN
            DO L = 1, MSPCS
            DO K = 1, NSEF
                MLFAC( L, K ) = HR2SEC * MLFAC( L, K )
            END DO
            END DO
        END IF

@hnqtran
Copy link

hnqtran commented Jan 11, 2024

Testing of tmpbeis4 with and without update, surprisingly, did not show improvement in the execution time. Note that the test was conducted on a SMOKE training package over LISTOS domain (25 row x 25 col). Observable improvement in execution time could be expected for larger domain.

Using m3diff tool to compare emis_mole* output files initially showed significantly lower emissions in the output files with updated tmpbeis4. This was later found to be caused by the double unit conversion in the updated tmpbeis4 (item 5 in comment above). After this double unit conversion was removed, differences between the outputs are < 0.1% which are in acceptable range.

@eyth
Copy link

eyth commented Jan 11, 2024 via email

@hnqtran
Copy link

hnqtran commented Jan 11, 2024

Huy, can you consider running this on the full 12US2 or 12US1 domain instead of the 25x25?

I'm working on setting up test case based on emission platform 2020ha2 for 12US1 domain. Currently having issue with missing variable SOILT2 in the input met file METCRO2D.

@hnqtran
Copy link

hnqtran commented Jan 18, 2024

Performance Test with 2020ha2_cb6_20k emission model platform

  • tmpbeis4 ran for July 2020 over 12US1 domain
  • Three scenarios:
    • FULL: incorporate all updates from Carlie to tmpbeis4.f (minus unit conversion which is an error)
    • SIMP: Only incorporate loop re-arrangement to follow column-major structure of Fortran
    • ORIG: original tempbeis4.f

Results:
Scenarios | Total Run Time | Individual Day Run TIME
FULL | |
1st try | 6:18.69 min | Jul-01: 7 s ; Jul-15: 5 s ; Jul-31: 5 s
2nd try | 6:13.15 min | Jul-01: 5 s ; Jul-15: 5 s ; Jul-31: 5 s
SIMP | |
1st try | 6:15.54 min | Jul-01: 5 s ; Jul-15: 6 s ; Jul-31: 5 s
2nd try | 6:18.04 min | Jul-01: 5 s ; Jul-15: 6 s ; Jul-31: 5 s
ORIG | |
1st try | 8:01.58 min | Jul-01: 10 s ; Jul-15: 9 s ; Jul-31: 7 s
2nd try | 7:30.14 min | Jul-01: 7 s ; Jul-15: 8 s ; Jul-31: 8 s

There is no significant differences in run time between FULL and SIMP, meaning all gained benefit in run time was mainly from the loop re-arrangement. Loop re-arrangement yield about 35% faster in runtime in comparison to ORIG.

Additional information: Modern compiler can transform the code for better efficiency in memory accessing when optimization flag is activated more info here such as -O3 flag which was activated for SMOKE compilation.

hnqtran pushed a commit that referenced this issue Jan 23, 2024
@hnqtran hnqtran closed this as completed Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants