Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Mac memory error #115

Closed
ljlamarche opened this issue Mar 3, 2023 · 14 comments
Closed

BUG: Mac memory error #115

ljlamarche opened this issue Mar 3, 2023 · 14 comments
Labels

Comments

@ljlamarche
Copy link
Collaborator

Describe the bug

Upon initializing an Apex object, python crashes with a memory error (usually either bus error or a segmentation fault). This may be related to #84 , but it shows up in a slightly different way, and the installation flag that solved that issue doesn't seem to change this one.

To Reproduce

  1. Start python session
  2. Import apexpy
  3. Initialize Apex object
>>> import apexpy
>>> A = apexpy.Apex(2022)
zsh: bus error  python

Expected behavior

Successful creation of Apex object.

Computer

Please provide the following information:

  • OS: MacOS 12.6.1 M1
  • Python version: 3.10.9
  • Compiler with version: gfortran 12.2.0
  • Apexpy version/branch: 2.0.0/develop
@ljlamarche ljlamarche added the bug label Mar 3, 2023
@ljlamarche
Copy link
Collaborator Author

This issue seems to alternate at random between a bus error, segmentation fault, and producing nan output. The following is output from repeatedly running a short test program without any changes.

Test program:

import apexpy

A = apexpy.Apex(2022)
alat, alon = A.geo2apex(60, 15, height=300)
print(alat, alon)

Output:

(apexpy_debug) e30737@Lamarchel-mbp16 Desktop % python temp_apexpy_test.py
LINE 125
LINE 1224 2022.0 /Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/igrf13coeffs.txt
LINE 1227
LINE 130
/Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/apex.py:556: RuntimeWarning: invalid value encountered in <lambda> (vectorized)
  alat, alon = self._geo2apex(glat, glon, height)
/Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/apex.py:559: UserWarning: Apex latitude set to NaN where undefined (apex height may be < reference height)
  warnings.warn(''.join(['Apex latitude set to NaN where undefined ',
nan nan
(apexpy_debug) e30737@Lamarchel-mbp16 Desktop % python temp_apexpy_test.py
LINE 125
LINE 1224 2022.0 /Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/igrf13coeffs.txt
zsh: segmentation fault  python temp_apexpy_test.py
(apexpy_debug) e30737@Lamarchel-mbp16 Desktop % python temp_apexpy_test.py
LINE 125
LINE 1224 2022.0 /Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/igrf13coeffs.txt
zsh: bus error  python temp_apexpy_test.py
(apexpy_debug) e30737@Lamarchel-mbp16 Desktop % python temp_apexpy_test.py
LINE 125
LINE 1224 2022.0 /Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/igrf13coeffs.txt
zsh: segmentation fault  python temp_apexpy_test.py

@ljlamarche
Copy link
Collaborator Author

To debug, I added the -fcheck=all flag to the make file to compile and run the fortran test program. Compiling generates the following warning:

igrf.f90:51:67:

   51 |     num_epochs = count([(s(i:i + 3), i = 1, len_trim(s))]== 'IGRF')
      |                                                                   ^
Warning: 'i' may be used uninitialized [-Wmaybe-uninitialized]
igrf.f90:31:44:

   31 |     integer(4)                   :: state, i, pos
      |                                            ^
note: 'i' was declared here

Running the resulting program generates the following runtime error:

At line 51 of file igrf.f90
Fortran runtime error: Substring out of bounds: lower bound (0) of 's' is less than one

Error termination. Backtrace:
#0  0x1026e3187
#1  0x1026e3d37
#2  0x1026e40f3
#3  0x102247a67
#4  0x1022487c7
#5  0x102252aeb
#6  0x102249aef
#7  0x10225464f

This refers to

num_epochs = count([(s(i:i + 3), i = 1, len_trim(s))]== 'IGRF')

@aburrell
Copy link
Owner

aburrell commented Mar 9, 2023

Perhaps if we address all the warnings upon compilation that will reduce potential memory leaks.

@ljlamarche ljlamarche mentioned this issue Mar 14, 2023
11 tasks
@ljlamarche
Copy link
Collaborator Author

One problem was an indexing error related extrapolating IGRF 5 years into the future. Some arrays were not being allocated large enough to accommodate this extra epoch but then were be indexed for it. This has been addressed by expanding the epoch array.

https://github.com/ljlamarche/apexpy/blob/be70342b66da33e3026c3baf44fdafdc22534d90/fortranapex/igrf.f90#L58

https://github.com/ljlamarche/apexpy/blob/be70342b66da33e3026c3baf44fdafdc22534d90/fortranapex/igrf.f90#L69

@ljlamarche
Copy link
Collaborator Author

Perhaps if we address all the warnings upon compilation that will reduce potential memory leaks.

I agree. I'm also trying to implement some stricter compiler flags in the Makefile at least so it does a better job warning us of anything sloppy that may cause issues down the road.

@ljlamarche
Copy link
Collaborator Author

There is potentially an issue with using apexpy for the future extrapolated epochs that are not part of the normal IGRF set of coefficients (currently anything between 2020-2025). These use fewer basis (8 vs 13), which may cause precision errors that are not handled well. Noteably, running the fortran text program will compute the coefficients for all epochs up to 2020 without issue, but produces a series of warnings about an imprecise fit for 2025.

APEX: Imprecise fit of apex: |Bdown/B| 7.0E-06

If you increase the number of iterations in apex.f90 L577, there are fewer errors but they don't go away completely.

When running the test script that generates NaN output, apexpy throws the warning

UserWarning: Apex latitude set to NaN where undefined (apex height may be < reference height)

only for years in the most recent epoch window. It is possible these two are related and the output NaNs are comming from an imprecise fit such that apexpy thinks the coordinate system is undefined.

@ljlamarche
Copy link
Collaborator Author

I think the original indexing problem for extrapolated years was the main issue. The NaN output went away after I updated apexsh.dat. @aburrell , do you know if there are any unit tests that check that time frame specifically? It looked to me like most of them used an epoch of 2000. If not, I'll add one as part of #116 .

@aburrell
Copy link
Owner

I recently added a new test for "today", but it'd be a good idea to add a test that uses the end of the current extrapolation era.

@ljlamarche
Copy link
Collaborator Author

Some of my initial speculation about this problem was incorrect. It seems to have boiled down to three main things:

  1. The syntax used in igrf.f90 L51-52 may not be valid for all compilers.

    num_epochs = count([(s(i:i + 3), i = 1, len_trim(s))]== 'IGRF')
    num_epochs = count([(s(i:i + 3), i = 1, len_trim(s))]== 'DGRF') + num_epochs

    This is addressed by counting the number of epochs more explicitly in a loop.

  2. In igrf.f90 L92, the array g must be allocated to have the number of epochs plus 1 to account for the secular variation (SV) parameters.

    allocate(g(1:num_sh, 1:num_epochs))

  3. In magfld.f90, the method by with the epoch index was found was causing an indexing overflow error beyond the last date.

    ! Set the date and time
    iy = 1
    do while (date > epoch(iy))
    iy = iy + 1
    end do

All these points should be addressed in #117.

@ljlamarche
Copy link
Collaborator Author

I recently added a new test for "today", but it'd be a good idea to add a test that uses the end of the current extrapolation era.

Is this what you're referring to?

def test_init_today(self):
"""Test Apex class initialization with today's date."""
self.apex_out = apexpy.Apex(date=self.test_date)
self.eval_date()
self.eval_refh()
return

If so, I believe initialization always worked fine for me, I only ran into issue when you actually try to use the object for conversions, so I'm not surprised that wasn't triggered.

@aburrell
Copy link
Owner

That's it. Good point about the limits of the test, something to change now.

@aburrell
Copy link
Owner

@ljlamarche is this good to close now?

@aburrell aburrell mentioned this issue Mar 28, 2023
19 tasks
@ljlamarche
Copy link
Collaborator Author

Yes, this is fixed with #117.

@ljlamarche
Copy link
Collaborator Author

One further note: This article was helpful for understanding what exactly the secular variation code was trying to do.

https://earth-planets-space.springeropen.com/articles/10.1186/s40623-021-01507-z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants