Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix M100 Free Memory Checker #6302

Merged
merged 1 commit into from
Apr 10, 2017
Merged

Fix M100 Free Memory Checker #6302

merged 1 commit into from
Apr 10, 2017

Conversation

Roxy-3D
Copy link
Member

@Roxy-3D Roxy-3D commented Apr 10, 2017

M100 had numerious changes and quit working. Part of the problem is
the overloading of the SERIAL_PROTOCOL functions. Also, some of the
address arithmatic was changed to use char *ptr and passing ptr into the
SERIAL_PROTOCOL functions caused them to try to print a string instead
of a number. M100 is working again. Let's keep it that way!

M100 has been expanded to now have a function int
free_memory_is_corrupted() that can be called from other code to see if
the free space is still contiguous. It may make sense to add a flag to
control its verbose nature but right now, the extra chit chat is very
helpful to know int free_memory_is_corrupted() is doing the right thing
and what it found at various points when it was called. A 'Show &
Tell' is coming up with int free_memory_is_corrupted().

M100 had numerious changes and quit working.   Part of the problem is
the overloading of the SERIAL_PROTOCOL functions.   Also, some of the
address arithmatic was changed to use char *ptr and passing ptr into the
SERIAL_PROTOCOL functions caused them to try to print a string instead
of a number.     M100 is working again.   Let's keep it that way!

M100 has been expanded to now have a function  int
free_memory_is_corrupted()  that can be called from other code to see if
the free space is still contiguous.  It may make sense to add a flag to
control its verbose nature but right now, the extra chit chat is very
helpful to know int free_memory_is_corrupted()  is doing the right thing
and what it found at various points when it was called.     A 'Show &
Tell' is coming up with int free_memory_is_corrupted().
@Roxy-3D Roxy-3D merged commit ba85faa into MarlinFirmware:RCBugFix Apr 10, 2017
@thinkyhead
Copy link
Member

In consideration of the 32-bit Marlin, going forward we'll need to use void * and add a print_hex_address function instead of using uint16_t and print_hex_word.

@Roxy-3D
Copy link
Member Author

Roxy-3D commented Apr 10, 2017

Yeah... But let's leave this alone for the next couple of weeks... I need this stable and working.

I'll have something very interesting to show you later today. The constexpr float operator very well may have a bug in it. I'm not sure exactly. But I'm trying to get the code in a state where you can just turn one way on or do it the other way. With the constexpr float stuff being compiled into the code, the free memory area is getting stepped on.

@thinkyhead
Copy link
Member

overloading of the SERIAL_PROTOCOL functions

  • macros

It's an issue of C++ overloading generally, and something we need to consider in a several places. For example, when we want to print an 8-bit value as an integer we have to cast it with (int) in these functions, or the single-char version is applied. As mentioned above, we'll have to add a proper address-to-integer typedef so we can print 32-bit memory addresses in future.

@thinkyhead
Copy link
Member

I need this stable and working.

Since we don't use any dynamic allocation —only globals that never move and stack-based variables that should remain relatively small— what kinds of issues are you catching with M100?

@thinkyhead
Copy link
Member

thinkyhead commented Apr 10, 2017

The constexpr float operator very well may have a bug in it.

Interesting. All it does is define a float that the pre-compiler can also evaluate. In practice, it ends up being not too different from a #define, except it carries a type.

@Roxy-3D
Copy link
Member Author

Roxy-3D commented Apr 10, 2017

Since we don't use any dynamic allocation —only globals that never move and stack-based variables that should remain relatively small— what kinds of issues are you catching with M100?

Its the constexpr float stuff in ubl_G29.cpp.... the stack frame is getting corrupted. I know this sounds like a 1st year computer science student claiming "Compiler Bug!!!!" But.... I think I have the evidence. I'm trying to get things packaged up so you can duplicate what I'm seeing....

BUT VERY LITERALLY.... I'm seeing a big chunk of the stack frame get corrupted.

@thinkyhead
Copy link
Member

In what instance is the stack getting corrupted?

@Roxy-3D
Copy link
Member Author

Roxy-3D commented Apr 10, 2017

Patience!!!! I'll write it up and show you dumps of the stack after each command. But I'm trying to get things simplified way down so there is no argument about what is happening... But up at the top of ubl_G29.cpp there was this code:

  constexpr float ubl_3_point_1_X = UBL_PROBE_PT_1_X,
                  ubl_3_point_1_Y = UBL_PROBE_PT_1_Y,
                  ubl_3_point_2_X = UBL_PROBE_PT_2_X,
                  ubl_3_point_2_Y = UBL_PROBE_PT_2_Y,
                  ubl_3_point_3_X = UBL_PROBE_PT_3_X,
                  ubl_3_point_3_Y = UBL_PROBE_PT_3_Y;

and with this commented out code 'active' (even if not invoked) the stack gets hammered

SERIAL_PROTOCOLLNPAIR("Checkpoint 2:", free_memory_is_corrupted() );
    if (code_seen('T')) {
 /*
      const float lx1 = LOGICAL_X_POSITION(ubl_3_point_1_X),
                  lx2 = LOGICAL_X_POSITION(ubl_3_point_2_X),
                  lx3 = LOGICAL_X_POSITION(ubl_3_point_3_X),
                  ly1 = LOGICAL_Y_POSITION(ubl_3_point_1_Y),
                  ly2 = LOGICAL_Y_POSITION(ubl_3_point_2_Y),
                  ly3 = LOGICAL_Y_POSITION(ubl_3_point_3_Y);

      float z1 = probe_pt(lx1, ly1, false, g29_verbose_level),
            z2 = probe_pt(lx2, ly2, false, g29_verbose_level),
            z3 = probe_pt(lx3, ly3, true , g29_verbose_level);

      //  We need to adjust z1, z2, z3 by the Mesh Height at these points. Just because they are non-zero doesn't mean
      //  the Mesh is tilted!  (We need to compensate each probe point by what the Mesh says that location's height is)

      z1 -= ubl.get_z_correction(lx1, ly1);
      z2 -= ubl.get_z_correction(lx2, ly2);
      z3 -= ubl.get_z_correction(lx3, ly3);
*/

      float z1 = probe_pt( LOGICAL_X_POSITION(ubl_3_point_1_X), LOGICAL_Y_POSITION(ubl_3_point_1_Y), false, g29_verbose_level),
            z2 = probe_pt( LOGICAL_X_POSITION(ubl_3_point_2_X), LOGICAL_Y_POSITION(ubl_3_point_2_Y), false, g29_verbose_level),
            z3 = probe_pt( LOGICAL_X_POSITION(ubl_3_point_3_X), LOGICAL_Y_POSITION(ubl_3_point_3_Y), true , g29_verbose_level);

      //  We need to adjust z1, z2, z3 by the Mesh Height at these points. Just because they are non-zero doesn't mean
      //  the Mesh is tilted!  (We need to compensate each probe point by what the Mesh says that location's height is)

      z1 -= ubl.get_z_correction(LOGICAL_X_POSITION(ubl_3_point_1_X), LOGICAL_Y_POSITION(ubl_3_point_1_Y));
      z2 -= ubl.get_z_correction(LOGICAL_X_POSITION(ubl_3_point_2_X), LOGICAL_Y_POSITION(ubl_3_point_2_Y));
      z3 -= ubl.get_z_correction(LOGICAL_X_POSITION(ubl_3_point_3_X), LOGICAL_Y_POSITION(ubl_3_point_3_Y));

      do_blocking_move_to_xy((X_MAX_POS - (X_MIN_POS)) / 2.0, (Y_MAX_POS - (Y_MIN_POS)) / 2.0);
      tilt_mesh_based_on_3pts(z1, z2, z3);
    }
SERIAL_PROTOCOLLNPAIR("Checkpoint 3:", free_memory_is_corrupted() );

@thinkyhead
Copy link
Member

thinkyhead commented Apr 10, 2017

So it was crashing? If so, that does tend to look like a compiler problem.

@Roxy-3D
Copy link
Member Author

Roxy-3D commented Apr 10, 2017

So it was crashing? If so, that does tend to look like a compiler problem.

Some times it crashes... Some times things are just fine. It just depends on where the block of corruption happens. I can do things to move the corruption around. And it sure looks and feels like a compiler bug. But I'm very hesitant to say it is just yet. But, declaring things constexpr should not have any impact on things in the middle of the stack!!!!! Like I say.... I'll get it simplified way down, and then let you play with it. But M100 is critical to being able to see what is happening... So.... no changes to M100 until we understand this other problem!!!

@thinkyhead
Copy link
Member

Is M100 I (or M100 by itself) required before you can use M100 C, M100 D, and M100 F?

@Roxy-3D
Copy link
Member Author

Roxy-3D commented Apr 10, 2017

Well... Technically no. But practically speaking... Yes. If you want to see the stack frame from the previous run, you would not want to do a M100 I. The RAM memory inside the AVR does not get reset or cleared from the previous run. So... It is possible to do an 'autopsy' and look at previous stack information if you want.

But practically speaking... you want to do a M100 I first thing.

@thinkyhead
Copy link
Member

Perhaps the handling of I in gcode_M100 should come first, above the others, since the C, F, and D commands cause memory initialization anyway, but only afterward, unseen.

This way you can do M100 I C, M100 I D, and M100 I F too.

@Roxy-3D
Copy link
Member Author

Roxy-3D commented Apr 10, 2017

It could... I just did it that way to make turning off features closer to the top of the file. It probably would be best to have 'I' at the top. BUT... Let's wait a few days to do that. I'm might have a simplified case ready to check in for you to look at soon.

@thinkyhead
Copy link
Member

thinkyhead commented Apr 10, 2017

Ok. I will hold off. I have an update of M100 for you pending in one of my branches. For now I will just submit the commits that remove trailing spaces, fix indentation, and stuff like that.

@Roxy-3D
Copy link
Member Author

Roxy-3D commented Apr 10, 2017

Unfortunately... My simplification isn't causing the problem, and while I'm getting corruption, it is not in the middle of the stack frame. It is much more close to the 'frame pointer' now. But still... You will be able to see the problem.

Do you have a graphical LCD Panel? If so... That would be best because that squeezes the stack space the worst. But here is what you need to duplicate the problem. (I'm waiting until Travis says its OK to merge)

  • Turn on UBL (I'm using a 15x15 grid. It maybe with a 4x4 it is easier to see the middle of the stack frame get zapped. I don't know about that variable yet.)
  • (That implies you have an LCD Panel and a Z-Probe configured and EEPROM turned on)
  • Turn on M100
  • I also have DOUBLE_CLICK_Z_BABYSTEPPING and FILAMENT_CHANGE enabled. You may as well get as close to me as possible.
  • Reset printer
  • M100 I
  • G28
  • M100 F
  • M100 D (Just so you can see the stack frame is intact up until this point)
  • G29 (No options needed...)
  • M100 F (It will detect corruption if it sees the middle of the stack frame altered)
  • M100 D (to dump the stack frame)

I added a G299 command to simplify things. I was hoping that would cause the problem because that would be just a tiny bit of code causing the issue and easier to examine. But that failed. Removing (or shrinking) the G299 function very possibly will move the corruption to the middle of the stack frame for you.

But if that fails.... I can do cut & paste dumps of what I see at each step of the process.

@thinkyhead
Copy link
Member

Sounds mysterious. I'm champing at the bit to solve this one.

@Roxy-3D
Copy link
Member Author

Roxy-3D commented Apr 10, 2017

I'm wondering... I wonder if this is related to the LCD Panel and its indirect function calls ? If you want, I can merge so you can take a look. But I don't want to scream "Compiler Bug!" without knowing for a fact that is what is doing this. I'm definitely seeing corruption... And I can cut and paste the log of everything for you to look at. But I don't have rock solid proof yet.

Arrrrgh! Travis doesn't like something. It will be tomorrow before I can make progress on this. But I'll try to simplify things and get it to a very trivial example that causes the problem to show up.

@thinkyhead
Copy link
Member

thinkyhead commented Apr 11, 2017

See if you can isolate the exact line where the problem manifests and add logging.

One way the stack can blow up if you call idle() (or lcd_synchronize) from within any LCD functions without adding a barrier to prevent recursion. The no_reentrance flag exists to prevent that from happening.

@thinkyhead
Copy link
Member

thinkyhead commented Apr 11, 2017

Do you have a graphical LCD Panel?

Not at the moment. It's still in Oregon waiting to be shipped here.

that squeezes the stack space the worst

Due to U8Glib?

@thinkyhead
Copy link
Member

Arrrrgh! Travis doesn't like something.

Well, that PR needs a huge amount of cleanup, and should be squashed to a single commit before being merged anyway. If you need anything tested, just ping me and I will do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants