Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WR port of node remains down after power cycle of node AND WR switch #51

Closed
dietrichb opened this issue Oct 23, 2017 · 6 comments
Closed

Comments

@dietrichb
Copy link
Contributor

There seems to be an annoying bug that seems to occur when a node (SCU) and WRS are switched-on simultaneously after a power cut.

The symptoms are the following

  • PPS LED not blinking, activity LED not blinking, link LED off
  • eb-mon -v dev/wbm0 shows "LINK_DOWN" and "NO_SYNC"
  • eb-console dev/wbm0 causes freezing of the ssh shell
  • node fails to get an IP via BOOTP
  • (but the WRS shows both "link up" and "activity" LEDs)
  • node is not accessible via the WR network
  • resetting the FPGA of the node via its Reset controller is possible and cures the symptom.

Suspicion: The FPGA of the node is much faster with "booting" compared to the WRS. It somehow misses to detect "link up" after WRS starts and remains trapped in "link down" state.

This issue is causing real annoyance in cases were major parts of the facility need to be recovered after a major power-cut.

Maybe this is linked to another issue:
#50

@bradomyn
Copy link
Contributor

I have updated the report of the issue #50
The issues #50 happens only in Arria V platforms.

@bradomyn
Copy link
Contributor

The suspicion:
Suspicion: The FPGA of the node is much faster with "booting" compared to the WRS. It somehow misses to detect "link up" after WRS starts and remains trapped in "link down" state.

The link detection works independently who has booted first. If so, we should've seen this issue also in the lab or testing facility. This effect hasn't been seen so far in the lab, and the GTX of the Arria II platform hasn't change since years... I would check the reset circuit of the GTX in the SCU, a lot of things have been changed around...

@dietrichb
Copy link
Contributor Author

Same problem occurs if the optical fibre is disconnected or reconnected between switch and node

Note:

  • tried with doomsday on pexaria -> issue
  • triied with doomsday on exploder -> it works!

(both using gateware from same nightly build)

@dietrichb
Copy link
Contributor Author

solved for Arria 5
not solved for Arria II (SCU and Vetar)

a fix for Arria would require a major effort

@dietrichb
Copy link
Contributor Author

update (January 2021): in rare cases this is also observed with fallout gateware

@alyxazon
Copy link
Collaborator

See #309

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants