Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreachable chips #81

Merged
merged 3 commits into from
Aug 9, 2018
Merged

Unreachable chips #81

merged 3 commits into from
Aug 9, 2018

Conversation

rowleya
Copy link
Member

@rowleya rowleya commented Jul 17, 2018

Makes it possible to detect and map out chips that can't be used due to down links.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 644

  • 31 of 36 (86.11%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.2%) to 95.009%

Changes Missing Coverage Covered Lines Changed/Added Lines %
spinn_machine/virtual_machine.py 2 4 50.0%
spinn_machine/machine.py 29 32 90.63%
Totals Coverage Status
Change from base Build 642: -0.2%
Covered Lines: 1085
Relevant Lines: 1142

💛 - Coveralls

@alan-stokes
Copy link
Contributor

going to ask a stupid question. but given the spinnman exploration of the machine works by communicating down links and cores. How do you end up with a real machine which has a unreachable chip?

@rowleya
Copy link
Member Author

rowleya commented Jul 24, 2018

SpiNNMan exploration doesn't do this any more; the machine has already done some amount of this and we use this information. In any case, we did end up with a real machine in this exact state! I think it is because the links out seem to work, whereas the links in fail, so the chip has spoken to the surrounding chips, but just hasn't been spoken to...

@rowleya
Copy link
Member Author

rowleya commented Jul 24, 2018

OK, so I have checked this and it is a bit odd. We now use the P2P table to tell us what chips exist. This is fine, except that we then ask the chip itself for its chip information. If the chip fails to give us this information, we would consider it not working. That the chip is in the P2P table makes sense; the chip has sent out messages to neighbouring chips so this only requires outgoing links. The interesting bit is that we then must have spoken to the chip i.e. sent it P2P messages to get information from it. This has worked, but the links themselves are considered down! There is an oddity here certainly...

@rowleya
Copy link
Member Author

rowleya commented Jul 24, 2018

A final comment - the router contains a NN router, a P2P router and an MC router. NN packets come in 2 forms: peek-poke and "normal". The P2P routing tables are built using "normal" NN packets. The links are checked by sending a peek/poke packet, which results in a normal NN response. Thus the issue here could be that the part of the NN router that processes incoming peek/poke packets is actually the bit at fault. This would show up this error, since all traffic would make it to and from the chip, except exactly those packets that we use to check the links!

This highlights an issue with using this mechanism to check for working links, especially when we really want to check the multicast path rather than the NN path. This could do with some work therefore, but that doesn't affect this pull request, which gets around this at the software level.

@alan-stokes
Copy link
Contributor

so basically, this is all a sticky plaster over dodgy machine detection again. i can live with that.

@rowleya
Copy link
Member Author

rowleya commented Jul 24, 2018

Pretty much!

@Christian-B Christian-B merged commit 789700a into master Aug 9, 2018
@Christian-B Christian-B deleted the unreachable_chips branch August 9, 2018 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants