-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unreachable chips #81
Conversation
Pull Request Test Coverage Report for Build 644
💛 - Coveralls |
going to ask a stupid question. but given the spinnman exploration of the machine works by communicating down links and cores. How do you end up with a real machine which has a unreachable chip? |
SpiNNMan exploration doesn't do this any more; the machine has already done some amount of this and we use this information. In any case, we did end up with a real machine in this exact state! I think it is because the links out seem to work, whereas the links in fail, so the chip has spoken to the surrounding chips, but just hasn't been spoken to... |
OK, so I have checked this and it is a bit odd. We now use the P2P table to tell us what chips exist. This is fine, except that we then ask the chip itself for its chip information. If the chip fails to give us this information, we would consider it not working. That the chip is in the P2P table makes sense; the chip has sent out messages to neighbouring chips so this only requires outgoing links. The interesting bit is that we then must have spoken to the chip i.e. sent it P2P messages to get information from it. This has worked, but the links themselves are considered down! There is an oddity here certainly... |
A final comment - the router contains a NN router, a P2P router and an MC router. NN packets come in 2 forms: peek-poke and "normal". The P2P routing tables are built using "normal" NN packets. The links are checked by sending a peek/poke packet, which results in a normal NN response. Thus the issue here could be that the part of the NN router that processes incoming peek/poke packets is actually the bit at fault. This would show up this error, since all traffic would make it to and from the chip, except exactly those packets that we use to check the links! This highlights an issue with using this mechanism to check for working links, especially when we really want to check the multicast path rather than the NN path. This could do with some work therefore, but that doesn't affect this pull request, which gets around this at the software level. |
so basically, this is all a sticky plaster over dodgy machine detection again. i can live with that. |
Pretty much! |
Makes it possible to detect and map out chips that can't be used due to down links.