Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to visualize simulation? #18

Open
wangshuaizs opened this issue May 23, 2018 · 9 comments
Open

how to visualize simulation? #18

wangshuaizs opened this issue May 23, 2018 · 9 comments

Comments

@wangshuaizs
Copy link

Hi,
On ubuntu OS, pyviz and netanim can be used to visualize simualtion. Is there any tool supported by project ns3-rdma to visualize simulation? I have tried to generate .xml file in simulation, and then open this file in Ubuntu, but netanim told me "This XML format is not supported. Minimum Version:3.106" (the verison of netanim I used is 3.107, the version of ns-3 is 3.26). Do you have any suggestion about visualization?
thank you in advance!

@bobzhuyb
Copy link
Owner

I never tried that. Sorry.

@wangshuaizs
Copy link
Author

OK, thank you anyway!

I have got another trouble. I run a simulation that server node 0 - 126 connect to a broadcom switch, then server node 0 send 1 packet (pay load size =1000) to the rest of each server node. the result prints some warning: " WARNING: Drop because egress Port buffer full, WARNING: Drop because egress Q buffer full, WARNING: Drop because egress SP buffer full", I expected to see retransmission, but I can not find retransimission in mix.tr.

Even when I increase the number of server nodes to 129, which means that server node 0 will send 1 packet to server node 1 - 128, respectively, the main.exe crashes with error message like “0x0000010000001000 access violation occurs when the reading position.”

Does that mean I can not simulation more than 127 flows from one server simultaneously? I have tried to dig in your source code, but I find nothing to support this assumption. Could you please give me some suggestion? Thank you !

@bobzhuyb
Copy link
Owner

The main issue is on the switch node, not on the servers/flows.

I hard-coded a max port number of 64 per switch because this is what we had in practice (64-port switches). You may try to raise this.
https://github.com/bobzhuyb/ns3-rdma/blob/master/src/network/model/broadcom-node.h#L59

Once you raise this, the switch buffer may run out easily -- remember PFC requires certain buffer headroom per port to operate, otherwise PFC cannot prevent packet losses. You may need to reconfigure buffer thresholds/capacity in https://github.com/bobzhuyb/ns3-rdma/blob/master/src/network/model/broadcom-node.cc

If you want to test 128->1 or even more intensive incast, I recommend you to stick with 64-port switches and use multi-hop topology. The congestion point will be at the last hop anyways. Then you don't need to worry about above issues on the switch.

@wangshuaizs
Copy link
Author

@bobzhuyb

I tried to create a topology with 2 servers, named server 0 and server 1, connected to each other directly. And server 1 established 200 rdma flows to server 0 at the same time, but visual studio report errors that said memory access violation. Is it a bug?

Thank you!

@bobzhuyb
Copy link
Owner

I don't remember any hard-coded limitation for the number of flows per server... but I may be wrong. What is the maximum number of flows that does not have this problem? 128? 64?

@wangshuaizs
Copy link
Author

wangshuaizs commented Jun 22, 2018

@bobzhuyb

In my test, 127 flows are ok, but 128 flows aren't.

@hdtjiang
Copy link

the problem is caused by the parameter in point-to-point/model/qbb-net-device.h you will find
static const uint32_t fCnt = 128; // Max number of flows on a NIC, for TX and RX respectively. TX+RX=fCnt*2.
And you can increase this.but there is also a problem. when you finished a flow and start a new flow ,you will find this problem will appear again.Because there is none of queue recovery mechanism.

@bobzhuyb
Copy link
Owner

Thanks @hdtjiang for the explanation. This is indeed something that needs to be improved.

@wangshuaizs
Copy link
Author

Thanks @hdtjiang for your reply. I think the parameter in network/utils/broadcom-egress-queue.h should also be increased accordingly:

static const unsigned fCnt = 128; //max number of queues, 128 for NICs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants