/
advanced.xml
1705 lines (1482 loc) · 86.9 KB
/
advanced.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<chapter id="user-advanced">
<title>Advanced Concepts</title>
<para>This chapter discusses some of the more advanced concepts of JGroups
with respect to using it and setting it up correctly.</para>
<section>
<title>Using multiple channels</title>
<para>When using a fully virtual synchronous protocol stack, the
performance may not be great because of the larger number of protocols
present. For certain applications, however, throughput is more important
than ordering, e.g. for video/audio streams or airplane tracking. In the
latter case, it is important that airplanes are handed over between
control domains correctly, but if there are a (small) number of radar
tracking messages (which determine the exact location of the plane)
missing, it is not a problem. The first type of messages do not occur very
often (typically a number of messages per hour), whereas the second type
of messages would be sent at a rate of 10-30 messages/second. The same
applies for a distributed whiteboard: messages that represent a video or
audio stream have to be delivered as quick as possible, whereas messages
that represent figures drawn on the whiteboard, or new participants
joining the whiteboard have to be delivered according to a certain
order.</para>
<para>The requirements for such applications can be solved by using two
separate stacks: one for control messages such as group membership, floor
control etc and the other one for data messages such as video/audio
streams (actually one might consider using one channel for audio and one
for video). The control channel might use virtual synchrony, which is
relatively slow, but enforces ordering and retransmission, and the data
channel might use a simple UDP channel, possibly including a fragmentation
layer, but no retransmission layer (losing packets is preferred to costly
retransmission).</para>
<para>The <classname>Draw2Channels</classname> demo program (in the
<classname>org.jgroups.demos</classname> package) demonstrates how to use
two different channels.</para>
</section>
<section>
<title id="SharedTransport">The shared transport: sharing a transport between multiple channels in a JVM</title>
<para>
To save resources (threads, sockets and CPU cycles), transports of channels residing within the same
JVM can be shared. If we have 4 channels inside of a JVM (as is the case in an application server
such as JBoss), then we have 4 separate thread pools and sockets (1 per transport, and there are 4
transports (1 per channel)).
</para>
<para>
If those transport happen to be the same (all 4 channels use UDP, for example), then we can share them and
only create 1 instance of UDP. That transport instance is created and started only once, when the first
channel is created, and is deleted when the last channel is closed.
</para>
<para>
Each channel created over a shared transport has to join a different cluster. An exception will be thrown
if a channel sharing a transport tries to connect to a cluster to which another channel over the same
transport is already connected.
</para>
<para>
When we have 3 channels (C1 connected to "cluster-1", C2 connected to "cluster-2" and C3 connected to
"cluster-3") sending messages over the same shared transport, the cluster name
with which the channel connected is used to multiplex messages over the shared transport: a header with
the cluster name ("cluster-1") is added when C1 sends a message.
</para>
<para>
When a message with a header of "cluster-1" is received by the shared transport, it is used to demultiplex
the message and dispatch it to the right channel (C1 in this example) for processing.
</para>
<para>
How channels can share a single transport is shown in <xref linkend="SharedTransportFig"/>.
</para>
<figure id="SharedTransportFig"><title>A shared transport</title>
<graphic fileref="images/SharedTransport.png" format="PNG" align="center" />
</figure>
<para>
Here we see 4 channels which share 2 transports. Note that first 3 channels which share transport
"tp_one" have the same protocols on top of the shared transport. This is <emphasis>not</emphasis>
required; the protocols above "tp_one" could be different for each of the 3 channels as long
as all applications residing on the same shared transport have the same requirements for the transport's
configuration.
</para>
<para>
To use shared transports, all we need to do is to add a property "singleton_name" to the transport
configuration. All channels with the same singleton name will be shared.
</para>
</section>
<section>
<title>Transport protocols</title>
<para>A <emphasis>transport protocol</emphasis> refers to the protocol at
the bottom of the protocol stack which is responsible for sending and
receiving messages to/from the network. There are a number of transport
protocols in JGroups. They are discussed in the following sections.</para>
<para>A typical protocol stack configuration using UDP is:</para>
<screen>
<config>
<UDP
mcast_addr="${jgroups.udp.mcast_addr:228.10.10.10}"
mcast_port="${jgroups.udp.mcast_port:45588}"
discard_incompatible_packets="true"
max_bundle_size="60000"
max_bundle_timeout="30"
ip_ttl="${jgroups.udp.ip_ttl:2}"
enable_bundling="true"
thread_pool.enabled="true"
thread_pool.min_threads="1"
thread_pool.max_threads="25"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="false"
thread_pool.queue_max_size="100"
thread_pool.rejection_policy="Run"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="1"
oob_thread_pool.max_threads="8"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="Run"/>
<PING timeout="2000"
num_initial_members="3"/>
<MERGE2 max_interval="30000"
min_interval="10000"/>
<FD_SOCK/>
<FD timeout="10000" max_tries="5" shun="true"/>
<VERIFY_SUSPECT timeout="1500" />
<pbcast.NAKACK
use_mcast_xmit="false" gc_lag="0"
retransmit_timeout="300,600,1200,2400,4800"
discard_delivered_msgs="true"/>
<UNICAST timeout="300,600,1200,2400,3600"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
max_bytes="400000"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"
shun="false"
view_bundling="true"/>
<FC max_credits="20000000"
min_threshold="0.10"/>
<FRAG2 frag_size="60000" />
<pbcast.STATE_TRANSFER />
</config>
</screen>
<para>In a nutshell the properties of the protocols are:</para>
<variablelist>
<varlistentry>
<term>UDP</term>
<listitem>
<para>This is the transport protocol. It uses IP multicasting to send messages to the entire cluster, or
individual nodes. Other transports include TCP, TCP_NIO and TUNNEL.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>PING</term>
<listitem>
<para>Uses IP multicast (by default) to find initial members. Once
found, the current coordinator can be determined and a unicast JOIN
request will be sent to it in order to join the cluster.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>MERGE2</term>
<listitem>
<para>Will merge subgroups back into one group, kicks in after a cluster partition.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>FD_SOCK</term>
<listitem>
<para>Failure detection based on sockets (in a ring form between
members). Generates notification if a member fails</para>
</listitem>
</varlistentry>
<varlistentry>
<term>FD</term>
<listitem>
<para>Failure detection based on heartbeats and are-you-alive messages (in a ring form between
members). Generates notification if a member fails</para>
</listitem>
</varlistentry>
<varlistentry>
<term>VERIFY_SUSPECT</term>
<listitem>
<para>Double-checks whether a suspected member is really dead,
otherwise the suspicion generated from protocol below is discarded</para>
</listitem>
</varlistentry>
<varlistentry>
<term>pbcast.NAKACK</term>
<listitem>
<para>Ensures (a) message reliability and (b) FIFO. Message
reliability guarantees that a message will be received. If not,
the receiver(s) will request retransmission. FIFO guarantees that all
messages from sender P will be received in the order P sent them</para>
</listitem>
</varlistentry>
<varlistentry>
<term>UNICAST</term>
<listitem>
<para>Same as NAKACK for unicast messages: messages from sender P
will not be lost (retransmission if necessary) and will be in FIFO
order (conceptually the same as TCP in TCP/IP)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>pbcast.STABLE</term>
<listitem>
<para>Deletes messages that have been seen by all members (distributed message garbage collection)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>pbcast.GMS</term>
<listitem>
<para>Membership protocol. Responsible for joining/leaving members and installing new views.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>FRAG2</term>
<listitem>
<para>Fragments large messages into smaller ones and reassembles
them back at the receiver side. For both multicast and unicast messages</para>
</listitem>
</varlistentry>
<varlistentry>
<term>STATE_TRANSFER</term>
<listitem>
<para>
Ensures that state is correctly transferred from an existing member (usually the coordinator) to a
new member.
</para>
</listitem>
</varlistentry>
</variablelist>
<section>
<title>UDP</title>
<para>UDP uses IP multicast for sending messages to all members of a
group and UDP datagrams for unicast messages (sent to a single member).
When started, it opens a unicast and multicast socket: the unicast
socket is used to send/receive unicast messages, whereas the multicast
socket sends/receives multicast messages. The channel's address will be
the address and port number of the <emphasis>unicast</emphasis>
socket.</para>
<section>
<title>Using UDP and plain IP multicasting</title>
<para>A protocol stack with UDP as transport protocol is typically
used with groups whose members run on the same host or are distributed
across a LAN. Before running such a stack a programmer has to ensure
that IP multicast is enabled across subnets. It is often the case that
IP multicast is not enabled across subnets. Refer to section <xref
linkend="ItDoesntWork" /> for running a test program that determines
whether members can reach each other via IP multicast. If this does
not work, the protocol stack cannot use UDP with IP multicast as
transport. In this case, the stack has to either use UDP without IP
multicasting or other transports such as TCP.</para>
</section>
<section id="IpNoMulticast">
<title>Using UDP without IP multicasting</title>
<para>The protocol stack with UDP and PING as the bottom protocols use
IP multicasting by default to send messages to all members (UDP) and
for discovery of the initial members (PING). However, if multicasting
cannot be used, the UDP and PING protocols can be configured to send
multiple unicast messages instead of one multicast message <footnote>
<para>Although not as efficient (and using more bandwidth), it is
sometimes the only possibility to reach group members.</para>
</footnote> (UDP) and to access a well-known server (
<emphasis>GossipRouter</emphasis> ) for initial membership information
(PING).</para>
<para>To configure UDP to use multiple unicast messages to send a
group message instead of using IP multicasting, the
<parameter>ip_mcast</parameter> property has to be set to
<literal>false</literal> .</para>
<para>To configure PING to access a GossipRouter instead of using IP
multicast the following properties have to be set:</para>
<variablelist>
<varlistentry>
<term>gossip_host</term>
<listitem>
<para>The name of the host on which GossipRouter is
started</para>
</listitem>
</varlistentry>
<varlistentry>
<term>gossip_port</term>
<listitem>
<para>The port on which GossipRouter is listening</para>
</listitem>
</varlistentry>
<varlistentry>
<term>gossip_refresh</term>
<listitem>
<para>The number of milliseconds to wait until refreshing our
address entry with the GossipRouter</para>
</listitem>
</varlistentry>
</variablelist>
<para>Before any members are started the GossipRouter has to be
started, e.g.</para>
<screen>
java org.jgroups.stack.GossipRouter -port 5555 -bindaddress localhost
</screen>
<para>This starts the GossipRouter on the local host on port 5555. The
GossipRouter is essentially a lookup service for groups and members.
It is a process that runs on a well-known host and port and accepts
GET(group) and REGISTER(group, member) requests. The REGISTER request
registers a member's address and group with the GossipRouter. The GET
request retrieves all member addresses given a group name. Each member
has to periodically ( <parameter>gossip_refresh</parameter> )
re-register their address with the GossipRouter, otherwise the entry
for that member will be removed (accommodating for crashed
members).</para>
<para>The following example shows how to disable the use of IP
multicasting and use a GossipRouter instead. Only the bottom two
protocols are shown, the rest of the stack is the same as in the
previous example:</para>
<screen>
<UDP ip_mcast="false" mcast_addr="224.0.0.35" mcast_port="45566" ip_ttl="32"
mcast_send_buf_size="150000" mcast_recv_buf_size="80000"/>
<PING gossip_host="localhost" gossip_port="5555" gossip_refresh="15000"
timeout="2000" num_initial_members="3"/>
</screen>
<para>The property <parameter>ip_mcast</parameter> is set to
<literal>false</literal> in <classname>UDP</classname> and the gossip
properties in <classname>PING</classname> define the GossipRouter to
be on the local host at port 5555 with a refresh rate of 15 seconds.
If PING is parameterized with the GossipRouter's address
<emphasis>and</emphasis> port, then gossiping is enabled, otherwise it
is disabled. If only one parameter is given, gossiping will be
<emphasis>disabled</emphasis> .</para>
<para>Make sure to run the GossipRouter before starting any members,
otherwise the members will not find each other and each member will
form its own group <footnote>
<para>This can actually be used to test the MERGE2 protocol: start
two members (forming two singleton groups because they don't find
each other), then start the GossipRouter. After some time, the two
members will merge into one group</para>
</footnote> .</para>
</section>
</section>
<section>
<title>TCP</title>
<para>TCP is a replacement of UDP as bottom layer in cases where IP
Multicast based on UDP is not desired. This may be the case when
operating over a WAN, where routers will discard IP MCAST. As a rule of
thumb UDP is used as transport for LANs, whereas TCP is used for
WANs.</para>
<para>The properties for a typical stack based on TCP might look like
this (edited/protocols removed for brevity):</para>
<screen>
<TCP start_port="7800" />
<TCPPING timeout="3000"
initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800],localhost[7801]}"
port_range="1"
num_initial_members="3"/>
<VERIFY_SUSPECT timeout="1500" />
<pbcast.NAKACK
use_mcast_xmit="false" gc_lag="0"
retransmit_timeout="300,600,1200,2400,4800"
discard_delivered_msgs="true"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
max_bytes="400000"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"
shun="true"
view_bundling="true"/>
</screen>
<variablelist>
<varlistentry>
<term>TCP</term>
<listitem>
<para>The transport protocol, uses TCP (from TCP/IP) to send
unicast and multicast messages. In the latter case, it sends
multiple unicast messages.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>TCPPING</term>
<listitem>
<para>Discovers the initial membership to determine coordinator.
Join request will then be sent to coordinator.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>VERIFY_SUSPECT</term>
<listitem>
<para>Double checks that a suspected member is really dead</para>
</listitem>
</varlistentry>
<varlistentry>
<term>pbcast.NAKACK</term>
<listitem>
<para>Reliable and FIFO message delivery</para>
</listitem>
</varlistentry>
<varlistentry>
<term>pbcast.STABLE</term>
<listitem>
<para>Distributed garbage collection of messages seen by all
members</para>
</listitem>
</varlistentry>
<varlistentry>
<term>pbcast.GMS</term>
<listitem>
<para>Membership services. Takes care of joining and removing
new/old members, emits view changes</para>
</listitem>
</varlistentry>
</variablelist>
<para>Since TCP already offers some of the reliability guarantees that
UDP doesn't, some protocols (e.g. FRAG and UNICAST) are not needed on
top of TCP.</para>
<para>When using TCP, each message to the group is sent as multiple
unicast messages (one to each member). Due to the fact that IP
multicasting cannot be used to discover the initial members, another
mechanism has to be used to find the initial membership. There are a
number of alternatives:</para>
<itemizedlist>
<listitem>
<para>PING with GossipRouter: same solution as described in <xref
linkend="IpNoMulticast" /> . The <parameter>ip_mcast</parameter>
property has to be set to <literal>false</literal> . GossipRouter
has to be started before the first member is started.</para>
</listitem>
<listitem>
<para>TCPPING: uses a list of well-known group members that it
solicits for initial membership</para>
</listitem>
<listitem>
<para>TCPGOSSIP: essentially the same as the above PING <footnote>
<para>PING and TCPGOSSIP will be merged in the future.</para>
</footnote> . The only difference is that TCPGOSSIP allows for
multiple GossipRouters instead of only one.</para>
</listitem>
</itemizedlist>
<para>The next two section illustrate the use of TCP with both TCPPING
and TCPGOSSIP.</para>
<section>
<title>Using TCP and TCPPING</title>
<para>A protocol stack using TCP and TCPPING looks like this (other
protocols omitted):</para>
<screen>
<TCP start_port="7800" /> +
<TCPPING initial_hosts="HostA[7800],HostB[7800]" port_range="5"
timeout="3000" num_initial_members="3" />
</screen>
<para>The concept behind TCPPING is that no external daemon such as
GossipRouter is needed. Instead some selected group members assume the
role of well-known hosts from which initial membership information can
be retrieved. In the example <parameter>HostA</parameter> and
<parameter>HostB</parameter> are designated members that will be used
by TCPPING to lookup the initial membership. The property
<parameter>start_port</parameter> in <classname>TCP</classname> means
that each member should try to assign port 7800 for itself. If this is
not possible it will try the next higher port (
<literal>7801</literal> ) and so on, until it finds an unused
port.</para>
<para><classname>TCPPING</classname> will try to contact both
<parameter>HostA</parameter> and <parameter>HostB</parameter> ,
starting at port <literal>7800</literal> and ending at port
<literal>7800 + port_range</literal> , in the above example ports
<literal>7800</literal> - <literal>7804</literal> . Assuming that at
least one of <parameter>HostA</parameter> or
<parameter>HostB</parameter> is up, a response will be received. To be
absolutely sure to receive a response all the hosts on which members
of the group will be running can be added to the configuration
string.</para>
</section>
<section>
<title>Using TCP and TCPGOSSIP</title>
<para>As mentioned before <classname>TCPGOSSIP</classname> is
essentially the same as <classname>PING</classname> with properties
<parameter>gossip_host</parameter> ,
<parameter>gossip_port</parameter> and
<parameter>gossip_refresh</parameter> set. However, in TCPGOSSIP these
properties are called differently as shown below (only the bottom two
protocols are shown):</para>
<screen>
<TCP />
<TCPGOSSIP initial_hosts="localhost[5555],localhost[5556]" gossip_refresh_rate="10000"
num_initial_members="3" />
</screen>
<para>The <parameter>initial_hosts</parameter> properties combines
both the host and port of a GossipRouter, and it is possible to
specify more than one GossipRouter. In the example there are two
GossipRouters at ports <literal>5555</literal> and
<literal>5556</literal> on the local host. Also,
<parameter>gossip_refresh_rate</parameter> defines how many
milliseconds to wait between refreshing the entry with the
GossipRouters.</para>
<para>The advantage of having multiple GossipRouters is that, as long
as at least one is running, new members will always be able to
retrieve the initial membership. Note that the GossipRouter should be
started before any of the members.</para>
</section>
</section>
<section>
<title>TUNNEL</title>
<section>
<title>Using TUNNEL to tunnel a firewall</title>
<para>Firewalls are usually placed at the connection to the internet.
They shield local networks from outside attacks by screening incoming
traffic and rejecting connection attempts to host inside the firewalls
by outside machines. Most firewall systems allow hosts inside the
firewall to connect to hosts outside it (outgoing traffic), however,
incoming traffic is most often disabled entirely.</para>
<para><emphasis>Tunnels</emphasis> are host protocols which
encapsulate other protocols by multiplexing them at one end and
demultiplexing them at the other end. Any protocol can be tunneled by
a tunnel protocol.</para>
<para>The most restrictive setups of firewalls usually disable
<emphasis>all</emphasis> incoming traffic, and only enable a few
selected ports for outgoing traffic. In the solution below, it is
assumed that one TCP port is enabled for outgoing connections to the GossipRouter.</para>
<para>JGroups has a mechanism that allows a programmer to tunnel a
firewall. The solution involves a GossipRouter, which has to be outside of the firewall,
so other members (possibly also behind firewalls) can access it.</para>
<para>The solution works as follows. A channel inside a firewall has
to use protocol TUNNEL instead of UDP or TCP as bottommost layer. Recommended
discovery protocol is PING, starting with 2.8 release, you do not have to specify
any gossip routers in PING. </para>
<screen>
<TUNNEL gossip_router_hosts="127.0.0.1[12001]" />
<PING />
</screen>
<para><classname>TCPGOSSIP</classname> uses the GossipRouter (outside
the firewall) at port <literal>12001</literal> to register its address
(periodically) and to retrieve the initial membership for its
group. It is not recommended to use TCPGOSSIP for discovery if TUNNEL is
already used. TCPGOSSIP might be used in rare scenarios when registration and
initial member discovery <emphasis>has to be done </emphasis>through gossip
router indepedent of transport protocol being used. Starting with 2.8 release
TCPGOSSIP accepts one or multiple router hosts as a comma delimited list
of host[port] elements specified in a property initial_hosts.</para>
<para><classname>TUNNEL</classname> establishes a TCP connection to the
<emphasis>GossipRouter</emphasis> process (also outside the firewall) that
accepts messages from members and passes them on to other members.
This connection is initiated by the host inside the firewall and
persists as long as the channel is connected to a group. GossipRouter will
use the <emphasis>same connection</emphasis> to send incoming messages
to the channel that initiated the connection. This is perfectly legal,
as TCP connections are fully duplex. Note that, if GossipRouter tried to
establish its own TCP connection to the channel behind the firewall,
it would fail. But it is okay to reuse the existing TCP connection,
established by the channel.</para>
<para>Note that <classname>TUNNEL</classname> has to be given the
hostname and port of the GossipRouter process. This example assumes a GossipRouter
is running on the local host at port <literal>12001</literal>. Both
TUNNEL and TCPGOSSIP (or PING) access the same GossipRouter.
Starting with 2.8 release TUNNEL transport layer accepts one or multiple router
hosts as a comma delimited list of host[port] elements specified in a
property gossip_router_hosts.</para>
<para>Any time a message has to be sent, TUNNEL forwards the message
to GossipRouter, which distributes it to its destination: if the message's
destination field is null (send to all group members), then GossipRouter
looks up the members that belong to that group and forwards the
message to all of them via the TCP connection they established when
connecting to GossipRouter. If the destination is a valid member address,
then that member's TCP connection is looked up, and the message is
forwarded to it <footnote>
<para>To do so, GossipRouter has to maintain a table between groups,
member addresses and TCP connections.</para>
</footnote> .</para>
<para>
Starting with 2.8 release, gossip router is no longer a single
point of failure. In a set-up with multiple gossip routers, routers do
not communicate among themselves, and single point of failure is avoided
by having each channel simply connect to multiple available routers. In
case one or more routers go down, cluster members are still able to
exchange message through remaining available router instances, if there
are any.
For each send invocation, a channel goes through a list of available
connections to routers and attempts to send a message on each connection
until it succeeds. If a message could not be sent on any of the
connections – an exception is raised. Default policy for connection
selection is random. However, we also provide an plug-in interface for
other policies as well.
Gossip router configuration is static and is not updated for the
lifetime of the channel. A list of available routers has to be provided
in channel configuration file.</para>
<para>To tunnel a firewall using JGroups, the following steps have to
be taken:</para>
<orderedlist>
<listitem>
<para>Check that a TCP port (e.g. 12001) is enabled in
the firewall for outgoing traffic</para>
</listitem>
<listitem>
<para>Start the GossipRouter:</para>
<screen>
start org.jgroups.stack.GossipRouter -port 12001
</screen>
</listitem>
<listitem>
<para>Configure the TUNNEL protocol layer as instructed
above.</para>
</listitem>
<listitem>
<para>Create a channel</para>
</listitem>
</orderedlist>
<para>The general setup is shown in <xref linkend="TunnelingFig" />
.</para>
<figure id="TunnelingFig">
<title>Tunneling a firewall</title>
<mediaobject>
<imageobject>
<imagedata align="center" fileref="images/Tunneling.png" />
</imageobject>
<textobject>
<phrase>A diagram representing tunneling a firewall.</phrase>
</textobject>
</mediaobject>
</figure>
<para>First, the GossipRouter process is created on host
B. Note that host B should be outside the firewall, and all channels in
the same group should use the same GossipRouter process.
When a channel on host A is created, its
<classname>TCPGOSSIP</classname> protocol will register its address
with the GossipRouter and retrieve the initial membership (assume this
is C). Now, a TCP connection with the GossipRouter is established by A; this
will persist until A crashes or voluntarily leaves the group. When A
multicasts a message to the group, GossipRouter looks up all group members
(in this case, A and C) and forwards the message to all members, using
their TCP connections. In the example, A would receive its own copy of
the multicast message it sent, and another copy would be sent to
C.</para>
<para>This scheme allows for example <emphasis>Java applets</emphasis>
, which are only allowed to connect back to the host from which they
were downloaded, to use JGroups: the HTTP server would be located on
host B and the gossip and GossipRouter daemon would also run on that host.
An applet downloaded to either A or C would be allowed to make a TCP
connection to B. Also, applications behind a firewall would be able to
talk to each other, joining a group.</para>
<para>However, there are several drawbacks: first, having to maintain a TCP connection for the
duration of the connection might use up resources in the host system
(e.g. in the GossipRouter), leading to scalability problems, second, this
scheme is inappropriate when only a few channels are located behind
firewalls, and the vast majority can indeed use IP multicast to
communicate, and finally, it is not always possible to enable outgoing
traffic on 2 ports in a firewall, e.g. when a user does not 'own' the
firewall.</para>
</section>
</section>
</section>
<section>
<title>The concurrent stack</title>
<para>
The concurrent stack (introduced in 2.5) provides a number of improvements over previous releases,
which has some deficiencies:
<itemizedlist>
<listitem>
Large number of threads: each protocol had by default 2 threads, one for the up and one for the
down queue. They could be disabled per protocol by setting up_thread or down_thread to false.
In the new model, these threads have been removed.
</listitem>
<listitem>
Sequential delivery of messages: JGroups used to have a single queue for incoming messages,
processed by one thread. Therefore, messages from different senders were still processed in
FIFO order. In 2.5 these messages can be processed in parallel.
</listitem>
<listitem>
Out-of-band messages: when an application doesn't care about the ordering properties of a message,
the OOB flag can be set and JGroups will deliver this particular message without regard for any
ordering.
</listitem>
</itemizedlist>
</para>
<section>
<title>Overview</title>
<para>
The architecture of the concurrent stack is shown in <xref linkend="ConcurrentStackFig"/>. The changes
were made entirely inside of the transport protocol (TP, with subclasses UDP, TCP and TCP_NIO). Therefore,
to configure the concurrent stack, the user has to modify the config for (e.g.) UDP in the XML file.
</para>
<para>
<figure id="ConcurrentStackFig"><title>The concurrent stack</title>
<graphic fileref="images/ConcurrentStack.png" format="PNG" align="left" />
</figure>
</para>
<para>
</para>
<para>
The concurrent stack consists of 2 thread pools (java.util.concurrent.Executor): the out-of-band (OOB)
thread pool and the regular thread pool. Packets are received by multicast or unicast receiver threads
(UDP) or a ConnectionTable (TCP, TCP_NIO). Packets marked as OOB (with Message.setFlag(Message.OOB)) are
dispatched to the OOB thread pool, and all other packets are dispatched to the regular thread pool.
</para>
<para>
When a thread pool is disabled, then we use the thread of the caller (e.g. multicast or unicast
receiver threads or the ConnectionTable) to send the message up the stack and into the application.
Otherwise, the packet will be processed by a thread from the thread pool, which sends the message up
the stack. When all current threads are busy, another thread might be created, up to the maximum number
of threads defined. Alternatively, the packet might get queued up until a thread becomes available.
</para>
<para>
The point of using a thread pool is that the receiver threads should only receive the packets and forward
them to the thread pools for processing, because unmarshalling and processing is slower than simply
receiving the message and can benefit from parallelization.
</para>
<section>
<title>Configuration</title>
<para>Note that this is preliminary and names or properties might change</para>
<para>
We are thinking of exposing the thread pools programmatically, meaning that a developer might be able to set both
threads pools programmatically, e.g. using something like TP.setOOBThreadPool(Executor executor).
</para>
<para>
Here's an example of the new configuration:
<screen>
<![CDATA[
<UDP
mcast_addr="228.10.10.10"
mcast_port="45588"
thread_pool.enabled="true"
thread_pool.min_threads="1"
thread_pool.max_threads="100"
thread_pool.keep_alive_time="20000"
thread_pool.queue_enabled="false"
thread_pool.queue_max_size="10"
thread_pool.rejection_policy="Run"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="1"
oob_thread_pool.max_threads="4"
oob_thread_pool.keep_alive_time="30000"
oob_thread_pool.queue_enabled="true"
oob_thread_pool.queue_max_size="10"
oob_thread_pool.rejection_policy="Run"/>
]]>
</screen>
</para>
<para>
The attributes for the 2 thread pools are prefixed with thread_pool and oob_thread_pool respectively.
</para>
<para>
The attributes are listed below. The roughly correspond to the options of a
java.util.concurrent.ThreadPoolExecutor in JDK 5.
<table>
<title>Attributes of thread pools</title>
<tgroup cols="2">
<colspec align="left" />
<thead>
<row>
<entry align="center">Name</entry>
<entry align="center">Description</entry>
</row>
</thead>
<tbody>
<row>
<entry>enabled</entry>
<entry>Whether of not to use a thread pool. If set to false, the caller's thread
is used.</entry>
</row>
<row>
<entry>min_threads</entry>
<entry>The minimum number of threads to use.</entry>
</row>
<row>
<entry>max_threads</entry>
<entry>The maximum number of threads to use.</entry>
</row>
<row>
<entry>keep_alive_time</entry>
<entry>Number of milliseconds until an idle thread is removed from the pool</entry>
</row>
<row>
<entry>queue_enabled</entry>
<entry>Whether of not to use a (bounded) queue. If enabled, when all minimum
threads are busy, work items are added to the queue. When the queue is full,
additional threads are created, up to max_threads. When max_threads have been
reached, the rejection policy is consulted.</entry>
</row>
<row>
<entry>max_size</entry>
<entry>The maximum number of elements in the queue. Ignored if the queue is
disabled</entry>
</row>
<row>
<entry>rejection_policy</entry>
<entry>Determines what happens when the thread pool (and queue, if enabled) is
full. The default is to run on the caller's thread. "Abort" throws an runtime
exception. "Discard" discards the message, "DiscardOldest" discards the
oldest entry in the queue. Note that these values might change, for example a
"Wait" value might get added in the future.</entry>
</row>
<row>
<entry>thread_naming_pattern</entry>
<entry>Determines how threads are named that are running from thread pools in
concurrent stack. Valid values include any combination of "cl" letters, where
"c" includes the cluster name and "l" includes local address of the channel.
The default is "cl"
</entry>
</row>
</tbody>
</tgroup>
</table>
</para>
</section>
</section>
<section>
<title>Elimination of up and down threads</title>
<para>
By removing the 2 queues/protocol and the associated 2 threads, we effectively reduce the number of
threads needed to handle a message, and thus context switching overhead. We also get clear and unambiguous
semantics for Channel.send(): now, all messages are sent down the stack on the caller's thread and
the send() call only returns once the message has been put on the network. In addition, an exception will
only be propagated back to the caller if the message has not yet been placed in a retransmit buffer.
Otherwise, JGroups simply logs the error message but keeps retransmitting the message. Therefore,
if the caller gets an exception, the message should be re-sent.
</para>
<para>
On the receiving side, a message is handled by a thread pool, either the regular or OOB thread pool. Both
thread pools can be completely eliminated, so that we can save even more threads and thus further
reduce context switching. The point is that the developer is now able to control the threading behavior
almost completely.
</para>
</section>
<section>
<title>Concurrent message delivery</title>
<para>
Up to version 2.5, all messages received were processed by a single thread, even if the messages were
sent by different senders. For instance, if sender A sent messages 1,2 and 3, and B sent message 34 and 45,
and if A's messages were all received first, then B's messages 34 and 35 could only be processed after
messages 1-3 from A were processed !
</para>
<para>
Now, we can process messages from different senders in parallel, e.g. messages 1, 2 and 3 from A can be
processed by one thread from the thread pool and messages 34 and 35 from B can be processed on a different
thread.
</para>
<para>
As a result, we get a speedup of almost N for a cluster of N if every node is sending messages and we
configure the thread pool to have at least N threads. There is actually a unit test
(ConcurrentStackTest.java) which demonstrates this.
</para>
</section>
<section id="Scopes">
<title>Scopes: concurrent message delivery for messages from the same sender</title>
<para>
In the previous paragraph, we showed how the concurrent stack delivers messages from different senders
concurrently. But all (non-OOB) messages from the same sender P are delivered in the order in which
P sent them. However, this is not good enough for certain types of applications.
</para>
<para>
Consider the case of an application which replicates HTTP sessions. If we have sessions X, Y and Z, then
updates to these sessions are delivered in the order in which there were performed, e.g. X1, X2, X3,
Y1, Z1, Z2, Z3, Y2, Y3, X4. This means that update Y1 has to wait until updates X1-3 have been delivered.
If these updates take some time, e.g. spent in lock acquisition or deserialization, then all subsequent
messages are delayed by the sum of the times taken by the messages ahead of them in the delivery order.