Skip to content

Everflow High Level Design

Shuotian Cheng edited this page Jul 4, 2019 · 3 revisions

Everflow in SONiC

High Level Design Document

Revision 0.1

Revision 0.2 2019/07/01 In-progress

Table of Contents

List of Tables

Table 1: Revision
Rev Date Author Change Description
0.1 Oleksandr Ivantsiv Initial version

About this Manual

This document provides general information about the Everflow feature implementation in SONiC

Scope

This document describes the high level design of the Everflow feature.

Definitions/Abbreviation

Table 2: Abbreviations
Definitions/Abbreviation Description
ACL Access Control List
API Application Programmable Interface
SAI Swich Abstraction Interface
ERSPAN Encapsulated Remote Switched Port Analysis
JSON

1 Sub-system Overview

1.1 System Chart

Following diagram describes a top level overview of the SONiC Switch components:

1.2 Modules description

1.2.1 swssconfig

Reads prepared json-files with configuration and injects it into App DB.

1.2.2 App DB

Located in the Redis DB instance #0 running inside the "database" container. Redis DB works with the data in format of key-value tuples, needs no predefined schema and can hold various types of data.

1.2.3 Orchestration Agent

Responcible is is a collection of software that provides a database interface for communication with and state representation of network applications and network switch hardware.

1.2.4 SAI Redis

SAI Redis is a kind of converter from the App DB format (key-value) to the format of SAI objects.

1.2.5 SAI DB

Located in the Redis DB instance #1 running inside the "database" container. Holds serialized SAI objects.

1.2.6 syncd

Reads SAI DB data (SAI objects) and performs appropriate calls to Switch SAI.

1.2.7 SAI (Redis and Switch)

An unified API which represent the switch state as a set of objects. In SONiC represented in two implementations - SAI DB frontend and ASIC SDK wrapper.

2 Subsystem Requirements Overview

2.1 Functional requirements

The feature sets is classified into Must Have (M) and Should Have (S). The Must Have features are features we are enabled in our test and production networks. They must be supported by the next release. The Should Have features are the features are not partially configured in our test networks, and are going to be enabled in production. The features should be either already in the current image or is in their production plan. On phase #1 only marked with Must Have requirements will be implemented.

2.1.1 Setup session

  • Configure session source IP (M)
  • Configure session destination IP (M)
  • Configure session GRE protocol type (M)
  • Configure session DSCP (M)
  • Configure session TTL (M)
  • Apply ACL to the ERSPAN session to do selective RX mirroring (M)
  • Select output queue for all mirrored traffic (M)
  • Policer applied to all mirroing traffic (S)
  • Bind CPU ports to the ERSPAN session as TX port (S)
  • Bind all front-panel ports to the ERSPAN session as TX port (S)

2.1.2 Multiple ERSPAN sessions (S)

  • Each ERSPAN can be configured independently for the destination IP, binding ports, acl, gre protocol type, dscp, ttl.

2.1.3 ACL rule capability (M)

  • Match IPv4 source (M)
  • Match IPv4 destination (M)
  • Match IP protocol (M)
  • Match TCP/UDP source/destination port (M)
  • Match TCP flags (M)
  • Match IP DSCP value (M)
  • Match User Defined Field (UDF). An ACL rule must support UDF matching and the above fixed fields matching at the same time. (S)
  • Packet counter for each mirror ACL entry (M)

2.1.4 Match User Defined Field (S)

  • Define UDF based offset-base, offset, and length. Offset-base defines the packet header to consider for the offset. It can be packet start, l3, l4 inner or outer header. The offset is the bytes offset from the offset based. The length is the length of the UDF field. (S)
  • The total UDF fields available on the system should be 16+ bytes. We expect to use those UDF fields to cover TCP flags/source port/destination port for IP-in-IP packet, IP protocol field for the inner IP header of an IP-in-IP packet, IPID field for the outer IP header, destination/source IP address for the inner IP header of an IP-in-IP packet

2.1.5 Packet drop mirroring capability (S)

  • This is a special packet mirroring session which mirrors the dropped packet in the pipeline.
  • The dropped packet can be:
    • FIB Miss (Traffic Blackholing)
    • Receiving L3 traffic when L3(IPv4/6) is disabled on the port
    • TTL Violation: TTL=0, Routed packets with TTL=1
    • STP not in Forward state/CBL Drops, STP BPDUs on L3 ports
    • VLAN Mismatch (when Packet vlan not enabled on the port)
    • Traffic to Zero address
    • ACL drop

2.1.6 Misc

  • Match and mirror PFC pause frames (S)

2.2 Scalability requirements

  • 1 mirroring session
  • 256 mirror acl rules

3 Modules Design

3.1 Modules that needs to be updated

3.1.1 Swssconfig

Swssconfig is responcible for loading configuration from JSON config files to APP DB. Implementation is generic and doesn't require additional modifications.

3.1.1.1 Everflow configuration types

Everflow configuration is consists of the following parts - Session configuration. Contains information required to setup session. - ACL table configuration. - Traffic match configuration. Contains list of ACL rules with action "mirror" that mean that matched traffic should be encapsulated by mirror session.

3.1.1.2 Config file example

[
    {
        "PORT_MIRROR_TABLE:session1": {
            "src_ip": "1.1.1.1",
            "dst_ip": "2.2.2.2",
		"gre_type": "0x88be",
		"dscp": "50",
		"ttl": "10",
		"queue": "5"
        },
        "OP": "SET"
    },
    {
        "ACL_TABLE:everflow_table": {
            "policy_desc": "Table contains everflow rules",
            "stage":"ingress",
            "match": "SRC_IP, DST_IP, IP_PROTOCOL, L4_SRC_PORT, L4_DST_PORT, DSCP"
        },
        "OP": "SET"
    },
	{
        "ACL_RULE_TABLE:everflow_table:rule1:match": {
            "SRC_IP": "10.10.10.0/24",
        },
        "OP": "SET"
    },
	{
        "ACL_RULE_TABLE:everflow_table:rule1:action": {
            "mirror": "session1",
		"count": ""
        },
        "OP": "SET"
    },
]

3.1.2 App DB

3.1.2.1 App DB Schema Reference

Raw schema from arch spec:

key       = PORT_MIRROR_TABLE:mirror_session_name ; mirror_session_name is      
                                                  ; unique session 
                                                  ; identifier
; field   = value
status    = "active/inactive"   ; Session state.
src_ip    = <ip_addr>           ; Session souce IP address                    
dst_ip    = <ip_addr>           ; Session destination IP address
gre_type  = <uint16_t>          ; Session GRE protocol type
dscp      = <uint8_t>           ; Session DSCP
ttl       = <uint8_t>           ; Session TTL
queue     = <uint8_t>           ; Session output queue

3.1.2.2 App DB Schema Details

Table name: PORT_MIRROR_TABLE Table entry key: PORT_MIRROR_TABLE:mirror_session_name (mirror_session_name is unique session identifier) Table entry values:

Value Type Description
status Enum: "active/inactive" Session state
src_ip IP_ADDR Session souce IP address
dst_ip IP_ADDR Session destination IP address
gre_type uint16_t Session GRE protocol type
dscp uint8_t Session DSCP
ttl uint8_t Session TTL
queue uint8_t Session output queue

3.1.3 Orchestration Agent

3.1.3.1 Callbacks mechanism

In order to implement generic callbacks mechanism the following classes will be added.

3.1.3.2 Subject Type

enum SubjectType
{
    SUBJECT_TYPE_NEXTHOP_CHANGE,
    SUBJECT_TYPE_NEIGH_CHANGE,
    SUBJECT_TYPE_FDB_CHANGE,
    SUBJECT_TYPE_LAG_MEMBER_CHANGE,
    SUBJECT_TYPE_VLAN_MEMBER_CHANGE,
    SUBJECT_TYPE_MIRROR_SESSION_CHANGE
};

3.1.3.3 Observer Class

class Observer
{
    virtual void update(SubjectType, void *);
    virtual ~Observer();
};

3.1.3.4 Subject Class

class Subject
{
public:
    virtual void attach(Observer *);
    virtual void detach(Observer *);
    virtual void notify(SubjectType, void *);
    virtual ~Subject();

protected:
    list<Observer *> m_observers;
};

3.1.3.5 Route API

In order to add possibility to reqister callback and receive notifications about router table update the following changes will be added.

3.1.3.5.1 Route Update Context
struct RouteUpdate
{
    IpPrefix prefix;
    IpAddresses nexthop;
    bool add;
};
3.1.3.5.2 Route Orchestration

RouteOrch class should be extended in order to provide a possibility to register callback to be called after next hop change. In order to achive this attach/detach methods will be added. Subscriber registers to next hop update callback and specifies destination IP address as a filter. Based on DST IP RouteOrch class will select next hop by appling route prerfix to this address using LPM algorithm.

class RouteOrch : public Orch, Subject
{
 public:
 …
    virtual void attach(Observer *observer, IpAddresses dst_addr);
    virtual void detach(Observer * observer, IpAddresses dst _addr);
 private:
 …
    /* Map destination IP address to route prefix via LPM algorithm */
    map<IpAddresses, IpPrefix> m_nexthopObservers;
…
};

3.1.3.6 Neighbor API

In order to add possibility to reqister callback and receive notifications about neighbor table update the following changes will be added.

3.1.3.6.1 Neighbor Update Context
struct NeighUpdate
{
    NeighborEntry entry;
    MacAddress mac;
    bool add;
};
3.1.3.6.2 Neighbor Orchestration
class NeighOrch : public Orch, Subject
{
…
};

3.1.3.7 Port API

In order to add possibility to reqister callback and receive notifications about LAG configuration changes update the following changes will be added.

3.1.3.7.1 Port Update Context
struct PortUpdate
{
    const Port *port;
    bool add;
};
3.1.3.7.2 Port Orchestration
class PortsOrch : public Orch, Subject
{
…
};

3.1.3.8 Mirror API

Mirror Orchestration has the following responciblities:

  • DST MAC address and port resolution based on DST IP address specified by user.
  • Session activation/deactivation based on Route and Neighbor tables.
  • Synchronization of PORT_MIRROR_TABLE in APP DB with mirroring table in SAI DB.
3.1.3.8.1 DST MAC address and port resolution algorithm
  1. Find next hop IP address in Route table by DST IP address (using longestl prefix match algortthm) specified by user.
    • If route is ECMP take first IP address from list.
    • If next hop not fould wait for event from Route table.
  2. Find neighbor entry in neighbors table by next hop IP address.
    • If neighbor entry is not found wait for event from Neighbor table.
  3. Resove port based on neighbor information:
    • If neighbor's interface is router port use it port ID
    • Else if neighbor's interface is LAG get ID of first member port.
    • If neighbor's interface is VLAN interface:
      • Get VLAN interface VLAN
      • Find port ID in FDB table (If neighbor entry exists FDB entry should aslo exists)
3.1.3.8.2 Mirror Update Context
struct MirrorUpdare
{
    string name;
    bool active;
};
3.1.3.8.3 Mirror Orchestration
/*
 * Contains session data specified by user in config file
 * and data required for MAC address and port resolution
 * */
struct MirrorEntry
{
    bool status;
    IpPrefix srcIp;
    IpPrefix dstIp;
    uint16_t greType;
    uint8_t dscp;
    uint8_t ttl;
    uint8_t queue;

    bool nextHopResolved;
    IpAddresses nextHop;

    bool neighborResoled;
    NeighborEntry neighbor;
    MacAddress neighborMac;

    const Port *port;

    sai_object_id_t session_id;
};

/* MirrorTable: mirror session name, mirror session data */
typedef map<string, MirrorEntry> MirrorTable;

class MirrorOrch : public Orch, public Observer
{
public:
    MirrorOrch(DBConnector *db, string tableName,
               PortsOrch *portOrch, RouteOrch *routeOrch, NeighOrch *neighOrch, FdbOrch *fdbOrch);

    void update(SubjectType, void *);
    void increaseSessionRefCound(const string&);
    void decreaseSessionRefCound(const string&);
…
    void doTask(Consumer& consumer);
}

3.1.3.9 FDB API

3.1.3.9.1 FDB Update Context
struct FdbUpdate
{
    MacAddress mac;
    Port port;
    Port vlan;
};
3.1.3.9.2 FDB Orchestration
class FdbOrch : public Subject
{
public:
    FdbOrch(PortsOrch *portOrch);

private:
    PortsOrch *m_portsOrch;
};

3.1.4 SAI Redis

Need to be updated to support the latest SAI.

3.1.5 SAI DB

No update is needed to support Everflow.

3.1.6 Syncd

No changes required to support Everflow.

3.1.7 SAI

No changes required to support Everflow.

4 Flows

4.1 Init

4.2 Load config

4.3 Session create

4.4 Handle next hop update callback

4.5 Handle neighbor add callback

4.6 Handle neighbor remove callback

4.7 Handle FDB entry add callback

4.8 Handle FDB entry remove callback

4.9 Handle LAG/VLAN interface remove callback

4.10 Handle LAG/VLAN interface member remove callback

4.11 Session remove

5 Open Questions

Clone this wiki locally