Skip to content
Permalink
Maciej-Fijalko…
Switch branches/tags

Commits on Aug 18, 2021

  1. ice: make use of ice_for_each_* macros

    Go through the code base and use ice_for_each_* macros.  While at it,
    introduce ice_for_each_xdp_txq() macro that can be used for looping over
    xdp_rings array.
    
    Commit is not introducing any new functionality.
    
    Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    mfijalko authored and intel-lab-lkp committed Aug 18, 2021
  2. ice: introduce XDP_TX fallback path

    Under rare circumstances there might be a situation where a requirement
    of having XDP Tx queue per CPU could not be fulfilled and some of the Tx
    resources have to be shared between CPUs. This yields a need for placing
    accesses to xdp_ring inside a critical section protected by spinlock.
    These accesses happen to be in the hot path, so let's introduce the
    static branch that will be triggered from the control plane when driver
    could not provide Tx queue dedicated for XDP on each CPU.
    
    Currently, the design that has been picked is to allow any number of XDP
    Tx queues that is at least half of a count of CPUs that platform has.
    For lower number driver will bail out with a response to user that there
    were not enough Tx resources that would allow configuring XDP. The
    sharing of rings is signalled via static branch enablement which in turn
    indicates that lock for xdp_ring accesses needs to be taken in hot path.
    
    Approach based on static branch has no impact on performance of a
    non-fallback path. One thing that is needed to be mentioned is a fact
    that the static branch will act as a global driver switch, meaning that
    if one PF got out of Tx resources, then other PFs that ice driver is
    servicing will suffer. However, given the fact that HW that ice driver
    is handling has 1024 Tx queues per each PF, this is currently an
    unlikely scenario.
    
    Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    mfijalko authored and intel-lab-lkp committed Aug 18, 2021
  3. ice: optimize XDP_TX workloads

    Optimize Tx descriptor cleaning for XDP. Current approach doesn't
    really scale and chokes when multiple flows are handled.
    
    Introduce two ring fields, @next_dd and @next_rs that will keep track of
    descriptor that should be looked at when the need for cleaning arise and
    the descriptor that should have the RS bit set, respectively.
    
    Note that at this point the threshold is a constant (32), but it is
    something that we could make configurable.
    
    First thing is to get away from setting RS bit on each descriptor. Let's
    do this only once NTU is higher than the currently @next_rs value. In
    such case, grab the tx_desc[next_rs], set the RS bit in descriptor and
    advance the @next_rs by a 32.
    
    Second thing is to clean the Tx ring only when there are less than 32
    free entries. For that case, look up the tx_desc[next_dd] for a DD bit.
    This bit is written back by HW to let the driver know that xmit was
    successful. It will happen only for those descriptors that had RS bit
    set. Clean only 32 descriptors and advance the DD bit.
    
    Actual cleaning routine is moved from ice_napi_poll() down to the
    ice_xmit_xdp_ring(). It is safe to do so as XDP ring will not get any
    SKBs in there that would rely on interrupts for the cleaning. Nice side
    effect is that for rare case of Tx fallback path (that next patch is
    going to introduce) we don't have to trigger the SW irq to clean the
    ring.
    
    With those two concepts, ring is kept at being almost full, but it is
    guaranteed that driver will be able to produce Tx descriptors.
    
    This approach seems to work out well even though the Tx descriptors are
    produced in one-by-one manner. Test was conducted with the ice HW
    bombarded with packets from HW generator, configured to generate 30
    flows.
    
    Xdp2 sample yields the following results:
    <snip>
    proto 17:   79973066 pkt/s
    proto 17:   80018911 pkt/s
    proto 17:   80004654 pkt/s
    proto 17:   79992395 pkt/s
    proto 17:   79975162 pkt/s
    proto 17:   79955054 pkt/s
    proto 17:   79869168 pkt/s
    proto 17:   79823947 pkt/s
    proto 17:   79636971 pkt/s
    </snip>
    
    As that sample reports the Rx'ed frames, let's look at sar output.
    It says that what we Rx'ed we do actually Tx, no noticeable drops.
    Average:        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s txcmp/s  rxmcst/s   %ifutil
    Average:       ens4f1 79842324.00 79842310.40 4678261.17 4678260.38 0.00      0.00      0.00     38.32
    
    with tx_busy staying calm.
    
    When compared to a state before:
    Average:        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s txcmp/s  rxmcst/s   %ifutil
    Average:       ens4f1 90919711.60 42233822.60 5327326.85 2474638.04 0.00      0.00      0.00     43.64
    
    it can be observed that the amount of txpck/s is almost doubled, meaning
    that the performance is improved by around 90%. All of this due to the
    drops in the driver, previously the tx_busy stat was bumped at a 7mpps
    rate.
    
    Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    mfijalko authored and intel-lab-lkp committed Aug 18, 2021
  4. ice: propagate xdp_ring onto rx_ring

    With rings being split, it is now convenient to introduce a pointer to
    XDP ring within the Rx ring. For XDP_TX workloads this means that
    xdp_rings array access will be skipped, which was executed per each
    processed frame.
    
    Also, read the XDP prog once per NAPI and if prog is present, set up the
    local xdp_ring pointer. Reading prog a single time was discussed in [1]
    with some concern raised by Toke around dispatcher handling and having
    the need for going through the RCU grace period in the ndo_bpf driver
    callback, but ice currently is torning down NAPI instances regardless of
    the prog presence on VSI.
    
    Although the pointer to XDP ring introduced to Rx ring makes things a
    lot slimmer/simpler, I still feel that single prog read per NAPI
    lifetime is beneficial.
    
    Further patch that will introduce the fallback path will also get a
    profit from that as xdp_ring pointer will be set during the XDP rings
    setup.
    
    [1]: https://lore.kernel.org/bpf/87k0oseo6e.fsf@toke.dk/
    
    Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    mfijalko authored and intel-lab-lkp committed Aug 18, 2021
  5. ice: do not create xdp_frame on XDP_TX

    xdp_frame is not needed for XDP_TX data path in ice driver case.
    For this data path cleaning of sent descriptor will not happen anywhere
    outside of the driver, which means that carrying the information about
    the underlying memory model via xdp_frame will not be used. Therefore,
    this conversion can be simply dropped, which would relieve CPU a bit.
    
    Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    mfijalko authored and intel-lab-lkp committed Aug 18, 2021
  6. ice: unify xdp_rings accesses

    There has been a long lasting issue of improper xdp_rings indexing for
    XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index
    is mixed with smp_processor_id(), there could be a situation where Tx
    descriptors are produced onto XDP Tx ring, but tail is never bumped -
    for example pin a particular queue id to non-matching IRQ line.
    
    Address this problem by ignoring the user ring count setting and always
    initialize the xdp_rings array to be of num_possible_cpus() size. Then,
    always use the smp_processor_id() as an index to xdp_rings array. This
    provides serialization as at given time only a single softirq can run on
    a particular CPU.
    
    Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    mfijalko authored and intel-lab-lkp committed Aug 18, 2021
  7. ice: split ice_ring onto Tx/Rx separate structs

    While it was convenient to have a generic ring structure that served
    both Tx and Rx sides, next commits are going to introduce several
    Tx-specific fields, so in order to avoid hurting the Rx side, let's
    pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs.
    
    Rx ring could be handled by the old ice_ring which would reduce the code
    churn within this patch, but this would make things asymmetric.
    
    Make the union out of the ring container within ice_q_vector so that it
    is possible to iterate over newly introduced ice_tx_ring.
    
    Remove the @SiZe as it's only accessed from control path and it can be
    calculated pretty easily.
    
    Change definitions of ice_update_ring_stats and
    ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be
    used for both Rx and Tx rings.
    
    Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In
    Rx ring xdp_rxq_info occupies its own cacheline, so it's the major
    difference now.
    
    Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    mfijalko authored and intel-lab-lkp committed Aug 18, 2021
  8. ice: move ice_container_type onto ice_ring_container

    Currently ice_container_type is scoped only for ice_ethtool.c. Next
    commit that will split the ice_ring struct onto Rx/Tx specific ring
    structs is going to also modify the type of linked list of rings that is
    within ice_ring_container. Therefore, the functions that are taking the
    ice_ring_container as an input argument will need to be aware of a ring
    type that will be looked up.
    
    Embed ice_container_type within ice_ring_container and initialize it
    properly when allocating the q_vectors.
    
    Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    mfijalko authored and intel-lab-lkp committed Aug 18, 2021
  9. ice: remove ring_active from ice_ring

    This field is dead and driver is not making any use of it. Simply remove
    it.
    
    Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    mfijalko authored and intel-lab-lkp committed Aug 18, 2021

Commits on Aug 17, 2021

  1. ice: Implement support for SMA and U.FL on E810-T

    Expose SMA and U.FL connectors as ptp_pins on E810-T based adapters and
    allow controlling them.
    
    E810-T adapters are equipped with:
    - 2 external bidirectional SMA connectors
    - 1 internal TX U.FL
    - 1 internal RX U.FL
    
    U.FL connectors share signal lines with the SMA connectors. The TX U.FL1
    share the line with the SMA1 and the RX U.FL2 share line with the SMA2.
    This dependence is controlled by the ice_verify_pin_e810t.
    
    Additionally add support for the E810-T-based  devices which don't use the
    SMA/U.FL controller. If the IO expander is not detected don't expose pins
    and use 2 predefined 1PPS input and output pins.
    
    Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
    Maciej Machnikowski authored and anguy11 committed Aug 17, 2021
  2. ice: Add support for SMA control multiplexer

    E810-T adapters have two external bidirectional SMA connectors and two
    internal unidirectional U.FL connectors. Multiplexing between U.FL and
    SMA and SMA direction is controlled using the PCA9575 expander.
    
    Add support for the PCA9575 detection and control of the respective pins
    of the SMA/U.FL multiplexer using the GPIO AQ API.
    
    Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
    Maciej Machnikowski authored and anguy11 committed Aug 17, 2021
  3. ice: Implement functions for reading and setting GPIO pins

    Implement ice_aq_get_gpio and ice_aq_set_gpio for reading and changing
    the state of GPIO pins described in the topology.
    
    Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
    Maciej Machnikowski authored and anguy11 committed Aug 17, 2021
  4. ice: Refactor ice_aqc_link_topo_addr

    Separate link topo parameters and move to the ice_aqc_link_topo_params
    in the ice_aqc_link_topo_addr.
    
    Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
    Maciej Machnikowski authored and anguy11 committed Aug 17, 2021
  5. igc: Add support for CBS offloading

    Implemented support for Credit-based shaper(CBS) Qdisc hardware
    offload mode in the driver. There are two sets of IEEE802.1Qav
    (CBS) HW logic in i225 controller and this patch supports
    enabling them in the top two priority TX queues.
    
    Driver implemented as recommended by Foxville External
    Architecture Specification v0.993. Idleslope and Hi-credit are
    the CBS tunable parameters for i225 NIC, programmed in TQAVCC
    and TQAVHC registers respectively.
    
    In-order for IEEE802.1Qav (CBS) algorithm to work as intended
    and provide BW reservation CBS should be enabled in highest
    priority queue first. If we enable CBS on any of low priority
    queues, the traffic in high priority queue does not allow low
    priority queue to be selected for transmission and bandwidth
    reservation is not guaranteed.
    
    Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
    Signed-off-by: Mallikarjuna Chilakala <mallikarjuna.chilakala@intel.com>
    agunasek authored and anguy11 committed Aug 17, 2021
  6. igc: Simplify TSN flags handling

    Separates the procedure done during reset from applying a
    configuration, knowing when the code is executing allow us to
    separate the better what changes the hardware state from what
    changes only the driver state.
    
    Introduces a flag for bookkeeping the driver state of TSN
    features. When Qav and frame-preemption is also implemented
    this flag makes it easier to keep track on whether a TSN feature
    driver state is enabled or not though controller state changes,
    say,during a reset.
    
    Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
    Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
    Signed-off-by: Mallikarjuna Chilakala <mallikarjuna.chilakala@intel.com>
    vcgomes authored and anguy11 committed Aug 17, 2021
  7. igc: Use default cycle 'start' and 'end' values for queues

    Sets default values for each queue cycle start and cycle end.
    This allows some simplification in the handling of these
    configurations as most TSN features in i225 require a cycle
    to be configured.
    
    In i225, cycle start and end time is required to be programmed
    for CBS to work properly.
    
    Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
    Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
    Signed-off-by: Mallikarjuna Chilakala <mallikarjuna.chilakala@intel.com>
    vcgomes authored and anguy11 committed Aug 17, 2021
  8. ice: Add support for VF rate limiting

    Implement ndo_set_vf_rate to support setting of min_tx_rate and
    max_tx_rate; set the appropriate bandwidth in the scheduler for the
    node representing the specified VF VSI.
    
    Co-developed-by: Tarun Singh <tarun.k.singh@intel.com>
    Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
    Signed-off-by: Brett Creeley <brett.creeley@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    bcreeley13 authored and anguy11 committed Aug 17, 2021
  9. ice: ndo_setup_tc implementation for PR

    Add tc-flower support for VF port representor devices.
    
    Implement ndo_setup_tc callback for TC HW offload on VF port representors
    devices. Implemented both methods: add and delete tc-flower flows.
    
    Mark NETIF_F_HW_TC bit in net device's feature set to enable offload TC
    infrastructure for port representor.
    
    Implement TC filters replay function required to restore filters settings
    while switchdev configuration is rebuilt.
    
    Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
    Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
    mswiatko authored and anguy11 committed Aug 17, 2021
  10. ice: ndo_setup_tc implementation for PF

    Implement ndo_setup_tc net device callback for TC HW offload on PF device.
    
    ndo_setup_tc provides support for HW offloading various TC filters.
    Add support for configuring the following filter with tc-flower:
    - default L2 filters (src/dst mac addresses, ethertype, VLAN)
    - variations of L3, L3+L4, L2+L3+L4 filters using advanced filters
    (including ipv4 and ipv6 addresses).
    
    Allow for adding/removing TC flows when PF device is configured in
    eswitch switchdev mode. Two types of actions are supported at the
    moment: FLOW_ACTION_DROP and FLOW_ACTION_REDIRECT.
    
    Co-developed-by: Priyalee Kushwaha <priyalee.kushwaha@intel.com>
    Signed-off-by: Priyalee Kushwaha <priyalee.kushwaha@intel.com>
    Signed-off-by: Kiran Patil <kiran.patil@intel.com>
    Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
    kapatil authored and anguy11 committed Aug 17, 2021
  11. ice: Allow changing lan_en and lb_en on all kinds of filters

    There is no way to change default lan_en and lb_en flags while
    adding new rule. Add function that allows changing these flags
    on rule determined by rule id and recipe id.
    
    Function checks if the rule is presented on regular rules list or
    advance rules list and call the appropriate function to update
    rule entry.
    
    As rules with ICE_SW_LKUP_DFLT recipe aren't tracked in a list,
    implement function which updates flags without searching for rules
    based only on rule id.
    
    Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
    Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
    mswiatko authored and anguy11 committed Aug 17, 2021
  12. ice: cleanup rules info

    Change ICE_SW_LKUP_LAST to ICE_MAX_NUM_RECIPES as for now there also can
    be recipes other than the default.
    
    Free all structures created for advanced recipes in cleanup function.
    Write a function to clean allocated structures on advanced rule info.
    
    Signed-off-by: Victor Raj <victor.raj@intel.com>
    Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
    vraj-amr authored and anguy11 committed Aug 17, 2021
  13. ice: allow deleting advanced rules

    To remove advanced rule the same protocols list like in adding should be
    send to function. Based on this information list of advanced rules is
    searched to find the correct rule id.
    
    Remove advanced rule if it forwards to only one VSI. If it forwards
    to list of VSI remove only input VSI from this list.
    
    Introduce function to remove rule by id. It is used in case rule needs to
    be removed even if it forwards to the list of VSI.
    
    Allow removing all advanced rules from a particular VSI. It is useful in
    rebuilding VSI path.
    
    Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
    Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
    Signed-off-by: Shivanshu Shukla <shivanshu.shukla@intel.com>
    Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
    shivanshushukla authored and anguy11 committed Aug 17, 2021
  14. ice: allow adding advanced rules

    Define dummy packet headers to allow adding advanced rules in HW. This
    header is used as admin queue command parameter for adding a rule.
    The firmware will extract correct fields and will use them in look ups.
    
    Define each supported packets header and offsets to words used in recipe.
    Supported headers:
    - MAC + IPv4 + UDP
    - MAC + VLAN + IPv4 + UDP
    - MAC + IPv4 + TCP
    - MAC + VLAN + IPv4 + TCP
    - MAC + IPv6 + UDP
    - MAC + VLAN + IPv6 + UDP
    - MAC + IPv6 + TCP
    - MAC + VLAN + IPv6 + TCP
    
    Add code for creating an advanced rule. Rule needs to match defined
    dummy packet, if not return error, which means that this type of rule
    isn't currently supported.
    
    The first step in adding advanced rule is searching for an advanced
    recipe matching this kind of rule. If it doesn't exist new recipe is
    created. Dummy packet has to be filled with the correct header field
    value from the rule definition. It will be used to do look up in HW.
    
    Support searching for existing advance rule entry. It is used in case
    of adding the same rule on different VSI. In this case, instead of
    creating new rule, the existing one should be updated with refreshed VSI
    list.
    
    Add initialization for prof_res_bm_init flag to zero so that
    the possible resource for fv in the files can be initialized.
    
    Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
    Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
    Signed-off-by: Grishma Kotecha <grishma.kotecha@intel.com>
    Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
    Grishma Kotecha authored and anguy11 committed Aug 17, 2021
  15. ice: create advanced switch recipe

    These changes introduce code for creating advanced recipes for the
    switch in hardware.
    
    There are a couple of recipes already defined in the HW. They apply to
    matching on basic protocol headers, like MAC, VLAN, MACVLAN,
    ethertype or direction (promiscuous), etc.. If the user wants to match on
    other protocol headers (eg. ip address, src/dst port etc.) or different
    variation of already supported protocols, there is a need to create
    new, more complex recipe. That new recipe is referred as
    'advanced recipe', and the filtering rule created on top of that recipe
    is called 'advanced rule'.
    
    One recipe can have up to 5 words, but the first word is always reserved
    for match on switch id, so the driver can define up to 4 words for one
    recipe. To support recipes with more words up to 5 recipes can be
    chained, so 20 words can be programmed for look up.
    
    Input for adding recipe function is a list of protocols to support. Based
    on this list correct profile is being chosen. Correct profile means
    that it contains all protocol types from a list. Each profile have up to
    48 field vector words and each of this word have protocol id and offset.
    These two fields need to match with input data for adding recipe
    function. If the correct profile can't be found the function returns an
    error.
    
    The next step after finding the correct profile is grouping words into
    groups. One group can have up to 4 words. This is done to simplify
    sending recipes to HW (because recipe also can have up to 4 words).
    
    In case of chaining (so when look up consists of more than 4 words) last
    recipe will always have results from the previous recipes used as words.
    
    A recipe to profile map is used to store information about which profile
    is associate with this recipe. This map is an array of 64 elements (max
    number of recipes) and each element is a 256 bits bitmap (max number of
    profiles)
    
    Profile to recipe map is used to store information about which recipe is
    associate with this profile. This map is an array of 256 elements (max
    number of profiles) and each element is a 64 bits bitmap (max number of
    recipes)
    
    Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
    Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
    DanNowlin authored and anguy11 committed Aug 17, 2021
  16. ice: manage profiles and field vectors

    Implement functions to manage profiles and field vectors in hardware.
    
    In hardware, there are up to 256 profiles and each of these profiles can
    have 48 field vector words. Each field vector word is described by
    protocol id and offset in the packet. To add a new recipe all used
    profiles need to be searched. If the profile contains all required
    protocol ids and offsets from the recipe it can be used. The driver has
    to add this profile to recipe association to tell hardware that newly
    added recipe is going to be associated with this profile.
    
    The amount of used profiles depend on the package. To avoid searching
    across not used profile, max profile id value is calculated at init flow.
    The profile is considered as unused when all field vector words in the
    profile are invalid (protocol id 0xff and offset 0x1ff).
    
    Profiles are read from the package section ICE_SID_FLD_VEC_SW. Empty
    field vector words can be used for recipe results. Store all unused field
    vector words in prof_res_bm. It is a 256 elements array (max number of
    profiles) each element is a 48 bit bitmap (max number of field vector
    words).
    
    For now, support only non-tunnel profiles type.
    
    Co-developed-by: Grishma Kotecha <grishma.kotecha@intel.com>
    Signed-off-by: Grishma Kotecha <grishma.kotecha@intel.com>
    Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
    Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
    DanNowlin authored and anguy11 committed Aug 17, 2021
  17. ice: implement low level recipes functions

    Add code to manage recipes and profiles on admin queue layer.
    
    Allow the driver to add a new recipe and update an existing one. Get a
    recipe and get a recipe to profile association is mostly used in update
    existing recipes code.
    
    Only default recipes can be updated. An update is done by reading recipes
    from HW, changing their params and calling add recipe command.
    
    Support following admin queue commands:
    - ice_aqc_opc_add_recipe (0x0290) - create a recipe with protocol
    header information and other details that determine how this recipe
    filter works
    - ice_aqc_opc_recipe_to_profile (0x0291) - associate a switch recipe
    to a profile
    - ice_aqc_opc_get_recipe (0x0292) - get details of an exsisting recipe
    - ice_aqc_opc_get_recipe_to_profile (0x0293) - get a recipe associated
    with profile ID
    
    Define ICE_AQC_RES_TYPE_RECIPE resource type to hold a switch
    recipe. It is needed when a new switch recipe needs to be created.
    
    Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
    Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
    Signed-off-by: Grishma Kotecha  <grishma.kotecha@intel.com>
    Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
    Grishma Kotecha authored and anguy11 committed Aug 17, 2021
  18. ice: Use ether_addr_copy() instead of memcpy

    To save the netdev->dev_addr the driver is using memcpy with the length
    of a MAC address. ether_addr_copy() is meant specifically for this, so
    use it.
    
    Signed-off-by: Brett Creeley <brett.creeley@intel.com>
    bcreeley13 authored and anguy11 committed Aug 17, 2021
  19. igc: Add support for PTP getcrosststamp()

    i225 supports PCIe Precision Time Measurement (PTM), allowing us to
    support the PTP_SYS_OFFSET_PRECISE ioctl() in the driver via the
    getcrosststamp() function.
    
    The easiest way to expose the PTM registers would be to configure the PTM
    dialogs to run periodically, but the PTP_SYS_OFFSET_PRECISE ioctl()
    semantics are more aligned to using a kind of "one-shot" way of retrieving
    the PTM timestamps. But this causes a bit more code to be written: the
    trigger registers for the PTM dialogs are not cleared automatically.
    
    i225 can be configured to send "fake" packets with the PTM
    information, adding support for handling these types of packets is
    left for the future.
    
    PTM improves the accuracy of time synchronization, for example, using
    phc2sys, while a simple application is sending packets as fast as
    possible. First, without .getcrosststamp():
    
    phc2sys[191.382]: enp4s0 sys offset      -959 s2 freq    -454 delay   4492
    phc2sys[191.482]: enp4s0 sys offset       798 s2 freq   +1015 delay   4069
    phc2sys[191.583]: enp4s0 sys offset       962 s2 freq   +1418 delay   3849
    phc2sys[191.683]: enp4s0 sys offset       924 s2 freq   +1669 delay   3753
    phc2sys[191.783]: enp4s0 sys offset       664 s2 freq   +1686 delay   3349
    phc2sys[191.883]: enp4s0 sys offset       218 s2 freq   +1439 delay   2585
    phc2sys[191.983]: enp4s0 sys offset       761 s2 freq   +2048 delay   3750
    phc2sys[192.083]: enp4s0 sys offset       756 s2 freq   +2271 delay   4061
    phc2sys[192.183]: enp4s0 sys offset       809 s2 freq   +2551 delay   4384
    phc2sys[192.283]: enp4s0 sys offset      -108 s2 freq   +1877 delay   2480
    phc2sys[192.383]: enp4s0 sys offset     -1145 s2 freq    +807 delay   4438
    phc2sys[192.484]: enp4s0 sys offset       571 s2 freq   +2180 delay   3849
    phc2sys[192.584]: enp4s0 sys offset       241 s2 freq   +2021 delay   3389
    phc2sys[192.684]: enp4s0 sys offset       405 s2 freq   +2257 delay   3829
    phc2sys[192.784]: enp4s0 sys offset        17 s2 freq   +1991 delay   3273
    phc2sys[192.884]: enp4s0 sys offset       152 s2 freq   +2131 delay   3948
    phc2sys[192.984]: enp4s0 sys offset      -187 s2 freq   +1837 delay   3162
    phc2sys[193.084]: enp4s0 sys offset     -1595 s2 freq    +373 delay   4557
    phc2sys[193.184]: enp4s0 sys offset       107 s2 freq   +1597 delay   3740
    phc2sys[193.284]: enp4s0 sys offset       199 s2 freq   +1721 delay   4010
    phc2sys[193.385]: enp4s0 sys offset      -169 s2 freq   +1413 delay   3701
    phc2sys[193.485]: enp4s0 sys offset       -47 s2 freq   +1484 delay   3581
    phc2sys[193.585]: enp4s0 sys offset       -65 s2 freq   +1452 delay   3778
    phc2sys[193.685]: enp4s0 sys offset        95 s2 freq   +1592 delay   3888
    phc2sys[193.785]: enp4s0 sys offset       206 s2 freq   +1732 delay   4445
    phc2sys[193.885]: enp4s0 sys offset      -652 s2 freq    +936 delay   2521
    phc2sys[193.985]: enp4s0 sys offset      -203 s2 freq   +1189 delay   3391
    phc2sys[194.085]: enp4s0 sys offset      -376 s2 freq    +955 delay   2951
    phc2sys[194.185]: enp4s0 sys offset      -134 s2 freq   +1084 delay   3330
    phc2sys[194.285]: enp4s0 sys offset       -22 s2 freq   +1156 delay   3479
    phc2sys[194.386]: enp4s0 sys offset        32 s2 freq   +1204 delay   3602
    phc2sys[194.486]: enp4s0 sys offset       122 s2 freq   +1303 delay   3731
    
    Statistics for this run (total of 2179 lines), in nanoseconds:
      average: -1.12
      stdev: 634.80
      max: 1551
      min: -2215
    
    With .getcrosststamp() via PCIe PTM:
    
    phc2sys[367.859]: enp4s0 sys offset         6 s2 freq   +1727 delay      0
    phc2sys[367.959]: enp4s0 sys offset        -2 s2 freq   +1721 delay      0
    phc2sys[368.059]: enp4s0 sys offset         5 s2 freq   +1727 delay      0
    phc2sys[368.160]: enp4s0 sys offset        -1 s2 freq   +1723 delay      0
    phc2sys[368.260]: enp4s0 sys offset        -4 s2 freq   +1719 delay      0
    phc2sys[368.360]: enp4s0 sys offset        -5 s2 freq   +1717 delay      0
    phc2sys[368.460]: enp4s0 sys offset         1 s2 freq   +1722 delay      0
    phc2sys[368.560]: enp4s0 sys offset        -3 s2 freq   +1718 delay      0
    phc2sys[368.660]: enp4s0 sys offset         5 s2 freq   +1725 delay      0
    phc2sys[368.760]: enp4s0 sys offset        -1 s2 freq   +1721 delay      0
    phc2sys[368.860]: enp4s0 sys offset         0 s2 freq   +1721 delay      0
    phc2sys[368.960]: enp4s0 sys offset         0 s2 freq   +1721 delay      0
    phc2sys[369.061]: enp4s0 sys offset         4 s2 freq   +1725 delay      0
    phc2sys[369.161]: enp4s0 sys offset         1 s2 freq   +1724 delay      0
    phc2sys[369.261]: enp4s0 sys offset         4 s2 freq   +1727 delay      0
    phc2sys[369.361]: enp4s0 sys offset         8 s2 freq   +1732 delay      0
    phc2sys[369.461]: enp4s0 sys offset         7 s2 freq   +1733 delay      0
    phc2sys[369.561]: enp4s0 sys offset         4 s2 freq   +1733 delay      0
    phc2sys[369.661]: enp4s0 sys offset         1 s2 freq   +1731 delay      0
    phc2sys[369.761]: enp4s0 sys offset         1 s2 freq   +1731 delay      0
    phc2sys[369.861]: enp4s0 sys offset        -5 s2 freq   +1725 delay      0
    phc2sys[369.961]: enp4s0 sys offset        -4 s2 freq   +1725 delay      0
    phc2sys[370.062]: enp4s0 sys offset         2 s2 freq   +1730 delay      0
    phc2sys[370.162]: enp4s0 sys offset        -7 s2 freq   +1721 delay      0
    phc2sys[370.262]: enp4s0 sys offset        -3 s2 freq   +1723 delay      0
    phc2sys[370.362]: enp4s0 sys offset         1 s2 freq   +1726 delay      0
    phc2sys[370.462]: enp4s0 sys offset        -3 s2 freq   +1723 delay      0
    phc2sys[370.562]: enp4s0 sys offset        -1 s2 freq   +1724 delay      0
    phc2sys[370.662]: enp4s0 sys offset        -4 s2 freq   +1720 delay      0
    phc2sys[370.762]: enp4s0 sys offset        -7 s2 freq   +1716 delay      0
    phc2sys[370.862]: enp4s0 sys offset        -2 s2 freq   +1719 delay      0
    
    Statistics for this run (total of 2179 lines), in nanoseconds:
      average: 0.14
      stdev: 5.03
      max: 48
      min: -27
    
    For reference, the statistics for runs without PCIe congestion show
    that the improvements from enabling PTM are less dramatic. For two
    runs of 16466 entries:
      without PTM: avg -0.04 stdev 10.57 max 39 min -42
      with PTM: avg 0.01 stdev 4.20 max 19 min -16
    
    One possible explanation is that when PTM is not enabled, and there's a lot
    of traffic in the PCIe fabric, some register reads will take more time
    than the others because of congestion on the PCIe fabric.
    
    When PTM is enabled, even if the PTM dialogs take more time to
    complete under heavy traffic, the time measurements do not depend on
    the time to read the registers.
    
    This was implemented following the i225 EAS version 0.993.
    
    Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
    vcgomes authored and anguy11 committed Aug 17, 2021
  20. igc: Enable PCIe PTM

    Enables PCIe PTM (Precision Time Measurement) support in the igc
    driver. Notifies the PCI devices that PCIe PTM should be enabled.
    
    PCIe PTM is similar protocol to PTP (Precision Time Protocol) running
    in the PCIe fabric, it allows devices to report time measurements from
    their internal clocks and the correlation with the PCIe root clock.
    
    The i225 NIC exposes some registers that expose those time
    measurements, those registers will be used, in later patches, to
    implement the PTP_SYS_OFFSET_PRECISE ioctl().
    
    Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
    vcgomes authored and anguy11 committed Aug 17, 2021
  21. PCI: Add pcie_ptm_enabled()

    Add a predicate that returns if PCIe PTM (Precision Time Measurement)
    is enabled.
    
    It will only return true if it's enabled in all the ports in the path
    from the device to the root.
    
    Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
    Acked-by: Bjorn Helgaas <bhelgaas@google.com>
    vcgomes authored and anguy11 committed Aug 17, 2021
  22. Revert "PCI: Make pci_enable_ptm() private"

    Make pci_enable_ptm() accessible from the drivers.
    
    Exposing this to the driver enables the driver to use the
    'ptm_enabled' field of 'pci_dev' to check if PTM is enabled or not.
    
    This reverts commit ac6c26d ("PCI: Make pci_enable_ptm() private").
    
    Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
    Acked-by: Bjorn Helgaas <bhelgaas@google.com>
    vcgomes authored and anguy11 committed Aug 17, 2021
  23. ice: Add support to print error on PHY FW load failure

    Some devices have support for loading the PHY FW and in some cases this
    can fail. When this fails, the FW will set the corresponding bit in the
    link info structure. Also, the FW will send a link event if the correct
    link event mask bit is set. Add support for printing an error message
    when the PHY FW load fails during any link configuration flow and the
    link event flow.
    
    Since ice_check_module_power() is already doing something very similar
    add a new function ice_check_link_cfg_err() so any failures reported in
    the link info's link_cfg_err member can be printed in this one function.
    
    Also, add the new ICE_FLAG_PHY_FW_LOAD_FAILED bit to the PF's flags so
    we don't constantly print this error message during link polling if the
    value never changed.
    
    Signed-off-by: Brett Creeley <brett.creeley@intel.com>
    bcreeley13 authored and anguy11 committed Aug 17, 2021
  24. i40e: Fix spelling mistake "dissable" -> "disable"

    There is a spelling mistake in a dev_info message. Fix it.
    
    Signed-off-by: Colin Ian King <colin.king@canonical.com>
    Colin Ian King authored and anguy11 committed Aug 17, 2021
  25. i40e: Fix ATR queue selection

    Without this patch, ATR does not work. Receive/transmit uses queue
    selection based on SW DCB hashing method.
    
    If traffic classes are not configured for PF, then use
    netdev_pick_tx function for selecting queue for packet transmission.
    Instead of calling i40e_swdcb_skb_tx_hash, call netdev_pick_tx,
    which ensures that packet is transmitted/received from CPU that is
    running the application.
    
    Reproduction steps:
    1. Load i40e driver
    2. Map each MSI interrupt of i40e port for each CPU
    3. Disable ntuple, enable ATR i.e.:
    ethtool -K $interface ntuple off
    ethtool --set-priv-flags $interface flow-director-atr
    4. Run application that is generating traffic and is bound to a
    single CPU, i.e.:
    taskset -c 9 netperf -H 1.1.1.1 -t TCP_RR -l 10
    5. Observe behavior:
    Application's traffic should be restricted to the CPU provided in
    taskset.
    
    Fixes: 821bd0c990ba ("i40e: Fix queue-to-TC mapping on Tx")
    Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
    Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
    kubalewski authored and anguy11 committed Aug 17, 2021
  26. igc: Use num_tx_queues when iterating over tx_ring queue

    Use num_tx_queues rather than the IGC_MAX_TX_QUEUES fixed number 4 when
    iterating over tx_ring queue since instantiated queue count could be
    less than 4 where on-line cpu count is less than 4.
    
    Fixes: ec50a9d ("igc: Add support for taprio offloading")
    Signed-off-by: Toshiki Nishioka <toshiki.nishioka@intel.com>
    Tested-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
    Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
    Acked-by: Sasha Neftin <sasha.neftin@intel.com>
    tnishiok authored and anguy11 committed Aug 17, 2021
Older