# Generating Implementation Results

Here we will programmatically generate implementation results for different instances of our filter designs with Vivado.

We'll do a parameter sweep over:
  * parallelisms
  * number of taps
  * type of coefficient set (i.e. minimum/non-linear phase, types I/II/III/IV, or half-band)
  * and filter structure (our FFA + MCM, our Polyphase + MMC, and LogiCORE FIR Compiler)

Representations of the different coefficient sets are defined in `coeffs.py` and representations and implementation functions for each filter structure are defined in `filters.py`.

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
from math import *

from coeffs import ResponseType, CoeffsMinPhase, CoeffsType1Pad, CoeffsType2, CoeffsHalfBand
from filters import SsrFilter, FfaMcmFilter, PolyMcmFilter

## Utilisation Results

This section generates results for circuit resource usage --- including DSPs, CLBs, LUTs, and FFs.

Let's define a few lists of constructors for each coefficient style and each filter structure. After that, all we need is a few nested loops around our `fir.impl()` call. This will take a _while_... somewhere in the order of a whole weekend!

In [None]:
coeff_constrs = [
    lambda taps : CoeffsMinPhase(ResponseType.LP, [0,0.3,0.4,1.0], taps, 16),
    lambda taps : CoeffsType1Pad(ResponseType.LP, [0,0.3,0.4,1.0], taps, 16),
    lambda taps :    CoeffsType2(ResponseType.LP, [0,0.3,0.4,1.0], taps, 16),
    lambda taps : CoeffsHalfBand(taps, 16)
]

fir_constrs = [
    lambda par, taps, ws, out_dir :     SsrFilter(par, 16, ws, 775.0, out_dir),
    lambda par, taps, ws, out_dir : PolyMcmFilter(par, 16, ws, 775.0, out_dir),
    lambda par, taps, ws, out_dir :  FfaMcmFilter(par, 16, ws, 775.0, out_dir),
]

results = []

for par in [2,4,8,16]:
    for taps in [2,4,8,16,32,64,128]:
        if par > taps:
            continue
        for f_coef in coeff_constrs:
            for f_fir in fir_constrs:
                ws = f_coef(taps)
                fir = f_fir(par, taps, ws, './outputs')
                res = fir.impl()
                print(res)
                results.append(res)

So we've generated everything we need! All the Vivado runs will be in the `outputs/` folder. While we have the parsed results in memory though, let's save them as a CSV for posterity.

In [None]:
df = pd.DataFrame(results)

df.to_csv('outputs/full_impl.csv')

And of course, if you want to load the results back in again, you can run `df = pd.read_csv('outputs/full_impl.csv')`.

Below we will plot our results for CLB usage, DSP usage.

In [None]:
display(px.line(df, x='taps', facet_col='parallelism', facet_row='coeff_class',y='clbs', color='structure', height=1000))

In [None]:
display(px.line(df, x='taps', facet_col='parallelism', facet_row='coeff_class',y='dsps', color='structure', height=1000))

Let's export a flattened CSV file for each of these metrics. This is a little bit easier to work with in LaTeX/pgfplots than the earlier CSV.

In [None]:
for (field, scale) in [('dsps', 100/4272), ('clbs', 100/53160)]:
    df_export = df.copy()
    df_export[field] = df_export[field]*scale
    df_export = df_export.pivot(index=['taps'], columns=['structure','parallelism', 'coeff_class'], values=field)
    df_export.columns = ['_'.join([str(c) for c in col]) for col in df_export.columns.values]

    df_export.to_csv(f'outputs/summary_{field}.csv', index = True, header=True)

## Maximum Clock Frequency Results

We'll rerun a subset of the above tests (let's say only the x8 parallel half-band filters) and try to estimate the maximum clock frequency. We'll implement each design multiple times, searching through a range of frequencies for the highest target frequency which meets timing. The search is directed by the `clkT - WNS` estimate of achieved clock rate.

Let's start by implementing our frequency search helper.

In [None]:
def fmax_search(impl_fun, iters, f_start, f_max, f_min):
    f_cur = f_start
    f_best = 0
    res_best = {}
    
    # Run iterations
    for _ in range(iters):
        print(f'Trying {f_cur} with range ({f_min}->{f_max})')
        
        result = impl_fun(f_cur)
        f_got = min(f_max, result['fclk_actual']) # Restrict actual range to physically possible
        met_timing = result['met_timing']
        
        print(f'Got clk of {f_got} and met = {met_timing}')
        
        # Run passed
        if met_timing:
            if f_cur > f_best:
                f_best = f_cur
                res_best = result
                
            if f_max - f_cur < 1:
                print(f'Got pretty close to maximum with {f_cur}, stopping early')
                return f_best, res_best
            # Adjust range to between achieved and max
            else:
                f_min = max(f_got, f_best)
                f_cur = (f_max + f_min)/2

        # Run failed, move range left
        else:
            f_max = max(f_cur, f_best)
            f_min = max(f_got, f_best)
            f_cur = f_min
            
    return f_best, res_best

Now let's loop through our tests, similar to the utilisation results. Note that we're using the "loopback" run now, which implements a full design (not out-of-context) where the filter is in a loop with only AXIS data width converter and AXIS clock converter blocks. We're doing this because the out-of-context implementation doesn't appear to place clocking routing, giving an unrealistic set of timing results. We're implementing with the `Performance_Explore` strategy.

In [None]:
coeff_constrs = [lambda taps : CoeffsHalfBand(taps, 16)]

fir_constrs = [
    lambda par, taps, ws, out_dir : (lambda f :  FfaMcmFilter(par, 16, ws, f, out_dir).impl(run='loopback')),
    lambda par, taps, ws, out_dir : (lambda f : PolyMcmFilter(par, 16, ws, f, out_dir).impl(run='loopback')),
    lambda par, taps, ws, out_dir : (lambda f :     SsrFilter(par, 16, ws, f, out_dir).impl(run='loopback')),
]

clk_results = []

for par in [8]:
    for taps in [8,16,32,64,128]:
        for f_coef in coeff_constrs:
            for f_fir in fir_constrs:
                ws = f_coef(taps)
                f_best, res = fmax_search(
                    f_fir(par, taps, ws, './outputs/fmax'),
                    6, 775, 775, 400
                )
                res['fclk_max'] = f_best
                print(res)
                clk_results.append(res)

Let's package the results up as a Pandas dataframe then export the full and summary resutls to CSV.

In [None]:
df_fmax = pd.DataFrame(clk_results)
df_fmax.to_csv('outputs/full_fmax.csv')

df_export = df_fmax.copy().pivot(index=['taps'], columns=['structure','parallelism', 'coeff_class'], values='fclk_max')
df_export.columns = ['_'.join([str(c) for c in col]) for col in df_export.columns.values]
df_export.to_csv('outputs/summary_fmax.csv', index = True, header=True)

...and a finally we'll plot the results below.

In [None]:
display(px.line(df_fmax, x='taps', facet_col='parallelism', facet_row='coeff_class',y='fclk_max', color='structure'))