# Example: Counting Streams: MisraGries Algorithm

The <b>Misra-Gries algorithm</b> identifies candidates for the N "<b>heavy
hitters</b>" in a stream. A  heavy hitter in a stream
of length <b>L</b> is an item that appears at least <b>ceiling(N/L)</b> times
in the stream. For example, with a stream of a million elements, and with N = 10, 
a heavy hitter is an item that appears at least 100,000 times in the stream.


The algorithm guarantees that all heavy hitters
are identified as candidates; however, not all identified
candidates are necessarily heavy hitters. To check whether
a candidate is a heavy hitter we run through the stream again,
counting the number of times that each candidate appeared.
   
This function updates the state for a new element in the input
stream using the Misra-Gries algorithm.

See:
    http://www.cs.utexas.edu/users/misra/Notes.dir/HeavyHitters.pdf.

In [1]:
def misra_gries_process_element(v, candidates, inputs, N):
    """
    Parameters
    ----------
    v: object
        An element of the input stream
    candidates: dict
        key: item of input stream
        value: int
             A lower bound on the number of times the key has
             appeared on the input stream.
    inputs: dict
        key: item
        value: number of times the item appears in the stream
        THIS IS USED ONLY FOR DEBUGGING AND EXPLANATION!
        REMOVE FOR USE IN AN APPLICATION.
    N: positive integer (constant)
       A heavy hitter appears more than L/N times in a stream
       of length L.
    

    Returns: None
    -------
       Updates candidates and inputs

    """
    
    # If the input element is in candidates then increment
    # its count.
    if v in candidates:
        candidates[v] += 1
    # If the input element is not in candidates and there are
    # fewer than N candidates, insert the input element in
    # candidates.
    elif len(candidates) < N:
        candidates[v] = 1
    # If the input element is not in candidates and if the
    # number of candidates is N, then decrement counts for
    # all candidates.
    else:
        for key, value in candidates.items(): candidates[key] -= 1
    # Remove candidates whose count is reduced to 0.
    zero_count_candidates = [key for key in candidates.keys() if candidates[key] == 0]
    for candidate in zero_count_candidates:
        del candidates[candidate]

    # FOR DEBUGGING AND EXPLANATION ONLY: UPDATE INPUTS
    if v in inputs.keys(): inputs[v] += 1
    else: inputs[v] = 1

    # PRINTS FOR EXPLANATION.
    print('inputs')
    print(inputs)
    print('candidates')
    print(candidates)
    print()

In [4]:
def test_Misra_Gries():
    from stream import Stream, run
    from example_operators import single_item

    # Declare streams.
    x = Stream('input')

    # Create the Agent
    single_item(in_stream=x,
        func=misra_gries_process_element, candidates={},
        inputs={}, N=2)

    # Put data into streams and run.
    x.extend([3])
    run()

    x.extend([2])
    run()

    x.extend([3])

    x.extend([2])
    run()

    x.extend([4])
    run()

    x.extend([4, 4, 4])
    run()

In [3]:
test_Misra_Gries()

inputs
{3: 1}
candidates
{3: 1}

inputs
{3: 1, 2: 1}
candidates
{3: 1, 2: 1}

inputs
{3: 2, 2: 1}
candidates
{3: 2, 2: 1}

inputs
{3: 2, 2: 2}
candidates
{3: 2, 2: 2}

inputs
{3: 2, 2: 2, 4: 1}
candidates
{3: 1, 2: 1}

inputs
{3: 2, 2: 2, 4: 2}
candidates
{}

inputs
{3: 2, 2: 2, 4: 3}
candidates
{4: 1}

inputs
{3: 2, 2: 2, 4: 4}
candidates
{4: 2}

