### Author: Kubam Ivo 
### Purpose: Algorithms For Big Data Project
### Date: 25/3/2021

**Algorithm 1(eliminate points-m):** <br>
    **Input:** p1,p2,... , pn' (in order) where n' is the number of points in the stream.<br> 
    **Output**: Skyline points S' <br>
    1. Let x = 24m. 
    2. **Pass 1:** For j : 1, 2, ..., x, let p'j be a point picked uniformly at random from the stream. <br>
    Let S be the set of such points.<br>
    **Pass 2**
    4. for i = 1, ..., n' do 
         * for any p'j, if pi dominates p'j then p'j:=pi
    6. end for 
    7. Let S'={p'1,p'2,...,p'x}.
    8. **Pass 3** 
            Delete from stream all points in S' and all points dominated by any point in S'.
    9. return S' 

In [1]:
# generate points
import random

def generate_points(n):
    data = [(random.randint(1,100),random.randint(1,100)) for x in range(n)] 
    return data

#stream = generate_points(1000)



In [2]:
# Class for algorithm 1: Eliminate-points (m)
import random
import numpy as np
class Eliminate:
    
    """ Class to generate m skyline points from n stream data """
    def __init__(self, m=3):
        self._m = m
        self._x = self._m * 24

    
    #reservoir sampling

    def reservoir_sample(self, stream):
        """Receives the sample generated data points and does to a reservoir sampling to return selected points """
        k = int(24*self._m)
        reservoir = [stream[i] for i in range(k)]

        for i in range(k,len(stream)):
            j = random.randint(1,i)
            if j < k:
                reservoir[j] = stream[i]
        return reservoir

    # dominant points

    def dominate(self, stream, reservoir_point):
        """ Reeceives the selected points from reservoir sampling and replace any if dominated by a point in the stream data """
        dominant_point = reservoir_point [:]
        for i in range(len(stream)):
            sampled_elem = random.choice(dominant_point)

            x1, y1 = sampled_elem
            x2, y2 = stream[i]

            if (x2 >= x1 and y2 >= y1) and (x2 > x1 or y2 > y1):
                dominant_point[dominant_point.index(sampled_elem)] = stream[i]
        return dominant_point

    # Final pass
    def remove_point_stream(self, stream, skyline_points):
        """Delete from stream data all points dominated by points skyline points or points found in skyline points"""
        output_stream = []
        for point in stream:
            if point not in skyline_points:
                output_stream.append(point)

        for point in skyline_points:
            x2, y2 = point
        
            for elem in output_stream:
                x1, y1 = elem
                if (x2 >= x1 and y2 >= y1) and (x2 > x1 or y2 > y1):
                    output_stream.remove(elem)
        return output_stream
    
    


In [7]:
%%time
stream = generate_points(10000)
output_stream = stream[:]

Wall time: 111 ms


In [8]:
%%time
test = Eliminate(int(m*math.log(n*math.log(n))))
reservoir_pts = test.reservoir_sample(output_stream)
sky_pts = test.dominate(output_stream,reservoir_pts)
output_stream = test.remove_point_stream(output_stream,sky_pts)

Wall time: 631 ms


Algorithm 2 (Streaming RAND): 
    1: Let n be the number of points in the input stream. 
    Let m' = 1. 
    2: while the input stream is not empty do: 
    3: let n' be the current number of points in the stream 
    4: Call eliminate points (m'log(nlogn))
    5: If more than n'/2 points are left in the stream, m' = 2 m'
    6: end while 
    Remark: In case the stream cannot be changed, we do not have to actually delete points from stream. 
    We only keep the skyline points found so far and consider only points in the stream that is not dominated by any found skyline points. 
        

In [3]:
%%time
m= 1
stream = generate_points(10000)
output_stream = stream[:]
n = len(stream)
n_prime = len(stream)

Wall time: 24.4 ms


In [4]:
%%time
import math


while n_prime > 0:
    test = Eliminate(int(m*math.log(n*math.log(n))))
    reservoir_pts = test.reservoir_sample(output_stream)
    sky_pts = test.dominate(output_stream,reservoir_pts)
    output_stream = test.remove_point_stream(output_stream,sky_pts)
    n_prime = len(output_stream)
    if n_prime > n/2:
        m = 2*m
    else:
        break


    


Wall time: 596 ms


## Fixed Window

### Random Access

In [6]:

stream = generate_points(100000)
w = int(0.01 * len(stream))
output_stream = stream[:]
n = len(output_stream)

In [7]:
import math


while n > 0:
    test = Eliminate(w/24)
    reservoir_pts = test.reservoir_sample(output_stream)
    sky_pts = test.dominate(output_stream,reservoir_pts)
    output_stream = test.remove_point_stream(output_stream,sky_pts)
    n = len(output_stream)