### Входные данные

У вас имеется поток данных (генератор data_stream). Поля это случайные величины - так сделано для упрощения генерации данных. Есть три поля (названы по уровню сложности задания)

### Задание
##### Мотивация:
У вас есть куча временных рядов, вы хотите научиться предсказывать следующее значение по 1000 предыдущим. 1000 признаков окна это слишком много, однако вы решили заменить их 5ю: средним, дисперсией, минимумом, медианой и максимумом. Однако, все эти признаки надо подсчитать, причём хочется уметь это делать быстро (в течение часа)
##### Для каждого поля нужно сделать следующее:

1. Пробежаться по данным окном размера 1000 (окно сдвигается на 1, то есть следующее окно пересекается с предыдущим по 999 элементам).

2. Для каждого окна посчитайте среднее значение поля и его дисперсию. Делайте yield этих значений, получая генератор tuple. 

3. Для каждого окна найдине минимум, медиану и максимум в нём. Делайте yield этих значений, получая генератор tuple. 

Ответом, который нужно будет засабмитить в гугл форму, является среднее значение tuple по получившемуся потоку, округлённое до 2го знака.

### Замечания

1. Обратите внимания как генерируются поля. Постарайтесь понять особенность каждого поля и как это можно использовать. Желательно, чтобы для каждого поля у вас было своё решение, максимально эффективно использующее знание об этом поле.
2. Полезные библиотеки: itertools, numpy, collections + всё что найдёте в интернете и можно поставить через pip install
3. **Медианой отсортированного массива arr считайте значение arr[len(arr) // 2]**



Если измерять время работы функций временем работы функции example, то примерное время работы такое:
Одновременно среднее, дисперсия - 1.17
Одновременно минимум, максимум и медиана:easy - 0.87
medium - 2.11
nightmare - 2.85


#### Генерация данных

In [1]:
from collections import namedtuple
import random

Record = namedtuple('Record', 'easy medium nightmare')

def data_stream():
    random_generator = random.Random(42)
    easy = 0
    for _ in range(10000000):
        easy += random_generator.randint(0, 2) 
        medium = random_generator.randint(0, 256 - 1)
        nightmare = random_generator.randint(0, 1000000000 - 1)
        
        yield Record(
            easy=easy,
            medium=medium,
            nightmare=nightmare
        )
        
def easy_stream():
    for record in data_stream():
        yield record.easy
        
def medium_stream():
    for record in data_stream():
        yield record.medium
        
def nightmare_stream():
    for record in data_stream():
        yield record.nightmare

#### Подсчёт среднего значения tuple по потоку

In [2]:
import numpy as np

def get_tuple_stream_mean(stream, number_of_values):
    result = np.zeros(number_of_values, dtype='object')
    count = 0. 
    for streamed_tuple in stream:
        result += streamed_tuple
        count += 1
    return ['{:0.2f}'.format(x) for x in result / count]

In [3]:
%%time
def example(stream):
    for value in stream:
        yield (value, value + 10)
print(get_tuple_stream_mean(example(medium_stream()), 2))

['127.48', '137.48']
CPU times: user 1min 50s, sys: 424 ms, total: 1min 51s
Wall time: 1min 51s


In [4]:
%%time
a = 0
random_generator = random.Random(42)
easy = 0
arr_easy = []
arr_medium = []
arr_nightmare = []
means = []
median = []
maximum = []
minimum = []

for _ in range(10000000):
    easy += random_generator.randint(0, 2) 
    medium = random_generator.randint(0, 256 - 1)
    nightmare = random_generator.randint(0, 1000000000 - 1)
    arr_easy.append(easy)
    arr_medium.append(medium)
    arr_nightmare.append(nightmare)
#     if len(arr)>1000:
        
# print(easy)    

CPU times: user 57.3 s, sys: 557 ms, total: 57.9 s
Wall time: 58 s


In [35]:
%%time
import numpy as np 
arr_easy = np.asarray(arr_easy)
arr_medium = np.asarray(arr_medium)
arr_nightmare = np.asarray(arr_nightmare, dtype=np.float64)

# list = numpy.array(list, dtype=numpy.float64)

CPU times: user 26.1 ms, sys: 32 ms, total: 58.1 ms
Wall time: 56.6 ms


### Функция расчета среднего и дисперсии для окна

In [6]:
def var_mean(arr, frame):
    k= 0
    windows = []
    means = []
    variations = []
    mysums = []
    mysquared_sums = []
    sums = sum(arr[:frame])
    mysums.append(sums)
    squared_sums = sum(arr[:frame]*arr[:frame])
    mysquared_sums.append(squared_sums)
    mean = sums/frame
    means.append(mean)
    kters = []
    iters = []
#     kters.append(k)
    
    var = (squared_sums - 2*mean*sums + frame*mean*mean)/(frame-1)
    variations.append(var)
    
    for i in range(len(arr)):
        if i > 999:
            
            sums = sums - arr[k] + arr[i]
            squared_sums = squared_sums - arr[k]*arr[k] + arr[i]*arr[i]
            
            mean = sums/frame
            var = (squared_sums - 2*mean*sums + frame*mean*mean)/(frame-1)
            
            means.append(mean)
            variations.append(var)
            
            mysums.append(sums)
            mysquared_sums.append(squared_sums)
            kters.append(k)
            iters.append(i)
            k=k+1
            
    return (means, variations, mysums, mysquared_sums, kters, iters)

In [7]:
%%time
means_easy = var_mean(arr_easy, 1000) 

CPU times: user 43.3 s, sys: 1.03 s, total: 44.3 s
Wall time: 44.4 s


In [8]:
means_easy[0][:4]

[503.78899999999999, 504.803, 505.81599999999997, 506.82999999999998]

In [9]:
np.mean(arr_easy[:1000])

503.78899999999999

In [10]:
%%time
means_medium = var_mean(arr_medium, 1000)

CPU times: user 43.3 s, sys: 1.27 s, total: 44.6 s
Wall time: 44.9 s


In [38]:
%%time
means_nightmare = var_mean(arr_nightmare, 1000)

CPU times: user 29.3 s, sys: 1.58 s, total: 30.8 s
Wall time: 30.9 s


In [43]:
means_medium[1][:4]

[5504.8559269269253,
 5506.6103663663671,
 5515.7187177177193,
 5518.3714714714733]

In [57]:
np.var(arr_medium[2:1002], ddof = 1) - means_medium[1][2]

-1.8189894035458565e-12

In [40]:
means_nightmare[1][:4]

[77760918748414528.0,
 77781217654627856.0,
 77739651100151280.0,
 77773432400888576.0]

In [55]:
np.var(arr_nightmare[4:1004], ddof = 1)- means_nightmare[1][4]

-48.0

In [36]:
squared_sums = arr_nightmare[:1000]**2

In [37]:
sum(squared_sums)

3.2050617974655694e+20

### Делаем сабмит в форму высчитав среднее значение для полученных потоков

In [12]:
%%time
print('average_easy = ', get_tuple_stream_mean(var_mean(arr_easy, 1000)[0], 1))
print('var_easy = ', get_tuple_stream_mean(var_mean(arr_easy, 1000)[1], 1))

average_easy =  ['4999675.28']
var_easy =  ['83522.86']
CPU times: user 2min 11s, sys: 4.25 s, total: 2min 16s
Wall time: 2min 17s


In [13]:
%%time
print('average_medium = ', get_tuple_stream_mean(means_medium[0], 1))
print('var_medium = ', get_tuple_stream_mean(means_medium[1], 1))

average_medium =  ['127.48']
var_medium =  ['5460.63']
CPU times: user 42 s, sys: 1.43 s, total: 43.4 s
Wall time: 43.7 s


In [49]:
%%time
print('average_nightmare = ', get_tuple_stream_mean(means_nightmare[0], 1))
print('var_nightmare = ', get_tuple_stream_mean(means_nightmare[1], 1))

average_nightmare =  ['499880345.88']
var_nightmare =  ['83312220784800192.00']
CPU times: user 45.6 s, sys: 223 ms, total: 45.8 s
Wall time: 46.1 s


### Следящая Медиана

In [15]:
"""
 Purpose: The main purpose is to demonstrate how to find the running median, mode and 
          mean over a sequence (list) of integers or reals or a mix of integers and reals.
          The secondary purpose, is to inspire Python programmers to explore some of
          the powerful packages (e.g. collections) available to the Python community and 
          to learn more about list comprehension and lambda functions.
    Note:        
       1. Much of the code here has been taken from code posted to the web (e.g. stackoverflow)
          by other Python programmers (e.g. Peter Otten)

  Author: V. Stokes (vs@it.uu.se)  
 Version: 2013.03.06

"""
import numpy as np

#*******************************************************

from collections import deque,Counter
from bisect import insort, bisect_left
from itertools import islice

def RunningMode(seq,N,M):
    """
    Purpose: Find the mode for the points in a sliding window as it 
             is moved from left (beginning of seq) to right (end of seq)
             by one point at a time.
     Inputs:
          seq -- list containing items for which a running mode (in a sliding window) is 
                 to be calculated
            N -- length of sequence                      
            M -- number of items in window (window size) -- must be an integer > 1
     Otputs:
        modes -- list of modes with size M - N + 1
       Note:
         1. The mode is the value that appears most often in a set of data.
         2. In the case of ties it the last of the ties that is taken as the mode (this
            is not by definition).
    """    
    # Load deque with first window of seq 
    d = deque(seq[0:M]) 

    modes = [Counter(d).most_common(1)[0][0]]  # contains mode of first window

    # Now slide the window by one point to the right for each new position (each pass through 
    # the loop). Stop when the item in the right end of the deque contains the last item in seq
    for item in islice(seq,M,N):
        old = d.popleft()                      # pop oldest from left
        d.append(item)                         # push newest in from right
        modes.append(Counter(d).most_common(1)[0][0])        
    return modes    

def RunningMedian(seq, M):
    """
     Purpose: Find the median for the points in a sliding window (odd number in size) 
              as it is moved from left to right by one point at a time.
      Inputs:
            seq -- list containing items for which a running median (in a sliding window) 
                   is to be calculated
              M -- number of items in window (window size) -- must be an integer > 1
      Otputs:
         medians -- list of medians with size N - M + 1
       Note:
         1. The median of a finite list of numbers is the "center" value when this list
            is sorted in ascending order. 
         2. If M is an even number the two elements in the window that
            are close to the center are averaged to give the median (this
            is not by definition)
    """   
    seq = iter(seq)
    s = []   
    m = M // 2

    # Set up list s (to be sorted) and load deque with first window of seq
    s = [item for item in islice(seq,M)]    
    d = deque(s)

    # Simple lambda function to handle even/odd window sizes    
    median = lambda : s[m] if bool(M&1) else (s[m-1]+s[m])*0.5

    # Sort it in increasing order and extract the median ("center" of the sorted window)
    s.sort()    
    medians = [median()]   

    # Now slide the window by one point to the right for each new position (each pass through 
    # the loop). Stop when the item in the right end of the deque contains the last item in seq
    for item in seq:
        old = d.popleft()          # pop oldest from left
        d.append(item)             # push newest in from right
        del s[bisect_left(s, old)] # locate insertion point and then remove old 
        insort(s, item)            # insert newest such that new sort is not required        
        medians.append(median())  
    return medians

In [16]:
%%time
medians_easy = RunningMedian(arr_easy,1000)

CPU times: user 35.3 s, sys: 402 ms, total: 35.7 s
Wall time: 35.9 s


In [17]:
%%time
medians_medium = RunningMedian(arr_medium,1000)

CPU times: user 36.6 s, sys: 228 ms, total: 36.9 s
Wall time: 36.9 s


In [18]:
%%time
medians_nightmare = RunningMedian(arr_nightmare,1000)

CPU times: user 36.8 s, sys: 251 ms, total: 37.1 s
Wall time: 37.1 s


In [85]:
%%time
print('medians_easy = ', get_tuple_stream_mean(medians_easy, 1))

medians_easy =  ['4999675.28']
CPU times: user 22.6 s, sys: 46 ms, total: 22.7 s
Wall time: 22.7 s


In [86]:
%%time
print('medians_medium = ', get_tuple_stream_mean(medians_medium, 1))

medians_medium =  ['127.47']
CPU times: user 22.4 s, sys: 331 ms, total: 22.7 s
Wall time: 22.8 s


In [87]:
%%time
print('medians__nightmare = ', get_tuple_stream_mean(medians_nightmare, 1))

medians__nightmare =  ['499938215.19']
CPU times: user 21.7 s, sys: 328 ms, total: 22.1 s
Wall time: 22.1 s


### Следящий максимум и минимум

In [19]:
import pandas as pd

In [20]:
frame = pd.DataFrame()
frame['easy'] = arr_easy
frame['medium'] = arr_medium
frame['nightmare'] = arr_nightmare

In [58]:
frame.head()

Unnamed: 0,easy,medium,nightmare
0,2,57,26855092
1,4,140,262950628
2,4,71,790779946
3,4,44,634036506
4,5,16,31994523


In [69]:
%%time
easy_max = frame['easy'].rolling(window=1000).max()
easy_min = frame['easy'].rolling(window=1000).min()

CPU times: user 771 ms, sys: 270 ms, total: 1.04 s
Wall time: 1.05 s


In [70]:
%%time
medium_max = frame['medium'].rolling(window=1000).max()
medium_min = frame['medium'].rolling(window=1000).min()

CPU times: user 943 ms, sys: 338 ms, total: 1.28 s
Wall time: 1.28 s


In [75]:
%%time
nightmare_max = frame['nightmare'].rolling(window=1000).max()
nightmare_min = frame['nightmare'].rolling(window=1000).min()

CPU times: user 891 ms, sys: 282 ms, total: 1.17 s
Wall time: 1.18 s


In [76]:
%%time
nightmare_var = frame['nightmare'].rolling(window=1000).var()

CPU times: user 397 ms, sys: 124 ms, total: 522 ms
Wall time: 521 ms


In [90]:
%%time
print('max_easy = ', get_tuple_stream_mean(easy_max[999:], 1))
print('min_easy = ', get_tuple_stream_mean(easy_min[999:], 1))

max_easy =  ['5000174.76']
min_easy =  ['4999175.79']
CPU times: user 46.5 s, sys: 107 ms, total: 46.6 s
Wall time: 46.7 s


In [91]:
%%time
print('max_medium = ', get_tuple_stream_mean(medium_max[999:], 1))
print('min_medium = ', get_tuple_stream_mean(medium_min[999:], 1))

max_medium =  ['254.98']
min_medium =  ['0.02']
CPU times: user 45.1 s, sys: 159 ms, total: 45.2 s
Wall time: 45.4 s


In [92]:
%%time
print('max_nightmare = ', get_tuple_stream_mean(nightmare_max[999:], 1))
print('min_nightmare = ', get_tuple_stream_mean(nightmare_min[999:], 1))

max_nightmare =  ['999017359.97']
min_nightmare =  ['1017512.29']
CPU times: user 45.5 s, sys: 110 ms, total: 45.7 s
Wall time: 45.7 s


In [93]:
%%time
easy_mean = frame['easy'].rolling(window=1000).mean()

CPU times: user 419 ms, sys: 369 ms, total: 788 ms
Wall time: 848 ms


In [96]:
np.mean(easy_mean[999:])

4999675.27649437

In [None]:
4999675.28