# [INFO-H515 - Big Data Scalable Analytics](https://uv.ulb.ac.be/course/view.php?id=85246?username=guest)

## TP 3 - Streaming forecasting (RLS and ML) with a network socket and Spark Streaming

#### *Gianluca Bontempi, Jacopo De Stefani and Theo Verhelst*

####  29/04/2020

## Sending data to network socket

This notebook uses a network socket to send stremaing data. 

In this example, the messages are data generated from a linear model with $n$ input variables, i.e., 

$$
y =x^T \beta +w
$$
with $x, \beta \in \mathbb{R}^n$, and $y, w \in \mathbb{R}$. $w$ is Gaussian noise.

Messages are sent every `time_delay` seconds. They are a list of size $(n+2)$ where:
* First element is the message index 
* Second element is $y$ 
* Third to last elements are $x$ values (size $n$)


Let's start by importing all the required libraries

In [1]:
import time
import numpy as np

Then, let's create a socket, running on port 9999, in order to be able to send messages.

In [2]:
import socket
  
# take the server name and port name
host = 'localhost'
port = 9999
  
# create a socket at server side
# using TCP / IP protocol
s = socket.socket(socket.AF_INET,
                  socket.SOCK_STREAM)
  
# bind the socket with server
# and port number
s.bind((host, port))
  
# allow maximum 1 connection to
# the socket
s.listen(5)
  
# wait till a client accept
# connection
c, addr = s.accept()
  
# display client address
print("CONNECTION FROM:", str(addr))

CONNECTION FROM: ('127.0.0.1', 59122)


## Linear DGP (Data Generating Process)

In this example, the messages are data generated from a linear model with $n$ input variables and random coefficients $\beta$, i.e., 

$$
y =x^T \beta +w
$$
with $x, \beta \in \mathbb{R}^n$, and $y, w \in \mathbb{R}$. $w$ is Gaussian noise.

Please note that the numerical values, here encoded as a numpy array, are sent to the network socket in a serialized (string) format.


In [None]:
np.random.seed(2452020515) # Fix seed to ensure repeatability
i=0 #Initialise counter

n=10   # number of inputs
time_delay = 0.01 # Time delay between the transmission of two consecutive messages

beta=np.zeros(n) 
beta[0]=1   
beta[-1]=1 ## first and last parameters are 1, others are zeros
beta.shape=(n,1)


#Infinite loop for sending messages to Kafka with the topic dataLinearModel
while True:
    # Randomly generate x_i
    x=np.random.randn(1,n)[0]
    
    # Compute y from x_i according to formula
    y=float(x.dot(beta))+0.1*np.random.rand(1)[0] ## y =x^T beta +w

    # Serialize array and print message as a string
    message=np.array2string(np.append([i,y],x),separator=",",max_line_width=1000) +'\n'
    #print(message) # n=10 -> 12 elements in the message: cnt+y+10 xi
    
    # Send message to the client
    try:  
        c.send(message.encode())
    except socket.error:
        # If failed, client is probably disconnected. Wait for another connection
        c.close()
        c, addr = s.accept()
    
    i=i+1
    time.sleep(time_delay)
    

In [21]:
# disconnect the server
c.close()

**N.B** As the cell runs an infinite loop, the producer is never going to stop by itself. 
Don't forget to stop the cell using the dedicated button (■).

## Non-linear DGP (Data Generating Process)

In this example, the messages are data generated from a non-linear model with $n$ input variables, i.e., 

$$
y = \sin(x_0) + |x_1*x_2| + \sum_{i=2}^{10} log(x_i) + w
$$
with $x \in \mathbb{R}^n$, and $y, w \in \mathbb{R}$. $w$ is Gaussian noise.

Please note that the numerical values, here encoded as a numpy array, are sent to the network socket in a serialized (string) format.

In [6]:
np.random.seed(2452020515) # Fix seed to ensure repeatability
i=0 #Initialise counter

n=10   # number of inputs
time_delay = 1 # Time delay between the transmission of two consecutive messages

#Infinite loop for sending messages to Kafka with the topic dataNonLinearModel
while True:
    # Randomly generate x_i
    x=np.random.rand(1,n)[0]
    # Compute y from x_i according to formula
    y=float(np.sin(x[0])+abs(x[1]*x[2])+np.log(abs(x[-1])))+0.25*np.random.rand(1)[0]
    
    # Serialize array and print message as a string
    message=np.array2string(np.append([i,y],x),separator=",") 
    print(message) # n=10 -> 12 elements in the message: cnt+y+10 xi
    
    # Send message to the client
    try:  
        c.send(message.encode())
    except socket.error:
        # If failed, client is probably disconnected. Wait for another connection
        c.close()
        c, addr = s.accept()
    
    
    i=i+1
    time.sleep(time_delay)
    

[ 0.        ,-2.13517467, 0.00241515, 0.02793932, 0.05161399, 0.67596521,
  0.10201457, 0.61138142, 0.55168554, 0.81702968, 0.09720055, 0.10013508]
[ 1.        ,-0.0655868 , 0.11901551, 0.17727414, 0.70426614, 0.45184756,
  0.77991384, 0.63604579, 0.45115242, 0.12065051, 0.98063141, 0.57960092]
[2.        ,0.3517806 ,0.35370735,0.34970193,0.41873636,0.23228666,
 0.20912698,0.96581699,0.87180643,0.02677749,0.70675843,0.81228294]
[3.        ,1.24464592,0.99681882,0.3823275 ,0.80071452,0.70452194,
 0.16261121,0.79832087,0.88443229,0.86393125,0.88721667,0.92640682]
[ 4.        ,-0.19919233, 0.7133137 , 0.83414193, 0.66703191, 0.48517621,
  0.42816018, 0.26098702, 0.71595058, 0.12970913, 0.91358122, 0.19263513]
[5.        ,0.65050537,0.60041567,0.1450803 ,0.77535642,0.29942698,
 0.9122697 ,0.34722958,0.78218832,0.24913355,0.91958351,0.9206573 ]
[6.        ,0.24952881,0.19094915,0.3580098 ,0.41742382,0.0971004 ,
 0.01398776,0.22867529,0.26245276,0.36292156,0.15258247,0.83366979]
[ 7.        

[ 5.40000000e+01,-3.33828053e+00, 4.01090687e-01, 9.02621164e-01,
  3.65317091e-01, 7.27098291e-02, 9.33678972e-01, 7.35904165e-01,
  1.12238687e-01, 1.23840433e-01, 9.67036742e-01, 1.54233976e-02]
[5.50000000e+01,5.80947147e-01,5.58925834e-01,6.92756406e-01,
 9.31522053e-01,2.66029023e-01,5.81767221e-01,8.74612004e-01,
 3.59192400e-01,1.93889424e-02,2.57472592e-01,5.21972139e-01]
[5.60000000e+01,2.90823969e-01,1.44287057e-02,4.75919419e-01,
 7.70949750e-01,9.26657042e-01,9.99011312e-01,4.02239453e-01,
 7.86516113e-01,5.07248470e-01,7.17969595e-01,7.75219885e-01]
[57.        ,-0.2871475 , 0.56836838, 0.78948733, 0.24200511, 0.47993862,
  0.89093742, 0.265453  , 0.37517218, 0.1453772 , 0.81263106, 0.34662138]
[58.        , 0.82409512, 0.33397159, 0.90046448, 0.58332387, 0.95650442,
  0.45190973, 0.78824625, 0.45101346, 0.34505792, 0.84087354, 0.89078818]
[59.        , 0.94762075, 0.97044376, 0.79673753, 0.69453109, 0.60308813,
  0.62578477, 0.13381012, 0.47040585, 0.54611166, 0.65747868

[1.03000000e+02,6.34138643e-02,9.31299325e-02,9.13665939e-01,
 9.37516417e-01,5.53578422e-01,7.10108313e-01,5.82878264e-01,
 5.36635574e-01,1.03653600e-01,5.63009780e-01,3.77594496e-01]
[ 1.04000000e+02,-2.06871689e+00, 4.95223029e-01, 6.39861619e-01,
  3.85340915e-01, 8.16240379e-01, 1.07710386e-01, 7.40959078e-01,
  4.10986103e-01, 5.96158808e-02, 4.10351675e-01, 5.69016683e-02]
[ 1.05000000e+02,-6.66634546e-01, 6.10506032e-01, 8.73129042e-01,
  5.46028170e-01, 8.24453820e-01, 4.01877223e-01, 9.99389830e-01,
  6.79945030e-02, 1.04324131e-01, 6.73312462e-01, 1.72595833e-01]
[106.        , -0.70216798,  0.14417713,  0.56792671,  0.78806854,
   0.77316477,  0.15738388,  0.61115483,  0.67465784,  0.93142442,
   0.81588742,  0.26751746]
[1.07000000e+02,7.93559073e-02,3.81039813e-01,4.17225509e-01,
 7.87107154e-01,8.36361023e-01,4.00664689e-01,1.61896370e-01,
 1.61916579e-02,7.07384183e-02,6.61254831e-01,4.67971590e-01]
[1.08000000e+02,3.72839414e-01,3.08627131e-01,2.30268505e-01,
 3.85474

[1.48000000e+02,9.72799375e-01,7.58229187e-01,4.27380428e-01,
 9.00503784e-01,4.15533686e-01,2.03832860e-01,1.23003066e-01,
 5.26430050e-02,5.23270323e-01,8.73315082e-01,8.52072081e-01]
[1.49000000e+02,1.04529937e+00,7.40280285e-01,6.00552526e-01,
 8.21511908e-01,7.91323371e-01,1.67290883e-01,6.37580308e-02,
 8.63616888e-01,4.09467636e-01,8.35632535e-01,7.20649216e-01]
[ 1.50000000e+02,-5.20141305e-01, 3.50031358e-01, 7.71043787e-01,
  9.93285201e-01, 3.75558732e-01, 6.45745948e-01, 7.67147688e-01,
  4.95495212e-01, 2.16480352e-01, 1.47981042e-01, 1.56722806e-01]
[ 1.51000000e+02,-3.31487329e-01, 6.63024072e-02, 9.74528974e-01,
  8.08201275e-02, 4.29712459e-02, 9.74596966e-02, 9.57710267e-01,
  7.46430468e-01, 2.70628683e-01, 7.02387767e-01, 5.07280640e-01]
[1.52000000e+02,1.00939556e+00,9.12642192e-01,4.98079770e-01,
 7.86426196e-01,7.75009908e-01,1.28256137e-01,4.07589639e-01,
 5.38313022e-01,7.45922993e-01,6.67300468e-01,7.36400296e-01]
[153.        ,  0.24093128,  0.56290455,  0.79

[ 1.93000000e+02,-1.90908287e+00, 8.28023918e-02, 1.76984898e-01,
  1.50604190e-01, 6.30443782e-01, 4.36576323e-01, 3.57101803e-01,
  8.90569535e-01, 8.51057355e-01, 7.93458520e-01, 1.30735333e-01]
[ 1.94000000e+02,-1.36132241e+00, 6.59493767e-01, 1.56909999e-01,
  2.93144754e-01, 7.35290115e-01, 7.12170007e-01, 2.34962367e-01,
  2.83302430e-01, 3.21463966e-01, 9.72809671e-01, 1.21622100e-01]
[1.95000000e+02,1.09022422e+00,8.02030336e-01,4.63232150e-01,
 6.41389513e-01,5.18586782e-01,5.75970067e-01,9.64963503e-01,
 7.70199996e-01,1.54417269e-01,7.07860755e-01,8.45683063e-01]
[ 1.96000000e+02,-2.30495276e-01, 1.80373564e-01, 1.68231823e-01,
  7.30365234e-02, 8.24547482e-01, 2.92740507e-01, 2.93922648e-01,
  9.13642051e-01, 8.79677270e-01, 4.86167893e-01, 5.82594305e-01]
[ 1.97000000e+02,-2.87934660e-01, 3.15100208e-02, 3.08938033e-01,
  5.35776131e-01, 9.16354281e-01, 1.85150413e-01, 3.33222813e-01,
  6.43377434e-01, 3.30755869e-01, 5.41910513e-02, 5.97017111e-01]
[198.        ,  0.5533

[ 2.36000000e+02,-5.99254161e-01, 4.11978353e-01, 7.87830349e-01,
  6.57349495e-01, 2.16589509e-01, 4.98690735e-01, 3.00156260e-01,
  1.09059608e-01, 2.63854636e-01, 4.56620045e-01, 1.82630036e-01]
[2.37000000e+02,5.10946749e-01,3.32050386e-01,5.98882158e-01,
 2.93793068e-01,2.22977408e-01,7.82495657e-01,9.07238195e-02,
 1.96535041e-01,6.65840254e-01,4.71859479e-01,9.91440741e-01]
[ 2.38000000e+02,-6.02077094e-01, 1.66781842e-01, 3.78708350e-01,
  7.10502096e-01, 1.19742566e-01, 4.86230783e-01, 7.71459896e-01,
  9.42593885e-02, 2.82655611e-01, 1.62483511e-01, 2.99325367e-01]
[ 2.39000000e+02,-2.68172153e+00, 8.94480678e-01, 7.89199586e-01,
  5.95612934e-02, 7.68099060e-01, 6.03347004e-01, 8.62713319e-01,
  2.95561352e-01, 4.76139347e-01, 2.83877159e-01, 2.47460995e-02]
[ 2.40000000e+02,-2.65935940e-01, 1.40617046e-01, 5.31744120e-03,
  2.63021282e-01, 7.33925283e-01, 1.06201524e-03, 4.36809127e-01,
  9.94229195e-01, 1.25195456e-01, 9.82290232e-01, 5.30693377e-01]
[2.41000000e+02,1.1759

[2.80000000e+02,6.64079660e-01,9.11561028e-01,6.30933716e-01,
 5.82564010e-01,1.39442491e-02,7.40823389e-01,7.00335223e-02,
 3.83514152e-02,8.28394288e-01,4.58552049e-01,6.08816632e-01]
[ 2.81000000e+02,-2.15196718e+00, 2.68730233e-01, 3.82297233e-01,
  5.57443891e-01, 2.03609964e-01, 4.43462095e-01, 6.55935579e-02,
  6.11400743e-01, 2.69208097e-01, 2.21432421e-01, 6.59856609e-02]
[ 2.82000000e+02,-5.68536653e-01, 3.70414548e-01, 7.38010509e-01,
  3.73694817e-01, 1.70245997e-01, 9.32435892e-01, 8.48747450e-01,
  5.80392555e-01, 4.49380930e-01, 3.49884420e-01, 2.98601847e-01]
[283.        ,  0.67554441,  0.92154136,  0.81822101,  0.33619429,
   0.83280547,  0.94662822,  0.75858895,  0.78772621,  0.61639626,
   0.82517789,  0.60204049]
[ 2.84000000e+02,-6.06648764e-02, 7.71874323e-01, 6.17060047e-01,
  3.47630931e-01, 9.95588601e-01, 7.61126092e-01, 3.55230319e-01,
  6.59631234e-02, 4.30435211e-01, 1.09569155e-01, 3.73620897e-01]
[ 2.85000000e+02,-1.26461829e+00, 8.16706402e-01, 1.249858

KeyboardInterrupt: 

**N.B** As the cell runs an infinite loop, the producer is never going to stop by itself. 
Don't forget to stop the cell using the dedicated button (■).