# Big Data & Machine Learning: Streaming Review

* data science
    * network science
* machine learning
    * regression
* big data
    * streaming

## Reflect

* dataset which provides connections between user preferences
* user-product graph which records high ratings of products
    * we use an events system to record high-ratings (i.e, "LIKE"s)
* we use machine learning on the event stream to detect if they'd like "WINE"
    * if so, we recommend the wine (on offer)

## Simulating the User-Product Graph

In [4]:
from random import choice, random

In [15]:
["ElectricToy"] * 3

['ElectricToy', 'ElectricToy', 'ElectricToy']

In [21]:
ptype = ["ElectricToy"] * 5 + ["Chocolate"] * 3 + ["Wine"]  * 2



users = [
    (
        f"User{i}", 
        round(10 * random())/10, 
        round(5 * random() + 1)/10, 
        round(10 * random())/10
    )
     for i in range(0, 100)
]

products = [ (f"Product{i}", choice(ptype)) for i in range(0, 100)]

In [22]:
users[:5]

[('User0', 0.5, 0.4, 0.5),
 ('User1', 1.0, 0.4, 0.5),
 ('User2', 0.3, 0.5, 0.4),
 ('User3', 0.5, 0.2, 0.3),
 ('User4', 0.2, 0.3, 0.1)]

In [23]:
products[:5]

[('Product0', 'ElectricToy'),
 ('Product1', 'Chocolate'),
 ('Product2', 'Wine'),
 ('Product3', 'Chocolate'),
 ('Product4', 'ElectricToy')]

In [29]:
G = []

for (user, pr_toy, pr_choc, pr_wine) in users:
    if (pr_toy > random()) and (pr_toy >= 0.5):
        product = choice([ 
            pname for (pname, ptype) in products 
            if ptype == "ElectricToy"
        ])
        
        G.append( (user, product, pr_toy) )
            
            
    if pr_choc > random() and (pr_choc >= 0.5):
        product = choice([ 
            pname for (pname, ptype) in products 
            if ptype == "Chocolate"
        ])
        
        G.append( (user, product, pr_choc) )
            
    if pr_wine > random() and (pr_wine >= 0.5):
        product = choice([ 
            pname for (pname, ptype) in products 
            if ptype == "Wine"
        ])
        
        G.append( (user, product, pr_wine) )
            

---

In [31]:
import pandas as pd

In [52]:
pdf = pd.DataFrame(products, columns=["ProductName", "Type"])

pdf.sample(3)

Unnamed: 0,ProductName,Type
3,Product3,Chocolate
41,Product41,Wine
46,Product46,ElectricToy


In [53]:
udf = pd.DataFrame(users, columns=["UserName", "Toy", "Choc", "Wine"])

In [54]:
udf.sample(3)

Unnamed: 0,UserName,Toy,Choc,Wine
21,User21,0.4,0.2,0.1
51,User51,0.4,0.3,0.4
73,User73,1.0,0.1,0.0


In [58]:
updf = pd.DataFrame(G, columns=["UserName", "ProductName", "Rating"])

In [59]:
updf.sample(3)

Unnamed: 0,UserName,ProductName,Rating
29,User32,Product22,0.5
64,User65,Product49,0.8
34,User36,Product75,0.9


---

In [61]:
from sklearn.linear_model import LinearRegression

In [68]:
model_toy = LinearRegression().fit(udf[['Toy']], udf['Wine'])
model_cho = LinearRegression().fit(udf[['Choc']], udf['Wine'])


In [70]:
model_toy.predict([
    [0.1],
    [0.5]
])

array([0.52483928, 0.51363589])

---

## Events

In [81]:
events = [{
    "subject": u,
    "verb": "like",
    "object": v,
    "context" :{
        "time": "12pm",
        "rating": w
    }
} for (u, v, w) in G]

As an event database (ie., append-only eventlog), 

In [83]:
events[:2]

[{'subject': 'User0',
  'verb': 'like',
  'object': 'Product29',
  'context': {'time': '12pm', 'rating': 0.5}},
 {'subject': 'User1',
  'verb': 'like',
  'object': 'Product70',
  'context': {'time': '12pm', 'rating': 1.0}}]

In [85]:
events.append({
    "subject": "User0",
    "verb": "orders",
    "object": "Product0",
    "context": {
        "time": "1pm"
    }
})

...to simulate a stream we can use `iter`, 

In [116]:
stream = iter()

In [117]:
def get_from_eventsdb():
    return next(stream)

In [120]:
for event in events:
    
    if event['verb'] != "like":
        #print("SKIPPING", event)
        continue 
        
    # PROCESSING LIKE
    
    user, product, rating = (
        event['subject'], 
        event['object'], 
        event['context']['rating']
    )
    
    # FIND THE PRODUCT TYPE
    fav_type = pdf.loc[ pdf['ProductName'] == product, 'Type' ].values[0]
    
    # AIM: RECOMMEND WINE IF SUITABLE 
    
    if fav_type == "Wine":
        est_pref_wine = rating
        
    if fav_type == "ElectricToy":
        est_pref_wine = model_toy.predict([
            [rating]
        ])
        
    if fav_type == "Chocolate":
        est_pref_wine = model_cho.predict([
            [rating]
        ])
        
    if est_pref_wine > 0.85:
        events.append({
            "subject": "RECOMMENDATION_SYSTEM",
            "verb": "RECOMMEND",
            "object": "USER",
            "context": {
                "time": "2pm",
                "product": "WINE_ON_OFFER"
            }
        })

In [121]:
events[-3:]

[{'subject': 'RECOMMENDATION_SYSTEM',
  'verb': 'RECOMMEND',
  'object': 'USER',
  'context': {'time': '2pm', 'product': 'WINE_ON_OFFER'}},
 {'subject': 'RECOMMENDATION_SYSTEM',
  'verb': 'RECOMMEND',
  'object': 'USER',
  'context': {'time': '2pm', 'product': 'WINE_ON_OFFER'}},
 {'subject': 'RECOMMENDATION_SYSTEM',
  'verb': 'RECOMMEND',
  'object': 'USER',
  'context': {'time': '2pm', 'product': 'WINE_ON_OFFER'}}]

---