# BSc network analysis

This notebook recreates the network analysis found in the BSc project.

## Theory

The BSc project investigated one network methods:
- Network Flow Matrix: the flow of shoppers between shops

**Network Flow Matrix**

Each shop is a node in the network. The flow of shoppers from one node to another (moving to a different store) is dipicted by the adjacency matrix $A$. Therefore, the number of shoppers going from shop $i$ to shop $j$ is the element $A_{ij}$.

Shopper can both move from shop $i$ to shop $j$ as well as from shop $j$ to shop $i$. We can characterise the `Net Flow`($F$) as the net amount of shoppers moving from $i$ to $j$. Therefore, the `Net Flow` is

$$ F_{ij} = A_{ij} - A_{ji}. $$

Furthermore, the `Net Flow` matrix can also be written as

$$ F_{ij} = A - A^{T}. $$

**Sliding Window**

The devices need to be `binned` to find the number of devices (shoppers) present over a time period at a shop. This is important because the devices emit singals at random intervals which do not allow us to know the exact number of devices in a shop at a given time. So, this will enable the number of shoppers at a particular shop over a given period time to be calculated. The period of time (known as a window) is kept at a fixed width and moved over time at a given rate (known as sliding). Therefore, the number of shopper can be approximated (with a window of 20 minutes) at a 1 minute time interval. This process is known as a `Sliding Window`.

**In-Degree Ranked Distribution**

In-degree $k_{in}$ is the sum of the adjancy matrix elements that direct to a particular shop i:

$$ k_{in}^{(i)} = \sum_{j}A_{ij} $$


## Load dependencies

In [2]:
import pandas as pd
import numpy as np
from scipy import stats

%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import matplotlib.cm as cm
matplotlib.style.use('ggplot')

import seaborn as sns

In [3]:
%load_ext autoreload
%autoreload 2

In [4]:
from msci.utils import utils

## Import raw data

In [6]:
signal_df = utils.import_signals('Mall of Mauritius', version=3, signal_type=1)

## Method

In [249]:
def create_adjacency_matrices(signal_df, sliding_interval=30, window_size=60):
    """
    Creates an array of adjacency matrices which find the number of shopper which move from shop i to shop j
    
    :param signal_df: (pd.DataFrame) the number of signals in the dataframe
    :param sliding_interval: (int) the number of minutes between each sampled window
    :param window_size: (int) the size of the window in minutes
    :return: (np.array) an array of adjacency matrices which increase in time with intervals of `sliding_interval`
    """
    signal_df = signal_df[signal_df.date_time.dt.time > pd.Timestamp('7:00:00').time()]
    
    store_ids = np.sort(signal_df[signal_df.store_id.notnull()].store_id.unique())
    store_ids_indices = {key: value for (key, value) in zip(store_ids, range(len(store_ids)))}
    
    adjacency_matrices_with_time = []
    
    start_time = min(signal_df.date_time)
    end_time = max(signal_df.date_time)
    window_start_time = start_time
    
    while window_start_time < (end_time - pd.Timedelta(minutes=window_size)):
        print(window_start_time)
        window_end_time = window_start_time + pd.Timedelta(minutes=window_size)
        
        signal_window_df = signal_df[
            (signal_df.date_time > window_start_time) &
            (signal_df.date_time < window_end_time)
        ]
        
        signal_matrix = signal_window_df.as_matrix(['mac_address', 'store_id'])
        adjacency_matrices_with_time.append(create_adjacency_matrix(store_ids_indices, signal_matrix))
        
        window_start_time = window_start_time + pd.Timedelta(minutes=sliding_interval)
            
    return np.array(adjacency_matrices_with_time)
    

In [259]:
def create_adjacency_matrix(store_ids_indices, signal_matrix):
    """
    :param store_ids_indices: (dict) the store ids as keys and index in the adjacency matrix as values
    :param signal_matrix: (np.array) the mac_address, store_id for a given window
    :return: (np.array) adjacency matrix with the number of shoppers going from store i to store j in the signal_matrix
    """
    num_stores = len(store_ids_indices)
    adjacency_matrix = np.zeros((num_stores, num_stores))
    mac_addresses = np.unique(signal_matrix[:, 0])
    
    for mac_address in mac_addresses:
        mac_address_indices = np.where(signal_matrix[:, 0] == mac_address)
        
        # remove nans from stores
        stores = [store for store in signal_matrix[mac_address_indices[0]][:, 1] if store is not np.nan]
        
        store_from = np.nan
        for store_to in stores:
            
            if (store_from is not store_to) and (store_from is not np.nan) and (store_to is not np.nan):
                store_from_index = store_ids_indices[store_from]
                store_to_index = store_ids_indices[store_to]
                
                adjacency_matrix[store_from_index][store_to_index] += 1
                
            store_from = store_to
                
    return adjacency_matrix
    

## Result

In [263]:
adjacency_matrix = create_adjacency_matrices(signal_df)

2016-12-22 07:00:05
