In [1]:
// nuget references
#r "nuget: FSharp.Stats, 0.5.1-preview.1"
//#r "nuget: Plotly.NET, 4.2.0"
#r "nuget: Plotly.NET.Interactive, 4.2.1"
#r "nuget: FSharp.Data, 6.3.0"
#r "nuget: Cytoscape.NET, 0.2.0"
#r "nuget: Cytoscape.NET.Interactive, 0.2.0"

open FSharp.Stats
open Plotly.NET
open Plotly.NET.StyleParam
open Plotly.NET.LayoutObjects
open FSharp.Data
open Cytoscape.NET
open System

// use a script to import data in all notebooks > 1
#load "import.fsx"
open Import

let orders = Import.orders



Loading extensions from `C:\Users\bvenn\.nuget\packages\plotly.net.interactive\4.2.1\interactive-extensions\dotnet\Plotly.NET.Interactive.dll`

Loading extensions from `C:\Users\bvenn\.nuget\packages\cytoscape.net.interactive\0.2.0\interactive-extensions\dotnet\Cytoscape.NET.Interactive.dll`

Some of FSharp.Stats functionalities require the usage of [LAPACK](https://www.netlib.org/lapack/) routines. After the initial package download you can find these at `C:\Users\USERNAME\.nuget\packages\fsharp.stats\0.5.1-preview.1\netlib_LAPACK`. In the prepared use cases it is not necessary to load it but if you want, the next two lines do the job 

In [2]:
//FSharp.Stats.ServiceLocator.setEnvironmentPathVariable (@"C:\Users\USERNAME\.nuget\packages\fsharp.stats\0.5.1-preview.1\netlib_LAPACK")
//FSharp.Stats.Algebra.LinearAlgebra.Service()

## Social network generation

The data allows the construction of a social network of drinking partners. In theory drinking partners are likely to log their drinks within a short period of time. Of course this assumption is prone to error because there are two logging devices in different building, and additionally external factors (like the end of a lecture many people attend) are likely to cause simultaneous thirst.

To start this analysis we map over all orders, and for each order isolate orders that are within a short time period (e.g. 1 minute prior and after). From these orders we can isolate the user names because that the only thing we are currently interested in. To remove self references, an additional filter step is required. 


In [3]:
let drinkingpartners = 
    orders
    |> Array.map (fun x -> 
        orders
        |> Array.filter (fun t -> 
            //x.DateTime < t.DateTime.AddMinutes 1 && x.DateTime > t.DateTime.AddMinutes -1
            let timeRange = x.DateTime - t.DateTime
            abs timeRange.TotalMinutes < 1
            )
        |> Array.map (fun drinkPartner -> 
            x.Name,drinkPartner.Name
            )
        |> Array.filter (fun (a,b) -> a <> b)
        )
    |> Array.concat

drinkingpartners

index,value
,
,
,
,
,
,
,
,
,
,

Unnamed: 0,Unnamed: 1
Item1,Douglas Powell
Item2,Nicholas Thomas

Unnamed: 0,Unnamed: 1
Item1,Douglas Powell
Item2,Patrick Holmes

Unnamed: 0,Unnamed: 1
Item1,Douglas Powell
Item2,Muhammed Sullivan

Unnamed: 0,Unnamed: 1
Item1,Nicholas Thomas
Item2,Douglas Powell

Unnamed: 0,Unnamed: 1
Item1,Nicholas Thomas
Item2,Patrick Holmes

Unnamed: 0,Unnamed: 1
Item1,Nicholas Thomas
Item2,Muhammed Sullivan

Unnamed: 0,Unnamed: 1
Item1,Patrick Holmes
Item2,Douglas Powell

Unnamed: 0,Unnamed: 1
Item1,Patrick Holmes
Item2,Nicholas Thomas

Unnamed: 0,Unnamed: 1
Item1,Patrick Holmes
Item2,Muhammed Sullivan

Unnamed: 0,Unnamed: 1
Item1,Muhammed Sullivan
Item2,Douglas Powell

Unnamed: 0,Unnamed: 1
Item1,Muhammed Sullivan
Item2,Nicholas Thomas

Unnamed: 0,Unnamed: 1
Item1,Muhammed Sullivan
Item2,Patrick Holmes

Unnamed: 0,Unnamed: 1
Item1,Emma Roman
Item2,Eleanor Macdonald

Unnamed: 0,Unnamed: 1
Item1,Eleanor Macdonald
Item2,Emma Roman

Unnamed: 0,Unnamed: 1
Item1,Eleanor Macdonald
Item2,Hannah Walters

Unnamed: 0,Unnamed: 1
Item1,Hannah Walters
Item2,Eleanor Macdonald

Unnamed: 0,Unnamed: 1
Item1,Hannah Walters
Item2,Jasmine Sutton

Unnamed: 0,Unnamed: 1
Item1,Jasmine Sutton
Item2,Hannah Walters

Unnamed: 0,Unnamed: 1
Item1,Hugo Green
Item2,Abigail Payne

Unnamed: 0,Unnamed: 1
Item1,Hugo Green
Item2,Abigail Payne


From there it's an easy task to determine the number of simultaneous drinking.

In [4]:
let partnerCounts = 
    drinkingpartners
    |> Array.countBy id

partnerCounts

As discussed earlier, it is possible to become a drinking parter by chance. To reduce the probability of getting false positives, it is recommended to filter sparse relationships. 
Therefore you could either just set an arbitrary threshold (e.g. 3) or you could visualize the count distribution and make an educated guess about an appropriate threshold.

In [5]:
partnerCounts
|> Array.map snd
|> Chart.Histogram
|> Chart.withXAxisStyle "Total number of simultaneous drinks of two people"
|> Chart.withYAxisStyle "count of occurances"


It becomes apparent, that most of the person-person relations have a simultaneous drink just a few times (<5). A threshold of e.g. 8 seems to be appropriate. Another thing you may noticed is the histogram counts are always multiple of 2.
This is due to the fact, that not only `(Hugo Green, Abigail Payne)` has a drinking count of `21`, but also `(Abigail Payne, Hugo Green)`.
Because we are not interested in a directed network where it does matter who took a drink first, we just can ignore half of the data: 

In [6]:
let filteredPartnerCounts = 
    partnerCounts
    |> Array.filter (fun (names,sharedDrinkingCount) -> sharedDrinkingCount >= 8)
    |> Array.distinctBy (fun ((name1,name2),_) -> [name1;name2] |> List.sort) 

Create a chart that visualizes these counts.

In [7]:
filteredPartnerCounts
|> Array.map (fun ((name1,name2),sharedDrinkingCount) -> $"{name1} - {name2}", sharedDrinkingCount)
|> Array.sortByDescending snd
|> Chart.Bar


We have additional information about the department the users are working in. This can be used to color the nodes respectively. Create function that takes a department name and returns a color string in the format "#fffff". Additionally create a Map that returns a deparment color when given a user name. 

In [8]:
let getDepartmentColor (department: string) = 
    match department with 
    | "Breakroom Bandits" -> "#2b3ae9"
    | "Genesis" -> "#f7da41"
    | "We Tried" -> "#008b66"
    | "No Lucks Given" -> "#987200"
    | "Toon Squad" -> "#ff7f0e"
    | "Rumor Spreaders" -> "#20b2aa"
    | "Risky Biscuits" -> "#a230ed"
    | "Recruitables" -> "#d21102"
    | "Employees of the Moment" -> "#19d3f3"
    | "Chargers" -> "#dea57b"
    | "Kickstarters" -> "#dea57b"
    | _ -> "#8b8b8b"


let person2Color = 
    orders 
    |> Array.map (fun x -> x.Name,getDepartmentColor x.Department) 
    |> Array.distinct
    |> Map.ofArray

Now we have all node, edge and styling information to generate a graph. 

Please check out the documentation: https://fslab.org/Cytoscape.NET/.

Start by creating a function that takes the `((string*string)*int) []` of the filtered partner counts and returns a sequence of Cytoscape.NET.Elements.Node. For every user we need a single node element.


In [9]:
let getCytoVertices (input: ((string*string)*int) []) = 
    input
    |> Seq.collect (fun ((s,t),w) ->
        let stylingSource = [CyParam.label s; CyParam.weight 12; CyParam.color person2Color.[s]]
        let stylingTarget = [CyParam.label t; CyParam.weight 12; CyParam.color person2Color.[t]]
        [|Elements.node s stylingSource;Elements.node t stylingTarget|]
        )
    |> Seq.distinct

Do the same with the edges!

In [10]:
let getCytoEdges (input: seq<(string*string)*int>)= 
    input 
    |> Seq.distinct
    |> Seq.mapi (fun i ((s,t),w) -> 
        //let styling = [CyParam.weight (sqrt (float w / 2.))]
        let styling = [CyParam.weight (log (float w))]
        Elements.edge ("e" + string i) s t styling
        )

Now isolate nodes and edges from the drinking parter data and create a CyGraph.

In [11]:
let socialVertices = getCytoVertices filteredPartnerCounts
let socialEdges = getCytoEdges filteredPartnerCounts

CyGraph.initEmpty ()
|> CyGraph.withElements socialVertices
|> CyGraph.withElements socialEdges

Do some styling using user names and deparment colors. The weights of the edge could represent the count of simultanous drinking.

In [12]:
let socialVertices = getCytoVertices filteredPartnerCounts
let socialEdges = getCytoEdges filteredPartnerCounts

let cytoGraph vertices edges = 
    CyGraph.initEmpty ()
    |> CyGraph.withElements vertices
    |> CyGraph.withElements edges
    |> CyGraph.withStyle "node" 
        [
            CyParam.shape "circle"
            CyParam.content =. CyParam.label
            CyParam.Text.Outline.color "#000000"
            CyParam.Text.Outline.width 1   
            CyParam.color "#FFFFFF"
            CyParam.Background.color =.CyParam.color //"grey"//
            CyParam.Border.color "#A00975"
        ]
    |> CyGraph.withStyle "edge" 
        [
            CyParam.Line.color "grey" //"#3D1244"
            CyParam.Curve.style "bezier"
            CyParam.width =. CyParam.weight
        ]
    |> CyGraph.withLayout (Layout.initCose id)   

cytoGraph socialVertices socialEdges


## Correlation network

Besides a social network we can also generate a day-based correlation network. Here we assign high correlation scores to a user-user pair, if their drinking behaviour is similar.

While this seems like that this analysis will show the same results as the network before, the readout will be different.

The most common correlation measure is the Pearsons correlation coefficient. It ranges from -1 to 1, while 0 indicates no correlation at all and 1 indicates a perfect correlation of two collections.

To be able to calculate correlations between two people, you could encode days when someone logged a drink by 1 and 0 otherwise.

Task: Create a nested collection as float [] [], that for each person contains an array of 1., -1 or 0..

```fsharp
[ //     | drank some kind of beverage on the second day since logging start
    [0.; 1.; -1.; ...] //Nicholas Powell
    [0.; 1.; 0.; ...] //Timo M.
    [-1.; 0.; 1.; ...] //Chloe Perkins
]
```


In [14]:
let allPersonNames = 
    orders
    |> Array.map (fun x -> x.Name)
    |> Array.distinct

let firstTicks = 
    allPersonNames
    |> Array.map (fun name -> 
        orders
        |> Array.find (fun order -> order.Name = name)
        |> fun x -> x.DateTime
        )       

let encodings = 
    orders
    |> Array.groupBy (fun x -> x.DateTime.Date)
    |> Array.map (fun (date,orders) -> 
        allPersonNames 
        |> Array.mapi (fun nameIndex name -> 
            let didPersonDrinkAtThisDate =
                orders 
                |> Array.exists (fun order -> order.Name = name)
            if date <= firstTicks.[nameIndex] then 
                0. 
            else 
                if didPersonDrinkAtThisDate then 1. else -1
        )
    )
    |> JaggedArray.transpose
        
encodings


index,value
0,"[ 0, 1, -1, -1, -1, 1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1, 1, -1 ... (1627 more) ]"
1,"[ 0, 1, -1, -1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1 ... (1627 more) ]"
2,"[ 0, 1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, 1 ... (1627 more) ]"
3,"[ 0, 1, -1, -1, 1, 1, 1, 1, 1, -1, 1, 1, 1, -1, -1, 1, 1, 1, 1, 1 ... (1627 more) ]"
4,"[ 0, 1, 1, 1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ... (1627 more) ]"
5,"[ 0, 1, -1, -1, 1, 1, 1, 1, -1, 1, 1, -1, 1, -1, -1, -1, -1, -1, -1, -1 ... (1627 more) ]"
6,"[ 0, 1, -1, -1, 1, 1, 1, 1, -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1 ... (1627 more) ]"
7,"[ 0, 1, -1, -1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1 ... (1627 more) ]"
8,"[ 0, 1, -1, -1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 ... (1627 more) ]"
9,"[ 0, 1, -1, -1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, -1 ... (1627 more) ]"


In [15]:
Array.zip allPersonNames encodings
|> DisplayExtensions.DisplayTable

Item1,Item2
Justin Bennett,"[ 0, 1, -1, -1, -1, 1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1, 1, -1 ... (1627 more) ]"
Timo M.,"[ 0, 1, -1, -1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1 ... (1627 more) ]"
Nicholas Thomas,"[ 0, 1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, 1 ... (1627 more) ]"
Archie Nelson,"[ 0, 1, -1, -1, 1, 1, 1, 1, 1, -1, 1, 1, 1, -1, -1, 1, 1, 1, 1, 1 ... (1627 more) ]"
Benedikt V.,"[ 0, 1, 1, 1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ... (1627 more) ]"
Philip Reilly,"[ 0, 1, -1, -1, 1, 1, 1, 1, -1, 1, 1, -1, 1, -1, -1, -1, -1, -1, -1, -1 ... (1627 more) ]"
Hugo Green,"[ 0, 1, -1, -1, 1, 1, 1, 1, -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1 ... (1627 more) ]"
Douglas Powell,"[ 0, 1, -1, -1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1 ... (1627 more) ]"
Muhammed Sullivan,"[ 0, 1, -1, -1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 ... (1627 more) ]"
Scott Woods,"[ 0, 1, -1, -1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, -1 ... (1627 more) ]"


Use a density point chart to visualize the encoding pairs of e.g. Benedikt (index 4) and Kevin (index 15).
You can additionally calculate the pearson correlation coefficient and do proper styling.

In [25]:
let nameA,exampleA = allPersonNames.[4],encodings.[4] 
let nameB,exampleB = allPersonNames.[15],encodings.[15]

let correlation = 
    FSharp.Stats.Correlation.Seq.pearson exampleA exampleB

Chart.PointDensity(exampleA,exampleB) 
|> Chart.withTitle $"Encoding pairs {nameA} vs {nameB}, correlation: %.2f{correlation}"
|> Chart.withXAxisStyle $"Encodings {nameA}"
|> Chart.withYAxisStyle $"Encodings {nameB}"
|> Chart.withAnnotation
    (Annotation.init(X=0.,Y=1.5,Text="0: not yet enroled<br>-1: not present that day<br>1: present that day",ShowArrow=false))
|> Chart.withTemplate ChartTemplates.lightMirrored

The encoded matrix already is in a form that can be used to calculate a pairwise pearson correlation matrix.

Use an appropriate function from the FSharp.Stats.Correlation module and visualize the resulting correlation matrix as heatmap.

By calling `Matrix.ofJaggedArray` or just `matrix` for short, you can convert this jagged array into a matrix.

In [18]:
let corrMat = 
    Correlation.Matrix.rowWisePearson (matrix encodings)

corrMat
|> Matrix.toJaggedArray
|> fun x -> Chart.Heatmap (x,colNames = allPersonNames,rowNames=allPersonNames)


To get an intuition of what the correlation distribution looks like, you can create a histogram out of the correlation coefficients. Note, that you should filter values of 1. since they are self-references.

In [20]:
corrMat
|> Matrix.toJaggedArray
|> Array.concat
|> Array.filter (fun x -> x <> 1.)
|> Chart.Histogram
|> Chart.withXAxisStyle "pearson correlation"
|> Chart.withYAxisStyle "count"


A method to identify an appropriate threshold for many types of correlation networks is called Random Matrix Theory. More indepth descriptions you can find [here](https://fslab.org/blog/posts/correlation-network.html).
The threshold for the coffee correlation network is precomputed because of runtime and LAPACK dependency.

Iterate over the correlation matrix, and whenever a correlation exceeds the threshold, create nodes for the people and an edge between them. 

In [21]:
let correlationThreshold = 
    //precomputed because of runtime and LAPACK dependency
    //Testing.RMT.compute 0.9 0.01 0.05 corrMat
    0.671875

let mutable nodelist : string list= []
let mutable edgelist: (string*string*float) list= []

corrMat
|> Matrix.mapi (fun r c x -> 
    if r < c then 
        if x > correlationThreshold then 
            nodelist <- allPersonNames.[r]::(allPersonNames.[c]::nodelist)
            edgelist <- (allPersonNames.[r],allPersonNames.[c],x)::edgelist
            1.
        else 0.
    else 0.)

let csbCytoVertices = 
    nodelist
    |> Seq.collect (fun s ->
        let stylingSource = [CyParam.label s; CyParam.weight 12; CyParam.color person2Color.[s]]
        [|Elements.node s stylingSource|]
        )
    |> Seq.distinct

let csbCytoEdges = 
    edgelist 
    |> Seq.distinct
    |> Seq.mapi (fun i (s,t,w) -> 
        let styling = [CyParam.weight (3. / abs w)]
        Elements.edge ("e" + string i) s t styling
        )

CyGraph.initEmpty ()
|> CyGraph.withElements csbCytoVertices
|> CyGraph.withElements csbCytoEdges
|> CyGraph.withStyle "node" 
    [
        CyParam.shape "circle"
        CyParam.content =. CyParam.label
        CyParam.Background.color  =.CyParam.color //"grey"//
        CyParam.Text.Outline.color "#000000"
        CyParam.Text.Outline.width 1   
        CyParam.color "#FFFFFF"
        CyParam.Border.color "#A00975"
    ]
|> CyGraph.withStyle "edge" 
    [
        CyParam.Line.color "grey"
        CyParam.Curve.style "bezier"
        CyParam.width =. CyParam.weight
    ]
|> CyGraph.withLayout (Layout.initCose id)  
|> CyGraph.withSize (1300,1000)
