# Baseball Demo

This notebook contains a detailed example, demonstrating the typical workflow Graft aims to support. 
The dataset used here was constructed by splicing together two separate datasets:

1. `SOCR Data MLB HeightsWeights`: Heights, ages and weights of Baseball players (Vertex Data). References:
  * Jarron M. Saint Onge, Patrick M. Krueger, Richard G. Rogers. (2008) Historical trends in height,
  weight, and body mass: Data from U.S. Major League Baseball players, 1869-1983, Economics & Human
  Biology, Volume 6, Issue 3, Symposium on the Economics of Obesity, December 2008, Pages 482-488,
  ISSN 1570-677X, DOI: 10.1016/j.ehb.2008.06.008.
  * Jarron M. Saint Onge, Richard G. Rogers, Patrick M. Krueger. (2008) Major League Baseball Players'
  Life Expectancies, Southwestern Social Science Association, Volume 89, Issue 3, pages 817–830,
  DOI: 10.1111/j.1540-6237.2008.00562.x.
2. `Advogato Trust Network` : Edge weights between 0 and 1. References:
  * Advogato network dataset -- KONECT, July 2016. [http](http://konect.uni-koblenz.de/networks/advogato)
  * Paolo Massa, Martino Salvetti, and Danilo Tomasoni. Bowling alone and trust decline in social network
  sites. In Proc. Int. Conf. Dependable, Autonomic and Secure Computing, pages 658--663, 2009.

The dataset has 6541 vertices, 51127 edges.
Vertex properties: Age, Height(cm), Weight(kg)
Edge properties  : Trust(float)

In [5]:
## Load and summarize the graph.
using Graft
using StatsBase
import LightGraphs

# Load the graph
download(
"https://raw.githubusercontent.com/pranavtbhat/Graft.jl/gh-pages/Datasets/baseball.txt",
joinpath(Pkg.dir("Graft"), "examples/baseball.txt")
);


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1900k  100 1900k    0     0   313k      0  0:00:06  0:00:06 --:--:--  496k


In [7]:
g = loadgraph(joinpath(Pkg.dir("Graft"), "examples/baseball.txt"))

Graph(6541 vertices, 51127 edges, Symbol[:Age,:Height,:Weight] vertex properties, Symbol[:Trust] edge properties)

In [8]:
# Get the graph's size
size(g)

(6541,51127)

In [11]:
# List vertex labels
encode(g)

6541-element Array{Any,1}:
 "gc"                  
 "prigaux"             
 "fred"                
 "quintela"            
 "jgarzik"             
 "penso"               
 "leviramsey"          
 "havardk"             
 "sh"                  
 "zappy"               
 "ollesson"            
 "sander"              
 "caolan"              
 "samth"               
 "SteveMallett"        
 "duff"                
 "MikeCamel"           
 "mwk"                 
 "mobius"              
 "mbrubeck"            
 "tod"                 
 "rasmus"              
 "mrcsparker"          
 "jeje"                
 "sheath"              
 "dwaite"              
 "Erbo"                
 "pgmillard"           
 "jeremie"             
 "miguel"              
 "temas"               
 "DizzyD"              
 "eliot"               
 "julian"              
 "fingolfin"           
 "yiran"               
 "DaveGoehrig"         
 "agntdrake"           
 "tripix"              
 "beppu"               
 "Astinus"   

In [12]:
# Split the graph into vertex and edge descriptors
V,E = g;

In [13]:
# Display the vertex table
V

│ VertexID │ Labels       │ Age   │ Height │ Weight  │
├──────────┼──────────────┼───────┼────────┼─────────┤
│ 1        │ "gc"         │ 26.03 │ 182.88 │ 104.545 │
│ 2        │ "prigaux"    │ 25.43 │ 175.26 │ 84.0909 │
│ 3        │ "fred"       │ 24.51 │ 182.88 │ 93.6364 │
│ 4        │ "quintela"   │ 31.81 │ 193.04 │ 86.3636 │
│ 5        │ "jgarzik"    │ 27.32 │ 185.42 │ 90.9091 │
│ 6        │ "penso"      │ 25.5  │ 185.42 │ 86.3636 │
│ 7        │ "leviramsey" │ 32.68 │ 182.88 │ 90.9091 │
│ 8        │ "havardk"    │ 30.22 │ 182.88 │ 88.6364 │
│ 9        │ "sh"         │ 28.8  │ 182.88 │ 90.9091 │
│ 10       │ "zappy"      │ 29.54 │ 193.04 │ 104.545 │
│ 11       │ "ollesson"   │ 29.12 │ 180.34 │ 106.818 │
⋮
│ 6530     │ "boerner"    │ 30.46 │ 187.96 │ 88.1818 │
│ 6531     │ "barismetin" │ 32.68 │ 193.04 │ 89.5455 │
│ 6532     │ "baris"      │ 28.11 │ 180.34 │ 88.6364 │
│ 6533     │ "obritim"    │ 26.63 │ 187.96 │ 84.0909 │
│ 6534     │ "arabouma36" │ 24.15 │ 193.04 │ 81.8182 │
│ 6535  

In [14]:
# Display the edge table
E

│ Index │ Source        │ Target        │ Trust     │
├───────┼───────────────┼───────────────┼───────────┤
│ 1     │ "gc"          │ "gc"          │ 0.42739   │
│ 2     │ "gc"          │ "prigaux"     │ 0.978998  │
│ 3     │ "gc"          │ "fred"        │ 0.714178  │
│ 4     │ "gc"          │ "penso"       │ 0.999861  │
│ 5     │ "gc"          │ "leviramsey"  │ 0.993962  │
│ 6     │ "gc"          │ "sh"          │ 0.336044  │
│ 7     │ "gc"          │ "fxn"         │ 0.0949308 │
│ 8     │ "gc"          │ "chromatic"   │ 0.778156  │
│ 9     │ "gc"          │ "strider"     │ 0.874019  │
│ 10    │ "gc"          │ "sdodji"      │ 0.282097  │
│ 11    │ "gc"          │ "Nyco"        │ 0.142455  │
⋮
│ 51116 │ "hulver"      │ "hulver"      │ 0.410183  │
│ 51117 │ "asanders"    │ "asanders"    │ 0.754016  │
│ 51118 │ "Aracnus"     │ "Aracnus"     │ 0.868275  │
│ 51119 │ "billstewart" │ "billstewart" │ 0.976183  │
│ 51120 │ "boerner"     │ "slok"        │ 0.864808  │
│ 51121 │ "baris"       │ 

In [15]:
# Find the average BMI of baseball players
@query(g |> eachvertex(v.Weight / (v.Height / 100) ^ 2)) |> mean

26.23778373854929

In [16]:
# Find the median height of baseball players in their 20s
@query(g |> filter(v.Age < 30,v.Age >= 20) |> eachvertex(v.Height * 0.0328084)) |> median

6.166666864000001

In [17]:
# Find the mean age difference in strong relationships
@query(g |> filter(e.Trust > 0.8) |> eachedge(s.Age - t.Age)) |> abs |> mean

4.163929464037767

In [18]:
# Find fred's 3 hop neighborhood (friends and friends-of-friends and so on)
fred_nhood = hopgraph(g, "fred", 3)

Graph(1957 vertices, 29901 edges, Symbol[:Age,:Height,:Weight] vertex properties, Symbol[:Trust] edge properties)

In [19]:
# See how well younger players in fred's neighborhood trust each other
@query(fred_nhood |> filter(v.Age > 30) |> eachedge(e.Trust)) |> mean

0.5495668265206273

In [20]:
# Find the 2 hop neighborhood of 2 separate vertices (multi seed traversal)
sg = hopgraph(g, ["nikolay", "jbert"], 3)

Graph(1615 vertices, 23569 edges, Symbol[:Age,:Height,:Weight] vertex properties, Symbol[:Trust] edge properties)

In [22]:
# Generate an edge distance property on the inverse of normalized-trust
dists = @query(sg |> eachedge(1 / e.Trust ));
seteprop!(sg, :, dists, :Dist);

In [23]:
# Trim edges of very high distance
sg = @query(sg |> filter(e.Dist < 10))

Graph(1615 vertices, 22108 edges, Symbol[:Age,:Height,:Weight] vertex properties, Symbol[:Trust,:Dist] edge properties)

In [24]:
# Export the graph's adjacency matrix
M = export_adjacency(sg)
lg = LightGraphs.DiGraph(M)

{1615, 22108} directed graph

In [26]:
# Export the edge distance property
D = export_edge_property(sg, :Dist);

In [27]:
# Compute betweenness centrailty
centrality = LightGraphs.betweenness_centrality(lg)

1615-element Array{Float64,1}:
 0.0352864  
 0.0180542  
 0.0145245  
 1.73845e-5 
 0.00615578 
 0.0232976  
 0.00730561 
 0.00691493 
 0.00599549 
 0.0074532  
 0.0        
 0.144516   
 0.00696514 
 0.0202448  
 0.0367132  
 0.00145273 
 0.0034288  
 0.00563605 
 0.0206077  
 0.00128917 
 0.0138399  
 0.0426688  
 0.00300789 
 0.00877153 
 0.0138812  
 0.00324878 
 0.00921783 
 0.000354597
 0.00828943 
 0.00811383 
 0.00104084 
 0.0112866  
 0.00524107 
 0.000407162
 0.0902401  
 0.00505601 
 0.0180029  
 0.00609432 
 0.00511491 
 0.0125463  
 0.00277507 
 0.00211868 
 0.0012625  
 0.00116903 
 0.0014234  
 0.00200895 
 0.00588454 
 0.00256029 
 0.0029032  
 0.00339986 
 0.0113789  
 0.00861584 
 0.000581583
 0.0105805  
 0.000914177
 0.000964668
 8.13436e-5 
 0.00160023 
 3.39211e-5 
 0.00119636 
 0.0031265  
 0.0120089  
 0.000268437
 0.0017777  
 0.00208762 
 0.00363503 
 0.00518    
 0.000490992
 0.000260685
 0.00492723 
 6.58264e-5 
 0.000208845
 0.00781227 
 0.00191447 
 0.0013

In [29]:
# Set the centrality as a vertex property
setvprop!(sg, :, centrality, :Centrality);

In [30]:
# Apply all pair shortest paths on the graph
apsp = LightGraphs.floyd_warshall_shortest_paths(lg, D).dists;

In [31]:
# Add the new shortest paths as a property to the graph
eit = edges(sg);
seteprop!(sg, :, [apsp[e.second,e.first] for e in eit], :Shortest_Dists);

In [32]:
 # Show new vertex descriptor
VertexDescriptor(sg)

│ VertexID │ Labels        │ Age   │ Height │ Weight  │ Centrality  │
├──────────┼───────────────┼───────┼────────┼─────────┼─────────────┤
│ 1        │ "lkcl"        │ 30.51 │ 190.5  │ 95.4545 │ 0.0352864   │
│ 2        │ "chalst"      │ 27.16 │ 187.96 │ 79.5455 │ 0.0180542   │
│ 3        │ "jrf"         │ 27.23 │ 182.88 │ 81.8182 │ 0.0145245   │
│ 4        │ "Astinus"     │ 33.77 │ 190.5  │ 81.8182 │ 1.73845e-5  │
│ 5        │ "halcy0n"     │ 30.8  │ 187.96 │ 90.9091 │ 0.00615578  │
│ 6        │ "mbp"         │ 24.21 │ 182.88 │ 113.182 │ 0.0232976   │
│ 7        │ "sulaiman"    │ 33.15 │ 198.12 │ 100.0   │ 0.00730561  │
│ 8        │ "crackmonkey" │ 27.08 │ 185.42 │ 109.091 │ 0.00691493  │
│ 9        │ "ajv"         │ 32.84 │ 180.34 │ 90.9091 │ 0.00599549  │
│ 10       │ "lukeh"       │ 30.99 │ 185.42 │ 81.8182 │ 0.0074532   │
│ 11       │ "AndreyGolub" │ 29.84 │ 193.04 │ 86.3636 │ 0.0         │
⋮
│ 1604     │ "jwoolley"    │ 40.66 │ 180.34 │ 77.2727 │ 1.26063e-5  │
│ 1605     │ "goze

In [33]:
# Show the new edge descriptor
EdgeDescriptor(sg)

│ Index │ Source     │ Target        │ Trust    │ Dist    │ Shortest_Dists │
├───────┼────────────┼───────────────┼──────────┼─────────┼────────────────┤
│ 1     │ "lkcl"     │ "chalst"      │ 0.753731 │ 1.32673 │ 1.32673        │
│ 2     │ "lkcl"     │ "jrf"         │ 0.837243 │ 1.1944  │ 1.1944         │
│ 3     │ "lkcl"     │ "Astinus"     │ 0.620516 │ 1.61156 │ 1.61156        │
│ 4     │ "lkcl"     │ "halcy0n"     │ 0.704766 │ 1.41891 │ 1.41891        │
│ 5     │ "lkcl"     │ "mbp"         │ 0.879317 │ 1.13725 │ 1.13725        │
│ 6     │ "lkcl"     │ "sulaiman"    │ 0.352907 │ 2.83361 │ 2.33345        │
│ 7     │ "lkcl"     │ "crackmonkey" │ 0.223243 │ 4.47942 │ 3.25504        │
│ 8     │ "lkcl"     │ "ajv"         │ 0.427735 │ 2.3379  │ 2.3379         │
│ 9     │ "lkcl"     │ "AndreyGolub" │ 0.896434 │ 1.11553 │ 1.11553        │
│ 10    │ "lkcl"     │ "fxn"         │ 0.187012 │ 5.34724 │ 2.21906        │
│ 11    │ "lkcl"     │ "splork"      │ 0.103399 │ 9.67129 │ 2.17231        │