## Epilepsy Comorbidities (using Brown MySQL server)

This script run a PubMed-Comorbidities pipeline using the following characteristics:

* Main MeSH Heading: Epilepsy
* UMLS filtering concept: Disease or Syndrome
* Articles analysed: All MEDLINE 2017AA articles tagged with the  as a MeSH Heading. Note that this is equivalent to searching PubMed using [MH:noexp]
  Total number of articles found: 66720
* UMLS concept filtering: Comorbidities are analysed on all other MeSH descriptors associated with the specified UMLS concept
* This script uses Brown MySQL databases:
    * medline
    * umls_meta
    * pubmed_miner

In [None]:
# addprocs(12);

In [4]:
using Revise #used during development to detect changes in module
using PubMedMiner
 
#Settings
mh = "Epilepsy"
concept = "Disease or Syndrome";

## 1. Save related occurrences to database

The folllowing code is designed to save to the pubmed_miner database a table containing the list of pmids and mesh descriptors that match the specified filtering criteria.

In [5]:
overwrite = false
if overwrite
    info("----------------Start: umls_semantic_occurrences")             
    @time save_semantic_occurrences(mh, concept; overwrite = overwrite) 
end

## 2. Retrieve results and analyse simple occurrences and co-occurrences

In [6]:
using FreqTables

occurrence_df = get_semantic_occurrences_df(mh, concept)
@time mesh_frequencies = freqtable(occurrence_df, :pmid, :descriptor);

[1m[36mINFO: [39m[22m[36mUsing concept table: MESH_T047
[39m

  1.774082 seconds (2.40 M allocations: 883.272 MiB, 3.74% gc time)


In [8]:
using PlotlyJS
using NamedArrays

# Visualize frequency 
topn = 50
mesh_counts = vec(sum(mesh_frequencies, 1))
count_perm = sortperm(mesh_counts, rev=true)
mesh_names = collect(keys(mesh_frequencies.dicts[2]))

#traces
#most frequent is epilepsy - remove from plot for better scaling
freq_trace = PlotlyJS.bar(; x = mesh_names[count_perm[2:topn]], y= mesh_counts[count_perm[2:topn]], marker_color="orange")

data = [freq_trace]
layout = Layout(;title="$(topn)-Most Frequent MeSH ",
                 showlegend=false,
                 margin= Dict(:t=> 70, :r=> 0, :l=> 50, :b=>200),
                 xaxis_tickangle = 90,)
plot(data, layout)

## 3. Pair Statistics

* Mutual information
* Chi-Square
* Co-occurrance matrix

In [9]:
using BCBIStats.COOccur
using StatsBase

#co-occurrance matrix - only for topp MeSH 
# min_frequency = 5 -- alternatively compute topn based on min-frequency
top_occ = mesh_frequencies.array[:, count_perm[2:topn]]
top_occ_sp = sparse(top_occ)
coo_sp = top_occ_sp' * top_occ_sp

#Point Mutual Information
pmi_sp = BCBIStats.COOccur.pmi_mat(coo_sp)

49×49 SparseMatrixCSC{Float64,Int64} with 2401 stored entries:
  [1 ,  1]  =  0.000503018
  [2 ,  1]  =  2.89968e-5
  [3 ,  1]  =  1.83472e-5
  [4 ,  1]  =  1.8642e-5
  [5 ,  1]  =  3.95511e-6
  [6 ,  1]  =  1.35698e-5
  [7 ,  1]  =  1.29747e-5
  [8 ,  1]  =  0.000102379
  [9 ,  1]  =  3.25866e-5
  [10,  1]  =  1.50785e-5
  ⋮
  [39, 49]  =  0.0
  [40, 49]  =  0.000180923
  [41, 49]  =  0.0
  [42, 49]  =  0.0
  [43, 49]  =  4.07349e-5
  [44, 49]  =  0.0
  [45, 49]  =  0.0
  [46, 49]  =  0.0
  [47, 49]  =  0.0
  [48, 49]  =  0.000135153
  [49, 49]  =  0.00680272

In [10]:
#chi2
top_chi2= BCBIStats.COOccur.chi2_mat(top_occ, min_freq=0);

49×49 LowerTriangular{Float64,Array{Float64,2}}:
    0.0             ⋅         …     ⋅         ⋅          ⋅       ⋅ 
   45.3346         0.0              ⋅         ⋅          ⋅       ⋅ 
    2.60182        0.0484257        ⋅         ⋅          ⋅       ⋅ 
    2.97698       55.7321           ⋅         ⋅          ⋅       ⋅ 
   23.7799         3.30606          ⋅         ⋅          ⋅       ⋅ 
    0.300441      23.4567     …     ⋅         ⋅          ⋅       ⋅ 
    0.567171       1.00305          ⋅         ⋅          ⋅       ⋅ 
  898.938          0.198816         ⋅         ⋅          ⋅       ⋅ 
   36.411          7.16654          ⋅         ⋅          ⋅       ⋅ 
    0.000945053    0.290322         ⋅         ⋅          ⋅       ⋅ 
    0.104811      16.7425     …     ⋅         ⋅          ⋅       ⋅ 
   10.5051         7.25461          ⋅         ⋅          ⋅       ⋅ 
   21.1911         0.0741725        ⋅         ⋅          ⋅       ⋅ 
    ⋮                         ⋱    ⋮                               

In [175]:
using NetworkLayout.Circular
using LightGraphs

#Build network layout from adjacency matrix (handles co-occurrance)
#Returns the coordinates of nodes in the layout
circular_net = NetworkLayout.Circular.layout(coo_sp);
G = Graph(coo_sp - spdiagm(diag(coo_sp)))


{49, 856} undirected simple Int64 graph

In [226]:
# Bezier mess:
#Trabslated from python example plohttps://plot.ly/python/chord-diagram/
function dist(A,B)
    norm(A -B)
end

"""
Returns the index of the interval the distance d belongs to
"""
function get_idx_interv(d, D)
    k=1
    while(d>D[k])
        k+=1
    end
    return  k-1
end
    
"""
Returns the point corresponding to the parameter t, on a Bézier curve of control points given in the list b
"""
function deCasteljau(b,t)
    N=length(b) 
    if(N<2)
        error("The  control polygon must have at least two points")
    end
    a=copy(b) #shallow copy of the list of control points 
    for r=1:N
        a[1:N-r,:]=(1-t)*a[1:N-r,:]+t*a[2:N-r+1,:]
    end
    return a[1,:][1]
end

"""
Returns an array of shape (nr, 2) containing the coordinates of nr points evaluated on the Bézier curve, 
at equally spaced parameters in [0,1].
"""
function BezierCv(b; nr=5)
    t=linspace(0, 1, nr)
    bp = Array{Float64}(nr, 2)
    for k=1:nr
        bp[k,:] = deCasteljau(b, t[k])
    end
    return bp
end

#unit circle control points angles 0,π/4 π/2,3π/4,π
cpoints = Array{Array{Float64}}(5)
cpoints[1] = [1, 0]
cpoints[2] = sqrt(2)/2.* [1, 1]
cpoints[3] = [0, 1]
cpoints[4] = sqrt(2)/2.* [-1, 1]
cpoints[5] = [-1, 0]

thresh_dist = [dist(cpoints[1], cpoints[i]) for i=1:5]
params=[1.2, 1.5, 1.8, 2.1]



4-element Array{Float64,1}:
 1.2
 1.5
 1.8
 2.1

In [334]:
#plot
labels = mesh_names[count_perm[2:topn]];

max_val = maximum(coo_sp - spdiagm(diag(coo_sp)))

# colors = distinguishable_colors(length(labels), RGB(1,0,0))
locs_x = map( (point)->point[1], circular_net )
locs_y = map( (point)->point[2], circular_net )

traces=Array{PlotlyJS.GenericTrace{Dict{Symbol,Any}},1}()


# Bezier curves:
for e in edges(G)
    A = [ locs_x[e.src], locs_y[e.src]]
    B = [ locs_x[e.dst], locs_y[e.dst]]
    d= dist(A, B)
    K= get_idx_interv(d, thresh_dist)
    b= [A, A/params[K], B/params[K], B]
    pts= BezierCv(b, nr=5)
#     println(pts)
#     println("------")
    
    line_trace =scatter(x=pts[:,1],
                     y=pts[:,2],
                     mode="lines",
                     line=attr(shape="spline", color="rgba(0,51,181, 0.85)",
                               width=5*coo_sp[e.src, e.dst]./max_val#The  width is proportional to the edge weight
                              ), 
                    hoverinfo="none"
                   )
    push!(traces, line_trace)
end

node_trace = scatter(;x=locs_x, y=locs_y, mode="markers",
                marker=attr(symbol="circle-open", size=mesh_counts[count_perm[2:topn]]/50, color="rgba(0,51,181, 0.85)"),
                hoverinfo="none"
                ) #, mode="text",textposition="top", text=labels, opacity=0.8)

push!(traces, node_trace)


all_an = []
for pid = 1:length(locs_x)
    annot = attr(x=locs_x[pid],y=locs_y[pid],xref="x", yref="y", text=labels[pid], textangle=-atan(locs_y[pid]./locs_x[pid]).*180/π,
                    showarrow=true,  ax=locs_x[pid]*130, ay=-locs_y[pid]*130, arrowhead = 6)
    push!(all_an, annot)
end


layout = Layout(showlegend=false, width=900, height=1100, showgrid=false,
                xaxis=attr(showline=false, zeroline=false, showgrid=false, showticklabels=false),
                yaxis=attr(showline=false, zeroline=false, showgrid=false, showticklabels=false),
                annotations = all_an)

plot(traces, layout)

In [257]:
atan(locs_y[7]/locs_x[7])*180/pi

In [216]:
t_all=linspace(0, 1, 5)
b = [[1.0, 0.0], [0.833333, 0.0], [0.826492, 0.106564], [0.99179, 0.127877]]
# deCasteljau(b, t[1])
t=t_all[1]
N=length(b) 
if(N<2)
    error("The  control polygon must have at least two points")
end
a=copy(b) #shallow copy of the list of control points 
for r=1:N
    a[1:N-r,:]=(1-t)*a[1:N-r,:]+t*a[2:N-r+1,:]
end
a[1,:]

1-element Array{Array{Float64,1},1}:
 [1.0, 0.0]

In [186]:
# #co-occurrance matrix is symetric - we are only intersted in
# # non-repeating, off-diagonal elements
# # co_occur_nr = (coo_sp - spdiagm(diag(coo_sp)))
# m1, m2 =findnz(LowerTriangular(coo_sp - spdiagm(diag(coo_sp))))
# x0 = locs_x[m1]
# y0 = locs_y[m1]
# x1 = locs_x[m2]
# y1 = locs_y[m2]
# weights = nonzeros(sparse(coo_sp - LowerTriangular(coo_sp)))
# # for i in eachindex(co_occur_nr)
# #     println(i)
# #     println("----")
# # #     push!(x0, locs_x[i[1]])
# # #     push!(y0, locs_y[i[1]])
# # #     push!(x1, locs_x[i[2]])
# # #     push!(y1, locs_y[i[2]])
# # end
        
# # line_trace  = PlotlyJS.line(x0[1:10], x1[1:10], y0[1:10], y1[1:10]; opacity=0.7, line_width = (weights./maximum(weights))[1:10], shape="spline")
# lines = []
# #represent edges as lines in a scatter
# for i=1:10 
#     x0 = locs_x[edge.source]
#     y0 = locs_y[edge.source]
#     z0 = locs_z[edge.source]

#     x1 = locs_x[edge.target]
#     y1 = locs_y[edge.target]
#     z1 = locs_z[edge.target]

#     t = PlotlyJS.scatter(;x=[x0, x1], y=[y0, y1], z=[z0, z1],
#     mode="lines", opacity=0.8, line=attr(color="grey", width=2))
#     push!(traces, t)
# end
# lines.append(Scatter(x=pts[:,0],
#                          y=pts[:,1],
#                          mode='lines',
#                          line=Line(color=color, 
#                                   shape='spline',
#                                   width=Weights[j]/5#The  width is proportional to the edge weight
#                                  ), 
#                         hoverinfo='none' 
#                        )
    
# layout = Layout(shapes=line_trace, showlegend=false, width=600, height=800, showgrid=false,
#                 xaxis=attr(showline=false, zeroline=false, showgrid=false, showticklabels=false),
#                 yaxis=attr(showline=false, zeroline=false, showgrid=false, showticklabels=false))

# plot([node_trace], layout)

In [148]:
weights = nonzeros(sparse(coo_sp - LowerTriangular(coo_sp)))
maximum(weights)

In [180]:
for e in edges(G)
       println(e.src)
end

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12

In [11]:
using LightGraphs

function coo2LG(coo)
    #co-occurrance matrix is symetric - we are only intersted in
    # non-repeating, off-diagonal elements
    co_occur_nr = (coo - spdiagm(diag(coo)))

    ind = findn(co_occur_nr .>0)
    g = Graph(size(co_occur_nr, 1))


    for i in range(1,length(ind[1]))
        row = ind[1][i]
        col = ind[2][i]
        if col < row
            # println(row, col)
            add_edge!(g,row,col)
        end
    end

    return g

end

coo2LG (generic function with 1 method)

In [12]:
G = coo2LG(coo_sp)

{49, 856} undirected simple Int64 graph

In [28]:
using NetworkLayout.Circular
a = adjacency_matrix(G) # generates a sparse adjacency matrix
network = NetworkLayout.Circular.layout(a) # generate 2D layout
# using GraphPlot
# gplot(G)

49-element Array{GeometryTypes.Point{2,Float64},1}:
 [1.0, 0.0]            
 [0.99179, 0.127877]   
 [0.967295, 0.253655]  
 [0.926917, 0.375267]  
 [0.871319, 0.490718]  
 [0.801414, 0.598111]  
 [0.718349, 0.695683]  
 [0.62349, 0.781831]   
 [0.518393, 0.855143]  
 [0.404783, 0.914413]  
 [0.284528, 0.958668]  
 [0.1596, 0.987182]    
 [0.0320516, 0.999486] 
 ⋮                     
 [0.0320516, -0.999486]
 [0.1596, -0.987182]   
 [0.284528, -0.958668] 
 [0.404783, -0.914413] 
 [0.518393, -0.855143] 
 [0.62349, -0.781831]  
 [0.718349, -0.695683] 
 [0.801414, -0.598111] 
 [0.871319, -0.490718] 
 [0.926917, -0.375267] 
 [0.967295, -0.253655] 
 [0.99179, -0.127877]  

In [None]:
g2 = Graph(coo_sp)
a2 = adjacency_matrix(G)

In [27]:
network2 = NetworkLayout.Circular.layout(coo_sp)

49-element Array{GeometryTypes.Point{2,Float64},1}:
 [1.0, 0.0]            
 [0.99179, 0.127877]   
 [0.967295, 0.253655]  
 [0.926917, 0.375267]  
 [0.871319, 0.490718]  
 [0.801414, 0.598111]  
 [0.718349, 0.695683]  
 [0.62349, 0.781831]   
 [0.518393, 0.855143]  
 [0.404783, 0.914413]  
 [0.284528, 0.958668]  
 [0.1596, 0.987182]    
 [0.0320516, 0.999486] 
 ⋮                     
 [0.0320516, -0.999486]
 [0.1596, -0.987182]   
 [0.284528, -0.958668] 
 [0.404783, -0.914413] 
 [0.518393, -0.855143] 
 [0.62349, -0.781831]  
 [0.718349, -0.695683] 
 [0.801414, -0.598111] 
 [0.871319, -0.490718] 
 [0.926917, -0.375267] 
 [0.967295, -0.253655] 
 [0.99179, -0.127877]  

In [None]:
using BCBIVizUtils
using Colors

In [None]:
labels = mesh_names[count_perm[2:topn]];
colors = distinguishable_colors(length(labels), RGB(1,0,0))


In [None]:
graph = BCBIVizUtils.coo2graph(coo_sp);


In [None]:
# BCBIVizUtils.graph2plotlyjs2D(graph, labels, colors)

In [None]:
# BCBIVizUtils.graph2plotlyjs3D(graph, labels, colors)

In [1]:
using D3Magic

In [2]:
d3"""
<style>

body {
  font: 10px sans-serif;
}

.group-tick line {
  stroke: #000;
}

.ribbons {
  fill-opacity: 0.67;
}

</style>
<svg width="960" height="960"></svg>
<script src="https://d3js.org/d3.v4.min.js"></script>
<script>

window.headwayVsRidership=[
  [11975,  5871, 8916, 2868],
  [ 1951, 10048, 2060, 6171],
  [ 8010, 16145, 8090, 8045],
  [ 1013,   990,  940, 6907]
];

var matrix = [
  [11975,  5871, 8916, 2868],
  [ 1951, 10048, 2060, 6171],
  [ 8010, 16145, 8090, 8045],
  [ 1013,   990,  940, 6907]
];

matrix = window.headwayVsRidership

var svg = d3.select("svg"),
    width = +svg.attr("width"),
    height = +svg.attr("height"),
    outerRadius = Math.min(width, height) * 0.5 - 40,
    innerRadius = outerRadius - 30;

var formatValue = d3.formatPrefix(",.0", 1e3);

var chord = d3.chord()
    .padAngle(0.05)
    .sortSubgroups(d3.descending);

var arc = d3.arc()
    .innerRadius(innerRadius)
    .outerRadius(outerRadius);

var ribbon = d3.ribbon()
    .radius(innerRadius);

var color = d3.scaleOrdinal()
    .domain(d3.range(4))
    .range(["#000000", "#FFDD89", "#957244", "#F26223"]);

var g = svg.append("g")
    .attr("transform", "translate(" + width / 2 + "," + height / 2 + ")")
    .datum(chord(matrix));

var group = g.append("g")
    .attr("class", "groups")
  .selectAll("g")
  .data(function(chords) { return chords.groups; })
  .enter().append("g");

group.append("path")
    .style("fill", function(d) { return color(d.index); })
    .style("stroke", function(d) { return d3.rgb(color(d.index)).darker(); })
    .attr("d", arc);

var groupTick = group.selectAll(".group-tick")
  .data(function(d) { return groupTicks(d, 1e3); })
  .enter().append("g")
    .attr("class", "group-tick")
    .attr("transform", function(d) { return "rotate(" + (d.angle * 180 / Math.PI - 90) + ") translate(" + outerRadius + ",0)"; });

groupTick.append("line")
    .attr("x2", 6);

groupTick
  .filter(function(d) { return d.value % 5e3 === 0; })
  .append("text")
    .attr("x", 8)
    .attr("dy", ".35em")
    .attr("transform", function(d) { return d.angle > Math.PI ? "rotate(180) translate(-16)" : null; })
    .style("text-anchor", function(d) { return d.angle > Math.PI ? "end" : null; })
    .text(function(d) { return formatValue(d.value); });

g.append("g")
    .attr("class", "ribbons")
  .selectAll("path")
  .data(function(chords) { return chords; })
  .enter().append("path")
    .attr("d", ribbon)
    .style("fill", function(d) { return color(d.target.index); })
    .style("stroke", function(d) { return d3.rgb(color(d.target.index)).darker(); });

// Returns an array of tick angles and values for a given group and step.
function groupTicks(d, step) {
  var k = (d.endAngle - d.startAngle) / d.value;
  return d3.range(0, d.value, step).map(function(value) {
    return {value: value, angle: value * k + d.startAngle};
  });
}

</script>
"""

In [None]:
d3"""
<g></g>

<script>
d3.select("g").text("Hello World");
</script>
"""