# Analysing bat data

Hello Javier and Christoph.

This is a Jupyter notebook. It's like `Shiny` for `R`, basically a frame that allows a person to 
mix markdown, code and plots.

In the page beneath there is a lot of code. You can jump it if you want, i commented what is done and the graphs produced. Comments are on white background, code in grey.

Also, this is of course online, which is why i'm not referring to the species name.

## Index

### I. Pairwise relatedness across groups.

### II. Roositng association verus genetic proximity.

### III. Distribution of Parent-Offspring, Full-Sib and Half-Sib relations across groups.

### IV. Related pairs Within and Across groups


The first thing is to import the modules that will be necessary.

In [1]:
import numpy as np
import pandas as pd
import itertools as it


import plotly
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from plotly.graph_objs import *
import plotly.figure_factory as ff

from scipy.stats import norm
from scipy import stats

from sklearn.neighbors import KernelDensity
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale

from scipy.stats.stats import pearsonr 
import scipy.stats as st

from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
init_notebook_mode(connected=True)

import collections

def recursively_default_dict():
        return collections.defaultdict(recursively_default_dict)



## I. Pairwise relatedness across groups

Are groups more related among themselves than when the whole colony is considered?

I use the output of ML relate in the Genpop file that is also in this directory.

In [2]:
#### read relatedness list output of MLrelate on Seville individuals
## Header was removed from file

def read_headless_REL(filename):
    
    Relations= {}
    
    Ind_asso= recursively_default_dict()
    
    Input= open(filename,'r')
    
    for line in Input:
        line= line.split()
        
        Relations[(line[0],line[1])]= float(line[2])
        Ind_asso[line[0]][line[1]]= float(line[2])
        Ind_asso[line[1]][line[0]]= float(line[2])
    
    Input.close()
    
    return Relations, Ind_asso



In [3]:
## Read, get individuals organise them by group.
# specific to seville, since the three groups are encoded in the third and fourth letters of their names.

rel_list= 'Output-RelatednessList.txt'

## read file
Rel_tuples,Ind_asso= read_headless_REL(rel_list)

## get individuals (keep the same order across analyses)
Seville_order= [x for x in Ind_asso.keys()]

## get dictionaries of group by individual and individual by group
# makes things easier below.
bat_gp= recursively_default_dict()

for bat in Seville_order:
    bat_gp[bat]= bat[2:4]

### names by group
gp_bats= {w:[x for x in bat_gp.keys() if bat_gp[x] == w] for w in list(set(bat_gp.values()))}

### Indexes of Inds by group along name vector (Seville_order).
gp_index= {gp: [x for x in range(len(Seville_order)) if bat_gp[Seville_order[x]] == gp] for gp in gp_bats.keys()}


print('groups present: {}'.format([x for x in gp_bats.keys()]))

groups present: ['SR', 'SE', 'SH']


In [4]:
print([len(x) for x in gp_bats.values()])
print(sum([len(x) for x in gp_bats.values()]))

[28, 28, 28]
84


In [64]:
## We'll first look at the distribution of genetic distances across the colony and within roosting groups.
## First get genetic distance matrix:

fig_box_rel= []

###########################
### for Seville
Sev_sim_matrix= [[Ind_asso[y][x] for x in Seville_order] for y in Seville_order]
Sev_sim_matrix= np.array(Sev_sim_matrix)
iuSev= np.triu_indices(Sev_sim_matrix.shape[0],1)
Sev_sim_vector= Sev_sim_matrix[iuSev]

fig_box_rel.append(go.Box(
    y= Sev_sim_vector,
    name= 'Seville',
    marker= dict(
    color= 'blue'
    )
))
###########################
### SE 
SE_sim_matrix= [[Ind_asso[y][x] for x in gp_bats['SE']] for y in gp_bats['SE']]
SE_sim_matrix= np.array(SE_sim_matrix)
iuSE= np.triu_indices(SE_sim_matrix.shape[0],1)
SE_sim_vector= SE_sim_matrix[iuSE]

fig_box_rel.append(go.Box(
    y= SE_sim_vector,
    name= 'SE',
    marker= dict(
    color= 'blue'
    )
))
###########################
### SR
SR_sim_matrix= [[Ind_asso[y][x] for x in gp_bats['SR']] for y in gp_bats['SR']]
SR_sim_matrix= np.array(SR_sim_matrix)
iuSR= np.triu_indices(SR_sim_matrix.shape[0],1)
SR_sim_vector= SR_sim_matrix[iuSR]

fig_box_rel.append(go.Box(
    y= SR_sim_vector,
    name= 'SR',
    marker= dict(
    color= 'blue'
    )
))
###########################
### SH
SH_sim_matrix= [[Ind_asso[y][x] for x in gp_bats['SH']] for y in gp_bats['SH']]
SH_sim_matrix= np.array(SH_sim_matrix)
iuSH= np.triu_indices(SH_sim_matrix.shape[0],1)
SH_sim_vector= SH_sim_matrix[iuSH]

fig_box_rel.append(go.Box(
    y= SH_sim_vector,
    name= 'SH',
    marker= dict(
    color= 'blue'
    )
))
#####

layout= go.Layout(
    yaxis= dict(
        title= 'relatedness',
        range= [-.5,1]
    ),
    title= 'Pairwise relatedness Within and Across Groups',
    
    xaxis= dict(
    title= 'groups'
    )
)

fig = go.Figure(data=fig_box_rel,layout= layout)
iplot(fig)

**Fig. 1** Distribution of pairwise relatedness values across subsets of the data. Only pairwise-relationship values among individuals within each group are used. 

This had already been done. Here, the question the is whether any group is significantly more related than another, and the answer is no.

Let's run a PCA on the dissimilarity matrix to see what we get.

In [55]:
#### before we move on to the distance matrices we can have a little fun with the similarity matrix:

# begin by scaling the matrix
Sev_sim_matrix_scaled= scale(Sev_sim_matrix)
## PCA on vectors simulated
n_comp = 3

pca = PCA(n_components=n_comp, whiten=False,svd_solver='randomized').fit(Sev_sim_matrix_scaled)
features = pca.transform(Sev_sim_matrix_scaled)

print("; ".join(['PC{0}: {1}'.format(x+1,round(pca.explained_variance_ratio_[x],3)) for x in range(n_comp)]))
print('features shape: {}'.format(features.shape))

## Plot PCA
fig_data= [go.Scatter(
        x = features[gp_index[i],0],
        y = features[gp_index[i],1],
        #z = features[gp_index[i],2],
        type='scatter',
        mode= "markers",
        name= i,
        text= [Seville_order[x] for x in gp_index[i]],
        marker= {
        'line': {'width': 0},
        'size': 8,
        'symbol': 'circle',
      "opacity": .8
      }
    ) for i in gp_index.keys()]


layout = go.Layout(
    title= 'PCA on genetic similarity matrix drawn from MLrelate',
    xaxis= dict(
        title= "PC 1: {}".format(round(pca.explained_variance_ratio_[0],2))
    ),
    yaxis= dict(
        title= "PC 2: {}".format(round(pca.explained_variance_ratio_[1],2))
    )
)

fig = go.Figure(data=fig_data, layout=layout)
iplot(fig)

PC1: 0.084; PC2: 0.072; PC3: 0.057
features shape: (84, 3)


**Fig. 2** Principal component analysis of genetic dissimilarity matrix among Seville inds.

## II. Roost use and genetic similarity.

The matrix provided by Ana is in this directory. 

My idea was to check whether there was a relationship between roost use and genetic relatedness.

To compare both we will first read Ana's matrix and extract the vector of roost use similarity.

We then compare it to the vector of genetic similarity.

My goal was to then permute the the genetic vector to obtain a p-value for the correlation between those two vectors.


In [8]:
max(Sev_sim_vector)

0.67649999999999999

In [9]:
### read Ana's dissimilarity matrix, correlate it to genetic similarity matrix:

def read_ANA_matrix(filename):
    
    Rel_matrix= recursively_default_dict()
    
    Names= []
    Input= open(filename,"r")
    
    for line in Input:
        line= line.split()
        if line[0]== 'ID':
            Names= line[1:]
        else:
            for trace in range(1,len(line)):
                Rel_matrix[line[0]][Names[trace - 1]]= float(line[trace])
                Rel_matrix[Names[trace - 1]][line[0]]= float(line[trace])
    Input.close()
    
    return Rel_matrix

filename= 'Roost_sim_ANA.txt'
Anatree= read_ANA_matrix(filename)
Ana_names= Anatree.keys()

print('Ana included: {}'.format([x for x in Ana_names]))

Ana included: ['ESSE22', 'ESSE23', 'ESSH24', 'ESSH25', 'ESSH26', 'ESSH27', 'ESSH28', 'ESSE25', 'ESSE26', 'ESSE27', 'ESSR24', 'ESSR25', 'ESSR26', 'ESSR27', 'ESSR28']


In [10]:
### get ana's data onto matrix format, extract sorted vector to compare with others.

Ana_mat= np.array([[Anatree[y][x] for x in Ana_names] for y in Ana_names])
iuANA= np.triu_indices(Ana_mat.shape[0],1)
roost_diss_vector= Ana_mat[iuANA]

## get the same vector from genetic distances.
Ana_gen= np.array([[Ind_asso[y][x] for x in Ana_names] for y in Ana_names])
gen_dist_vector= Ana_gen[iuANA]

### now we normalize each and calculate Pearson's r.
roost_diss_vector= scale(roost_diss_vector)
#gen_dist_vector= scale(gen_dist_vector)

pear= pearsonr(roost_diss_vector,gen_dist_vector)[0]
## Now we plot one against the other before proceeding to a test of it's significance.

fig_gen_to_dist= [go.Scatter(
    x= roost_diss_vector,
    y= gen_dist_vector,
    type='scatter',
    mode= "markers",
    marker= {
    'line': {'width': 0},
    'size': 8,
    'symbol': 'circle',
  "opacity": .8
  }
)]

layout= go.Layout(
    title= 'Genetic versus roost use similarity, Pearson r: {}'.format(round(pear,3)),
    xaxis= dict(
        title= 'roost use similarity, normalized.'
    ),
    yaxis= dict(
        title= 'genetic similarity, normalized.'
    )
)


fig = go.Figure(data=fig_gen_to_dist, layout=layout)
iplot(fig)

**Fig. 3** Roost similarity versus Genetic similarity. Both vectors werer scaled.

This does not look like it will take us anywhere..

I don't think Pearson's r seems like a good idea here. Since, for one, all of Ana's bats appear highly related, 
and secondly, the roost sharing variable is nearly a factor.

However, it is apparent that bats that roost together often appear related above aerage. 

So, maybe the question here shouldn't be:
- "is there a relation between roost use and genetic relatedness?" 
but instead:
- "Are bats roosting together frequently more closely related than expected given our sample?"

We'll first see about transforming the roosting variable to a factor.
We'll start by plotting its density.


In [11]:
### roost distance
X_plot = np.linspace(min(roost_diss_vector) - 2, max(roost_diss_vector) + 2, 1000)

kde = KernelDensity(kernel='gaussian', bandwidth=0.2).fit(np.array(roost_diss_vector).reshape(-1,1))

log_dens = kde.score_samples(X_plot.reshape(-1,1))

fig_roost_dens= [go.Scatter(x=X_plot, y=np.exp(log_dens), 
                            mode='lines', fill='tozeroy', name= 'roost similarity',
                            line=dict(color='blue', width=2))]

### genetic distance

X_plot = np.linspace(min(gen_dist_vector) - 2, max(gen_dist_vector) + 2, 1000)

kde = KernelDensity(kernel='gaussian', bandwidth=0.2).fit(np.array(gen_dist_vector).reshape(-1,1))

log_dens = kde.score_samples(X_plot.reshape(-1,1))

fig_roost_dens.append(go.Scatter(x=X_plot, y=np.exp(log_dens), 
                            mode='lines', fill='tozeroy', name= 'gen similarity',
                            line=dict(color='red', width=2)))

layout= go.Layout(
    title= 'Genetic and roost use pairwise distances, normalized'
)

fig = go.Figure(data=fig_roost_dens, layout= layout)
iplot(fig)

**Fig. 4** Density plot of pairwise roost-use similarity and genetic distance among Ana's bats.


It appears we could use a **threshold of 1** on the normalized roost scores to differentiate who's sleeping together from who isn't.

We'll create a factor and look at this using a boxplot now.

In [12]:
factor_threshold= 1
Factor_roost= [int(roost_diss_vector[x] >= factor_threshold) for x in range(len(gen_dist_vector))]

y1= [gen_dist_vector[x] for x in range(len(gen_dist_vector)) if Factor_roost[x] == 0]
y2= [gen_dist_vector[x] for x in range(len(gen_dist_vector)) if Factor_roost[x] == 1]

test_stat= stats.ttest_ind(y1, y2, equal_var = False)

fig_box= [go.Box(
    y= [gen_dist_vector[x] for x in range(len(gen_dist_vector)) if Factor_roost[x] == i],
    name= ['not associated','associated'][i]
) for i in list(set(Factor_roost))]

layout= go.Layout(
    title= 'Genetic similarity by roost-sharing behaviour. P-value: {}'.format(round(test_stat[1],3))
)

fig = go.Figure(data=fig_box,layout= layout)
iplot(fig)

**Fig. 5** comparison of genetic similarity when roost between roost sharing pairs verus non-sharing pairs.

## III. Relations across groups

Here we will look at the distributions of pairwise relations revealed by MLrelate across and within groups.

Our question is wether related pairs of individuals are more likely to share the same group.

Our expectation will be based on the total pairs of individuals falling within groups across groups. We can correct this estimate by the relative size of the groups.

In [13]:
#### read from the relations file outputed by MLrelate.
## the file is modified to leave only the header of the info.

def read_relations(filename):
    
    Relations= recursively_default_dict()
    Input= open(filename,'r')
    d= 0
    
    for line in Input:
        line= line.split()
        if d== 0:
            d += 1
        else:
            Relations[line[0]][line[1]]= line[2]
            Relations[line[1]][line[0]]= line[2]
    
    Input.close()
    
    return Relations


In [14]:
Par_file= 'Output-RelationList.txt'
Par_dict= read_relations(Par_file)

Par_matrix= np.array([[['I',Par_dict[x][y]][int(x != y)] for y in Seville_order] for x in Seville_order])
Par_vector= Par_matrix[iuSev]

Group_share= np.array([[int(bat_gp[x] == bat_gp[y] and x != y) for y in Seville_order] for x in Seville_order])
Group_share_vector= Group_share[iuSev]

print('Possible relationships: {}'.format(list(set(Par_vector))))

Possible relationships: ['HS', 'U', 'FS', 'PO']


In [17]:
### We're intersted in PO and FS relationships.
Inter= ['PO','FS','HS','U']

expected= [0.644,0.356]

Annote= []
conf_int= recursively_default_dict()

for z in Inter:
    Prop= len([x for x in range(len(Par_vector)) if Par_vector[x] == z and Group_share_vector[x] ==1])
    Size= float(len([x for x in Par_vector if x == z]))
    
    P_more= 1 - binom.cdf(Prop,Size,expected[1])
    conf_int[z]= P_more
    Annote.append(dict(
        xref= z,
        x= Inter.index(z),
        y= 85,
        text= 'p-value= {}'.format(round(P_more,5)),
        showarrow= False
    ))


    
fig_within= [go.Bar(
    x= ['Parent-Offspring','Full-Sibs','Half-Sibs','Unrelated'],
    y= [len([x for x in range(len(Par_vector)) if Par_vector[x] == z and Group_share_vector[x] ==i]) * 100 / float(len([x for x in Par_vector if x == z])) for z in Inter],
    name= ['Across','Within'][i]
) for i in [0,1]]

layout = go.Layout(
    title= 'Relations identified within versus across groups',
    barmode='group'
)

layout["annotations"] = Annote

fig = go.Figure(data=fig_within, layout=layout) 
iplot(fig)


**Fig. 5** Proportion of Relationship classes estimated by ML relate within and across groups.

Here we can see that there are more Mother-Daughter pairs within groups than across them. 

This is a good result! Consider the probability of two individuals choosing the same group at random.

If we take the sizes of those groups to be the relative sizes of individuals among them:

`
P(SR)= 114 / 256 = 0.445
P(SE)= 61 / 256 = 0.238
P(SH)= 81 / 256 = 0.316
`

then, the probability P(O) of pairs of individuals falling across groups is:

`P(o)=Psr(Pse+Psh)+Pse(Psr+Psh)+Psh(Psr+Pse)` = **0.644**
 
and P(I), the probability of two pairs choosing the same group is:

`1 - P(O)` = **0.36**

Remark that this is very close to the proportions of possible pairs of individuals within and across groups (.325 and .675, see plot below).

We can see there is a relationship between Relationship class and choice of group, becoming more random as individuals become genetically distant. This relationship reaches the H0 equilibrium at around Half-sib level.


In [19]:
#expected = [len([x for x in Group_share_vector if x == z]) / float(len(Group_share_vector)) for z in list(set(Group_share_vector))]


fig_expected= [go.Bar(
    x= ['Across', 'Within'],
    y= [.644,.356]
)]

layout = go.Layout(
    title= 'Expected proportions'
)

fig = go.Figure(data=fig_expected,layout= layout)
iplot(fig)


**Fig. 6** Observed proportions of pairs of individuals within and among groups across across the entire Sevilla data set.

### IV. Related pairs Within and Across groups


In this section we will see about getting something more linear out pairs of related individuals across groups.

We'll refer back to the matrix of pairwise genetic distances and do the following:

For increasing values of *r* we will plot the proportion of pairs of individuals equally or more related to that that fall within the same group.

Also, we will plot the 95% confidence interval for that estimate along the value of r. This is important because that interval will expand as the number of individuals decreases.

The function used will be the `scipy.stats.binom.interval(quantile,n,prob)`


In [68]:
### the matrix is 'Sev_sim_vector'
import math
from scipy.stats import fisher_exact
from scipy.stats import binom

conf_interval= .05
Fisher_threshold= .05
step= .01

ub= []
Props= []
Rs= []
conf= []
Enes= []

for r in np.arange(min(Sev_sim_vector),.5,step):
    Classes = [int(x >= r) for x in Sev_sim_vector]
    within= len([x for x in range(len(Classes)) if Classes[x] == 1 and Group_share_vector[x] == 1])
    Prop_within= within / sum(Classes)
    
    Enes.append(sum(Classes))
    
    conf_int= binom.interval(1- 2*conf_interval,sum(Classes),expected[1])[1] / sum(Classes)
    
    ub.append(conf_int)
    
    Props.append(Prop_within)
    Rs.append(r)

## plot
fig_linear= [go.Scatter(
    x= Rs,
    y= Props,
    type='scatter',
    mode= "point",
    name= 'Prop. within',
    marker= {
    'line': {'width': 0},
    'size': 8,
    'symbol': 'circle',
  "opacity": .8
  }
)]

fig_linear.append(go.Scatter(
    x= Rs,
    y= ub,
    type='scatter',
    mode= "point",
    name= 'UP Binom; {}'.format(1-conf_interval),
    marker= {
    'size': 8,
    'symbol': 'circle',
  "opacity": .8
  }
))

layout= go.Layout(
    title= 'Proportion of pairs with r >= X that share group. step= {}'.format(step),
    xaxis= dict(
        title= 'r'
    ),
    yaxis= dict(
        title= 'Proportion of same-group pairs'
    ),
    shapes= [{
        'type': 'line',
        'x0': min(Sev_sim_vector),
        'y0': expected[1],
        'x1': max(Sev_sim_vector),
        'y1': expected[1],
        'line': {
            'color': 'red',
            'width': 4,
            'dash': 'dashdot',
        },
    }],
)

fig = go.Figure(data=fig_linear,layout= layout)
iplot(fig)

**Fig. 7** Proportion of pairs of individuals above a relatedness threshold that were captured within the same group. Upper confidence interval given an expected proportion derived from the total sample observed proportion. Blue: proportion of pairs within groups; Orange: 95% confidence interval; Red: expected proportion.


There, this is a result i believe we can present.

However, **Fig. 6** is subtly misleading. It answers the question *'do pairs above a certain threshold share a group more often?'*. The question *'How does group sharing evolve with relatedness?'* isn't answered.


The previous analysis includes every pair above a certain threshold at each step. Meaning that at a low value of *r* the estimate for the pairs considered is always inflated by the higher pairs (more related).

We should instead move along a *window of relatedness* so we can focus only on pairs of a given range. 

### Estimating window size

Before we can perform our moving estimate we must determine an appropriate window size.

The main consideration to bear in mind is the accuracy of our measures of deviation from the binomial model across windows of a given size. For a range of window sizes we will camculate the number of relations for every window and extract the mean and minimum numbers and their standard deviation.


In [49]:
### estimating window size.

shares= recursively_default_dict()
step= .01
concentrate= recursively_default_dict()

for window in np.arange(0.05,.2,.01):
    shares[window]= []
    concentrate[window]= []
    
    for r in np.arange(window,.5, step):
        window_index= [x for x in range(len(Sev_sim_vector)) if Sev_sim_vector[x] >= r - window and Sev_sim_vector[x] <= r+window]
        local_share= [Group_share_vector[x] for x in window_index]
        
        shares[window].append(len(local_share))
        concentrate[window].append(np.std([Sev_sim_vector[x] for x in window_index]))

fig_box= [go.Scatter(
    x= [x for x in sorted(shares.keys())],
    y= [np.median(shares[x]) for x in sorted(shares.keys())],
    name= 'median'
)]

fig_box.append(go.Scatter(
    x= [x for x in sorted(shares.keys())],
    y= [np.std(shares[x]) for x in sorted(shares.keys())],
    name= 'sd'
))

fig_box.append(go.Scatter(
    x= [x for x in sorted(shares.keys())],
    y= [np.min(shares[x]) for x in sorted(shares.keys())],
    name= 'min'
))

layout= go.Layout(
    title= 'Estimating window size',
    xaxis= dict(
    title= 'Window size'
    ),
    yaxis= dict(
    title= 'Relations per window: min, mean and sd'
    )
)

fig = go.Figure(data=fig_box,layout= layout)
iplot(fig)

**Fig. 7** Minium, median and standard deviation of sample size by window across window lengths.

In [50]:
fig_box= [go.Scatter(
    x= [x for x in sorted(shares.keys())],
    y= [np.median(concentrate[x]) for x in sorted(shares.keys())],
    name= 'median r'
)]

layout= go.Layout(
    title= 'Estimating window size',
    xaxis= dict(
    title= 'Window size'
    ),
    yaxis= dict(
    title= 'median r'
    )
)

fig = go.Figure(data=fig_box,layout= layout)
iplot(fig)



In [69]:
### the matrix is 'Sev_sim_vector'
import math
from scipy.stats import fisher_exact
from scipy.stats import binom

conf_interval= .05
Fisher_threshold= .05
window= .15
step= .01

ub= []
Props= []
Rs= []
conf= []
Enes= []



for r in np.arange(window / 2,.5, step):
    
    window_index= [x for x in range(len(Sev_sim_vector)) if Sev_sim_vector[x] >= r - window and Sev_sim_vector[x] <= r+window]
    local_share= [Group_share_vector[x] for x in window_index]
    if len(local_share) == 0:
        continue
    
    box_gen= [Sev_sim_vector[x] for x in window_index]
    
    within= len([x for x in range(len(local_share)) if local_share[x] == 1])
    Prop_within= within / len(local_share)
    
    #Enes.append(sum(Classes))
    
    conf_int= binom.interval(1 - conf_interval*2,len(local_share),expected[1])[1] / len(local_share)
    
    ub.append(conf_int)
    
    Props.append(Prop_within)
    Rs.append(r)

## plot
fig_linear= [go.Scatter(
    x= Rs,
    y= Props,
    type='scatter',
    mode= "point",
    name= 'Prop. within',
    marker= {
    'line': {'width': 0},
    'size': 8,
    'symbol': 'circle',
  "opacity": .8
  }
)]

fig_linear.append(go.Scatter(
    x= Rs,
    y= ub,
    type='scatter',
    mode= "point",
    name= 'UP Binom; {}'.format(1 - conf_interval),
    marker= {
    'size': 8,
    'symbol': 'circle',
  "opacity": .8
  }
))

layout= go.Layout(
    title= 'Proportion of pairs within the same group across relatdness lvl',
    xaxis= dict(
        title= 'relatedness, moving window: step= {}, window= {}'.format(step,window)
    ),
    yaxis= dict(
        title= 'Proportion of same-group pairs'
    ),
    shapes= [{
        'type': 'line',
        'x0': min(Sev_sim_vector),
        'y0': expected[1],
        'x1': max(Sev_sim_vector),
        'y1': expected[1],
        'line': {
            'color': 'red',
            'width': 4,
            'dash': 'dashdot',
        },
    }],
)

fig = go.Figure(data=fig_linear,layout= layout)
iplot(fig)

**Fig. 6** Moving estimate. Proportion and Binomial conf. interval for proportion of pairs within window (relatedness) that share group. Dashed red line indicates expected proportion.