Degree Correlation

Wang Cheng-Jun edited this page Dec 19, 2016 · 1 revision

计算传播学是计算社会科学的重要分支。它主要关注人类传播行为的可计算性基础,以传播网络分析、传播文本挖掘、数据科学等为主要分析工具,(以非介入地方式)大规模地收集并分析人类传播行为数据,挖掘人类传播行为背后的模式和法则,分析模式背后的生成机制与基本原理,可以被广泛地应用于数据新闻和计算广告等场景,注重编程训练、数学建模、可计算思维。

Clone this wiki locally

Table of Contents

Scaling of degree correlations and its influence on diffusion in scale-free networks

test

Lazaros K. Gallos, Chaoming Song, Hernán A Makse, 2008. Scaling of Degree Correlations and Its Influence on Diffusion in Scale-Free Networks. Physical Review Letters 100(24):248701 Impact Factor: 7.51 DOI: 10.3731/topologica.2.020 [1]

Hernán A Makse http://www-levich.engr.ccny.cuny.edu/webpage/hmakse/software-and-data/

Internet data: The DIMES project Internet As data: http://data.caida.org/datasets/as-relationships/serial-1/

Chaoming, Song. 2011 Self-similarity and Scaling Theory of Complex Networks. Proquest. Google books link

PPT: :File:Degree correlations. Gallos_CT.pdf

Joint degree distribution

<math>P(k_1, k_2) </math> denotes the probability of two nodes of degree <math>k_1</math> and <math>k_2</math> are connected to each other. If no degree correlation: <math>P(k_1, k_2) \sim k_1 P(k_1) k_2 P(k_2) </math>

For scale free networks:<math>P(k_1, k_2) \sim k_1 P(k_1) k_2 P(k_2) \sim k_1^{1-\gamma} k_2^{1-\gamma} </math>

Density conservation law

<math>\int P(k_1, k_2) d k_2 = k_1 P(k_1)</math>

等式两边表示的都是度为<math>k_1</math>的节点一共连了多少条边的概率!!!

密度守恒定律,指的是边密度从度分布算和联合度分布算结果总是一样的。

对无标度网络而言,满足 <math>P(k) \sim k^{-\gamma}</math>,所以边密度守恒的公式可以表达为:

<math>\int P(k_1, k_2) d k_2 = k_1 P(k_1) \sim k_1^{-(\gamma -1) }</math>

硬猜<math>P(k_1, k_2)</math>

缩略图

The statistical similarity of the corresponding plots suggests the invariance of <math>P(k_1, k_2)</math>. Accordingly, this suggests that the <math>k_1</math> and <math>k_2</math> dependence can be separated, and the behavior of the tail of the joint degree distribution is

<math>P(k_1, k_2) \sim k_1^{-(\gamma -1 )} k_2^{-\epsilon}</math> (1)

for completely random networks:

<math>P(k_1, k_2) \sim k_1 P(k_1) k_2 P(k_2) \sim k_1^{1-\gamma} k_2^{1-\gamma}</math> (2)

Using Eq. (1), we can see that:

  1. For low-degree nodes (<math>k_1 > k_2</math>), integrating over <math>k_2</math>, we retrieve the <math>k^{1-\gamma}</math> dependence.
  2. For the case of hubs, where integration is over <math>k_1</math>, the dependence on the degree is <math>k^{-\epsilon}</math>
600px

Maslov, S. & Sneppen, K. 2002. Specificity and stability in topology of protein networks. Science[2]

The protein interaction network is a representative of the broad class of scale-free networks (4–6 ) in which the number of nodes with a given number of neighbors (connectivity) <math>K</math>scales as a power law 􏸙 <math>K^{-\gamma}</math>􏸖. In our case the histogram of connectivities can be fitted by a power law with 􏸖 􏸡<math>\gamma = 2.5 \pm 0.3</math> for <math>K</math> ranging from 2 to about 100 (7, 8).

  1. <math>\gamma = 2.5 \pm 0.3</math>
  2. <math>\gamma -1 = 1.5 \pm 0.3</math>

<math>E_b(k)</math>

测量度相关的方式很多,Newman提出的度相关系数、<math>k_{nn}</math>等,但是不幸的是它们随着重整化并非不变的(invariant)。<math>E_b(k)</math>可以做到伴随着重整化不变(但要求是无标度网络才行)。

<math>E_b(k) \equiv \frac{\int_{bk}^{\infty} P(k | k') dk'}{ \int_{bk}^{\infty} P( k') dk' } =\frac{ \int_{bk}^{\infty} \frac{P(k, k') }{k' P(k') } dk'} {\int_{bk}^{\infty} P( k') dk'}</math>

400px

  1. 分子是两节点度为k和大于bk的链接比例;
  2. 分母是度大于bk的节点的比例
我们知道对无标度网络而言,满足 <math>P(k) \sim k^{-\gamma}</math>,所以边密度守恒的公式可以表达为:<math>\int P(k_1, k_2) d k_2 = k_1 P(k_1) \sim k_1^{-(\gamma -1) }</math>

以下哪一个推导正确:

推导1:<math>P(k | k') = \frac{P(k, k') }{\int P(k, k') dk} = \frac{k^{-(\gamma-1)} k'^{-\epsilon}}{k'^{1-\gamma}} = k^{-(\gamma -1)} k'^{-(1+\epsilon - \gamma)}</math>

已知度为k的节点存在的概率为p(k),那么连边的一头连度为k的概率为kP(k)。就是P(k,k')是连边存在的概率,p(k)为度值为k的节点存在的概率,这个没法放到分母下面,放到分布下面的是度值为k的连边存在的概率,这个值还有乘以k才对。

推导2:<math>P(k | k') = \frac{P(k, k') }{P(k')} = \frac{k^{-(\gamma-1)} k'^{-\epsilon}}{k'^{-\gamma}} = k^{-(\gamma -1)} k'^{-\epsilon + \gamma}</math>

缩略图

计算<math>E_b(k)</math>: 算法与代码














  • 将链接表达为<math>(k, k')</math>的形式
    • 使得其中的 <math>k' > k</math>
    • 选择 <math>b = 3</math>
  • 计算:
    • 对于每一个<math>(k, k')</math>, 计算<math>k' > 3k, k = k</math>的链接数量<math>N_{kk'}</math>
    • 计算<math>P(k, k') = \frac{N_{kk’}}{Number\;of\;edges}</math>
    • 对于每一个度k', 当<math>k' > 3k</math>时, 计算其比例<math>P(k')</math>和<math>k' P(k')</math>
    • 计算<math>P_1 = \sum \frac{P(k, k')}{k' P(k')}</math>和<math>P_2 = \sum P(k')</math>。
    • 计算 <math>E_b (k) = P_1/P_2 = \sum \frac{P(k, k')}{k' P(k')} / \sum P(k')</math>
%matplotlib inline
import networkx as nx
import matplotlib.cm as cm
import matplotlib.pyplot as plt
from collections import defaultdict

def ebkk(g, b):
    edge_dict = defaultdict(lambda: defaultdict(int))
    degree_dict = defaultdict(int)
    edge_degree = [sorted(g.degree(e).values()) for e in g.edges()]
    for e in edge_degree:
        edge_dict[e[0]][e[-1]] +=1
    for i in g.degree().values():
        degree_dict[i] +=1
    edge_number = g.number_of_edges()
    node_number = g.number_of_nodes()
    ebks, ks = [], []
    for k1 in edge_dict:
        p1, p2 = 0, 0
        for k2 in edge_dict[k1]:
            if k2 >= b*k1:
                pkk = float(edge_dict[k1][k2])/edge_number
                pk2 = float(degree_dict[k2])/node_number
                k2pk2 = k2*pk2
                p1 += pkk/k2pk2
                p2 += pk2
        if p2 > 0:
            ebks.append(p1/p2)
            ks.append(k1)
    return ebks, ks

  1. http://snap.stanford.edu/data/ca-CondMat.html
ca = nx.Graph() with open ('/Users/chengjun/bigdata/ca-CondMat.txt') as f: for line in f: if line[0] != '#': x, y = line.strip().split('\t') ca.add_edge(x,y) nx.info(ca) ebk, k = ebkk(ca, b=3) plt.plot(k,ebk,'r^') plt.xlabel(r'$k$', fontsize = 16) plt.ylabel(r'$E_b(k)$', fontsize = 16) plt.xscale('log') plt.yscale('log') plt.show()

600px

不再采用作者的思路,而是直接计算每一个度k所在的边中另一个度大于等于bk的比例就是ebk,为了减少数据里的波动,采用<math>\sum_{bk}^{\infty} N(k') </math>进行归一化。

def ebw(g, b):
    edge_degree = [sorted([g.degree(i), g.degree(j)]) for i, j in g.edges()]
    degree_dict = defaultdict(int)
    for i in g.degree().values():
        degree_dict[i] +=1
    node_number = g.number_of_nodes()
    edge_number = g.number_of_edges()
    ebks, ks, kv = [],[], set(g.degree().values())
    for k in kv:
        ek, ekk, k_node_num = 0, 0, 0
        for k1, k2 in edge_degree:
            if k1 == k or k2 == k:
                ek +=1
            if k1 ==k and k2 >= b*k:
                ekk += 1
        ebk = float(ekk)/ek 
        for ki in degree_dict:
            if ki>=b*k:
                k_node_num += float(degree_dict[ki])
        #k_node_num = float(degree_dict[k])
        if ebk != 0:
            ebks.append(ebk/k_node_num)
            ks.append(k)
    return ebks, ks

import numpy as np

def log_binning(x, y, bin_count=35):
    max_x = np.log10(max(x))
    max_y = np.log10(max(y))
    max_base = max([max_x,max_y])
    xx = [i for i in x if i>0]
    min_x = np.log10(np.min(xx))
    bins = np.logspace(min_x,max_base,num=bin_count)
    bin_means_y = (np.histogram(x,bins,weights=y)[0] / np.histogram(x,bins)[0])
    bin_means_x = (np.histogram(x,bins,weights=x)[0] / np.histogram(x,bins)[0])
    return bin_means_x,bin_means_y

800px

[3] :File:scaling_of_degree_correlation.pdf

参考文献