## Implementation 1
### Feature Engineering
#### 1. SSC+ 

#### 2. SSC Inner

#### 3. SSC 

#### 4. Preferential Attachment Score 

#### 5. Total Friends


| Feature | Description | Note/Implication |
|:--------|:------------|:-----|
| SSC+   |    |      |
| SSC Inner      |            |      |
| SSC       |             |      |
| Preferential Attachment Score         |             |      |
| Total Friends       |             |      |
|         |             |      |

# Data 
#### File descriptions

- train.txt - the training graph adjacency lists (tab delimited; line per source node followed by sink neighbours; no header line)  
- test-public.txt - the test edge IDs (tab delimited; edge ID, source node, sink node; with header line)  
- sample.csv - a sample submission file in the correct format  

### Importing All Data 

In [1]:
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt


In [2]:
G =  nx.read_adjlist('data/train.txt', delimiter='\t', create_using=nx.DiGraph(), nodetype=int)

## Sampling the Data
For faster computational performance.

In [None]:
from sampler import random_sampler

num_sample = 2400000
sample_data = random_sampler('data/train.txt', num_sample)
G_sample =  nx.read_adjlist(sample_data, delimiter='\t', create_using=nx.DiGraph() )

### Getting Sample Statistics

In [None]:
from stats import get_network_statistics
get_network_statistics(G)

## Features

In [3]:

features = pd.read_csv('data_features.csv')


In [None]:

#features['SCC'] = 0 # features.apply(lambda row: SCC(row.source, row.target), axis=1)
#features['SCC_p'] = 0 # features.apply(lambda row: SCC_p(row.source, row.target), axis=1)
features['pref_attachment'] = features.apply(lambda row: pref_attachment(row.source, row.target), axis=1)
features['total_friends'] = features.apply(lambda row: total_friends(G, row.source, row.target), axis=1)
features['common_friends'] = features.apply(lambda row: common_friends(row.source, row.target), axis=1)

features['SCC_p'] = 0 # features.apply(lambda row: SCC_p(row.source, row.target), axis=1)
features['jacard_coef'] = features.apply(lambda row: jacard_coef(row.source, row.target), axis=1)
features['total_friends'] = 0 # features.apply(lambda row: total_friends(row.source, row.target), axis=1)


features.to_csv('data_features.csv')

In [4]:
features_sample = features.loc[:5,]
import time

t0= time.clock()
features_sample['pref_attachment'] =  pref_attach(features_sample['source'], features_sample['target'])
t1 = time.clock() 
print("Time elapsed: ", t1 - t0)
t0= time.clock()
features_sample['total_friends'] = features_sample.apply(lambda row: total_friends(row.source, row.target), axis=1)
t1 = time.clock() 
print("Time elapsed: ", t1 - t0)
t0= time.clock()
features_sample['common_friends'] = features_sample.apply(lambda row: common_friends(row.source, row.target), axis=1)
t1 = time.clock()
print("Time elapsed: ", t1 - t0)




NameError: name 'pref_attach' is not defined

In [5]:

# Defining intersection and union of lists.
def intersection(a, b):
    return list(set(a) & set(b))
def union(a, b):
    return list(set(a) | set(b))

# Trying to be consistent with notation from the paper to avoid confusion.
# Neighborhood functions
def Gamma(u):
    return union(G.successors(u),G.predecessors(u))
def Gamma_in(u):
    return G.predecessors(u)
def Gamma_out(u):
    return G.successors(u)
def Gamma_plus(u):
    return union(Gamma(u),{u})    


# SUBGRAPH FUNCTIONS
# I represent these in subgraph form rather than edge list, as specified in the
# paper. I think it will be more efficient this way, as we can use the nx.compose
# function and others.

# nh subgraph for vertex
def nh_subgraph_vertex(u):
    return G.subgraph(Gamma(u))

def nh_subgraph_vertex_plus(u):
    return G.subgraph(Gamma_plus(u))

# nh subgraph for edge
def nh_subgraph_edge(u,v):
    return G.subgraph(union(Gamma(u),Gamma(v)))

def nh_subgraph_edge_plus(u,v):
    return G.subgraph(union(Gamma(u),Gamma(v)))

def inner_subgraph(u,v):
    g1 = Gamma(u)
    g2 = Gamma(v)
    # some function that grabs a full edgelist
    # for some edge (a,b)
    g = nh_subgraph_edge(u,v)
    e = g.edges()
    inner = nx.DiGraph() # create empty subgraph structure
    for i in e:
        if ((i[0] in g1) and (i[1] in g2)) or ((i[1] in g1) and (i[0] in g2)):
            inner.add_edge(i[0],i[1]) # add this edge to inner
    return inner


# DEFINING FEATURES


# COMPUTATIONALLY FAST
# TOTAL FRIENDS
def total_friends(u,v):
    return len(union(Gamma(u),Gamma(v)))

# PREFERENTIAL ATTACHMENT
def pref_attach(u,v):
    return len(Gamma(u))*len(Gamma(v))

# COMMON FRIENDS
def common_friends(u,v):
    return len(intersection(Gamma(u),Gamma(v)))

# JACARD'S COEFFICIENT
def jacard_coef(u,v):
    return common_friends(u,v)/total_friends(u,v)

# FRIENDS MEASURE (slightly slower)
def friends_measure(u,v):
    counter = 0
    for i in Gamma(u):
        for j in Gamma(v):
            if (i == j) or (G.has_edge(i,j) == True) or (G.has_edge(j,i) == True):
                counter = counter + 1
    return counter


# COMPUTATIONALLY SLOW.
# scc nh subgraph
def scc_nh(u,v):
    return nx.number_strongly_connected_components(nh_subgraph_edge(u,v))  

# scc nh subgraph +
def scc_nh_plus(u,v):
    return nx.number_strongly_connected_components(nh_subgraph_edge_plus(u,v)) 

# scc inner subgraph
def scc_inner(u,v):
    return nx.number_strongly_connected_components(inner_subgraph(u,v))



In [7]:
t0= time.clock()

pref_attach(2926583,4380400 )
t1 = time.clock() 
print("Time elapsed: ", t1 - t0)

Time elapsed:  0.0001239999999995689


In [8]:
import time
features_sample = features.loc[:5,]


t0= time.clock()
features_sample['pref_attachment'] =  features_sample.apply(lambda row: pref_attach(row.source, row.target), axis=1)
t1 = time.clock() 
print("Time elapsed: ", t1 - t0)

Time elapsed:  17.948119000000005


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [12]:
UnsatisfiableError: The following specifications were found to be in conflict:
  - conda=4.0.8
  - entrypoints
Use "conda info <package>" to see the dependencies for each package.

Collecting alabaster
  Downloading https://files.pythonhosted.org/packages/6e/71/c3648cc2f675063dbe2d669004a59e4a5120172713a1de3c3b14144d4b31/alabaster-0.7.11-py2.py3-none-any.whl
Installing collected packages: alabaster
  Found existing installation: alabaster 0.7.10
    Uninstalling alabaster-0.7.10:
      Successfully uninstalled alabaster-0.7.10
Successfully installed alabaster-0.7.11
Collecting anaconda-project
[?25l  Downloading https://files.pythonhosted.org/packages/3f/6b/986e3731d08cb840bd02fead7b3556d26402bae77475a4793ea3ceda46b5/anaconda_project-0.8.2-py2.py3-none-any.whl (347kB)
[K    100% |████████████████████████████████| 348kB 4.7MB/s ta 0:00:01
Installing collected packages: anaconda-project
  Found existing installation: anaconda-project 0.8.0
    Uninstalling anaconda-project-0.8.0:
      Successfully uninstalled anaconda-project-0.8.0
Successfully installed anaconda-project-0.8.2
Collecting asn1crypto
[?25l  Downloading https://files.pythonhosted.org/packages/ea/c

Installing collected packages: bleach
  Found existing installation: bleach 2.0.0
    Uninstalling bleach-2.0.0:
      Successfully uninstalled bleach-2.0.0
Successfully installed bleach-2.1.4
Collecting bokeh
[?25l  Downloading https://files.pythonhosted.org/packages/07/1b/1bb751797f0bbbafc2642c629656ce158e7e7b7fb1110f449f7c320fb819/bokeh-0.13.0.tar.gz (16.0MB)
[K    100% |████████████████████████████████| 16.0MB 1.0MB/s eta 0:00:01
Building wheels for collected packages: bokeh
  Running setup.py bdist_wheel for bokeh ... [?25ldone
[?25h  Stored in directory: /Users/najla/Library/Caches/pip/wheels/05/3e/43/95ff0bde940a0a5d86ec13c22d2a4bddc97271cd788f441a63
Successfully built bokeh
Installing collected packages: bokeh
  Found existing installation: bokeh 0.12.10
    Uninstalling bokeh-0.12.10:
      Successfully uninstalled bokeh-0.12.10
Successfully installed bokeh-0.13.0
Collecting boto
[?25l  Downloading https://files.pythonhosted.org/packages/23/10/c0b78c27298029e4454a472a1919

[K    100% |████████████████████████████████| 5.3MB 1.8MB/s eta 0:00:01
[?25hInstalling collected packages: Cython
  Found existing installation: Cython 0.26.1
    Uninstalling Cython-0.26.1:
[31mCould not install packages due to an EnvironmentError: [Errno 2] No such file or directory: '/anaconda3/lib/python3.6/site-packages/__pycache__/cython.cpython-36.pyc'
[0m
Collecting cytoolz
[?25l  Downloading https://files.pythonhosted.org/packages/36/f4/9728ba01ccb2f55df9a5af029b48ba0aaca1081bbd7823ea2ee223ba7a42/cytoolz-0.9.0.1.tar.gz (443kB)
[K    100% |████████████████████████████████| 450kB 3.5MB/s ta 0:00:01
Building wheels for collected packages: cytoolz
  Running setup.py bdist_wheel for cytoolz ... [?25ldone
[?25h  Stored in directory: /Users/najla/Library/Caches/pip/wheels/88/f3/11/9817b001e59ab04889e8cffcbd9087e2e2155b9ebecfc8dd38
Successfully built cytoolz
[31mthinc 6.10.2 has requirement cytoolz<0.9,>=0.8, but you'll have cytoolz 0.9.0.1 which is incompatible.[0m
Install

Installing collected packages: Jinja2, Werkzeug, Flask
  Found existing installation: Jinja2 2.9.6
    Uninstalling Jinja2-2.9.6:
      Successfully uninstalled Jinja2-2.9.6
  Found existing installation: Werkzeug 0.12.2
    Uninstalling Werkzeug-0.12.2:
      Successfully uninstalled Werkzeug-0.12.2
  Found existing installation: Flask 0.12.2
    Uninstalling Flask-0.12.2:
      Successfully uninstalled Flask-0.12.2
Successfully installed Flask-1.0.2 Jinja2-2.10 Werkzeug-0.14.1
Collecting Flask-Cors
  Downloading https://files.pythonhosted.org/packages/d1/db/f3495569d5c3e2bdb9fb8a66c54503364abb6f35a9da2227cf5c9c50dc42/Flask_Cors-3.0.6-py2.py3-none-any.whl
Installing collected packages: Flask-Cors
  Found existing installation: Flask-Cors 3.0.3
    Uninstalling Flask-Cors-3.0.3:
      Successfully uninstalled Flask-Cors-3.0.3
Successfully installed Flask-Cors-3.0.6
Collecting gevent
[?25l  Downloading https://files.pythonhosted.org/packages/2b/a9/7c38605b9672a6ede6ccf822a645fdeec0c80f

Installing collected packages: ipykernel
  Found existing installation: ipykernel 4.6.1
    Uninstalling ipykernel-4.6.1:
      Successfully uninstalled ipykernel-4.6.1
Successfully installed ipykernel-4.8.2
Collecting ipython
[?25l  Downloading https://files.pythonhosted.org/packages/f7/62/2fef7db3a7b75e8099c3d9db2630ae5ba0b9eefefd91f7497862393d90e8/ipython-6.5.0-py3-none-any.whl (748kB)
[K    100% |████████████████████████████████| 757kB 3.2MB/s ta 0:00:01
Collecting backcall (from ipython)
  Downloading https://files.pythonhosted.org/packages/84/71/c8ca4f5bb1e08401b916c68003acf0a0655df935d74d93bf3f3364b310e0/backcall-0.1.0.tar.gz
Building wheels for collected packages: backcall
  Running setup.py bdist_wheel for backcall ... [?25ldone
[?25h  Stored in directory: /Users/najla/Library/Caches/pip/wheels/98/b0/dd/29e28ff615af3dda4c67cab719dd51357597eabff926976b45
Successfully built backcall
Installing collected packages: backcall, ipython
  Found existing installation: ipython 6.1.0

Successfully installed isort-4.3.4
Collecting jdcal
  Downloading https://files.pythonhosted.org/packages/a0/38/dcf83532480f25284f3ef13f8ed63e03c58a65c9d3ba2a6a894ed9497207/jdcal-1.4-py2.py3-none-any.whl
Installing collected packages: jdcal
  Found existing installation: jdcal 1.3
    Uninstalling jdcal-1.3:
      Successfully uninstalled jdcal-1.3
Successfully installed jdcal-1.4
Collecting jedi
[?25l  Downloading https://files.pythonhosted.org/packages/3d/68/8bbf0ef969095a13ba0d4c77c1945bd86e9811960d052510551d29a2f23b/jedi-0.12.1-py2.py3-none-any.whl (174kB)
[K    100% |████████████████████████████████| 184kB 2.4MB/s ta 0:00:01
[?25hCollecting parso>=0.3.0 (from jedi)
[?25l  Downloading https://files.pythonhosted.org/packages/09/51/9c48a46334be50c13d25a3afe55fa05c445699304c5ad32619de953a2305/parso-0.3.1-py2.py3-none-any.whl (88kB)
[K    100% |████████████████████████████████| 92kB 2.4MB/s ta 0:00:011
[?25hInstalling collected packages: parso, jedi
  Found existing installation:

[?25l  Downloading https://files.pythonhosted.org/packages/16/31/be98027f5cd909e698210092ffc7d2e339492bc82cc872557b05f2ba3546/lxml-4.2.4-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (8.7MB)
[K    100% |████████████████████████████████| 8.7MB 2.3MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
  Found existing installation: lxml 4.1.0
    Uninstalling lxml-4.1.0:
      Successfully uninstalled lxml-4.1.0
Successfully installed lxml-4.2.4
Collecting marshmallow
[?25l  Downloading https://files.pythonhosted.org/packages/67/7d/5435c399acecd4398d77ef31ea80e02cee5368599ce6a980f9014e8ec5fd/marshmallow-2.15.4-py2.py3-none-any.whl (44kB)
[K    100% |████████████████████████████████| 51kB 879kB/s ta 0:00:01
[?25hInstalling collected packages: marshmallow
  Found existing installation: marshmallow 2.15.3
    Uninstalling marshmallow-2.15.3:
      Successfully uninstalled marshmallow-2.15.3
Successfully installed marshm

Installing collected packages: netgraph
  Found existing installation: netgraph 3.1.2
    Uninstalling netgraph-3.1.2:
      Successfully uninstalled netgraph-3.1.2
Successfully installed netgraph-3.1.4
Collecting networkx
[?25l  Downloading https://files.pythonhosted.org/packages/11/42/f951cc6838a4dff6ce57211c4d7f8444809ccbe2134179950301e5c4c83c/networkx-2.1.zip (1.6MB)
[K    100% |████████████████████████████████| 1.6MB 3.8MB/s ta 0:00:01
Building wheels for collected packages: networkx
  Running setup.py bdist_wheel for networkx ... [?25ldone
[?25h  Stored in directory: /Users/najla/Library/Caches/pip/wheels/44/c0/34/6f98693a554301bdb405f8d65d95bbcd3e50180cbfdd98a94e
Successfully built networkx
Installing collected packages: networkx
  Found existing installation: networkx 2.0
    Uninstalling networkx-2.0:
      Successfully uninstalled networkx-2.0
Successfully installed networkx-2.1
Collecting nltk
[?25l  Downloading https://files.pythonhosted.org/packages/50/09/3b1755d528ad

  Running setup.py bdist_wheel for prometheus-client ... [?25ldone
[?25h  Stored in directory: /Users/najla/Library/Caches/pip/wheels/04/6b/a4/4ccef2edcc797b5553edc6c8bed4d219dafdb019e225c2c348
Successfully built prometheus-client
Installing collected packages: terminado, pyzmq, Send2Trash, prometheus-client, notebook
  Found existing installation: terminado 0.6
[31mCannot uninstall 'terminado'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.[0m
Collecting numba
[?25l  Downloading https://files.pythonhosted.org/packages/fc/f6/1555a24d032c1ebe6db3e92796d5846e7d5b4337eeb53439d95fb4694fc2/numba-0.39.0-cp36-cp36m-macosx_10_9_x86_64.whl (1.5MB)
[K    100% |████████████████████████████████| 1.5MB 4.7MB/s ta 0:00:01
Installing collected packages: numba
  Found existing installation: numba 0.35.0
    Uninstalling numba-0.35.0:
      Successfully uninstalled numba-0.35.0
Successfully install

  Running setup.py bdist_wheel for openpyxl ... [?25ldone
[?25h  Stored in directory: /Users/najla/Library/Caches/pip/wheels/b8/20/5b/d260e131180f4394eba48d97e238016f4bde050727fce79283
Successfully built openpyxl
Installing collected packages: openpyxl
  Found existing installation: openpyxl 2.4.8
    Uninstalling openpyxl-2.4.8:
      Successfully uninstalled openpyxl-2.4.8
Successfully installed openpyxl-2.5.5
Collecting packaging
  Downloading https://files.pythonhosted.org/packages/ad/c2/b500ea05d5f9f361a562f089fc91f77ed3b4783e13a08a3daf82069b1224/packaging-17.1-py2.py3-none-any.whl
Installing collected packages: packaging
  Found existing installation: packaging 16.8
    Uninstalling packaging-16.8:
      Successfully uninstalled packaging-16.8
Successfully installed packaging-17.1
Collecting pandas
[?25l  Downloading https://files.pythonhosted.org/packages/78/78/50ef81a903eccc4e90e278a143c9a0530f05199f6221d2e1b21025852982/pandas-0.23.4-cp36-cp36m-macosx_10_6_intel.macosx_10_9_

[K    100% |████████████████████████████████| 36.9MB 441kB/s eta 0:00:01�█████████                   | 15.0MB 64.1MB/s eta 0:00:01
Collecting retrying>=1.3.3 (from plotly)
  Downloading https://files.pythonhosted.org/packages/44/ef/beae4b4ef80902f22e3af073397f079c96969c69b2c7d52a57ea9ae61c9d/retrying-1.3.3.tar.gz
Building wheels for collected packages: retrying
  Running setup.py bdist_wheel for retrying ... [?25ldone
[?25h  Stored in directory: /Users/najla/Library/Caches/pip/wheels/d7/a9/33/acc7b709e2a35caa7d4cae442f6fe6fbf2c43f80823d46460c
Successfully built retrying
Installing collected packages: retrying, plotly
  Found existing installation: plotly 2.5.1
    Uninstalling plotly-2.5.1:
      Successfully uninstalled plotly-2.5.1
Successfully installed plotly-3.1.1 retrying-1.3.3
Collecting ply
[?25l  Downloading https://files.pythonhosted.org/packages/a3/58/35da89ee790598a0700ea49b2a66594140f44dec458c07e8e3d4979137fc/ply-3.11-py2.py3-none-any.whl (49kB)
[K    100% |██████████

Installing collected packages: ptyprocess
  Found existing installation: ptyprocess 0.5.2
[31mCannot uninstall 'ptyprocess'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.[0m
Collecting py
[?25l  Downloading https://files.pythonhosted.org/packages/f3/bd/83369ff2dee18f22f27d16b78dd651e8939825af5f8b0b83c38729069962/py-1.5.4-py2.py3-none-any.whl (83kB)
[K    100% |████████████████████████████████| 92kB 1.5MB/s ta 0:00:011
[?25hInstalling collected packages: py
  Found existing installation: py 1.4.34
    Uninstalling py-1.4.34:
      Successfully uninstalled py-1.4.34
Successfully installed py-1.5.4
Collecting pycodestyle
[?25l  Downloading https://files.pythonhosted.org/packages/e5/c6/ce130213489969aa58610042dff1d908c25c731c9575af6935c2dfad03aa/pycodestyle-2.4.0-py2.py3-none-any.whl (62kB)
[K    100% |████████████████████████████████| 71kB 1.2MB/s ta 0:00:01
[?25hInstalling collect

Successfully installed PySocks-1.6.8
Collecting pytest
[?25l  Downloading https://files.pythonhosted.org/packages/70/0b/c577e79496be9698ca118afe0c1dafd4878decd73337b21570b0d28bacc2/pytest-3.7.3-py2.py3-none-any.whl (204kB)
[K    100% |████████████████████████████████| 204kB 3.8MB/s ta 0:00:01
Collecting more-itertools>=4.0.0 (from pytest)
[?25l  Downloading https://files.pythonhosted.org/packages/79/b1/eace304ef66bd7d3d8b2f78cc374b73ca03bc53664d78151e9df3b3996cc/more_itertools-4.3.0-py3-none-any.whl (48kB)
[K    100% |████████████████████████████████| 51kB 13.5MB/s ta 0:00:01
[?25hCollecting atomicwrites>=1.0 (from pytest)
  Downloading https://files.pythonhosted.org/packages/0a/67/6cc7ebe2c939aa8e0de9cc3d2660b105d365330c2a8ffb066738fc83562f/atomicwrites-1.2.0-py2.py3-none-any.whl
Collecting pluggy>=0.7 (from pytest)
  Downloading https://files.pythonhosted.org/packages/f5/f1/5a93c118663896d83f7bcbfb7f657ce1d0c0d617e6b4a443a53abcc658ca/pluggy-0.7.1-py2.py3-none-any.whl
Collecting 

Collecting QtPy
[?25l  Downloading https://files.pythonhosted.org/packages/c4/cd/e25754bac536572dbe7f590689727683a7c43971918d1b611511a9f759f1/QtPy-1.5.0-py2.py3-none-any.whl (40kB)
[K    100% |████████████████████████████████| 40kB 733kB/s ta 0:00:01
[?25hInstalling collected packages: QtPy
  Found existing installation: QtPy 1.3.1
    Uninstalling QtPy-1.3.1:
      Successfully uninstalled QtPy-1.3.1
Successfully installed QtPy-1.5.0
Collecting regex
[?25l  Downloading https://files.pythonhosted.org/packages/f4/8f/20610f296c2475b3479d38ff1225ff457011c0232b0a5ace4fd8effb621b/regex-2018.08.17.tar.gz (643kB)
[K    100% |████████████████████████████████| 645kB 5.7MB/s ta 0:00:01
[?25hBuilding wheels for collected packages: regex
  Running setup.py bdist_wheel for regex ... [?25ldone
[?25h  Stored in directory: /Users/najla/Library/Caches/pip/wheels/55/af/2b/908e2afe3c41d245afa1d72e28b1a4f04f2f3791b71aaab916
Successfully built regex
[31mspacy 2.0.11 has requirement plac<1.0.0,>=0.

[K    100% |████████████████████████████████| 215kB 8.5MB/s eta 0:00:01
Installing collected packages: seaborn
  Found existing installation: seaborn 0.8
    Uninstalling seaborn-0.8:
      Successfully uninstalled seaborn-0.8
Successfully installed seaborn-0.9.0
Collecting setuptools
[?25l  Downloading https://files.pythonhosted.org/packages/66/e8/570bb5ca88a8bcd2a1db9c6246bb66615750663ffaaeada95b04ffe74e12/setuptools-40.2.0-py2.py3-none-any.whl (568kB)
[K    100% |████████████████████████████████| 573kB 4.6MB/s ta 0:00:011
[31mipython 6.5.0 has requirement prompt-toolkit<2.0.0,>=1.0.15, but you'll have prompt-toolkit 2.0.4 which is incompatible.[0m
[?25hInstalling collected packages: setuptools
  Found existing installation: setuptools 39.0.1
    Uninstalling setuptools-39.0.1:
[31mCould not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/anaconda3/lib/python3.6/site-packages/pkg_resources/__init__.py'
Consider using the `--user` option or check th

Collecting Sphinx
[?25l  Downloading https://files.pythonhosted.org/packages/71/3c/256c74a8dc5d735f73998d5a1e21ee3654771996ba07230bfebadc09d2c5/Sphinx-1.7.7-py2.py3-none-any.whl (1.9MB)
[K    100% |████████████████████████████████| 1.9MB 4.6MB/s ta 0:00:01
Installing collected packages: Sphinx
  Found existing installation: Sphinx 1.6.3
    Uninstalling Sphinx-1.6.3:
      Successfully uninstalled Sphinx-1.6.3
Successfully installed Sphinx-1.7.7
Collecting sphinxcontrib-websupport
  Downloading https://files.pythonhosted.org/packages/52/69/3c2fbdc3702358c5b34ee25e387b24838597ef099761fc9a42c166796e8f/sphinxcontrib_websupport-1.1.0-py2.py3-none-any.whl
Installing collected packages: sphinxcontrib-websupport
  Found existing installation: sphinxcontrib-websupport 1.0.1
    Uninstalling sphinxcontrib-websupport-1.0.1:
      Successfully uninstalled sphinxcontrib-websupport-1.0.1
Successfully installed sphinxcontrib-websupport-1.1.0
Collecting spyder
[?25l  Downloading https://files.pyth

Collecting sip<4.20,>=4.19.4 (from pyqt5<5.10; python_version >= "3"->spyder)
[?25l  Downloading https://files.pythonhosted.org/packages/b1/6f/782b6ff2770ebc6709dd7dc53a0636599a035853c78d785911fefc3f0e43/sip-4.19.8-cp36-cp36m-macosx_10_6_intel.whl (51kB)
[K    100% |████████████████████████████████| 61kB 18.7MB/s ta 0:00:01
Collecting prompt-toolkit<2.0.0,>=1.0.15 (from ipython>=4.0.0->ipykernel>=4.8.2->spyder-kernels<1.0->spyder)
  Using cached https://files.pythonhosted.org/packages/04/d1/c6616dd03701e7e2073f06d5c3b41b012256e42b72561f16a7bd86dd7b43/prompt_toolkit-1.0.15-py3-none-any.whl
Installing collected packages: sip, pyqt5, spyder-kernels, keyring, spyder, prompt-toolkit
  Found existing installation: spyder 3.2.4
[31mCannot uninstall 'spyder'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.[0m
Collecting SQLAlchemy
[?25l  Downloading https://files.pythonhosted.org/packages/aa

Installing collected packages: statsmodels
  Found existing installation: statsmodels 0.8.0
    Uninstalling statsmodels-0.8.0:
      Successfully uninstalled statsmodels-0.8.0
Successfully installed statsmodels-0.9.0
Collecting sympy
[?25l  Downloading https://files.pythonhosted.org/packages/b0/47/54f3a752f3b88b9487f38e5e6b7bb96e0b60286b50fd818999f1ffead0cc/sympy-1.2.tar.gz (5.4MB)
[K    100% |████████████████████████████████| 5.4MB 3.6MB/s eta 0:00:01
Building wheels for collected packages: sympy
  Running setup.py bdist_wheel for sympy ... [?25ldone
[?25h  Stored in directory: /Users/najla/Library/Caches/pip/wheels/b4/af/81/7fad05f53f1e451e8ddb4b15aa3908dc98112a9459911ebb66
Successfully built sympy
Installing collected packages: sympy
  Found existing installation: sympy 1.1.1
[31mCannot uninstall 'sympy'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.[0m
Collecting tables
[?25

  Running setup.py bdist_wheel for thinc ... [?25ldone
[?25h  Stored in directory: /Users/najla/Library/Caches/pip/wheels/f6/06/79/bfb440b8ab1eb00bc116d8e1845a032dcc144cc6933f03a49b
  Running setup.py bdist_wheel for hypothesis ... [?25ldone
[?25h  Stored in directory: /Users/najla/Library/Caches/pip/wheels/ff/6d/23/efba0d5f717baec2323a0416fa739a6fa92da25f882a855b6b
  Running setup.py bdist_wheel for msgpack-python ... [?25ldone
[?25h  Stored in directory: /Users/najla/Library/Caches/pip/wheels/f5/52/33/f49e5fca0bd22a9c0d7ab85320f2d0e6ff5fe49cec948d673a
Successfully built thinc hypothesis msgpack-python
[31mspacy 2.0.12 has requirement thinc<6.11.0,>=6.10.3, but you'll have thinc 6.11.2 which is incompatible.[0m
Installing collected packages: hypothesis, msgpack-python, msgpack-numpy, thinc
  Found existing installation: msgpack-python 0.5.6
    Uninstalling msgpack-python-0.5.6:
      Successfully uninstalled msgpack-python-0.5.6
  Found existing installation: msgpack-numpy 0.

In [15]:
features_split = np.array_split(features, 20)

NameError: name 'np' is not defined

In [11]:
t0= time.clock()
features_sample['total_friends'] = features_sample.apply(lambda row: total_friends(row.source, row.target), axis=1)
t1 = time.clock() 
print("Time elapsed: ", t1 - t0)

Time elapsed:  6.482231999999996


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [13]:
t0= time.clock()
features_sample['common_friends'] = features_sample.apply(lambda row: common_friends(row.source, row.target), axis=1)
t1 = time.clock()
print("Time elapsed: ", t1 - t0)

Time elapsed:  6.328578000000022


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [19]:
import numpy as np
x = np.array_split(features_sample, 3)

In [28]:
x[3]

IndexError: list index out of range

In [24]:
x[1]['pref_attachment'] = x[1].apply(lambda row: pref_attach(row.source, row.target), axis=1)
x[1]['d'] = 2

In [32]:
import pickle

# obj0, obj1, obj2 are created here...

# Saving the objects:
with open('G.pkl', 'wb') as f:  # Python 3: open(..., 'wb')
    pickle.dump(G, f)

# Getting back the objects:
with open('G.pkl', 'rb') as f:  # Python 3: open(..., 'rb')
       G = pickle.load(f)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

In [None]:
# Getting back the objects:
with open('G.pkl', 'rb') as f: 
       G = pickle.load(f)

In [None]:
type(G)