# Creating a feature matrix from a networkx graph

In this notebook we will look at a few ways to quickly create a feature matrix from a networkx graph.

[Networkx basic tutorial](http://pynetwork.readthedocs.io/en/latest/networkx_basics.html)

In [None]:
# %load ../_data/standard_import.txt

%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
import pickle

plt.style.use('seaborn-white')

### Data

### Import edges and nodes into networkx 2.X

<div class=warn>Networkx 1.X files cannot be read by networkx 2.X</div>

```python
file = './major_us_cities.dms'
G = nx.read_gpickle(file)
print(nx.info(G))

list(G.edges(data=True))[:10]
list(G.nodes(data=True))[:10]

edges = pd.DataFrame(list(G.edges(data=True)))
nodes = pd.DataFrame(list(G.nodes(data=True)))
nodes.sample()
edges.sample()

# Use a lambda to pull out the attributes from the attributes dictionary in column 1
nodes['location'] = nodes.loc[:, 1].map(lambda x: x['location'])
nodes['population'] = nodes.loc[:, 1].map(lambda x: x['population'])
del nodes[1]
nodes.sample()

# Use a lambda to pull out the attributes from the attributes dictionary in column 1
edges['weight'] = edges.loc[:, 2].map(lambda x: x['weight'])
del edges[2]
edges.sample()

edges.to_csv('major_us_cities_edges.csv')
nodes.to_csv('major_us_cities_email_nodes.csv')
```

In [None]:
!find .. | grep -i major_us_cities

In [None]:
nodes = pd.read_csv('../_data/major_us_cities_nodes.csv', index_col=0, 
                    names=['node', 'location', 'population'])
edges = pd.read_csv('../_data/major_us_cities_edges.csv', index_col=0,
                   names=['n1', 'n2', 'weight'])
nodes.sample(3)
edges.sample(3)

In [None]:
# First create graph from edges, then add nodes
G = nx.from_pandas_dataframe(edges, 'n1', 'n2', edge_attr='weight')
print(nx.info(G))

In [None]:
list(G.edges(data=True))[:5]
list(G.nodes(data=True))[:5]

# remove header
try:
    G.remove_node('0')
    G.remove_node('1')
    G.remove_edge('0', '1')
except:
    None

In [None]:
_ = [G.add_node(nodes.loc[n, 'node'], 
                location=nodes.loc[n, 'location'], 
                population=nodes.loc[n, 'population']) for n in nodes.index]

In [None]:
list(G.nodes(data=True))[:10]

## Node based features

In [None]:
# Initialize the dataframe, using the nodes as the index
df = pd.DataFrame(index=G.nodes())

### Extracting attributes

Using `nx.get_node_attributes` it's easy to extract the node attributes in the graph into DataFrame columns.

In [None]:
df['location'] = pd.Series(nx.get_node_attributes(G, 'location'))
df['population'] = pd.Series(nx.get_node_attributes(G, 'population'))

df.head()

### Creating node based features

Most of the networkx functions related to nodes return a dictionary, which can also easily be added to our dataframe.

In [None]:
list(nx.degree(G))[:10]
pd.Series([d[1] for d in G.degree()]).head()

In [None]:
df['clustering'] = pd.Series(nx.clustering(G))
df['degree'] = pd.Series([d[1] for d in G.degree()]).values
df.head()

# Edge based features

In [None]:
list(G.edges(data=True))[:5]

In [None]:
# Initialize the dataframe, using the edges as the index
df = pd.DataFrame(index=G.edges(data=True))

### Extracting attributes

Using `nx.get_edge_attributes`, it's easy to extract the edge attributes in the graph into DataFrame columns.

In [None]:
weight = list(nx.get_edge_attributes(G, 'weight').values())
weight[:5]

In [None]:
# Remove weight from index
df.index = [(edge[0], edge[1]) for edge in df.index]

In [None]:
df['weight'] = weight
df.sample(3)

### Creating edge based features

Many of the networkx functions related to edges return a nested data structures. We can extract the relevant data using list comprehension.

In [None]:
df['preferential attachment'] = [i[2] for i in nx.preferential_attachment(G, df.index)]
df.sample(3)

In the case where the function expects two nodes to be passed in, we can map the index to a lamda function.

In [None]:
df['Common Neighbors'] = df.index.map(lambda city: len(list(nx.common_neighbors(G, city[0], city[1]))))
df.sample(3)