In [None]:
import pandas as pd #module to work with dataframes
import networkx as nx #module to work with networks
import numpy as np
import scipy as scpy
from Functions import *
#%matplotlib inline

# Network Structure II: Community structure and mixing patterns (community-scale)

## Connected components

In the simple network we saw at the start of the previous notebook we saw that for **every** pair of nodes, we can find a path connecting them. This is the definition of a **connected graph**. We can check this property for a given graph:

In [None]:
G = nx.Graph()
nx.add_cycle(G, (1,2,3))
G.add_edge(4,5)
nx.draw(G, with_labels=True)

In [None]:
nx.is_connected(G)

While visually, we can identify two connected components in that graph, it is not always so simple, and we need a way to verify this. We will use the function `nx.number_connected_components(G)`

In [None]:
nx.number_connected_components(G)

The `nx.connected_components(G)` function takes a graph and returns a list of sets of node names, one such set for each connected component. Verify that the two sets in the following list correspond to the two connected components in the drawing of the graph above:

In [None]:
list(nx.connected_components(G)) # see that they match the components in the graph figure

> In case you're not familiar with Python sets: they are **unordered** collections of items **without duplicates**. These are useful for collecting node names because node names should be unique. We can get the number of items in a set with the `len` function:

In [None]:
components = list(nx.connected_components(G)) #get number of nodes in the first component
len(components[0])

We often care about the **largest connected component**, which is sometimes referred to as the **core** of the network. We can make use of Python's builtin `max` function in order to obtain the largest connected component. By default, Python's `max` function sorts things in lexicographic (i.e. alphabetical) order, which is not helpful here. We want the maximum connected component when sorted in order of their sizes, so we pass `len` as a key function:

In [None]:
max(nx.connected_components(G), key=len)

While it's often enough to just have the list of node names, sometimes we need the actual subgraph consisting of the largest connected component. One way to get this is to pass the list of node names to the `G.subgraph()` function:

In [None]:
core_nodes = max(nx.connected_components(G), key=len)
core = G.subgraph(core_nodes)

nx.draw(core, with_labels=True)

### Directed components
Directed networks have two kinds of connectivity. **Strongly connected** means that there exists a directed path between every pair of nodes, i.e., that from any node we can get to any other node while following edge directionality. Think of cars on a network of one-way streets: they can't drive against the flow of traffic.

In [None]:
D = nx.DiGraph() #lets  use the same small directed network
D.add_edges_from([
    (1,2),
    (2,3),
    (3,2), (3,4), (3,5),
    (4,2), (4,5), (4,6),
    (5,6),
    (6,4),
])
nx.draw(D, with_labels=True)

In [None]:
nx.is_strongly_connected(D)

**Weakly connected** means that there exist a path between every pair of nodes, regardless of direction. Think about pedestrians on a network of one-way streets: they walk on the sidewalks so they don't care about the direction of traffic.

In [None]:
nx.is_weakly_connected(D)

> Note 1: If a network is strongly connected, it is also weakly connected. The converse is not always true, as seen in this example.

> Note 2: The `is_connected` function for undirected graphs will raise an error when given a directed graph.

In the directed case, instead of `nx.connected_components` we now have `nx.weakly_connected_components` and `nx.strongly_connected_components`:

In [None]:
list(nx.weakly_connected_components(D))

In [None]:
list(nx.strongly_connected_components(D))

<div class="alert alert-block alert-success"><b>Up to you: </b>
<h4> Exercise X</h4>
Let's work with the network of US air travel routes. The nodes in this graph are airports, represented by their [IATA codes](https://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_A); 
    
Two nodes are connected with an edge if there is a scheduled flight directly connecting these two airports. We'll assume this graph to be undirected since a flight in one direction usually means there is a return flight.
Thus this graph has edges

[('HOM', 'ANC'), ('BGM', 'PHL'), ('BGM', 'IAD'), ...]

where ANC is Anchorage, IAD is Washington Dulles, etc.
These nodes also have **attributes** associated with them, containing additional information about the airports, as their full name and location.
    
Create the network of USA flights and analyze it to answer these questions:
    
- 1) Is there a direct flight between Indianapolis (IND) and Fairbanks, Alaska (FAI)? A direct flight is one with no intermediate stops.
- 2) If I wanted to fly from Indianapolis to Fairbanks, Alaska what would be an itinerary with the fewest number of flights?
- 3) Is it possible to travel from any airport in the US to any other airport in the US, possibly using connecting flights? In other words, does there exist a path in the network between every possible pair of airports?
</div>

In [None]:
#write your code here. The network is already loaded
G = nx.read_graphml('./data/openflights_usa.graphml.gz')

In [None]:
nodes=list(G.nodes(data=True))

In [None]:
city1="IND"
citi2="FAI"

In [None]:
G.has_edge("IND","FAI")

In [None]:
G.shor

## Partitions and Modularity

### SBM

## Clustering

## Small world

# Network Structure III: Motifs (meso-scale)