# Colaboratory Assignment 7.2

**Instructions**. Below you will find several text cells with programming (short) problems. You can create how many code cells you need to answer them.

There are four problems, but you will only need to solve two. You **must** choose at least one of the problems with the title in <font color='#006633'>green</font>.


**BEFORE YOU START**

Make sure to run the code cell below, to fix the adjacency matrix problem. Also, remember that the next code cell should be the first thing you evaluate. Otherwise, you will to restart your runtime and reimport `networkx`

In [None]:
!pip uninstall -q --yes scipy networkx && pip install -q scipy==1.8 networkx==2.7

## 1. Identifying v-shapes

There is a distinction between counting the v-shapes that have node $h$ as its vertex and checking if nodes $i, h, q$ are a v-shape motif. In this problem, use the IMDB data to print all v-shapes with `Martin Scorsese` as the vertex and `Robert De Niro` as one of the nodes. You can print a 3-element tuple with the nodes involved in the v-shape

In [None]:
import networkx as nx
from Readlist import readlist


def vCheck(G, vertex, end1):
  v_shapes = []
  nodes = list(G.nodes())
  for i in nodes:
    if G.has_edge(vertex, end1) and G.has_edge(vertex, i):
      v_shapes.append((end1, vertex, i))
    else:
      continue
  return v_shapes


G = readlist('imdb.pkl', 0)
v = vCheck(G, 'Martin Scorsese', 'Robert De Niro')
print(v)



##<font color='#006633'>2. Counting v-shapes inefficiently</font>

Even though we have defined a function to count how many v-shapes every node is the vertex to,  we can also accomplish this by using the `vCheck` function. Ideally, we should use the IMDB data for this problem, but this is not feasible given the number of pairs we could identify in this network. Potentially, there are 218133097260 pairs of nodes, which makes the search too extensive in terms of time of execution.

We will use the airports network for this problem.

Count the v-shapes in which `ATL` is the vertex, using only the `vCheck` function. Compare that result with the one obtained using the `vi` function, defined in the slides and the videos for this lesson.

In [None]:
def vCheck(G, vertex):
  neighs = list(nx.neighbors(G, vertex))
  count = 0
  for i in range(len(neighs)):
    for j in range(1, len(neighs)):
      if G.has_edge(vertex, neighs[i]) and G.has_edge(vertex, neighs[j]):
        count += 1
      else:
        continue
  return count / 2

def vi(G, i):
    k = G.degree(i)
    return (k * (k - 1))/2

G2 = readlist('Airports.txt', 0) # network

v = vCheck(G2, 'ATL') # modified v-shape check
v2 = vi(G2, 'ATL') # instructor v-shape check

print('Using the modified vCheck function, the number of v-shapes calculated was: ', v)
print('Using the instructors vi function, the number of v-shapes calculated was: ', v2)

## <font color='#006633'>3. How slow is using `vCheck` compared to `vi`?</font>

Let's time both methods for counting v-shapes. A useful piece of information is to have the count of v-shapes for **multiple** nodes in a network. To save this, we can create a dictionary `vAir` with the node labels as the keys, and the number of v-shapes with the node as vertex as the values.

Use the airports dataset, but considering (from the previous problem) that each call to the function `vCheck` for this network takes $\approx$ 3 seconds; let's use the last 50 nodes in the node set. Time how long it takes to obtain the dictionary `vAir` using the functions `vCheck()` and `vi()`.

To take the time used to obtain this dictionary using both methods, import the `time` module and use the function `time` like this

```python
import time

t0 = time.time()
##put some code here
tf = time.time()
print('Executed in ', tf - t0}')
```

## 4. Creating histograms

Just as with triangles, it is possible to create a v-shape histogram for a network. This is a count for the frequencies of the number of v-shapes every node is the vertex to.

Go back to the IMDB network and obtain vIMDB, a dictionary of the v-shapes for each node (use the `vi()` function). Then, create a histogram `Hv` and plot it, using the method you think it's optimal in terms of visualization (e.g. bars, dots, linear or log scales, etc).