<a href="https://colab.research.google.com/github/abhilasha-kumar/modeling-lexical-retrieval/blob/main/DistanceExtractor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Code Snippet for measuring network distances over single-layer and multiplex networks


In [2]:
import networkx as nx

Down below we import two txt files containing only edge lists (note: You have to drag and drop files on Colab in the File subwindow each time you activate a runup). I used these edge lists for convenience but we should change them to the ones filtered/weighted appropriately.

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
import csv

pholi=[]
with open('/content/drive/MyDrive/LexicalRetrieval-2021/networks-data/EdgeLists/PhonologicalSimilaritiesMST.txt') as file:
    edgeread = csv.reader(file, delimiter='\t')
    for entries in edgeread:
      pholi.append(entries)

freea=[]
with open('/content/drive/MyDrive/LexicalRetrieval-2021/networks-data/EdgeLists/FreeAssociationsMST.txt') as file:
    edgeread = csv.reader(file, delimiter='\t')
    for entries in edgeread:
      freea.append(entries)


These are the first 10 edges of the phonological layer and of the free association layer:

In [None]:
pholi[:10]

[['1', '2'],
 ['1', '3'],
 ['1', '4'],
 ['1', '5'],
 ['1', '6'],
 ['1', '7'],
 ['1', '8'],
 ['1', '10'],
 ['1', '12'],
 ['1', '13']]

In [None]:
freea[:10]

[['1', 'binary'],
 ['1', 'scale'],
 ['2', 'binary'],
 ['2', 'couple'],
 ['2', 'l'],
 ['2', 'PlayStation'],
 ['2', 'u'],
 ['2', 'version'],
 ['2.', 'version'],
 ['3', 'number']]

In networkx we can import individual layers as Graph classes, whereas the multiplex network can be a MultiGraph. When we add links coming from different layers, we need to specify an appropriate colour. The multiplex distance becomes the distance in a single layer network where all links of all colours are available.

In [5]:
M1 = nx.Graph()
M2 = nx.Graph()
M1.add_edges_from(pholi)
M2.add_edges_from(freea)
M = nx.MultiGraph()
M.add_edges_from(pholi, color="red")
M.add_edges_from(freea, color="blue");

In [None]:
len(list(M1.edges()))

14225

In [None]:
len(list(M2.edges()))

28533

In [None]:
len(list(M.edges()))

42758

This file contains information on the targets and cues that are all in the multiplex network. It also includes the same sentence IDs I used for mapping sentences into syntactic networks.

In [6]:
words=[]
with open('/content/drive/MyDrive/LexicalRetrieval-2021/networks-data/EdgeLists/WordsIDsinMultiplex.txt') as file:
    edgeread = csv.reader(file, delimiter='\t')
    for entries in edgeread:
      words.append(entries)


In [7]:
words[6]

['14',
 'bargain',
 'light',
 'tariff',
 'bark',
 'barter',
 'True',
 'True',
 'True',
 'True',
 'True']

In [7]:
len(words)

33

This function imports a syntactic network (my textual forma mentis networks) as extracted from the prompt sentence (for details about how these networks are computed, see Stella, PeerJ Comp. Sci., 2020). In alternative, there are WithStopwords file that are the direct output of TextDependency as implemented in Mathematica (ie. the Stanford Universal Parser).

In [None]:
synct=[]
with open('FormaMentisNetwork_2.txt') as file:
    edgeread = csv.reader(file, delimiter='\t')
    for entries in edgeread:
      synct.append(entries)


These are the edges we want to add to the free association layer:

In [None]:
synct

[['formal', 'renounce'], ['formal', 'throne'], ['renounce', 'throne']]

In [None]:
M1Enr = nx.MultiGraph()
M2Enr = nx.MultiGraph()
M1Enr.add_edges_from(pholi)
M2Enr.add_edges_from(freea)
M2Enr.add_edges_from(synct);

These functions try to compute the shortest network distance on a given network from a target - abandon here - to a source - abdicate here. If there is no path on that layer, a warning flag is produced.

In [None]:
try:
    n=nx.shortest_path_length(M2Enr,'abandon','abdicate')
    print(n)
except nx.NetworkXNoPath:
    print ('No path')

7


In [None]:
try:
    n=nx.shortest_path_length(M2,'abandon','abdicate')
    print(n)
except nx.NetworkXNoPath:
    print ('No path')

7


In [None]:
try:
    n=nx.shortest_path_length(M,'abandon','abdicate')
    print(n)
except nx.NetworkXNoPath:
    print ('No path')

7


In this case, the shortest distance is the same over the multiplex (M) and also over the original free association layer (M2) and the free association layer enriched by syntactic dependencies (M2Enr).

In [13]:
withsynt = []
def with_synt():
  for i in range(1,len(words)):
    synct=[]
    pathnm = '/content/drive/MyDrive/LexicalRetrieval-2021/networks-data/EdgeLists/FormaMentisNetwork_' + words[i][0] + '.txt'
    with open(pathnm) as file:
      edgeread = csv.reader(file, delimiter='\t')
      for entries in edgeread:
        synct.append(entries)
    M2Enr = nx.MultiGraph()
    M2Enr.add_edges_from(freea)
    M2Enr.add_edges_from(synct)
    try:
      n=nx.shortest_path_length(M2Enr,words[i][1],words[i][5])
      withsynt.append(n)
    except nx.NetworkXNoPath:
      withsynt.append('No path')
   return withsynt 


In [8]:
    synct=[]
    pathnm = '/content/drive/MyDrive/LexicalRetrieval-2021/networks-data/EdgeLists/FormaMentisNetwork_' + words[6][0] + '.txt'
    with open(pathnm) as file:
      edgeread = csv.reader(file, delimiter='\t')
      for entries in edgeread:
        synct.append(entries)
    M2Enr = nx.MultiGraph()
    M2Enr.add_edges_from(freea)
    M2Enr.add_edges_from(synct)
    nx.shortest_path_length(M2Enr,words[6][1],words[6][5])
  

4

In [20]:
len(synct)

13

In [11]:
print(words[6][1])
print(words[6][5])
print(words[6])

bargain
barter
['14', 'bargain', 'light', 'tariff', 'bark', 'barter', 'True', 'True', 'True', 'True', 'True']


In [16]:
len(list(M2Enr.nodes()))

28535

In [15]:
len(list(M2.nodes()))

28535

In [17]:
len(list(M2Enr.edges()))

28546

In [18]:
len(list(M2.edges()))

28533

In [14]:
with_synt()

In [15]:
withsynt

[7,
 7,
 8,
 7,
 1,
 4,
 5,
 8,
 5,
 6,
 6,
 7,
 4,
 6,
 7,
 8,
 8,
 6,
 8,
 6,
 8,
 8,
 4,
 4,
 6,
 6,
 6,
 7,
 7,
 7,
 6,
 2]

In [16]:
nosynt = []
def no_synt():
  for i in range(1,len(words)):
    try:
      n=nx.shortest_path_length(M2,words[i][1],words[i][5])
      nosynt.append(n)
    except nx.NetworkXNoPath:
      nosynt.append('No path')
  return nosynt

In [17]:
no_synt()

[7,
 7,
 8,
 7,
 1,
 4,
 6,
 8,
 5,
 6,
 6,
 7,
 4,
 6,
 7,
 8,
 8,
 6,
 8,
 6,
 8,
 8,
 4,
 4,
 6,
 6,
 6,
 7,
 7,
 7,
 6,
 2]

In [19]:
diffs = []
for x in range(len(nosynt)):
  diffs.append(nosynt[x]-withsynt[x])
diffs

[0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0]

Comparing M2Enr and M2, the paths between the 'both' primes and targets are almost always equal (except in one case where M2Enr had a shorter path by 1).

1.add histogram of number of edges added by syntactic layer

2.try using other phonological layer (compute phonological similarities for all pairs of word nodes, any non-zero values are an edge)

3.look at what weighted semantic and phon networks would look like

look at number of unique nodes in phon MST

look at metrics for weighted shortest paths https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.shortest_paths.weighted.dijkstra_path.html

if we use distributional representation instead of FreeAssociationMST

code to create MST: https://networkx.org/documentation/networkx-1.10/reference/algorithms.mst.html