In [27]:
import json
import pickle
import numpy as np
import requests

In [21]:
# Load the dataset of positions of all words in 3D space
url = "https://raw.githubusercontent.com/CCS-ZCU/noscemus_ETF/master/data/coordinates3s_dict.pkl"
resp = requests.get(url)
coordinates3s_dict = pickle.loads(resp.content)

In [22]:
# The dataset has form of a dictionary, each key correspond to one of our six subcoropra:
coordinates3s_dict.keys()

dict_keys(['lasla', 'operamaiora', '1501-1550', '1551-1600', '1601-1650', '1651-1700'])

Values of each item correspond to word positions in 3D space. Thus each item consists of:
* `xs` - np.array of all x coordinates
* `ys` - np.array of all y coordinates
* `zs` - np.array of all z coordinates
* `words` - np.array of all words in the vocabulary
Thus, to extract positions of all word vectors in the 1651-1700 subcorpus, you have to run:

In [23]:
subcorpus = "1651-1700"
xs, ys, zs, words = coordinates3s_dict[subcorpus]

For instance, of we want to get x,y,z coordinates of a _target_ word, we have to find its position in the `words` nd.array and subsequently use this position to index from `xs`, `ys`, `zs`.

In [25]:
target = "scientia"
i = np.where(words == target) # find the positional index
x, y, z = xs[i], ys[i], zs[i] # apply the positional index to navigate through xs and ys
print(x, y, z)

[0.33957076] [0.27663973] [0.5570164]


We can use something similar to get positions of multiple words at once:

In [26]:
wordlist = ["scientia", "sapientia", "cognitio", "disciplina"]
idx = [word[0] for word in enumerate(words) if word[1] in wordlist] # find the positional indeces
wordlist_xs, wordlist_ys, wordlist_zs = xs[idx], ys[idx], zs[idx] # extract xs and ys for words in the wordlist based on their positional indeces
print(wordlist_xs, wordlist_ys, wordlist_zs)

[0.33957076 0.48258787 0.3337054  0.35430148] [0.27663973 0.52705663 0.2713426  0.2833426 ] [0.5570164  0.7940444  0.5414133  0.57174116]


I think that the data in this shape (wordlist_xs, wordlist_ys, wordlist_zs, wordlist) can be plotted with `plotly` 3D scatter in a very straightforward way.

In [28]:
wordlist_file_url = "https://raw.githubusercontent.com/CCS-ZCU/noscemus_ETF/master/data/wordlist.json"
response = requests.get(wordlist_file_url)
wordlist = requests.get(wordlist_file_url).json()

In [29]:
# let's take a look at the first 10 words
wordlist[:10]

['cognitio',
 'disciplina',
 'notitia',
 'philosophia',
 'ars',
 'doctrina',
 'geometria',
 'mathematicus',
 'peritia',
 'studiosus']

Now we can obtain their coordinates the same way as above.
Once again, let start by choosing the positional data for the subcorpus we are interested in.



In [30]:
# now we can obtain their coordinates the same way as above:


idx = [word[0] for word in enumerate(words) if word[1] in wordlist] # find the positional indeces
wordlist_xs, wordlist_ys, wordlist_zs = xs[idx], ys[idx], zs[idx]

In [None]:
# TO-DO: 
# (1) install and load plotly (maybe plotly express will be sufficient)
# (2) test plotting of the positional data above as a 3D scatter plot (do not hesitate to pass all this steps to ChatGPT)
# (3) play out with configuration of the hover
# (4) playout with axis limits, colors, fontsizes etc.
# (5) develop a versatile function ```plot_wordlist(wordlist, [additional parameters])```