### Initialization

Import of standard and third party libraries.

In [6]:
import os
from pathlib import Path
import sys

import astroid
import owlready2 as owl

HTML_W_PX = 800
HTML_H_PX = 800

Adding `codeontology` to sys path for its import (it is not installed in this virtual environement).

In [7]:
codeontology_path = Path("..", "..", "..", "..").resolve()
assert codeontology_path.exists()
sys.path.insert(0, str(codeontology_path))

### RDFization

We perform the triple extraction on the locally available source code of the [`okgraph`](https://github.com/atzori/okgraph) library. We use [`okgraph`](https://github.com/atzori/okgraph) as our test library because it is a real repository, sufficiently (but not overly) complex, and most importantly well known to us. The last one makes our test/inspection task easier, since we know its contents reasonably well.

In [8]:
a = 0

In [3]:
from codeontology.__main__ import main
main([
    "python3", "local", "D:\\Coding\\PyCharm\\okgraph",
    "-d", "D:\\Users\\sandr\\CodeOntologyOut\\okgraph\\download",
    "-o", "D:\\Users\\sandr\\CodeOntologyOut\\okgraph\\output",
])

2023-02-18 11:01:44 INFO    > Installing project in 'D:\Users\sandr\CodeOntologyOut\okgraph\download\install'.
2023-02-18 11:01:49 INFO    > Installed project 'okgraph-0.0.1'.
2023-02-18 11:01:49 INFO    > Installing project with its dependencies in 'D:\Users\sandr\CodeOntologyOut\okgraph\download\install'.
2023-02-18 11:09:58 INFO    > Creating object for `Project` 'okgraph-0.0.1' (from 'D:\Coding\PyCharm\okgraph').
2023-02-18 11:10:10 INFO    > Building unique model of 'okgraph-0.0.1':
2023-02-18 11:10:10 INFO    >  - parsing project packages and actual referenced dependencies (not linear progression);
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [01:35<00:00,  2.81s/it]
2023-02-18 11:11:45 INFO    >  - applying transformations to the ASTs of the project and of its actual referenced dependencies.
100%|███████████████████████

0

### Query

As said, the triple have been stored! We load them with `owlready2` to make some simple queries.

In [9]:
path = Path("D:\\Users\\sandr\\CodeOntologyOut\\okgraph\\output\\okgraph-0.0.1.nt")
ontology = owl.get_ontology(str(path)).load()

We use a custom function just to properly format and show the results.

In [10]:
from query import list_query_results, show_subgraph_from_entity

#### Libraries and packages

When running the extraction, we start downloading the dependencies of the project by (fakely) installing all the libraries dictated in the setup.py file. According to `pip list` this is what we should have in our environment after a clean install:
```
annoy              1.17.1    
certifi            2022.12.7 
charset-normalizer 3.0.1     
colorama           0.4.6     
fasteners          0.18      
filelock           3.9.0     
h5py               3.8.0     
huggingface-hub    0.12.0    
idna               3.4       
joblib             1.2.0     
lz4                4.3.2     
numpy              1.24.1    
packaging          23.0      
pip                22.3.1    
pymagnitude        0.1.143   
PyYAML             6.0       
regex              2022.10.31
requests           2.28.2    
scikit-learn       1.2.1     
scipy              1.10.0    
setuptools         65.6.3    
threadpoolctl      3.1.0     
tokenizers         0.13.2    
torch              1.13.1    
tqdm               4.64.1    
transformers       4.26.0    
typing_extensions  4.4.0
urllib3            1.26.14
Whoosh             2.7.4
xxhash             3.2.0
```
This is not really meaningful of what is **really** used in the library.

Some dependencies may be a leftover (`tqdm` is in the requirements, but has not been used in the *okgraph* code); some other dependencies may be there because are a dependency for our dependencies, but are actually never referenced from us or from the part of code of those dependencies that we effectively use.

Querying the generated triples asking for the libraries can show us which libraries are actually used, accounting for the standard library too (that obviously never appear in the list of installed libraries/packages).

In [11]:
list_query_results("""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?n_lib ?lib
WHERE {
    ?lib rdf:type woc:Library .
    ?lib woc:hasName ?n_lib .
}
""", 50);

Results:

 1 ('abc', 'http://rdf.webofcode.org/woc/library13')
 2 ('builtins', 'http://rdf.webofcode.org/woc/library9')
 3 ('docs', 'http://rdf.webofcode.org/woc/library1')
 4 ('gensim', 'http://rdf.webofcode.org/woc/library15')
 5 ('itertools', 'http://rdf.webofcode.org/woc/library23')
 6 ('logging', 'http://rdf.webofcode.org/woc/library19')
 7 ('math', 'http://rdf.webofcode.org/woc/library24')
 8 ('nt', 'http://rdf.webofcode.org/woc/library3')
 9 ('ntpath', 'http://rdf.webofcode.org/woc/library4')
10 ('numpy', 'http://rdf.webofcode.org/woc/library8')
11 ('okgraph', 'http://rdf.webofcode.org/woc/library10')
12 ('operator', 'http://rdf.webofcode.org/woc/library20')
13 ('os', 'http://rdf.webofcode.org/woc/library2')
14 ('pymagnitude', 'http://rdf.webofcode.org/woc/library11')
15 ('re', 'http://rdf.webofcode.org/woc/library16')
16 ('requests', 'http://rdf.webofcode.org/woc/library22')
17 ('shutil', 'http://rdf.webofcode.org/woc/library5')
18 ('sphinx', 'http://rdf.webofcode.org/woc/libra

In [16]:
out = show_subgraph_from_entity('http://rdf.webofcode.org/woc/importstatement24', ontology, HTML_W_PX, HTML_H_PX, max_deep=2)
from IPython.display import IFrame
IFrame(src=out, width=HTML_W_PX+10, height=HTML_H_PX+10)

In [15]:
list_query_results("""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?n_pkg
WHERE {
    ?pkg rdf:type woc:Package 
    ?pkg woc:hasFullyQualifiedName ?n_pkg
}
""", 100);

Results:

  1 ('abc',)
  2 ('builtins',)
  3 ('docs',)
  4 ('docs.make_docs',)
  5 ('gensim.downloader',)
  6 ('gensim.models.base_any2vec',)
  7 ('gensim.models.phrases',)
  8 ('gensim.models.word2vec',)
  9 ('gensim.utils',)
 10 ('itertools',)
 11 ('logging',)
 12 ('logging.config',)
 13 ('math',)
 14 ('nt',)
 15 ('numpy',)
 16 ('numpy.core.numerictypes',)
 17 ('okgraph',)
 18 ('okgraph.core',)
 19 ('okgraph.embeddings',)
 20 ('okgraph.evaluation',)
 21 ('okgraph.indexing',)
 22 ('okgraph.preprocessing',)
 23 ('okgraph.sliding_windows',)
 24 ('okgraph.task',)
 25 ('okgraph.task.relation_expansion',)
 26 ('okgraph.task.relation_expansion.centroid',)
 27 ('okgraph.task.relation_expansion.centroid.centroid',)
 28 ('okgraph.task.relation_expansion.intersection',)
 29 ('okgraph.task.relation_expansion.intersection.intersection',)
 30 ('okgraph.task.relation_labeling',)
 31 ('okgraph.task.relation_labeling.intersection',)
 32 ('okgraph.task.relation_labeling.intersection.intersection',)
 3

In [9]:
do_query_and_show_res("""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT ?n_pkg
WHERE {
    ?pkg rdf:type woc:Package 
    ?pkg woc:hasFullyQualifiedName ?n_pkg
}
""", 100);

NameError: name 'do_query_and_show_res' is not defined

#### OKgraph Packages

Lets see which `packages` are in `okgraph`:

In [10]:
do_query_and_show_res("""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?n_pkg
WHERE {
    ?lib woc:hasProject ?prj
    ?lib woc:hasName "okgraph"
    ?pkg woc:hasLibrary ?lib
    ?pkg woc:hasFullyQualifiedName ?n_pkg
}
""", 50);

NameError: name 'do_query_and_show_res' is not defined

In [11]:
do_query_and_show_res("""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?n_pkg ?docs
WHERE {
    ?lib woc:hasProject ?prj
    ?lib woc:hasName "okgraph"
    ?pkg woc:hasLibrary ?lib
    ?pkg woc:hasFullyQualifiedName ?n_pkg
    ?pkg woc:hasDocumentation ?docs
}
""", 50);

NameError: name 'do_query_and_show_res' is not defined

#### Classes

Lets see which `classes` are in `okgraph`:

In [10]:
do_query_and_show_res("""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?n_cls
WHERE {
    ?lib woc:hasProject ?prj
    ?lib woc:hasName "okgraph"
    ?pkg woc:hasLibrary ?lib
    ?cls rdf:type woc:Class
    ?cls woc:hasPackage ?pkg
    ?cls woc:hasFullyQualifiedName ?n_cls
}
""", 50);

Results:

 1 ('okgraph.core.NotExistingCorpusException',)
 2 ('okgraph.core.OKgraph',)
 3 ('okgraph.embeddings.FileConverter',)
 4 ('okgraph.embeddings.MagnitudeWordEmbeddings',)
 5 ('okgraph.embeddings.NotExistingWordException',)
 6 ('okgraph.embeddings.WordEmbeddings',)
 7 ('okgraph.indexing.Indexing',)
 8 ('okgraph.sliding_windows.SlidingWindows',)


In [11]:
do_query_and_show_res("""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?n_cls ?docs
WHERE {
    ?lib woc:hasProject ?prj
    ?lib woc:hasName "okgraph"
    ?pkg woc:hasLibrary ?lib
    ?cls woc:hasPackage ?pkg
    ?cls woc:hasFullyQualifiedName ?n_cls
    ?cls woc:hasDocumentation ?docs
}
""", 50);

Results:

 1 (
        okgraph.core.NotExistingCorpusException
        An exception used to represent the error that occur when the specified corpus is not existing.,
   )
 2 (
        okgraph.core.OKgraph
        A class used to extract knowledge from unstructured text corpus. This class currently focuses on the following tasks: - **set expansion**: given one or a short set of words, continues this set with a list of other 'same-type' words (`co-hyponyms <https://en.wikipedia.org/wiki/Hyponymy_and_hypernymy#Co-hyponyms>`_); - **relation expansion**: given one or a short set of word pairs, continues this set with a list of pairs having the same implicit relation of the given pairs; - **set labeling**: given one or a short set of words, returns a list of short strings (labels) describing the given set (its type or `hyperonym <https://en.wikipedia.org/wiki/Hyponymy_and_hypernymy>`_); - **relation labeling**: given one or a short set of word pairs, returns a list of short strings (labels)

#### Class fields

Lets see some `fields` from one of those `classes`:

In [12]:
class_full_name = "okgraph.core.OKgraph"

In [13]:
do_query_and_show_res(f"""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?n_fld
WHERE {{
    ?cls woc:hasFullyQualifiedName "{class_full_name}"
    ?fld woc:isFieldOf ?cls
    ?fld woc:hasName ?n_fld
}}
""", 50);

Results:

 1 ('corpus',)
 2 ('dictionary',)
 3 ('embeddings',)
 4 ('index',)


In [14]:
do_query_and_show_res(f"""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?n_fld ?type
WHERE {{
    ?cls woc:hasFullyQualifiedName "{class_full_name}"
    ?fld woc:isFieldOf ?cls
    ?fld woc:hasName ?n_fld
    ?fld woc:hasType ?type
}}
""", 50);

Results:

 1 ('corpus', 'http://rdf.webofcode.org/woc/class9')
 2 ('dictionary', 'http://rdf.webofcode.org/woc/class9')
 3 ('embeddings', 'http://rdf.webofcode.org/woc/class40')
 4 ('index', 'http://rdf.webofcode.org/woc/class9')


In [15]:
do_query_and_show_res(f"""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?n_fld ?docs
WHERE {{
    ?cls woc:hasFullyQualifiedName "{class_full_name}"
    ?fld woc:isFieldOf ?cls
    ?fld woc:hasName ?n_fld
    ?fld woc:hasDocumentation ?docs
}}
""", 50);

Results:

 1 ('corpus', 'path of the corpus file.')
 2 ('dictionary', 'path of the corpus dictionary.')
 3 ('embeddings', 'words embeddings (vector model).')
 4 ('index', 'path of the indexed corpus files.')


#### Methods

Lets see some `methods` from one of those `classes`:

In [16]:
class_full_name = "okgraph.core.OKgraph"

In [17]:
do_query_and_show_res(f"""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?n_mth
WHERE {{
    ?cls woc:hasFullyQualifiedName "{class_full_name}"
    ?mth woc:isMethodOf ?cls
    ?mth woc:hasName ?n_mth
}}
""", 50);

Results:

 1 ('_get_dictionary',)
 2 ('_get_embeddings',)
 3 ('_get_index',)
 4 ('relation_expansion',)
 5 ('relation_labeling',)
 6 ('set_expansion',)
 7 ('set_labeling',)


In [18]:
do_query_and_show_res(f"""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?n_mth ?docs
WHERE {{
    ?cls woc:hasFullyQualifiedName "{class_full_name}"
    ?mth woc:isMethodOf ?cls
    ?mth woc:hasName ?n_mth
    ?mth woc:hasDocumentation ?docs
}}
""", 50);

Results:

 1 ('_get_dictionary', 'Loads or generates the dictionary whether or not it is already existing.')
 2 ('_get_dictionary', 'Returns: the path of the loaded/generated dictionary file.')
 3 (
        _get_embeddings
        Loads or generates the embeddings whether or not it is already existing. FIXME: do not work with URLs, just with files.,
   )
 4 ('_get_embeddings', 'Returns: the path of the loaded/generated Magnitude model.')
 5 ('_get_index', 'Loads or generates the index whether or not it is already existing.')
 6 ('_get_index', 'Returns: the path of the loaded/generated index directory.')
 7 ('relation_expansion', 'Finds tuples with the same implicit relation of the seed tuples.')
 8 ('relation_expansion', 'Returns: the list of tuples related to the seed tuples.')
 9 ('relation_labeling', 'Finds labels describing the implicit relation between the seed tuples.')
10 ('relation_labeling', 'Returns: the list of labels related to the seed tuples.')
11 ('set_expansion', 'Finds

#### Parameters

In [19]:
class_full_name = "okgraph.core.OKgraph"
method_name = "relation_expansion"

In [20]:
do_query_and_show_res(f"""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?p_par ?n_par
WHERE {{
    ?cls woc:hasFullyQualifiedName "{class_full_name}"
    ?mth woc:isMethodOf ?cls
    ?mth woc:hasName "{method_name}"
    ?par woc:hasParameterPosition ?p_par
    ?par woc:isParameterOf ?mth
    ?par woc:hasName ?n_par
}}
""", 50);

Results:

 1 ('0', 'self')
 2 ('1', 'seed')
 3 ('2', 'k')
 4 ('3', 'algo')
 5 ('4', 'options')


In [21]:
do_query_and_show_res(f"""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?p_par ?n_par ?docs
WHERE {{
    ?cls woc:hasFullyQualifiedName "{class_full_name}"
    ?mth woc:isMethodOf ?cls
    ?mth woc:hasName "{method_name}"
    ?par woc:hasParameterPosition ?p_par
    ?par woc:isParameterOf ?mth
    ?par woc:hasName ?n_par
    ?par woc:hasDocumentation ?docs
}}
""", 50);

Results:

 1 ('1', 'seed', 'List[Tuple[str, ...]]')
 2 ('2', 'k', 'int')
 3 ('3', 'algo', 'str')
 4 ('4', 'options', 'Dict')


In [22]:
do_query_and_show_res(f"""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?p_par ?n_par ?t_par
WHERE {{
    ?cls woc:hasFullyQualifiedName "{class_full_name}"
    ?mth woc:isMethodOf ?cls
    ?mth woc:hasName "{method_name}"
    ?par woc:isParameterOf ?mth
    ?par woc:hasParameterPosition ?p_par
    ?par woc:hasName ?n_par
    ?par woc:hasType ?t_par
}}
""", 50);

Results:

 1 ('0', 'self', 'http://rdf.webofcode.org/woc/class51')
 2 ('1', 'seed', 'http://rdf.webofcode.org/woc/parameterizedtype67')
 3 ('2', 'k', 'http://rdf.webofcode.org/woc/class5')
 4 ('3', 'algo', 'http://rdf.webofcode.org/woc/class9')
 5 ('4', 'options', 'http://rdf.webofcode.org/woc/class1')


In [23]:
do_query_and_show_res(f"""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?p
WHERE {{
    ?cls woc:hasFullyQualifiedName "{class_full_name}"
    ?mth woc:isMethodOf ?cls
    ?mth woc:hasName "{method_name}"
    ?o ?p ?mth
}}
""", 50);

Results:

 1 ('http://rdf.webofcode.org/woc/isParameterOf',)
 2 ('http://rdf.webofcode.org/woc/isReturnStatementOf',)


In [24]:
do_query_and_show_res(f"""
prefix woc: <http://rdf.webofcode.org/woc/>

SELECT DISTINCT ?st ?ot
WHERE {{
    ?s woc:hasBody ?o
    ?s rdf:type ?st
    ?o rdf:type ?ot
}}
""", 50);

Results:

 1 ('http://rdf.webofcode.org/woc/CatchStatement', 'http://www.w3.org/2002/07/owl#NamedIndividual')
 2 ('http://rdf.webofcode.org/woc/CatchStatement', 'http://rdf.webofcode.org/woc/BlockStatement')
 3 (
        http://rdf.webofcode.org/woc/DeclarationStatement
        http://www.w3.org/2002/07/owl#NamedIndividual,
   )
 4 (
        http://rdf.webofcode.org/woc/DeclarationStatement
        http://rdf.webofcode.org/woc/BlockStatement,
   )
 5 (
        http://rdf.webofcode.org/woc/ForEachStatement
        http://www.w3.org/2002/07/owl#NamedIndividual,
   )
 6 ('http://rdf.webofcode.org/woc/ForEachStatement', 'http://rdf.webofcode.org/woc/BlockStatement')
 7 ('http://rdf.webofcode.org/woc/TryStatement', 'http://www.w3.org/2002/07/owl#NamedIndividual')
 8 ('http://rdf.webofcode.org/woc/TryStatement', 'http://rdf.webofcode.org/woc/BlockStatement')
 9 ('http://rdf.webofcode.org/woc/WhileStatement', 'http://www.w3.org/2002/07/owl#NamedIndividual')
10 ('http://rdf.webofcode.org/woc/W