# Generate the AudioSet ontology

Using [OwlReady2](https://owlready2.readthedocs.io/en/latest/index.html) package. Ontology documentation published at https://maastrichtu-ids.github.io/audioset-owl

First define the Notebook parameters for [papermill](https://papermill.readthedocs.io/en/latest/usage-parameterize.html)

In [1]:
# Papermill parameters. Do not delete this cell.
output_format = 'rdfxml'
audioset_ontology_uri = 'https://w3id.org/audioset'

Import the library and define the local `ontologies` folder. If an URL is given, first searches for a local copy of the OWL file and, if not found, tries to download it from the Internet.

In [2]:
from owlready2 import *
import types

if output_format == 'ntriples':
    output_extension = 'nt'
else:
    output_extension = 'rdf'

global audioset_onto 
global audioset_curated_hash
onto_path.append("/notebooks/ontologies")



### Create and load ontologies

Create the AudioSet ontology and load the Pizza ontology from the Internet (for example purpose)

In [3]:
audioset_onto = get_ontology(audioset_ontology_uri)

pizza_onto = get_ontology("http://www.lesfleursdunormal.fr/static/_downloads/pizza_onto.owl").load()

### Create AudioSet OWL ontology from the JSON

* Get [AudioSet ontology JSON from GitHub](https://github.com/audioset/ontology)
    * [AudioSet Top classes](https://research.google.com/audioset/ontology/index.html): Human sounds, Animal, Music, Sounds of things, Natural sounds, source-ambiguous things, "Channel, environment and background"
* Add classes respecting hierarchy provided in the JSON through the `child_ids` field

See [OwlReady2 documentation](https://owlready2.readthedocs.io/en/latest/index.html) for:
* [Dynamic Classes](https://owlready2.readthedocs.io/en/latest/class.html#creating-classes-dynamically)
* [Add annotations to a Class](https://owlready2.readthedocs.io/en/latest/annotations.html?highlight=comment#adding-an-annotation): `comment`, `isDefinedBy`, `label`, `seeAlso`, `backwardCompatibleWith`, `deprecated`, `incompatibleWith`, `priorVersion`, `versionInfo`
* [Properties](https://owlready2.readthedocs.io/en/latest/properties.html)

Note: classes with multiple parents are properly defined, see `ChirpTweet` or the graph visualization as example

In [4]:
import requests, json
audioset_json = json.loads(requests.get("https://raw.githubusercontent.com/audioset/ontology/master/ontology.json").text)

In [5]:
def generate_owl_class(class_json, parent_class):
    """Recursively generates OWL classes and instances, original hierarchy respected using child_ids."""
    with audioset_onto:
        NewClass = types.new_class(class_json['uri_id'], (parent_class,))
        NewClass.label = locstr(class_json['name'], lang = "en")
        NewClass.comment = locstr(class_json['description'], lang = "en")
        NewClass.comment = class_json['id']
        if class_json['citation_uri']:
            NewClass.comment = class_json['citation_uri']
        if class_json['positive_examples']:
            # Generate instances
            for youtube_example in class_json['positive_examples']:
                NewClass(comment = 'https://' + youtube_example)
    for child in class_json['child_ids']:
        generate_owl_class(audioset_curated_hash[child], NewClass)

In [6]:
c = 0
# Create a hash using google audioset ID as key
audioset_curated_hash = {}
for row in audioset_json:
    # Generate the ID that will be used for the ontology URI
    uri_id = row['name'].replace(',', '').replace(')', '').replace('(', '').replace('.', '').replace("'", '').replace(";", '')
    uri_id = uri_id.title().replace(' ', '').replace('-', '')
    audioset_curated_hash[row['id']] = row
    audioset_curated_hash[row['id']]['uri_id'] = uri_id
    c += 1
print('Number of classes in the original AudioSet JSON: ' + str(c))
    
# Recursively generates classes starting from AudioSet top classes
audioset_top_classes = ['/m/0dgw9r', '/m/0jbk', '/m/04rlf', '/t/dd00041', '/m/059j3w', '/t/dd00098', '/t/dd00123']
for top_class in audioset_top_classes:
    generate_owl_class(audioset_curated_hash[top_class], Thing)

Number of classes in the original AudioSet JSON: 632


### Example to generate properties with domain and ranges

In [7]:
# with audioset_onto:
#     class Accent(Thing):
#         pass
#     class has_accent(ObjectProperty):
#         domain    = [HumanVoice]
#         range     = [Accent]
#     class description(ObjectProperty):
#         range     = [str]

### Add metadata to the ontology

In [8]:
audioset_onto.metadata.comment.append("OWL Ontology for the AudioSet ontology from Google defined in JSON.")

### Save the ontology file

Ontology files saved in the `ontologies` folder. 

2 formats available, defined in the papermill parameters (at the start of the notebook or in the `papermill-config.json` file):
* `rdfxml`
* `ntriples`

In [9]:
audioset_onto.save(file = "ontologies/audioset." + output_extension, format = output_format)

# Explore the ontology

**With OwlReady2**, e.g. list an ontology classes and properties.

In [10]:
# Get a class IRI:
print(audioset_onto.HumanVoice.iri)
# List all 682 classes:
#print(list(audioset_onto.classes()))
# List object properties:
print(list(audioset_onto.object_properties()))
# List a class instances:
for i in audioset_onto.InsideSmallRoom.instances(): print(i)

https://w3id.org/audioset#HumanVoice
[]
audioset.insidesmallroom1
audioset.insidesmallroom2
audioset.insidesmallroom3
audioset.insidesmallroom4
audioset.insidesmallroom5
audioset.insidesmallroom6
audioset.insidesmallroom7
audioset.insidesmallroom8


### Use Ontospy to analyze the ontology

Load the ontology file with `ontospy`, then:
* print top classes and the class tree
* print instances of a class

In [11]:
import ontospy
audioset_spy = ontospy.Ontospy("ontologies/audioset.rdf", verbose=True)

[32mReading: <ontologies/audioset.rdf>[0m
.. trying rdf serialization: <xml>[0m
[1m..... success![0m
[37m----------
Loaded 12223 triples.
----------[0m
[32mRDF sources loaded successfully: 1 of 1.[0m
[37m..... 'ontologies/audioset.rdf'[0m
[37m----------[0m


[32mScanning entities...[0m
[2m----------[0m
[2mOntologies.........: 1[0m
[2mClasses............: 632[0m
[2mProperties.........: 0[0m
[2m..annotation.......: 0[0m
[2m..datatype.........: 0[0m
[2m..object...........: 0[0m
[2mConcepts (SKOS)....: 0[0m
[2mShapes (SHACL).....: 0[0m
[2m----------[0m


In [12]:
# audioset_spy.printClassTree()
audioset_spy.toplayer_classes

[<Class *https://w3id.org/audioset#Animal*>,
 <Class *https://w3id.org/audioset#ChannelEnvironmentAndBackground*>,
 <Class *https://w3id.org/audioset#HumanSounds*>,
 <Class *https://w3id.org/audioset#Music*>,
 <Class *https://w3id.org/audioset#NaturalSounds*>,
 <Class *https://w3id.org/audioset#SoundsOfThings*>,
 <Class *https://w3id.org/audioset#SourceAmbiguousSounds*>]

In [13]:
# Print instances of Sigh class
audioset_spy.get_class('Sigh')[0]
for instance in audioset_spy.get_class('Sigh')[0].instances:
        print(instance.uri, instance.qname)
        instance.printTriples()

https://w3id.org/audioset#sigh1 audioset:sigh1
[31mhttps://w3id.org/audioset#sigh1[0m[0m
[30m=> http://www.w3.org/2000/01/rdf-schema#comment[0m
[2m.... https://youtu.be/XOphuM8ZUhM?start=560&end=570[39m[0m
[30m=> http://www.w3.org/1999/02/22-rdf-syntax-ns#type[0m
[2m.... https://w3id.org/audioset#Sigh[39m[0m
[30m=> http://www.w3.org/1999/02/22-rdf-syntax-ns#type[0m
[2m.... http://www.w3.org/2002/07/owl#NamedIndividual[39m[0m

https://w3id.org/audioset#sigh2 audioset:sigh2
[31mhttps://w3id.org/audioset#sigh2[0m[0m
[30m=> http://www.w3.org/1999/02/22-rdf-syntax-ns#type[0m
[2m.... https://w3id.org/audioset#Sigh[39m[0m
[30m=> http://www.w3.org/2000/01/rdf-schema#comment[0m
[2m.... https://youtu.be/giY25pWyJxM?start=140&end=150[39m[0m
[30m=> http://www.w3.org/1999/02/22-rdf-syntax-ns#type[0m
[2m.... http://www.w3.org/2002/07/owl#NamedIndividual[39m[0m



### Visualize with Ontospy docs

Experimental, it is recommended to generate the documentation from the commandline (cf. `README.md` file) 

In [14]:
# from ontospy.ontodocs.viz.viz_html_single import *

# v = HTMLVisualizer(audioset_spy) # => instantiate the visualization object
# v.build("/notebooks/docs") # => render visualization. You can pass an 'output_path' parameter too
# v.preview() # => open in browser

### Visualize with WebVOWL

Use the URL to the ontology file:

[http://www.visualdataweb.de/webvowl/#iri=https://raw.githubusercontent.com/MaastrichtU-IDS/audioset-owl/master/ontologies/audioset.rdf](http://www.visualdataweb.de/webvowl/#iri=https://raw.githubusercontent.com/MaastrichtU-IDS/audioset-owl/master/ontologies/audioset.rdf)

### Load the ontology RDF with `rdflib`

Use `rdflib` and `networkx` to load the data in the graph and display it (not working with the ontology size, to be improved.

Visualize as graph using networkx

In [6]:
import rdflib
from rdflib import Graph, ConjunctiveGraph, plugin, Literal, RDF, URIRef, Namespace
from rdflib.serializer import Serializer
from rdflib.namespace import RDFS, XSD, DC, DCTERMS, VOID, OWL, SKOS
# from rdflib.plugins.sparql.parser import Query, UpdateUnit
# from rdflib.plugins.sparql.processor import translateQuery
# from rdflib.extras.external_graph_libs import rdflib_to_networkx_multidigraph
# import networkx as nx
# import matplotlib.pyplot as plt

g = rdflib.Graph()
result = g.parse('ontologies/audioset.rdf', format='xml')
for owl_class in g.subjects(RDF.type, OWL.Class):
    print(owl_class)
# G = rdflib_to_networkx_multidigraph(result)

# # # Plot Networkx instance of RDF Graph
# pos = nx.spring_layout(G, scale=3)
# edge_labels = nx.get_edge_attributes(G, 'r')
# nx.draw_networkx_edge_labels(G, pos, labels=edge_labels)
# nx.draw(G, with_labels=True)

ps://w3id.org/audioset#Harp
https://w3id.org/audioset#SonicBoom
https://w3id.org/audioset#Skidding
https://w3id.org/audioset#Digestive
https://w3id.org/audioset#Oboe
https://w3id.org/audioset#WindChime
https://w3id.org/audioset#BabyCryInfantCry
https://w3id.org/audioset#SoulMusic
https://w3id.org/audioset#Scrape
https://w3id.org/audioset#PowerTool
https://w3id.org/audioset#SlapSmack
https://w3id.org/audioset#AcousticGuitar
https://w3id.org/audioset#AmbulanceSiren
https://w3id.org/audioset#Mantra
https://w3id.org/audioset#Quack
https://w3id.org/audioset#AircraftEngine
https://w3id.org/audioset#ChuckleChortle
https://w3id.org/audioset#Boom
https://w3id.org/audioset#Yawn
https://w3id.org/audioset#HumanGroupActions
https://w3id.org/audioset#Animal
https://w3id.org/audioset#Sizzle
https://w3id.org/audioset#SocaMusic
https://w3id.org/audioset#Glass
https://w3id.org/audioset#Pour
https://w3id.org/audioset#Biting
https://w3id.org/audioset#WolfWhistling
https://w3id.org/audioset#Techno
https://