# Accessing IRIS NLP from Python

In this notebook, we'll leverage the NLP capabilities of InterSystems IRIS, also know as [iKnow](https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GIKNOW). The iKnow NLP engine is also available as an [open-source Python library `iknowpy`](https://github.com/intersystems/iknow), which offers the same linguistic analysis capabilities through a standalone engine for use in Python scripts and applications. However, the version embedded in IRIS leverages the platforms database capabilities to enable cross-document analyses, storing the engine output in a "domain" and offering SQL projections of the raw contents as well as rich REST and ObjectScript APIs on top. In this notebook, we'll show how to access the latter from Python.

We'll first connect to IRIS using the [Native API for Python](https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=BPYNAT_about). If you haven't already, install the DB-API from the wheel posted [here](https://intersystems-community.github.io/iris-driver-distribution/) to get access to the `iris` module. After establishing a connection, we can use the [`%SYSTEM.iKnow`](https://docs.intersystems.com/irislatest/csp/documatic/%25CSP.Documatic.cls?LIBRARY=%25SYS&CLASSNAME=%25SYSTEM.iKnow) API to print the domains already created in this namespace. This may show up empty if you haven't used iKnow before.

In [43]:
import iris

# Open a connection to the server
args = {
        'hostname':'localhost', 
        'port': 1972,
        'namespace':'USER', 
        'username':'_SYSTEM', 
        'password':'SYS'
}
connection = iris.connect(**args)

# Create an IRIS object
irispy = iris.createIRIS(connection)

# Invoke the ListDomains() method
irispy.classMethodVoid('%SYSTEM.iKnow','ListDomains')


Domains for Namespace USER:
 Domain ID : Domain name                              : # of sources : version
 --------- : ---------------------------------------- : ------------ : -------
         1 : HelloWorld                               :            3 :       5


## Creating an iKnow Domain

iKnow Domains are repositories to which you can add "sources". Source text is indexed by the iKnow engine and the resulting concepts and relationships are then stored in the domain for you to access through the various APIs. The easiest way to create a domain is using the [iKnow Architect](https://community.intersystems.com/post/creating-domain-iknow-domain-architect), a GUI available from the System Management Portal. This will create a Domain Definition, which registers where data needs to be loaded from so you can easily rebuild it from scratch. 

In this demo, we'll use a lower-level, fully programmatic approach and interact with the [`%iKnow.Domain`](https://docs.intersystems.com/irislatest/csp/documatic/%25CSP.Documatic.cls?LIBRARY=%25SYS&CLASSNAME=%25iKnow.Domain) class directly. Note that you may also programmatically create a Domain Definition by instantiating the [`%iKnow.Model.domain`](https://docs.intersystems.com/irislatest/csp/documatic/%25CSP.Documatic.cls?LIBRARY=%25SYS&CLASSNAME=%25iKnow.Model.domain) class and then using its [`%SaveToClass()`](https://docs.intersystems.com/irislatest/csp/documatic/%25CSP.Documatic.cls?LIBRARY=%25SYS&CLASSNAME=%25iKnow.Model.domain#%25SaveToClass) method if you'd like to take advantage of the rebuildability of Domain Definitions.

In [22]:
# First drop any prior domain
if (irispy.classMethodValue('%iKnow.Domain','Exists','HelloWorld') > 0):
    print('Domain already exists, dropping first...')
    irispy.classMethodVoid('%iKnow.Domain','Delete','HelloWorld')
    
# Now create a new domain. 
domain = irispy.classMethodValue('%iKnow.Domain','Create','HelloWorld')
domain_id = domain.get("Id")
print('Domain created with ID: '+str(domain_id))

# Note that we could also use the %New()/%Save() methods, but that's slightly more verbose from Python

Domain already exists, dropping first...
Domain created with ID: 1


To add data to the domain, the easiest way to get started is using the [`%SYSTEM.iKnow`](https://docs.intersystems.com/irislatest/csp/documatic/%25CSP.Documatic.cls?LIBRARY=%25SYS&CLASSNAME=%25iKnow.Model.domain#%25SaveToClass) API again, which takes the domain names as its first argument.

In [24]:
# Add some text straight from a string argument using IndexString()
# Note that the second argument to that method (and therefore the fourth to classMethodVoid()) is a unique "external"
# identifier we can use to look up our source afterwards. We stick to dull identifiers for now :-)
irispy.classMethodVoid('%SYSTEM.iKnow', 'IndexString', 'HelloWorld', 'Test1',
                           'Here\'s a first example of a piece of text indexed by the iKnow engine.')
irispy.classMethodVoid('%SYSTEM.iKnow', 'IndexString', 'HelloWorld', 'Test2',
                           'Let\'s add more text for the iKnow engine to process. Of course there can be multiple sentences.')

# Next we'll create a separate configuration to auto-detect the language for English or Frech text
languages = iris.IRISList()
languages.add('fr')
languages.add('en')
irispy.classMethodVoid('%iKnow.Configuration', 'Create', 'EnglishOrFrench', 1, languages)
irispy.classMethodVoid('%SYSTEM.iKnow', 'IndexString', 'HelloWorld', 'Test3',
                           'On peut également utiliser plusieurs langues dans la même domaine.\n'+
                           'In fact we can even use different languages in the same text!', 'EnglishOrFrench')

# If you have some text files accessible from the IRIS server, you can use IndexFile() or IndexDirectory()
#irispy.classMethodVoid('%SYSTEM.iKnow', 'IndexDirectory', 'HelloWorld', '/data/nlp/input')

# Similarly, there's a simple method to index data accessible through SQL.
# This method takes additional arguments to identify columns for building the external ID and of course the text column.
#irispy.classMethodVoid('%SYSTEM.iKnow', 'IndexTable', 'HelloWorld', 'Aviation.Event', 
#    'EventID', 'LocationCountry', 'NarrativeSummary')

# And this utility method also prints the number of sources in our domain
irispy.classMethodVoid('%SYSTEM.iKnow','ListDomains')


Domains for Namespace USER:
 Domain ID : Domain name                              : # of sources : version
 --------- : ---------------------------------------- : ------------ : -------
         1 : HelloWorld                               :            3 :       5


## Querying the domain

The iKnow infrastructure in IRIS includes a comprehensive query API, implementing a few common types of "questions" you might want to ask of the data processed. In our HelloWorld demo domain there isn't as much data to query right now (unless you also indexed a directory or table with interesting contents), but let's take a closer look anyway. 

The iKnow query APIs return data by reference in an array with a `$list` for each result row. We'll first create a utility method for browsing these results. Note that there is also a SQL version of the API that is somewhat more straightforward to consume from Python, but when we start working with filters later on, programmatic access through the Native API will be easier again.

In [78]:
def iknow_query(api : str, method : str, domain_id : int, *args, outGlo : str = '^result'):
    
    # figure out result columns to build a nice return object
    proxy = irispy.classMethodValue('%Dictionary.ParameterDefinition', '%OpenId', api+'||'+method+'RT')
    raw = proxy.get('Default').split(',')
    return_cols = list(map(lambda x:x.split(':')[0], raw))
    return_types = list(map(lambda x:x.split(':')[1], raw))
    
    # call actual API method
    irispy.classMethodVoid(api, method, outGlo, domain_id, *args)
    
    # now iterate through output global
    result = []
    direction = 0
    subscript = 0
    while True:
        subscript = irispy.nextSubscript(direction, outGlo, subscript)
        if subscript == None: 
            break
        raw_row = iris.IRISList(irispy.getBytes(outGlo,subscript))
        row = {}
        for i, col in enumerate(return_cols):
            row[col] = raw_row.get(i+1)
        result.append(row)
        
    # clean up results
    irispy.kill(outGlo)
    
    return result

print('Top concepts:')
print(iknow_query('%iKnow.Queries.EntityAPI', 'GetTop', domain_id))

print()
print('Top entities similar to "text":')
print(iknow_query('%iKnow.Queries.EntityAPI', 'GetSimilar', domain_id, 'text'))

Top concepts:
[{'entUniId': 6, 'entity': 'text', 'frequency': 2, 'spread': 2}, {'entUniId': 8, 'entity': 'iknow engine', 'frequency': 2, 'spread': 2}, {'entUniId': 3, 'entity': 'first example', 'frequency': 1, 'spread': 1}, {'entUniId': 5, 'entity': 'piece', 'frequency': 1, 'spread': 1}, {'entUniId': 10, 'entity': 'more text', 'frequency': 1, 'spread': 1}, {'entUniId': 14, 'entity': 'multiple sentences', 'frequency': 1, 'spread': 1}, {'entUniId': 17, 'entity': 'plusieurs langues', 'frequency': 1, 'spread': 1}, {'entUniId': 20, 'entity': 'domaine', 'frequency': 1, 'spread': 1}, {'entUniId': 23, 'entity': 'different languages', 'frequency': 1, 'spread': 1}]

Top entities similar to "text":
[{'entUniId': 6, 'entity': 'text', 'frequency': 2, 'spread': 2}, {'entUniId': 10, 'entity': 'more text', 'frequency': 1, 'spread': 1}]
