# Jupyter Notebooks and CONSTELLATION

This notebook is an introduction to using Jupyter notebooks with CONSTELLATION. In part 1, we'll learn how to send data to CONSTELLATION to create and modify graphs. In part 2, we'll learn how to retrieve graph data from CONSTELLATION. Part 3 will be about getting and setting information about the graph itself. Part 4 will show how to call plugins. Part 5 is a quick look at types. Part 6 will be fun (and occasionally useful). Part 7 introduces some advanced graph usage.

To run through the notebook, click on the triangular 'run cell' button in the toolbar to execute the current cell and move to the next cell.

Let's start by seeing if we can talk to CONSTELLATION. Make sure that CONSTELLATION is running, and you've started the external scripting server (which has been done for you if you started the Jupyter notebook server from CONSTELLATION). The external scripting server makes a REST HTTP API available for use by any HTTP client.

The Python ``import`` statement looks for a library with the given name. Click the 'run cell' button to execute it.

(All of the libraries used here are included in the Anaconda Python distribution.)

In [None]:
import io
import os
import pandas as pd
import PIL.Image, PIL.ImageDraw, PIL.ImageFilter, PIL.ImageFont

# Also import some of the notebook display methods so we can display nice things.
#
from IPython.display import display, HTML, Image

# This is a convenient Python interface to the REST API.
#
import constellation_client

In [None]:
cc = constellation_client.Constellation()

When the external scripting server started, it automatically downloaded ``constellation_client.py`` into your ``.ipython`` directory. It's also important that you create a client instance **after** you start the REST server, because the server creates a secret that the client needs to know to communicate with the server.)

When the import succeeds, we then create a Python object that communicates with CONSTELLATION on our behalf. CONSTELLATION provides communication with the outside world using HTTP (as if it were a web server) and JSON (a common data format). The ``constellation_client`` library hides these details so you can just use Python.

## Part 1: Sending Data to CONSTELLATION

Typically you'll have some data in a CSV file. We'll use some Python tricks (in this case, ``io.StringIO``) to make it look like we have a separate CSV file that we're reading into a dataframe. (If your data is in an Excel spreadsheet, you could use ``read_excel()`` to read it it directly, rather than saving it to a CVS file first.)

In [None]:
csv_data = '''
from_address,from_country,to_address,to_country,dtg
abc@example1.com,Brazil,def@example2.com,India,2017-01-01 12:34:56
abc@example1.com,Brazil,ghi@example3.com,Zambia,2017-01-01 14:30:00
jkl@example4.com,India
'''.strip()
df = pd.read_csv(io.StringIO(csv_data))
df

Putting our data in a dataframe is a good idea; not only can we easily manipulate it, but it's easy to send a dataframe to CONSTELLATION, as long as we tell CONSTELLATION what data belongs where.

A dataframe is a table of data, but CONSTELLATION deals with graphs, so we need to reconcile a data table and a graph. It shouldn't be too hard to notice (especially given the column names) that a row of data in the dataframe represents a transaction: the source node has the "from" attributes, the destination node has the "to" attributes, and the transaction has the dtg attribute. The first row therefore represents a connection from ``abc@example1.com`` with country value ``Brazil`` to ``def@example2.com`` with country India. The last row represents a node that is not connected to any other node.

Let's massage the data to something that CONSTELLATION likes. All of the addresses are email addresses, which CONSTELLATION should be clever enough to recognise, but we'd prefer to be explicit, so let's add the types.

In [None]:
df.from_address = df.from_address + '<Email>'
df.to_address = df.to_address + '<Email>'
df

Dataframes are clever enough to work on a column at a time; we don't have to do our own loops.

Let's check the data types.

In [None]:
df.dtypes

All of the columns are of type ``object``, which in this case means "string". However, CONSTELLATION expects datetimes to actually be of ``datetime`` type; if we try and upload datetimes as strings, CONSTELLATION won't recognise them as datetimes.

Not to worry: pandas can fix that for us.

In [None]:
df.dtg = pd.to_datetime(df.dtg)
df

The datetimes look exactly the same, but notice that the ``Not a Number`` value in the last row has become a ``Not a Timestamp`` value. If we look at the data types again, we can see that the ``dtg`` values are now datetimes, not objects.

In [None]:
df.dtypes

The ``datetime64[ns]`` type means that datetimes are stored as a 64-bit number representing a number of nanoseconds from a zero timestamp. Not that we care that much about the storage: the important thing is that ``dtg`` is now 
a datetime column.

CONSTELLATION recognises source, destination and transaction attributes by the prefixes of their names. It won't be too surprising to find out that the prefixes are ``source``, ``destination``, and ``transaction``, with a ``.`` separating the prefixes from the attribute names.

Let's rename the columns to match what CONSTELLATION expects. (We didn't do this first because the column headers were valid Python identifiers, it was easier to type ``df.dtg`` than ``df['transaction.DateTime']``.)

Note that we use the name ``Identifier`` for the values that uniquely identify a particular node.

In [None]:
df.rename(columns={
    'from_address': 'source.Label',
    'from_country': 'source.Geo.Country',
    'to_address': 'destination.Label',
    'to_country': 'destination.Geo.Country',
    'dtg': 'transaction.DateTime'},
    inplace=True)
df

Now the dataframe is ready to be sent to CONSTELLATION. We'll create a new graph (using the ``new_graph()`` method), and send the dataframe to CONSTELLATION using the ``put_dataframe()`` method.

If you get a Python `ConnectionRefusedError` when you run this cell, you've probably forgotten to start the CONSTELLATION external scripting server in the Tools menu. If you start it now, you'll have to go back and re-execute the "`cc = constellation_client.Constellation()`" cell, then come back here.)

In [None]:
cc.new_graph()
cc.put_dataframe(df)

CONSTELLATION creates a new graph, accepts the contents of the dataframe, applies the schema, and automatically arranges the graph. Finally, it resets the view so you can see the complete graph.

In this simple case, it's easy to see that the first two rows of the dataframe are correctly represented as nodes with transactions between them. The third row of the dataframe does not have a destination, so there is no transaction.

If you select the transactions, you'll see that they have the correct ``DateTime`` values.

Of course, we didn't have to create a new graph. In the same graph, let's add a new node with a transaction from an existing node (`ghi@example3.com`).

In [None]:
csv_data = '''
from_address,from_country,to_address,to_country,dtg
ghi@example3.com,Zambia,mno@example3.com,Brazil,2017-01-02 01:22:33
'''.strip()
dfn = pd.read_csv(io.StringIO(csv_data))
dfn = dfn.assign(from_address=dfn.from_address + '<Email>')
dfn = dfn.assign(to_address=dfn.to_address + '<Email>')
dfn = dfn.assign(dtg = pd.to_datetime(dfn.dtg))
dfn.rename(columns={
    'from_address': 'source.Label',
    'from_country': 'source.Geo.Country',
    'to_address': 'destination.Label',
    'to_country': 'destination.Geo.Country',
    'dtg': 'transaction.DateTime'},
    inplace=True)
cc.put_dataframe(dfn)

## Part 2: Getting Data from CONSTELLATION

We'll use the graph that we created in Part 1 to see what happens when we get data from CONSTELLATION.

In [None]:
df = cc.get_dataframe()
df

In [None]:
df.columns

We added five columns in part 1, but we get 50+ columns back! (The number may vary depending on the version of CONSTELLATION and your default schema.) 

What's going on?

Remember that CONSTELLATION will apply the graph's schema to your data, and do an arrangement. Those other columns are the result of applying the schema, or (in the case of the x, y, z columns) applying an arrangement. The dataframe will load the columns in no particular order.

Let's have a look at the data types in the dataframe.

In [None]:
df.dtypes

The various ``selected`` columns are bool (that is, ``true`` or ``false`` values): an element is either selected or not selected. The ``transaction.DateTime`` is a ``datetime64[ns]`` as expected. Everything else should be unsurprising. One thing to notice is that ``source.nradius`` may be an ``int64``, even though in CONSTELLATION it's a ``float``. This is because ``nradius`` usually has integer values (typically 1.0), so the dataframe will convert it to an ``int64``. This shouldn't be a problem for us; it's still a number. This can happen for any column that only has integral values.

We can see what the CONSTELLATION types are using ``cc``'s type attribute: the ``Constellation`` instance will remember the types after each call to ``get_dataframe()``. (Usually you won't have to worry about these.)

In [None]:
cc.types

CONSTELLATION types such ``boolean``, ``datetime``, ``float``, ``int``, ``string`` convert to their obvious types in a dataframe. Other types convert to reasonable string equivalents; for example, ``icon`` converts to a string containing the name of the icon.

The ``color`` type converts to a ``[red, green, blue, alpha]`` list, where each value ranges from 0 to 1. Some people are more used to web colors (in the format #RRGGBB). The following function converts a color list to a web color.

In [None]:
def to_web_color(color):
    """Convert an RGB tuple of 0..1 to a web color."""
    
    return f'#{int(color[0]*255):02x}{int(color[1]*255):02x}{int(color[2]*255):02x}'

For example:

In [None]:
print(df['source.color'])
print(df['source.color'].apply(to_web_color))

Which allows us to display labels using their node's schema color.

In [None]:
import html
for label,color in df[['source.Label', 'source.color']].get_values():
    h = '<span style="color:{}">{}</span>'.format(to_web_color(color), html.escape(label))
    display(HTML(h))

### Graph elements

Calling ``get_dataframe()`` with no parameters gave us four rows representing the whole graph: one row for each transaction, and a row for the singleton node.

Sometimes we only want parts of the graph. We can ask for just the nodes.

In [None]:
df = cc.get_dataframe(vx=True)
df

Five rows, one for each node. Note that all of the columns use the ``source`` prefix.

We can ask for just the transactions.

In [None]:
df = cc.get_dataframe(tx=True)
df

Three rows, one for each transaction. Note that transactions always include the source and destination nodes.

Finally, you can get just the elements that are selected. Before you run the next cell, use your mouse to select two nodes in the current graph.

In [None]:
df = cc.get_dataframe(vx=True, selected=True)
df

Two rows, one for each selected node. (If you don't see any rows here, it's because you didn't select any nodes. Select a couple of nodes and run the cell again.)

Generally, you'll probably want one of ``vx=True`` when you're looking at nodes, or ``tx=True`` when you're looking at transactions.

### Choosing attributes

You generally don't want all of the attributes that CONSTELLATION knows about. For example, the x,y,z coordinates are rarely useful when you're analysing data. The ``get_dataframe()`` method allows you to specify only the attributes you want. Not only does this use less space in the dataframe, but particularly for larger graphs, it can greatly reduce the time taken to get the data into a dataframe.

First we'll find out what graph, node, and transaction attributes exist. The `get_attributes()` method returns a dictionary mapping attribute names to their CONSTELLATION types. For consistency with the other method return values, the attribute names are prefixed with `graph.`, `source.`, and `transaction.`.

In [None]:
attrs = cc.get_attributes()
attrs

To specify just the attributes you want, pass a list of attribute names using the ``attrs`` parameter.

In [None]:
df = cc.get_dataframe(vx=True, attrs=['source.Identifier', 'source.Type'])
df

### Updating the graph: nodes

There is a special attribute for each element that isn't visible in CONSTELLATION: ``source.[id]``, ``destination.[id]``, and ``transaction.[id]``. These are unique identifiers for each element. These identifiers can change whenever a graph is modified, so they can't be relied on to track an element. However, they can be used to identify a unique element when you get a dataframe, modify a value, and send the dataframe back to CONSTELLATION.

For example, suppose we want to make all nodes in the ``@example3.com`` domain larger, and color them blue. We need the ``Identifier`` attribute (for the domain name), the ``nradius`` attribute so we can modify it, and the ``source.[id]`` attribute to tell CONSTELLATION which nodes to modify. We don't need to get the color, because we don't care what it is before we change it. xx

In [None]:
df = cc.get_dataframe(vx=True, attrs=['source.Identifier', 'source.nradius', 'source.[id]'])
df

Let's filter out the ``example3.com`` nodes and double their radii.

In [None]:
e3 = df[df['source.Identifier'].str.endswith('@example3.com')].copy()
e3['source.nradius'] *= 2
e3

We don't need to send the ``source.Identifier`` column back to CONSTELLATION, so let's drop it. We'll also add the color column. (Fortunately, CONSTELLATION is quite forgiving about color values.)

In [None]:
e3.drop('source.Identifier', axis=1, inplace=True)
e3['source.color'] = 'blue'
e3

Finally, we can send this dataframe to CONSTELLATION.

In [None]:
cc.put_dataframe(e3)

The two ``example3.com`` nodes should be noticably larger. However, the colors didn't change. This is because one of the things that CONSTELLATION does for us is to apply the graph's schema whenever you call ``put_dataframe()``, so the color changes to blue, then is immediately overridden by the schema.

Let's put the node sizes back to 1, and call ``put_dataframe()`` again, but this time tell CONSTELLATION not to apply the schema.

In [None]:
e3['source.nradius'] = 1
cc.put_dataframe(e3, complete_with_schema=False)

Better.

Another thing that CONSTELLATION does for a ``put_dataframe()`` is a simple arrangement. If you want to create your own arrangement, you have to tell CONSTELLATION not to do this using the ``arrange`` parameter.

Let's arrange the nodes in a circle, just like the built-in circle arrangement. (Actually, wih only five nodes, it's more of a pentagon.) We don't need to know anything about the nodes for this one, we just need to know they exist.

In [None]:
df = cc.get_dataframe(vx=True, attrs=['source.[id]'])
df

In [None]:
n = len(df)
import numpy as np
df['source.x'] = n * np.sin(2*np.pi*(df.index/n))
df['source.y'] = n * np.cos(2*np.pi*(df.index/n))
df['source.z'] = 0
df

In [None]:
cc.put_dataframe(df, arrange='')

The empty string tells CONSTELLATION not to perform any arrangement. (You could put the name of any arrangement plugin there, but there are better ways of doing that.)

Also note that the blue nodes aren't blue any more, because the schema was applied.

### Updating the graph: transactions

The graph we created earlier has a problem: the transactions have the wrong type. More precisely, they don't have any type. Let's fix that. We'll get all of the transactions from the graph, give them a type, and update the graph.

When you run this, the transactions will turn green, indicating that schema completion has happened. You can look at the Attribute Editor to see that the transactions types are now "Email".

In [None]:
tx_df = cc.get_dataframe(tx=True, attrs=['transaction.[id]'])
display(tx_df)
tx_df['transaction.Type'] = 'Email'
display(tx_df)
cc.put_dataframe(tx_df)

### Updating the graph: custom attributes

Sometimes we want to add attributes that aren't defined in the graph's schema. For example, let's add an attribute called ``Country.Chars`` that shows the number of characters in each node's country name.

In [None]:
c_df = cc.get_dataframe(vx=True, attrs=['source.[id]', 'source.Geo.Country'])
c_df['source.Country.Chars'] = c_df['source.Geo.Country'].str.len()
display(c_df)
display(c_df.dtypes)
cc.put_dataframe(c_df)

If you look at the Attribute Editor, you'll see the new node attribute ``Country.Chars``. However, if you right-click on the attribute and select ``Modify Attribute``, you'll see that the new attribute is a string, not an integer, even though the value is an integer in the dataframe. This is because CONSTELLATION assumes that everything it doesn't recognise is a string.

We can fix this by suffixing a type indicator to the column name. Let's create a new attribute called ``Country.Length`` which we turn into an integer by adding ``<integer>`` to the name.

In [None]:
c_df = cc.get_dataframe(vx=True, attrs=['source.[id]', 'source.Geo.Country'])
c_df['source.Country.Length<integer>'] = c_df['source.Geo.Country'].str.len()
display(c_df)
cc.put_dataframe(c_df)

Looking at ``Country.Length`` in the Attribute Viewer, we can see that it is an integer.

Other useful types are ``float`` and ``datetime``. You can see the complete list of types by adding a custom attribute in the Attribute Editor and looking at the ``Attribute Type`` dropdown list.

(Note that there is currently no way to delete attributes externally, so if you want to delete the ``Country.Chars`` attribute, you'll have to do it manually.)

### Deleting nodes and vertices

The special identifier ``[delete]`` lets you delete nodes and transactions from the graph. It doesn't matter what value is in the ``source.[delete]`` column - just the fact that the column is there is sufficient to delete the graph elements.

Let's delete all singleton nodes. These nodes have no transactions connected to them, so when we get a dataframe, the ``destination.[id]`` value will be ``NaN``.

(If we get all nodes with ``vx=True``, we won't get any data about transactions. If we get all transactions with ``tx=True``, we won't get the singleton nodes.)

In [None]:
# Get the graph. (Names are included so we can check that the dataframe matches the graph.)
#
df = cc.get_dataframe(attrs=['source.[id]', 'source.Identifier', 'destination.[id]', 'destination.Identifier'])
display(df)

# Keep the singleton rows (where the destination.[id] is null).
#
df = df[df['destination.[id]'].isnull()]
display(df)

# Create a new dataframe with a source.[id] column containing all of the values from the df source.[id] column,
# and a source.[delete] column containing any non-null value
#
del_df = pd.DataFrame({'source.[id]': df['source.[id]'], 'source.[delete]': 0})
display(del_df)

# Delete the singletons.
#
cc.put_dataframe(del_df)

Likewise, we can delete transactions. Let's delete all transactions originating from ``ghi`` .

In [None]:
# Get all transactions.
# We don't need all of the attributes for the delete, but we'll get them to use below.
#
df = cc.get_dataframe(tx=True)
display(df)

# Keep the transactions originating from 'ghi'.
#
df = df[df['source.Identifier'].str.startswith('ghi@')]
display(df)

# Create a new dataframe.
#
del_df = pd.DataFrame({'transaction.[id]': df['transaction.[id]'], 'transaction.[delete]': 0})
display(del_df)

# Delete the transactions.
#
cc.put_dataframe(del_df)

And let's add a transaction that is exactly the same as the original. Remember that we originally fetched all of the attributes, so this new transaction will have the same attribute values.

In [None]:
cc.put_dataframe(df)

## Part 3: Graph Attributes

As well as node and transaction attributes, we can also get graph attributes. (This covers the attributes you can see in CONSTELLATION's Attribute Editor.)

In [None]:
df = cc.get_graph_attributes()
df

Let's display the Geo.Country attribute in a small size above the nodes, and the country flag as a decorator on the top-right of the node icon.

A node label is defined as *``attribute-name``*``;``*``color``*``;``*``size``*, with multiple labels separated by pipes "|".

A decorator is defined as ``"nw";"ne";"se";"sw";`` where any of the direction ordinals may be blank.

In [None]:
labels = 'Geo.Country;Orange;0.5'
df = pd.DataFrame({'node_labels_top': [labels], 'decorators': [';"Geo.Country";;;']})
cc.set_graph_attributes(df)

(You may have to zoom in to see the smaller labels.)

To add a label on the bottom in addition to the default ``Label`` attribute, you have to specify both labels.

In [None]:
labels = 'Type;Teal;0.5|Label;LightBlue;1'
df = pd.DataFrame({'node_labels_bottom': [labels]})
cc.set_graph_attributes(df)

## Part 4: Types

CONSTELLATION defines many types. Use the ``describe_type()`` method to get a description of a particular type.

In [None]:
t = cc.describe_type('Email')
t

## Part 5: Plugins

You can call CONSTELLATION plugins from Python (if you know what they're called). Let's arrange the graph in trees.

In [None]:
cc.run_plugin('ArrangeInTrees')

If we can't see all of the graph, reset the view.

In [None]:
cc.run_plugin('ResetView')

You can also call plugins with parameters (if you know what they are). For example, the ``AddBlaze`` plugin accepts a node id to add a blaze to.

Let's add a blaze to each ``example3.com`` node.

In [None]:
df = cc.get_dataframe(vx=True, attrs=['source.Identifier', 'source.[id]'])
e3 = df[df['source.Identifier'].str.endswith('@example3.com')]

cc.run_plugin('AddBlaze', args={'BlazeUtilities.vertex_ids': list(e3['source.[id]'])})

Let's be neat and tidy and remove them again.

In [None]:
cc.run_plugin('RemoveBlaze', args={'BlazeUtilities.vertex_ids': list(e3['source.[id]'])})

While most parameter values are quite simple (strings, integers, etc), some are a little more complex to deal with, such as the multichoice parameter. In order to pass multichoice parameter values to a plugin, you need to know the possible choices, and you need to know how to select them. 

Let's use the <i>select top n</i> plugin as an example. The schema view tells us that this plugin has a multichoice parameter called <i>SelectTopNPlugin.type</i>.

Looking in the Data Access View, the type options will vary depending on the value given to the <i>SelectTopN.type_category</i> parameter. For this example we we set the type category to "Online Identifier", which will result in the possible type options being:
- Online Identifier 
- Email

In order to use this parameter, we need to create a string containing all options by joining each option with '\n'. We also need to select all the options we want by prefixing them with the '✓ ' character (i.e. Unicode character 10003 (check mark) followed by character 32 (space)). 

This is obviously not an ideal system, but this is how multichoice parameters were implemented at a time when it wasn't expected that CONSTELLATION's internal workings would be exposed via scripting or a REST API.

In [None]:
# select a node
cc.run_plugin('SelectSources')
# run the "select top n" plugin with a custom multichoice parameter value.
CHECK = '\u2713'

options = ['Online Identifier', 'Email', 'User Name']
checked = ['Email']
parameters = {
    'SelectTopNPlugin.mode': "Node",
    'SelectTopNPlugin.type_category': 'Online Location',
    'SelectTopNPlugin.type': '\n'.join([f'{CHECK} {v}' if v in checked else v for v in options]),
    'SelectTopNPlugin.limit': 2
}

cc.run_plugin('SelectTopN', args=parameters)

So how do we know what plugins exist?

In [None]:
plugins = cc.list_plugins()
plugins

Unfortunately, at the moment there is no way of using the REST API to find out what each plugin does or what parameters it takes. However, you can go the the Schema View in CONSTELLATION and look at the ``Plugins`` tab.

## Part 6: Data Access Plugins

Data Access plugins in CONSTELLATION are like any other plugins; they just have a different user interface. This means that they can be called from an external scripting client just like any other plugin.

One caveat is that many of these plugins use the global parameters (seen at the top of the Data Access View).

- Query Name
- Range

Let's try running a data access plugin, although to avoid connectivity problems we'll use the <i>Test Parameters</i> plugin in the <strong>Developer</strong> category of the Data Access View. This plugin doesn't actually access any external data, but rather simply exists to test the mechanisms CONSTELLATION uses to build and use plugin parameters. The plugin has many parameters, but for this example we will focus on the following:

- ``GlobalCoreParameters.query_name``: A string representing the name of the query.
- ``GlobalCoreParameters.datetime_range``: The datetime range; see below.

You might want to try running this plugin manually on an empty graph before running the code below. The plugin will create two connected nodes containing attribute values reflecting the values specified by the plugin parameters. Run the plugin a few times, changing the parameters each time, to satisfy yourself that this is the case. After you've done that, let's try running it programmatically.

In [None]:
# running the "test parameters" plugin with custom parameter values.
def get_data():
    df = cc.get_dataframe()
    print('query_name     :', df.loc[0, 'source.Comment'])
    print('datetime_range :', df.loc[0, 'destination.Comment'])
    print('all_parameters :', df.loc[0, 'transaction.Comment'])

cc.new_graph()

i = 0
i += 1

parameters = {
    'GlobalCoreParameters.query_name': 'Query %d from a REST client' % i,
    'GlobalCoreParameters.datetime_range': 'P1D'
}

cc.run_plugin("TestParameters", args=parameters)

get_data()

The datetime range can be an explicit range, or a duration from the current time.

### Datetime range

A range is represented by two ISO 8601 datetime values separated by a semi-colon. This represents an explicit start and end point. Examples are:

- ``2016-01-01T00:00:00Z;2016-12-31T23:59:59Z``
- ``2017-06-01T12:00:00Z;2017-06-01T13:00:00Z``

### Datetime duration

A duration is represented by a single ISO 8601 duration. This is converted to an explicit datetime range when the query is run. Examples are:

- ``P1D``: one day
- ``P7D``: 7 days
- ``P1M``: one month
- ``P1Y``: one year
- ``P1M7D``: one month and seven days

Note that only years, months, and days are supported (so ``P1H`` for one hour is not a valid period, for example.) For durations other than those, use Python to determine an explicit range.

Let's try calling the plugin again.

In [None]:
cc.new_graph()

i += 1

parameters['GlobalCoreParameters.query_name'] = 'Query %d from a REST client' % i
parameters['GlobalCoreParameters.datetime_range'] = '2017-07-01T00:21:15Z;2017-07-14T00:21:15Z'

cc.run_plugin("TestParameters", args=parameters)

get_data()

## Part 6: Taking a Screenshot

It can be useful to include a screenshot of the graph in a notebook. It's easy to get an image encoded as data representing a PNG file.

In [None]:
buf = cc.get_graph_image()
Image(buf)

Here we used the built-in notebook facilities to display the image (which is returned from CONSTELLATION as a sequence of bytes, the encoding of the image in PNG format).

If another window overlaps CONSTELLATION's graph display, you might see that window in the image. One way of avoiding this is to resize the CONSTELLATION window slightly first. Another way is to add a sleep before the get_graph_image call and click in the CONSTELLATION window to bring it to the top.

We can also use PIL (the Python Image Library) to turn the bytes into an image and manipulate it.

In [None]:
img = PIL.Image.open(io.BytesIO(buf))

You might want to resize the image to fit it into a report.

In [None]:
def resize(img, max_size):
    w0 = img.width
    h0 = img.height
    s = max(w0, h0)/max_size
    w1 = int(w0//s)
    h1 = int(h0//s)
    print(f'Resizing from {w0}x{h0} to {w1}x{h1}')
    
    return img.resize((w1, h1))

In [None]:
small = resize(img, 512)

# PIL images know how to display themselves.
#
small

The image can be saved to a file. You can either write the bytes directly (remember the bytes are already in PNG format), or save the PIL image.

In [None]:
with open('my_constellation_graph.png', 'wb') as f:
    f.write(buf)

In [None]:
img.save('my_small_constellation_graph.png')

PIL is fun.

In [None]:
small.filter(PIL.ImageFilter.EMBOSS)

In [None]:
w = small.width
h = small.height
small.crop((int(w*0.25), int(h*0.25), int(w*0.75), int(h*0.75)))

In [None]:
# Fonts depend on the operating system.
#
if os.name=='nt':
    font = PIL.ImageFont.truetype('calibri.ttf', 20)
else:
    font = PIL.ImageFont.truetype('Oxygen-Sans.ttf', 20)
draw = PIL.ImageDraw.Draw(small)
draw.text((0, 0), 'This is my graph, it is mine.', (255, 200, 40), font=font)
small

# Part 7: NetworkX

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

This notebook isn't going to teach you how to use NetworkX, but you can extract your CONSTELLATION graph into a NetworkX graph for further analysis.

We'll start by getting a dataframe containing the graph data.

In [None]:
cc.run_plugin('ArrangeInGridGeneral')
df = cc.get_dataframe()
df.head()

The ``constellation_client`` library contains a function that converts a dataframe to a NetworkX graph. You can see the documentation for it using the notebook's built-in help mechanism.

In [None]:
constellation_client.nx_from_dataframe?

When you've looked at the help, close the help window and create a NetworkX graph from the dataframe.

In [None]:
g = constellation_client.nx_from_dataframe(df)
g

We can look at a node and see that it has the expected attributes.

In [None]:
g.nodes()[0]

We can look at an edge and see that it has the expected attributes.

In [None]:
g.edges[list(g.edges())[0]]

NetworkX can draw its graphs using a plotting library called ``matplotlib``. We just need to tell ``matplotlib`` to draw in the notebook, and get the correct positions and colors from the node and edge attributes. (We can use a convenience function provided by ``constellation_client`` to get the positions.)

In [None]:
import networkx as nx
%matplotlib inline

pos = constellation_client.get_nx_pos(g)
node_colors = [to_web_color(g.nodes[n]['color']) for n in g.nodes()]
edge_colors = [to_web_color(g.edges[e]['color']) for e in g.edges()]

nx.draw(g, pos=pos, node_color=node_colors, edge_color=edge_colors)