<h2><i>kiara</i>: Network Analysis</h2>

Welcome back! Now that we're comfortable with what <i>kiara</i> looks like and what it can do to help track your data and your research process, let's try out some of the digital analysis tools, starting with <b>Network Analysis</b>.

<h2>Why Network Analysis?</h2>

Network Analysis offers a computational and quantitative means to examine and explore relational objects, with proxies to measure structural roles and concepts such as power and influence. Doing so digitally - and at scale - also allows us to consider these kinds of questions with large amounts of material or documents that was not  heretofore manageable with qualitative or manual approaches.

>INSERT REFERENCES OR WIDER REFERENCING HERE: PROGRAMMING HISTORIAN LINKS W/ BIBLIOGRAPHY SECTION BELOW

<h3>Getting Started</h3>

Let's start by double checking that we have all the required plugins and setting up an API for us to use <i>kiara</i>. We'll do this all in one go this time, but if you're unsure, feel free to head back to the <a href="http://dharpa.org/kiara.documentation/latest/workshop/workshop/">installation notebook</a> to look over this section again.

In [1]:
import csv
import networkx as nx
import matplotlib.pyplot as plt

try:
    from kiara_plugin.jupyter import ensure_kiara_plugins
except:
    import sys
    print("Installing 'kiara_plugin.jupyter'...")
    !{sys.executable} -m pip install -q kiara_plugin.jupyter
    from kiara_plugin.jupyter import ensure_kiara_plugins

ensure_kiara_plugins()

from kiara.api import KiaraAPI
kiara = KiaraAPI.instance()

Great, we're all set up. Let's download some sample data for our network analysis: this is some epistolary data taken from the <i>State Papers of England</i> between 1534-1540. 

<span style="color:blue">insert links for references when needed</span> 

As before, we can use the <i>kiara</i> function `download.file` to access our material, specifying the <span style="color:green">inputs</span> and runnning the function as our <span style="color:red">outputs</span>.

In [2]:
correspsearch = kiara.run_job('import.local.file', inputs={'path':'correspSearch.csv'})
correspsearch

<h2>Creating a Network</h2>

Time to make our network from this data. Let's start by searching for the <i>kiara</i> modules that are included in the `kiara_plugin.network_analysis` package.

In [4]:
infos = metadata = kiara.retrieve_operations_info()
operations = {}
for op_id, info in infos.item_infos.items():
    if info.context.labels.get("package", None) == "kiara_plugin.network_analysis":
        operations[op_id] = info

print(operations.keys())

dict_keys(['create.network_data.from.tables', 'export.network_data.as.csv_files', 'export.network_data.as.graphml_file', 'export.network_data.as.sql_dump', 'export.network_data.as.sqlite_db', 'network_data.check_clusters'])


There's lots of options for analysis, but we want to make our network first. Let's have a look what we need with the function `create.network_data.from.tables` using `kiara.retrieve_operation_info` once more.

<span style="color:blue">this will make more sense/have more options when analysis modules are incorporated into the network plugin</span>

In [5]:
kiara.retrieve_operation_info('create.network_data.from.tables')

Like other network analysis tools, <i>kiara</i> first needs the data as an edge table. This means we first have to transform the data we downloaded earlier into a table before we can create the network data. Let's start by using the `create.table.from.file` function that we used in the first notebook and storing this as our <b>edges</b>, then use this to create our network data using the `create.network_data.from.tables` that we just read about. In this, we are defining two different sets of <span style="color:green">inputs</span>, overriding the first variable once we have used it to create our table.

We're going to be using the network data quite a lot in the rest of the notebook, so lets store the output for this function as <b>network_data</b>. We can import a separate table with the nodes in, but this is optional, and for the moment let's stick with just the edge table.

In [6]:
inputs = {
    "file": correspsearch['file']
}

outputs = kiara.run_job('create.table.from.file', inputs={'file': correspsearch['file']})

edges = outputs['table']

inputs = {
    'edges': edges,
    'source_column_name': 'f0',
    'target_column_name': 'f1'
}

correspsearch = kiara.run_job('create.network_data.from.tables', inputs=inputs)
correspsearch

<h2>Network Info</h2>

In [7]:
correspsearch = correspsearch['network_data']
correspsearch.get_property_data('metadata.graph_properties')

In [20]:
kiara.retrieve_operation_info('network_data.check_clusters')

In [30]:
output = kiara.run_job('network_data.check_clusters', inputs={'network_data':correspsearch})
output

In [31]:
network_data = output['largest_component']

network_data.get_property_data('metadata.graph_properties')

<h2>Onboarding Data: An Alternative</h2>

In [25]:
kiara.retrieve_operation_info('onboard.gml_file')

In [26]:
lesmis = kiara.run_job('onboard.gml_file', inputs={'path':'lesmis.gml'})
lesmis

<h2>Network Analysis: Statistical Measures</h2>

In [27]:
kiara.retrieve_operation_info('create.degree_rank_list')

In [29]:
output = kiara.run_job('create.degree_rank_list', inputs={'network_data':network_data})
output

In [None]:
kiara.retrieve_operation_info('create.betweenness_rank_list')

In [None]:
centrality_network = output['centrality_network']

output = kiara.run_job('create.betweenness_rank_list', inputs={'network_data':centrality_network, 'weight_column_name':'value', 'weighted_betweenness':True})

output

In [None]:
kiara.retrieve_operation_info('create.eigenvector_rank_list')

In [None]:
centrality_network = output['centrality_network']

output = kiara.run_job('create.eigenvector_rank_list', inputs={'network_data':centrality_network, 'weight_column_name':'value'})

output

In [None]:
kiara.retrieve_operation_info('compute.modularity_group')

In [None]:
network = output['centrality_network']

output = kiara.run_job('compute.modularity_group', inputs={'network_data':network})

output

In [None]:
kiara.retrieve_operation_info('create.cut_point_list')

In [None]:
centrality_network = output['centrality_network']

output = kiara.run_job('create.cut_point_list', inputs={'network_data':centrality_network})

output

In [None]:
output['centrality_network'].lineage

In [None]:
kiara.retrieve_operation_info('export.network_data.as.graphml_file')