<h1>Hello <i>kiara</i>.</h1>

This tutorial introduces *kiara*, a data orchestration software.
It will walk you through installation of the software in Jupyter Notebooks, and some basic but essential functions that can be built on in further notebooks.

This tutorial requires you to know **python** and **SQL**.


<h2>Installation</h2>

First, we need to check if *kiara* and its plugins are already installed, and install them if not. There are currently seven plugins:

- `kiara_plugin.core-types`
- `kiara_plugin.onboarding`
- `kiara_plugin.tabular`
- `kiara_plugin.network_analysis`
- `kiara_plugin.language_processing`
- `kiara_plugin.html`
- `kiara_plugin.streamlit`

All of these will be installed automatically alongside *kiara*, using the code below.

<span style="color:blue">\* Note: would be great if this could link out to further documentation, with a separate section for each plugin, describing overall what is in each, then with specific details of the functions in each</span>

In [None]:
try:
    from kiara_plugin.jupyter import ensure_kiara_plugins
except:
    import sys
    print("Installing 'kiara_plugin.jupyter'...")
    !{sys.executable} -m pip install -q kiara_plugin.jupyter
    from kiara_plugin.jupyter import ensure_kiara_plugins

ensure_kiara_plugins()


<h2>Running <i>kiara</i></h2>

In order to use *kiara*, we need to create a `KiaraAPI` instance. An API allows us to control and interact with *kiara* and its functions. In *kiara* this also allows us to get more information about what can be done (and what is happening) to our data as we go.

<span style="color:blue">\* Note: I'm calling the API 'kiara' at the moment because I think it simplifies it for people who don't know what an API is/how it functions and might get confused by the variable name. This keeps it grounded in 'kiara' (whilst indicating it's an API for those who want to know/already know more) - the first instance of kiara isn't used again in this notebook, but we can always reverse this if it makes things more complicated for follow-on notebooks!</span>

In [None]:
from kiara import KiaraAPI

kiara = KiaraAPI.instance()

Now we have an API in place, we can get more information about what we can do in *kiara*. Let's start by asking *kiara* to list all the operations that are included with the plugins we just installed.

The documentation for each of these functions can be found [here](https://dharpa.org/kiara/latest/reference/kiara/interfaces/python_api/__init__/#kiara.interfaces.python_api.KiaraAPI)

In [None]:
kiara.list_operation_ids()

<h3>Downloading Files</h3>

Great, now we know the different kind of operations we can use with *kiara*. Let's start by introducing some files to our notebook, using the `download.file` function.<br/>
First we want to find out what this operation does, and just as importantly, what inputs it needs to work.

In [None]:
kiara.retrieve_operation_info('download.file')

So from this, we know that `download.file` will download a single file from a remote location for us to use in *kiara*. <br/>
We need to give the function a **url** and, if we want, a **file name**. These are the <span style="color:green">**inputs**</span>. <br/>
In return, we will get the **file** and **metadata** about the file as our <span style="color:red">**outputs**</span>.

Let's give this a go using some *kiara* sample data.

First we define our <span style="color:green">inputs</span>, then use `kiara.run_job` with our chosen operation, `download.file`, and save this as our <span style="color:red">outputs</span>.

In [None]:
inputs = {
        "url": "https://raw.githubusercontent.com/DHARPA-Project/kiara.examples/main/examples/data/journals/JournalNodes1902.csv",
        "file_name": "JournalNodes1902.csv"
}

outputs = kiara.run_job('download.file', inputs=inputs)

Let's print out our <span style="color:red">outputs</span> and see what that looks like.

In [None]:
outputs

Great! We've successfully downloaded the file, and we can see there's lots of information here.

At the moment, we're most interested in the **file** output. This contains the actual *contents* of the file that we have just downloaded.

Let's separate this out and store it in a separate variable for us to use.

In [None]:
downloaded_file = outputs['file']

<h3>New Formats: Creating and Converting</h3>

What next? We could transform the downloaded file contents into a different format. <br/>
Let's use the operation list earlier, and look for something that allows us to create something out of our new file.

In [None]:
kiara.list_operation_ids('create')

Our file was orginally in a CSV format, so let's make a table using `create.table.from.file`. 

Just like when we used `download.file`, we can double check what this does, and what <span style="color:green">inputs</span> and <span style="color:red">outputs</span> this involves.

This time, we're also going to use a variable to store the operation in - this is especially handy if the operation has a long name, or if you want to use the same operation more than once without retyping it.

In [None]:
op_id = 'create.table.from.file'

kiara.retrieve_operation_info(op_id)

Great, we have all the information we need now.

Let's go again.

First we define our <span style="color:green">inputs</span>, the downloaded file we saved earlier.

Then use `kiara.run_job` with our chosen operation, this time stored as `op_id`.

Once this is saved as our <span style="color:red">outputs</span>, we can print it out.

In [None]:
inputs = {
    "file": downloaded_file
}

outputs = kiara.run_job(op_id, inputs=inputs)

outputs

This has done exactly what we wanted, and shown the contents from the downloaded file as a table. But we are also interested in some general (mostly internal) information and metadata, this time for the new table we have just created, rather than the original file itself.

Let's have a look.

In [None]:
outputs_table = outputs['table']

outputs_table

<h3>Querying our Data</h3>

So now we have downloaded our file and converted it into a table, we want to actually explore it.

To do this, we can query the table using **SQL** and some functions already included in *kiara*.

Let's take another look at that operation list, this time looking for functions that let us 'query'.


In [None]:
kiara.list_operation_ids('query')

Well, we already know our file has been converted into a table, so let's have a look at `query.table`.

In [None]:
kiara.retrieve_operation_info('query.table')

So from this information, we only need to provide the **table** itself, and our **query**.

Let's work out how many of these journals were published in Berlin.

In [None]:
inputs = {
    "table" = outputs_table
    "query" = "SELECT * from data where City like 'Berlin'"
}

outputs = kiara.run_job('query.table', inputs=inputs)

outputs

The function has returned the table with just the results we were looking for from the SQL query. 

Let's narrow this further, and find all the journals that are just about general medicine and published in Berlin.

We can re-use the `query.table` function and the table we've just made, stored in `outputs['query_result']`

In [None]:
inputs = {
    "table" : outputs['query_result'],
    "query" : "SELECT * from data where JournalType like 'general medicine'"
}

outputs = kiara.run_job('query.table', inputs=inputs)

outputs

<h3>Recording and Tracing our Data</h3>

We've quite a few changes to this table, so let's double check the information about this new table we've created with our queries.

In [None]:
query_output = outputs['query_result']

query_output

Looks good!

We might have changed things around, but we can still get lots of information about all our data.

More importantly, *kiara* is able to trace all of these changes, tracking the inputs and outputs and giving them all different identifiers, so you know exactly what has happened to your data. <br/>Check it out!

In [None]:
query_output.lineage

Even though we are only actually asking for the **data lineage** using the *last* SQL query and the table it made, *kiara* shows us everything that has happened since we first downloaded the file. This helps us keep an eye on the research process *and* the changes we are making to the data at the same time!

<h3>What next...?</h3>

That's great, you've completed the first notebook and successfully installed *kiara*, downloaded files, tested out some functions, and are able to see what this does to your data. 

Now you can check out the other plugin packages to explore how this helps you manage and trace your data while using digital analysis tools!