# Tutorial 2 - Filtering and Sorting with Expressions

Welcome to the Filtering & Sorting Tutorial.
You will learn how to filter and sort requested data with the help of *IBM SMF Explorer*.


## Getting started

Initialize a Context for the dataset that you want to work with.

In [None]:
import smfexplorer
from datetime import datetime, time
from smfexplorer.fields import SMF70S1

DATASET = "YOUR.SMF.DATA"

ctx = smfexplorer.new_context(DATASET)

## Filtering 

The *IBM SMF Explorer* uses the built-in operators of python for filtering purposes.

As discussed in the previous tutorial (see *Tutorial 1 - Basics*), we provide functions that can be used in a request chain. 
One such function is ``where()``, which can be used to filter the returned data.
The ``where()`` function is given an expression which uses the python operators.

> **Note**: you can also do filtering using pandas (see [pandas documentation](https://pandas.pydata.org/docs/) and *Tutorial 3* for more exmaples).
> The advantage of using the ``where()`` method is that the *IBM SMF Explorer* can use the provided information to reduce the amount of data that is extracted from the underlying dataset.
> Consequently, performance is increased and network load reduced.

Expressions available:

Expression| Description
:---|:---
``>`` | Greater than 
``>=`` | Greater than or equal to 
``<`` | Less than 
``<=`` | Less than or equal to 
``==`` | Equal to 
``!=`` | Not equal to 


In the following example, we use ``where()`` to narrow down our result to the cases where a LPAR has more than one processor available: 

In [None]:
# fetching full data
df_not_filtered = ctx.samples.lpar_information().run()
display(df_not_filtered)

# fetching reduced data
df_filtered = ctx.samples.lpar_information().where(SMF70S1.lpar_cpu_count > 1).run()
display(df_filtered)

With *IBM SMF Explorer* we can compare fields as well:

In [None]:
# fetching reduced data based on field comparison
df_fields = (
    ctx.samples.lpar_information().where(SMF70S1.lpar_name == SMF70S1.system_name).run()
)
display(df_fields)

### Logical operators

For more complex conditions *IBM SMF Explorer* has three logical expressions:

Expression| Description
:---|:---
 ``&`` | Logical AND
``\|`` | Logical OR
 ``~`` | Logical NOT

Chaining multiple `where()` calls is equivalent to a logical **and**.

Below, we are fetching instances **where** LPAR name is identical to system name **and** the LPAR CPU count is larger than 5.  

In [None]:
# use chainging of where()
ctx.samples.lpar_information().where(SMF70S1.lpar_name == SMF70S1.system_name).where(
    SMF70S1.lpar_cpu_count > 5
).run()

In [None]:
# use AND expression instead of where() chainging
ctx.samples.lpar_information().where(
    (SMF70S1.lpar_name == SMF70S1.system_name) & (SMF70S1.lpar_cpu_count > 5)
).run()

The following example shows the usage of the logical **or** expression.

In [None]:
ctx.samples.lpar_information().where(
    (SMF70S1.lpar_cpu_count == 5) | (SMF70S1.lpar_cpu_count == 6)
).run()

## Sorting

For sorting purposes, *IBM SMF Explorer* uses the chain method `sort()`, which takes any number of sort expressions.
A sort expression is created with the help of `ASC` and `DESC` functions from the `smfexplorer` module.
`ASC` and `DESC` tell  *IBM SMF Explorer* to sort a given field in ascending or descending order respectively.
The sorting importance is driven by the order of sort expressions given to `sort()` (i.e., the first expression is the most important).
The default sort order, when nothing has been explicitly specified, is `ASC`.
 *IBM SMF Explorer* sorts some fields (e.g., timestamp) by default.
Any sorting condition that you specify, has a higher priority than the default behaviour.

In [None]:
from smfexplorer import ASC, DESC

# sort from the lowest value to the highest value
# head() function prints first 5 rows of the table
display(ctx.samples.lpar_information().sort(ASC(SMF70S1.lpar_cpu_count)).run().head())

# sort from the highest value to the lowest value
display(ctx.samples.lpar_information().sort(DESC(SMF70S1.lpar_cpu_count)).run().head())

# sort from the highest value to the lowest value
display(
    ctx.samples.lpar_information()
    .sort(ASC(SMF70S1.lpar_cpu_count), DESC(SMF70S1.lpar_number))
    .run()
    .head()
)