# Data Manipulation with Kotlin

In [2]:
%use dataframe

## Handling `null` values

In many dataset and data analysis application, **missing data** occurs commonly.

Thanks to Kotlin's nullable values, we can have a column of a nullable type like:

In [15]:
val col by columnOf<String?>("a", "b", null)
col

We can then print for each row if it is null or not, similar to pandas `dataframe.isnull()`:

In [17]:
col.map { it.isNullOrEmpty() }

The great advantage in using Kotlin in this kind of situation is the fact that we have a complete control on how we can handle missing data. Unlike python, kotlin's provides out of the box methods for handling null values, possibly without raising a `NullPointerException` if the developer keeps the context safe with the use of *safe call operators* (`?.`) or explicit null checking.

Moreover, Dataframe offers a series of method for filtering or filling `null` values.

In [57]:
val df = dataFrameOf(
    "0" to listOf(1.0, null, null, 2.0),
    "1" to listOf(3.5, 6.0, 4.0, null),
    "2" to listOf(1.0, null, 9.6, 10.0)
)

df

By default, the method `dropNulls()` drops all the *rows* that contain a null value. 

In [59]:
df.dropNulls()

It is important to notice that Dataframe provides three methods for dropping possible null values:
- `dropNull()`: drops every row with a `null` value
- `dropNaNs()`: drop rows with `Double.NaN` or `Float.NaN` values
- `dropNA()`: removes rows with `null`, `Double.NaN` or `Float.NaN` values

For each method, we can choose which columns we want to check for nulls, for example:

In [62]:
df.dropNA(whereAllNA = true) // remoevs the rows where ALL values are null

In [64]:
df.dropNA("0") // dropping all rows that has null in "0" column

In [67]:
// remove rows where col "0" and "2" have null or NaN
df.dropNA(whereAllNA = true) { "0" and "2" }

Instead of dropping null values, there could be the need to fill in missing values. Just like pandas, Dataframe offers similar API.

In [100]:
var df = dataFrameOf("a", "b", "c").randomDouble(7)

In [102]:
df = df.update { a }.at(1..4).with { Double.NaN }
  .update { b }.at(1, 2).with { Double.NaN }
df

In [103]:
df.fillNA { all() }.withZero()

pandas offers the filling method `ffill` or `bfill`, that fills missing values with the next or preceding row's value.

We can simulate that behavior with:

In [117]:
df.fillNA { all() }.perRowCol { row, col -> row.prev()?.get(col) }

The example below does not consider the new values that are computed during the before computations. In case we want to fill ALL missing values in that column with the first non null preceding row, we must specify the column we want to modify, and use the method `newValue()`. 

In [120]:
df.fillNA { a }.with { prev()?.newValue() }

With `fillNA` (or `fillNulls` or `fillNaNs`) you can pass any kind of function inside the `with` construct, for example the row mean (remember to pass `skipNA = true` when computing the mean):

In [133]:
df.fillNaNs{ colsOf<Double>() }
    .perCol { it.mean(skipNA = true) }

## Data Transformation