# Daany Developer Guide - Part 1: DataFrame - cross platform NET library for analysis and transformation of tabular data. 

Try the notebook by using [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bhrnjica/notebooks/master)

In [2]:
//Nuget package installation
#r "nuget:Daany.DataFrame,1.3.0"
#r "nuget:Daany.DataFrame.Ext,1.3.0"
#r "nuget: Daany.Stat,1.1.0"
#r "nuget: Daany.LinA.win-x64,1.3.0"

//Plot capabilities
#r "nuget: XPlot.Plotly.Interactive"

Loading extensions from `XPlot.Plotly.Interactive.dll`

Configuring PowerShell Kernel for XPlot.Plotly integration.

Installed support for XPlot.Plotly.

In [3]:
//using statement of Daany package
using System;
using Daany;
using Daany.MathStuff;
using Daany.Ext;

//PLot support
using XPlot.Plotly;
//custom display implementation
using static Microsoft.DotNet.Interactive.Formatting.PocketViewTags;
using Microsoft.AspNetCore.Html;
using Microsoft.DotNet.Interactive.Formatting;
using static System.Diagnostics.Debug;
using System.Globalization;

Formatter.Register<DataFrame>((df, writer) =>
{
    var headers = new List<IHtmlContent>();

    headers.Add(th(i($"({df.Index.Name})")));
    headers.AddRange(df.Columns.Select(c => (IHtmlContent) th(c)));
    
    //renders the rows
    var rows = new List<List<IHtmlContent>>();
    var take = 20;
    
    //
    for (var i = 0; i < Math.Min(take, df.RowCount()); i++)
    {
        var cells = new List<IHtmlContent>();
        cells.Add(td(df.Index[i]));
        foreach (var obj in df[i])
        {
            cells.Add(td(obj));
        }
        rows.Add(cells);
    }
    
    var t = table(
        thead(
            headers),
        tbody(
            rows.Select(
                r => tr(r))));
    
    writer.Write(t);
}, "text/html");


```Daany``` – .NET DAta ANalYtics library 
====================================



![Daany - .NET DAta ANalYtics library ](img/daany_logo_small.png)

### Summary

`Daany` is .NET and cross platform data analytics and linear algebra library written in C\# supposed to be a tool for data preparation, feature engineering and other kinds of data transformations. The library is implemented on top of .NET Standard 2.1 and supports .NET Core 3.0 and above separated on several Visual Studio projects that can be installed separately as a NuGet package. The library implements `DataFrame` as the core component with extensions of a set of data science and linear algebra features. The library contains several implementation of time series decomposition (SSA, STL ARIMA), optimization  methods (SGD) as well as plotting support. The library also implements set of features based on matrix, vectors and similar linear algebra operations. The main part of the library is the `Daany.DataFrame` with similar implementation that can be found in python based Pandas library. 

Introduction
=======================

`Daany` is .NET  data analytic library written in `C#` with support various kind of data transformation, descriptive statistics and linear algebra. With `Daany` an user can load the data from txt based file into the `DataFrame` arranged into columns, rows and index. The user can also create `Series` object - a special kind of `Daany.DataFrame` in order to work with time series data. Once the data is loaded the user can start analyzing the data by performing various transformation and results can be display as chart or tabular data.

The library implements the `Daany.MathStuff` module which consists of of mathematics operations on matrix and vectors as well as rich set of statistics distributions and parameters. Furthermore `Daany.LinA` extends it in order to gain better performance and functionalities. The `Daany.LinA` is the .NET wrapper around the LAPACK  and BLAS  C++ libraries. Besides data analysis, the library implements a set of statistics or data science features e.g. time series decomposition, optimization performance parameters and similar. The main components of the library which can be installed separately as a NuGet package are:

Currently ```Daany``` project consists of four main components:

-   ```Daany.DataFrame```,
-   ```Daany.DataFrame.Ext```
-   ```Daany.Stats```, and
-   ```Daany.LinA``` 

### Daany Architecture

`Daany` is a classic .NET component implemented through the several visual studio projects. The library is based on .NET Framework and Intel MKL implementation of LAPACK and BLASS libraries. The architecture diagram of the library is described at the following figure. 

![](img/daany_architecture_diagram.jpg)


The project is developed as a need to have a set of data transformation features in one library while I am working with machine learning. So, I thought it might help to others. Currently, the library has pretty much data transformation features and might be your number one data analytics library on .NET platform. Collaboration to the project is also welcome.

How to start with Daany
=======================

```Daany``` is 100% .NET Core component and can be run on any platform .NET Core supports, from the Windows x86/x64 to Mac or Linux based OS. It can be used by Visual Studio or Visual Studio Code. It consisted of 4 NuGet packages, so
the easiest way to start with it is to install the packages in your .NET
application. Within Visual Studio create or open your .NET application and open
NuGet packages window. Type ```Daany``` in the browse edit box and hit enter. You can
find four packages starting with Daany. You have few options to install the
packages.

1.  Install ```Daany.DataFrame```  – only. *Use this option if you want only data
    analysis by using data frame. Once you click Install button, Daany.DataFrame
    and Daany.Math will be installed into your project app.*

2.  Install ```Daany.Stat``` package. This package already contains ```DataFrame```, as well as time series decomposition and related statistics features.

![](img/daany_nuget.png)

Once you install the packages, you can start developing your app using Daany
packages.



Using ```Daany``` as assembly reference
===========================================

Since ```Daany``` has no dependency to other libraries you can copy three dlls and add them as reference to your project.

![file explorer](img/daany_file_exp.png)

In order to do so clone the project from [http://github.com/bhrnjica/daany](http://github.com/bhrnjica/daany),build it and copy ```Daany.DataFrame.dll```, ```Daany.Math.dll``` and ```Daany.Stat.dll``` to your project as assembly references. Whole project is just 270 KB.


Namespaces in Daany
================================================

```Daany``` project contains several namespaces for separating different
implementation. The following list contains relevant namespaces:

-   ```using Daany``` – data frame and related code implementation,
-   ```using Daany.Ext``` – data frame extensions, used with dependency on third party
    library,
-   ```using Daany.MathStuff``` – math related stuff implemented in Daany,
-   ```using Daany.Optimizers``` – set of optimizers like SGD,
-   ```using Daany.Stat``` – set of statistics implementations in the project.
- ```using Daany.LinA``` -  Intel MKL Lapack and Blass routines.

Working with ```Daany.DataFrame```
============================

The main part of ```Daany``` project is ```Daany.DataFrame``` -  an c\# implementation of data frame. A data frame is software component used for handling tabular data, especially for data preparation, feature engineering and analysis during development of machine learning models. The concept of ```Daany.DataFrame``` implementation is based on simplicity and .NET coding standard. It represents tabular data consisting of columns and rows. Each column has name and type and each row has its index and label.
Usually, rows indicate a zero axis, while columns indicate axis one.

The following image shows a data frame structure

![data frame structure](img/daany_data_frame_structure.png)

The basic components of the data frame are:

-   ```header``` - list of column names,
-   ```index```  – list of object representing each row,
-   ```data``` – list of values in the data frame,
-   ```missing value``` – data with no values in data frame.

The image above shows the data frame components visually, and how they are
positioned in the data frame.

How to create ```Daany.DataFrame```  .NET object
-----------------------------------------

In order to create a DataFrame there are several options:

-   from a list of values, by specifying column names and row count
-   from a dictionary, letting keys be column names and values be column values,
-   from text-based file, where each line represents row values,
-   as a return object for almost any data frame operations.

## Create ```DataFrame``` from a list of data.

```Daany.DataFrame```  can be created by passing 1d list of data and column header. The following code shows such action.


In [4]:
//define a list of data
var lst = new List<object>() 
    { 1, "Sarajevo", 77000, "BiH", true, 3.14, DateTime.Now.AddDays(-20),
      2, "Seattle", 98101, "USA", false, 3.21, DateTime.Now.AddDays(-10),
      3, "Berlin", 10115, "GER", false, 4.55, DateTime.Now.AddDays(-5) };

//define column header for the data frame
var columns = new List<string>() { "ID", "City", "Zip Code","Country", "IsHome","Values", "Date" };

//create data frame with 3 rows and 7 columns
var df = new DataFrame(lst.ToArray(), columns);
//show df
df

(index),ID,City,Zip Code,Country,IsHome,Values,Date
0,1,Sarajevo,77000,BiH,True,3.14,2022-01-20 09:15:17Z
1,2,Seattle,98101,USA,False,3.21,2022-01-30 09:15:17Z
2,3,Berlin,10115,GER,False,4.55,2022-02-04 09:15:17Z


## Create ```DataFrame``` from dictionary

Similarly ```Daany.DataFrame```  can be created by passing dictionary collection. The following code shows how to create data frame from the dictionary:

In [5]:
//define a dictionary of data
var dict = new Dictionary<string, List<object>>
{

    { "ID",new List<object>() { 1,2,3} },
    { "City",new List<object>() { "Sarajevo", "Seattle", "Berlin" } },
    { "Zip Code",new List<object>() { 71000,98101,10115 } },
    { "State",new List<object>() {"BiH","USA","GER" } },
    { "IsHome",new List<object>() { true, false, false} },
    { "Values",new List<object>() { 3.14, 3.21, 4.55 } },
    { "Date",new List<object>() { DateTime.Now.AddDays(-20) ,
    DateTime.Now.AddDays(-10) , DateTime.Now.AddDays(-5) } },

};

//create data frame with 3 rows and 7 columns
var df = new DataFrame(dict);
//check the size of the data frame
df

(index),ID,City,Zip Code,State,IsHome,Values,Date
0,1,Sarajevo,71000,BiH,True,3.14,2022-01-20 09:15:19Z
1,2,Seattle,98101,USA,False,3.21,2022-01-30 09:15:19Z
2,3,Berlin,10115,GER,False,4.55,2022-02-04 09:15:19Z


## Create ```DataFrame``` by loading data from a file

By using static method ```DataFrame.FromCsv``` a user can create data frame object
from the ``csv`` file. Otherwise, data frame can be persisted on disk by calling
static method ```DataFrame.ToCsv```. 
The following code shows how to use static methods ```ToCsv``` and ```FromCsv``` to show persisting and loading data to data frame:

In [6]:
string filename = "df_file.txt";
//define a dictionary of data
var dict = new Dictionary<string, List<object>>
{
    { "ID",new List<object>() { 1,2,3} },
    { "City",new List<object>() { "Sarajevo", "Seattle", "Berlin" } },
    { "Zip Code",new List<object>() { 71000,98101,10115 } },
    { "State",new List<object>() {"BiH","USA","GER" } },
    { "IsHome",new List<object>() { true, false, false} },
    { "Values",new List<object>() { 3.14, 3.21, 4.55 } },
    { "Date",new List<object>() { DateTime.Now.AddDays(-20) , DateTime.Now.AddDays(-10) , DateTime.Now.AddDays(-5) } },

};

//create data frame with 3 rows and 7 columns
var df1 = new DataFrame(dict);

//first Save data frame on disk and load it
DataFrame.ToCsv(filename, df1);

//create data frame with 3 rows and 7 columns
var dfFromFile = DataFrame.FromCsv(filename, sep:',');
//show data frame
dfFromFile

(index),ID,City,Zip Code,State,IsHome,Values,Date
0,1,Sarajevo,71000,BiH,True,3.14,2022-01-20 09:15:20Z
1,2,Seattle,98101,USA,False,3.21,2022-01-30 09:15:20Z
2,3,Berlin,10115,GER,False,4.55,2022-02-04 09:15:20Z


First, we created data frame from the dictionary collection. Then we store data frame to file. After successfully saving, we load the same data frame from the csv file. The end of the code snippet, put asserts in order to prove everything is correctly implemented.

In case the performance is important, you should pass column types to `FromCSV` method in order to achieve up to 50% of loading time. 
For example the following code loads the data from the file, by passing predefined column types:

In [7]:
//defined types of the column 
var colTypes1 = new ColType[] { ColType.I32, ColType.IN, ColType.I32, ColType.STR, ColType.I2, ColType.F32, ColType.DT };

//create data frame with 3 rows and 7 columns
var dfFromFile01 = DataFrame.FromCsv(filename, sep: ',', colTypes: colTypes1);
dfFromFile01

(index),ID,City,Zip Code,State,IsHome,Values,Date
0,1,Sarajevo,71000,BiH,True,3.14,2022-01-20 09:15:20Z
1,2,Seattle,98101,USA,False,3.21,2022-01-30 09:15:20Z
2,3,Berlin,10115,GER,False,4.55,2022-02-04 09:15:20Z


## Loading Data from a Web

Data can be loaded directly from the web storage by using `FromWeb`static method. The following code shows how to load the `Concrete Slump Test` data from the web. The data set includes 103 data points. There are 7 input variables, and 3 output variables in the data set: `Cement`, `Slag`, `Fly ash`, `Water`, `SP`, `Coarse Aggr.`,`Fine Aggr.`, `SLUMP (cm)`, `FLOW (cm)`, `Strength (Mpa)`. 
The following code load the `Concrete Slump Test` data set into Daany DataFrame:


In [8]:
//define web url where the data is stored
var url = "https://archive.ics.uci.edu/ml/machine-learning-databases/concrete/slump/slump_test.data";
//
var df2 = DataFrame.FromWeb(url);
df2.Head(5)

(index),No,Cement,Slag,Fly ash,Water,SP,Coarse Aggr.,Fine Aggr.,SLUMP(cm),FLOW(cm),Compressive Strength (28-day)(Mpa)
0,1,273,82,105,210,9,904,680,23,62.0,34.99
1,2,163,149,191,180,12,843,746,0,20.0,41.14
2,3,162,148,191,179,16,840,743,1,20.0,41.81
3,4,162,148,190,179,19,838,741,3,21.5,42.08
4,5,154,112,144,220,10,923,658,20,64.0,26.82


## Other ways to create a ```DataFrame```

Besides above, the data frame can be created on other ways:

- Create data frame from existing one - This method can be used in order to create
new data frame with different numbers of columns. For example, the new data frame
can be created from existing one by specifying specific columns:

In [9]:
//now create a new data frame with only three columns
var newDf = df1["City", "Zip Code", "State"];

//check the size of the data frame
newDf

(index),City,Zip Code,State
0,Sarajevo,71000,BiH
1,Seattle,98101,USA
2,Berlin,10115,GER


- Data frame can be created using ```Create``` method and passing tuples of
existing and new column name. For example:

In [10]:
//create data frame with 3 rows and 7 columns
var df3 = DataFrame.FromCsv("data/simple_data_frame.txt", sep: ',',names: null, dformat: "MM/dd/yyyy");

//now create a new data frame with three columns which can be renamed during creation
var newDf1 = df3.Create(("City","Place"), ("Zip Code", null), ("State","Country"));
newDf1

(index),Place,Zip Code,Country
0,Sarajevo,71000,BiH
1,Seattle,98101,USA
2,Berlin,10115,GER


In case you want the column names remains the same pass ```null``` as second tuple
item, or write the same name.

- It is handy to create an empty data frame with specific column header. To do so
use similar code like following:

In [11]:
var cols = new List<string> { "Place", "Country", "Zip Code", "Values" };

//create empty data frame with 4 columns
var df = DataFrame.CreateEmpty(cols);

//check the size of the data frame
display(df.Shape.ToString());
display(df.Columns.ToList());

(0, 4)

index,value
0,Place
1,Country
2,Zip Code
3,Values


- Data frame is created almost in any performed operation like sorting, filtering,
grouping, aggregation and similar.

Enumeration of  ```DataFrame```
--------------------------

Enumeration of data frame means iterating it row by row.
```Daany.DataFrame```  provides three ways for the enumeration:

-   **Strongly typed enumeration** – it means that you have to provide a Class type
    during enumeration, as well as mapping logic for conversion data frame row into
    the class object. This is done by providing callback mapping method.

-   **Enumeration by dictionary** – it means that each data frame row is returned as
    dictionary, where keys are column names and values are row array.

-   **Enumeration by list** – it means the each data frame row is returned as list
    of data.

### Strongly typed enumeration

This enumeration is suitable when data frame need to be converted into list of
custom types. Also, in order to convert ```Daany.DataFram``` into ```ML.NET```  ```IDataView``` you have to use typed enumeration. The following code shows how data set as previously loaded into ```Daany.DataFrame```  is converted into ```ML.NET``` ```IDataView```.

Assume we defined ```Person``` class as follow:


In [12]:
//define class type
class Person
{
    public int ID { get; set; }
    public string City { get; set; }
    public int Zip { get; set; }
    public string State { get; set; }
    public bool IsHome { get; set; }
    public float Values { get; set; }
    public DateTime Date { get; set; }
}

Previously we created ```Daany.DataFrame```  from the file containing lists of Persons. Conversion from ```Daany.DataFrame``` to list of ```Persons``` the following code should be implemented:

In [13]:
//create data frame with 3 rows and 7 columns
var df34 = DataFrame.FromCsv($"data/simple_data_frame.txt");

//convert data frame into strongly typed list
List<Person> list = df34.GetEnumerator<Person>((oRow) =>
{
    //convert row object array into Iris row

    var prRow = new Person();
    prRow.ID = Convert.ToInt32(oRow["ID"]);
    prRow.City = Convert.ToString(oRow["City"]);
    prRow.Zip = Convert.ToInt32(oRow["Zip Code"]);
    prRow.State = Convert.ToString(oRow["State"]);
    prRow.IsHome = Convert.ToBoolean(oRow["IsHome"]);
    prRow.Values = Convert.ToSingle(oRow["Values"]);
    prRow.Date = Convert.ToDateTime(oRow["Date"]);
    //
    return prRow;
}).ToList();

//
display(list)

index,ID,City,Zip,State,IsHome,Values,Date
0,1,Sarajevo,71000,BiH,True,3.14,2019-10-17 13:03:40Z
1,2,Seattle,98101,USA,False,3.21,2019-10-27 13:03:40Z
2,3,Berlin,10115,GER,False,4.55,2019-11-01 13:03:40Z


As can be seen, ```GetEnumerator``` takes one argument `oRow` which is dictionary of the
data frame row. The `oRow` is passed into delegate where conversion is performed.

### Enumeration by dictionary

In order to enumerate data frame by dictionary call ```GetEnumerator``` method, without providing custom type. The following code uses previously loaded data frame and perform `Linq` operators against it.


In [14]:
//create data frame with 3 rows and 7 columns
var df42 = DataFrame.FromCsv($"data/simple_data_frame.txt");

//get second data frame row
//row2 is a dictionary with column names as keys
var row2 = df42.GetEnumerator().Skip(1).First();
row2

key,type,value
ID,System.Int32,2
City,System.String,Seattle
Zip Code,System.Int32,98101
State,System.String,USA
IsHome,System.String,False
Values,System.Single,3.21
Date,System.DateTime,2019-10-27 13:03:40Z


### Enumeration by list

Data frame can be enumerated by returning the row as a list. This method can be use when the performance is important, otherwise use previous method. The following code shows how to use enumeration by list.

In [15]:
//create data frame with 3 rows and 7 columns
var df46 = DataFrame.FromCsv($"data/simple_data_frame.txt");

//get second data frame row
//row2 is a dictionary with column names as keys
var row46 = df.GetRowEnumerator().FirstOrDefault();
display(row46)

Selecting data in ```Daany``` data frame
==================================

Accessing data in data frame can be achieved in many ways. Also different kind of data can be selected from the data frame. You can select a data as a single values, or list of data from a single row, list of column values or data frame as subset of the existing one. Let's see how can the column be selected from the data frame. 

## Column selection

This code sample select two columns separately from data frame, and convert them
into array and list.

In [16]:
//create data frame with 3 rows and 7 columns
var df = DataFrame.FromCsv($"data/simple_data_frame.txt");

//select one column from the data frame
var cities = df["City"].ToArray();
var zipCodes = df["Zip Code"].ToList();

(cities,zipCodes)

Item1,Item2
"[ Sarajevo, Seattle, Berlin ]","[ 71000, 98101, 10115 ]"


## Two or more columns selection

Selecting more than one column at once returns data frame. The following code
creates new data frame from selected columns:

In [17]:
//create data frame with 3 rows and 7 columns
var df54 = DataFrame.FromCsv($"data/simple_data_frame.txt");

//select one column from the data frame
var citiesDf = df54["City", "Zip Code"];

//check for values
citiesDf

(index),City,Zip Code
0,Sarajevo,71000
1,Seattle,98101
2,Berlin,10115


## Row selection
Selecting data by rows returns whole row from the data frame. The following code
returns third rows from the existing data frame.

In [18]:
//select third row from data frame
//3, "Berlin", 10115, "GER", false, 4.55, DateTime.Now.AddDays(-5)
var row = df54[2].ToArray();
row

index,type,value
0,System.Int32,3
1,System.String,Berlin
2,System.Int32,10115
3,System.String,GER
4,System.String,False
5,System.Single,4.55
6,System.DateTime,2019-11-01 13:03:40Z


## Data selection

Selecting data is achieved by using zero-based position of row and column or by column name and rows position. The following code select ```City``` from the third row:

In [19]:
//create data frame with 3 rows and 7 columns
var df57 = DataFrame.FromCsv($"data/simple_data_frame.txt");

//select city from the third row
var city = df57[2,1];
var city1 = df57["City", 2];

(city, city1)//the same values with diferent indexer accessors

Item1,Item2
Berlin,Berlin


# Operations in `Daany.DataFrame`

```Daany.DataFrame```  supports the following operations:

-   Add/Insert Column,
-   AddRows,
-   AddCalculatedColumns,
-   Aggregate,
-   Describe,
-   Drop, DropNA and FillNA,
-   Filter and RemoveRows,
-   SortBy and SortByDescending,
-   GroupBy and Rolling
-   Merge and Join two data frames
-   Select.

In the next section eery feature is going to be presented.

Add/Insert new columns into data frame
--------------------------

Adding one or more new columns into data frame can be achieve by calling
```AddColumns``` data frame method. The following code add two new columns `Age` and
`Gender` to existing data frame:

In [20]:
//create data frame with 3 rows and 7 columns
var df = DataFrame.FromCsv($"data/simple_data_frame.txt");

//add Age column
var newCols = new Dictionary<string, List<object>>()
    {
        { "Age", new List<object>() { 31, 25, 45 } },
        { "Gender", new List<object>() { "male", "female", "male" } } 
    };

//add column
var newDf = df.AddColumns(newCols);

newDf

(index),ID,City,Zip Code,State,IsHome,Values,Date,Age,Gender
0,1,Sarajevo,71000,BiH,True,3.14,2019-10-17 13:03:40Z,31,male
1,2,Seattle,98101,USA,False,3.21,2019-10-27 13:03:40Z,25,female
2,3,Berlin,10115,GER,False,4.55,2019-11-01 13:03:40Z,45,male


Similarly, column can be inserted at any column list position. In case the above column should be inserted somewhere in column header, `InsertColumn` should be called:

In [21]:
//add Age column
var newCol= new List<object>() { 131, 125, 145 };

//inser column at third position
var newDf01= newDf.InsertColumn("Age2", newCol, 8 );
newDf01

(index),ID,City,Zip Code,State,IsHome,Values,Date,Age,Age2,Gender
0,1,Sarajevo,71000,BiH,True,3.14,2019-10-17 13:03:40Z,31,131,male
1,2,Seattle,98101,USA,False,3.21,2019-10-27 13:03:40Z,25,125,female
2,3,Berlin,10115,GER,False,4.55,2019-11-01 13:03:40Z,45,145,male


Add new rows into data frame
------------------------------------

In order to add one or more rows into existing data frame the `AddRow` or `AddRows`
should be called respectively. The following code shows adding one row into
existing data frame:

In [22]:
//create data frame with 3 rows and 7 columns
var df = DataFrame.FromCsv($"data/simple_data_frame.txt");

//new row
var newRow = new List<object>() { 4, "London", 11000, "GB", false, 5.55,
DateTime.Now.AddDays(-5) };

//add column
df.AddRow(newRow);
//check for values
Assert(7==df.ColCount());
Assert(4==df.RowCount());
Assert("GB"== df["State", 3].ToString());
Assert(5.55== (double)df["Values", 3]);

In case more than one rows should be added into data frame we can used `AddRows`
method by passing data frame object containing new rows. The data frame object must have the same number of columns as existing. More than one row can be added by calling `AddRow` from the loop.


# Add calculated column

Adding calculated column into data frame is often task. We use this feature when performing features engineering or feature selection during data preparation. In order to add new column into data frame which is based on the calculation on each rows in the data frame use `AddCalculatedColumn` method. The method has two variants with current row as dictionary or as list collection. Both methods are very similar, so we are going to show the example by using the first variant. The
following code shows adding calculated column into existing data frame:

In [23]:
var dict = new Dictionary<string, List<object>>
    {
        { "col1",new List<object>() { 1,13,25,37,49} },
        { "col2",new List<object>() { 2,14,26,38,50} },
        { "col3",new List<object>() { 3,15,27,39,51} },
        { "col4",new List<object>() { 4,16,28,40,52} },
        { "col5",new List<object>() { 5,17,29,41,53} },
        { "col6",new List<object>() { 6,18,30,42,54} },
        { "col7",new List<object>() { 7,19,31,43,55} },
        { "col8",new List<object>() { 8,20,32,44,56} },
        { "col9",new List<object>() { 9,21,33,45,57} },
        { "col10",new List<object>(){ 10,22,34,46,58} },
    };

//
var df = new DataFrame(dict);
var sCols = new string[] { "col11", "col12" };
var df01 = df.AddCalculatedColumns(sCols, (row, i) => calculate(row, i));

//local function declaration
object[] calculate(IDictionary<string, object> row, int i)
    {
        return new object[2] { i * (row.Count() +2) + row.Count() + 1,
        i * (row.Count()+ 2) + row.Count() +2};
    }
df

(index),col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12
0,1,2,3,4,5,6,7,8,9,10,11,12
1,13,14,15,16,17,18,19,20,21,22,23,24
2,25,26,27,28,29,30,31,32,33,34,35,36
3,37,38,39,40,41,42,43,44,45,46,47,48
4,49,50,51,52,53,54,55,56,57,58,59,60


As can be seen two new columns have been added by the calculations made by the current
row.

### Aggregation in data frame

Aggregation process include performing arithmetic operation on data frame. The result of
the aggregation is new list of values or new data frame containing the result of
aggregation operations. The following code shows Aggregation method in action:

In [24]:
var date = DateTime.Now.AddDays(-5);
//define a dictionary of data
var dict = new Dictionary<string, List<object>>
{
    { "ID",new List<object>() { 1,2,3} },
    { "City",new List<object>() { "Sarajevo", "Seattle", "Berlin" } },
    { "Zip Code",new List<object>() { 71000,98101,10115 } },
    { "State",new List<object>() {"BiH","USA","GER" } },
    { "IsHome",new List<object>() { true, false, false} },
    { "Values",new List<object>() { 3.14, 3.21, 4.55 } },
    { "Date",new List<object>() { DateTime.Now.AddDays(-20) ,
    DateTime.Now.AddDays(-10) , date } },
    { "Age", new List<object>() { 31, 25, 45 } },
    { "Gender", new List<object>() { "male", "female", "male" } }
};

//create df
var df = new DataFrame(dict);

//define aggregation
var agg = new Dictionary<string, Aggregation>() 
        { 
            {"ID",Aggregation.Count},
            {"City",Aggregation.Top},
            {"Date", Aggregation.Max},
            {"Values",Aggregation.Avg },
        };

var row = df.Aggragate(agg);
var val = new List<object>() { 3, "Sarajevo", 3.6333333333333329, date };

for(int i=0; i< val.Count; i++)
    Assert(val[i] == row[i]);

df

(index),ID,City,Zip Code,State,IsHome,Values,Date,Age,Gender
0,1,Sarajevo,71000,BiH,True,3.14,2022-01-20 09:15:50Z,31,male
1,2,Seattle,98101,USA,False,3.21,2022-01-30 09:15:50Z,25,female
2,3,Berlin,10115,GER,False,4.55,2022-02-04 09:15:50Z,45,male


As can be seen from the code above. The aggregation process included performing
different operation on four columns (`"ID"`,`"City"`,`"Date"`,`"Values"`). In case
default argument `allColumns:true` then all columns from data frame will be shown
in the aggregate result.

In case more than one aggregate operation should be applied to single column, then the second aggregate method will be used. 

In [25]:
var date = DateTime.Now.AddDays(-5);
//define a dictionary of data
var dict = new Dictionary<string, List<object>>
{
    { "ID",new List<object>() { 1,2,3} },
    { "City",new List<object>() { "Sarajevo", "Seattle", "Berlin" } },
    { "Zip Code",new List<object>() { 71000,98101,10115 } },
    { "State",new List<object>() {"BiH","USA","GER" } },
    { "IsHome",new List<object>() { true, false, false} },
    { "Values",new List<object>() { 3.14, 3.21, 4.55 } },
    { "Date",new List<object>() { DateTime.Now.AddDays(-20) , DateTime.Now.AddDays(-10) , date } },
    { "Age", new List<object>() { 31, 25, 45 } },
    { "Gender", new List<object>() { "male", "female", "male" } }
};

//create df
var df = new DataFrame(dict);

//define aggregation
var agg = new Dictionary<string, Aggregation[]>() { {"ID",new Aggregation[]{Aggregation.Count,Aggregation.Sum }},
                                                    {"City",new Aggregation[]{Aggregation.Top,Aggregation.Frequency }},
                                                    {"Date", new Aggregation[]{Aggregation.Max }},
                                                    {"Values",new Aggregation[]{Aggregation.Avg } },
                                                };
var newDf = df.Aggragate(agg);
var val = new List<object>() { 3, null, null, null, 6, null, null, null, null, "Sarajevo", null, null, 
    null, 1, null, null, null, null, 3.633333, null, null, null, null, date};

newDf



(index),ID,City,Values,Date
Count,3,<null>,<null>,<null>
Sum,6,<null>,<null>,<null>
Top,<null>,Sarajevo,<null>,<null>
Freq,<null>,1,<null>,<null>
Mean,<null>,<null>,3.633333,<null>
Max,<null>,<null>,<null>,2022-02-04 09:15:50Z


### Describe data frame

Describe data frame method prints out the based descriptive statistics for specified columns in the data frame. The following code shows usage of `Describe` method.

In [26]:
var date = DateTime.Now.AddDays(-5);
//define a dictionary of data
var dict = new Dictionary<string, List<object>>
{
    { "ID",new List<object>() { 1,2,3} },
    { "City",new List<object>() { "Sarajevo", "Seattle", "Berlin" } },
    { "Zip Code",new List<object>() { 71000,98101,10115 } },
    { "State",new List<object>() {"BiH","USA","GER" } },
    { "IsHome",new List<object>() { true, false, false} },
    { "Values",new List<object>() { 3.14, 3.21, 4.55 } },
    { "Date",new List<object>() { DateTime.Now.AddDays(-20) , DateTime.Now.AddDays(-10) , date } },
    { "Age", new List<object>() { 31, 25, 45 } },
    { "Gender", new List<object>() { "male", "female", "male" } }
};

//create df
var df = new DataFrame(dict);
df.Describe()

(index),ID,Zip Code,Values,Age
Count,3.0,3.0,3.0,3.0
Unique,3.0,3.0,3.0,3.0
Top,1.0,71000.0,3.14,31.0
Freq,1.0,1.0,1.0,1.0
Mean,2.0,59738.666667,3.633333,33.666667
Std,1.0,45061.039384,0.794628,10.263203
Min,1.0,10115.0,3.14,25.0
25%,1.5,40557.5,3.175,28.0
Median,2.0,71000.0,3.21,31.0
75%,2.5,84550.5,3.88,38.0


In case all columns should be presented the `Describe` method should be called with `df.Describe(numericOnly:false)`and the output should looks like:

In [27]:
df.Describe(numericOnly:false)

(index),ID,City,Zip Code,State,IsHome,Values,Date,Age,Gender
Count,3.0,3,3.0,3,3,3.0,3,3.0,3
Unique,3.0,3,3.0,3,2,3.0,3,3.0,2
Top,1.0,Sarajevo,71000.0,BiH,False,3.14,2022-01-20 09:15:50Z,31.0,male
Freq,1.0,1,1.0,1,2,1.0,1,1.0,2
Mean,2.0,<null>,59738.666667,<null>,<null>,3.633333,<null>,33.666667,<null>
Std,1.0,<null>,45061.039384,<null>,<null>,0.794628,<null>,10.263203,<null>
Min,1.0,<null>,10115.0,<null>,<null>,3.14,2022-01-20 09:15:50Z,25.0,<null>
25%,1.5,<null>,40557.5,<null>,<null>,3.175,<null>,28.0,<null>
Median,2.0,<null>,71000.0,<null>,<null>,3.21,<null>,31.0,<null>
75%,2.5,<null>,84550.5,<null>,<null>,3.88,<null>,38.0,<null>


### Drop columns and missing value handling

We can drop or remove column from the data frame based on different criterion.
For example the following code will remove columns `ID`, `Date`, `Age`, `Gender`
from the existing data frame:

In [28]:
var date1 = DateTime.Now.AddDays(-20);
var date2 = DateTime.Now.AddDays(-10);
var date3 = DateTime.Now.AddDays(-5);

//define a dictionary of data
var dict = new Dictionary<string, List<object>>
{
    { "ID",new List<object>() { 1,2,3} },
    { "City",new List<object>() { "Sarajevo", "Seattle", "Berlin" } },
    { "Zip Code",new List<object>() { 71000,98101,10115 } },
    { "State",new List<object>() {"BiH","USA","GER" } },
    { "IsHome",new List<object>() { true, false, false} },
    { "Values",new List<object>() { 3.14, 3.21, 4.55 } },
    { "Date",new List<object>() { date1 , date2 , date3 } },
    { "Age", new List<object>() { 31, 25, 45 } },
    { "Gender", new List<object>() { "male", "female", "male" } }
};

//create df
var df = new DataFrame(dict);
var df2 = df.Drop("ID", "Date", "Age", "Gender");
var lst = new List<object>() {"Sarajevo", 71000, "BiH", true, 3.14,"Seattle",
98101, "USA", false, 3.21, "Berlin",10115, "GER", false, 4.55 };

//
df

(index),ID,City,Zip Code,State,IsHome,Values,Date,Age,Gender
0,1,Sarajevo,71000,BiH,True,3.14,2022-01-20 09:15:52Z,31,male
1,2,Seattle,98101,USA,False,3.21,2022-01-30 09:15:52Z,25,female
2,3,Berlin,10115,GER,False,4.55,2022-02-04 09:15:52Z,45,male


Drop row can be performed in case we want to remove missing values. The
following code show dropping rows containing missing values:

In [29]:
var date = DateTime.Now.AddDays(-20);
//define a dictionary of data
var dict = new Dictionary<string, List<object>>
{
    { "ID",new List<object>() { 1,2,3} },
    { "City",new List<object>() { "Sarajevo", "Seattle", DataFrame.NAN } },
    { "Zip Code",new List<object>() { 71000,98101,10115 } },
    { "State",new List<object>() {"BiH","USA","GER" } },
    { "IsHome",new List<object>() { true, false, false} },
    { "Values",new List<object>() { 3.14, 3.21, 4.55 } },
    { "Date",new List<object>() { date , DateTime.Now.AddDays(-10) , date } },
    { "Age", new List<object>() { 31, 25, 45 } },
    { "Gender", new List<object>() { "male", DataFrame.NAN, "male" } }
};

//create df
var df = new DataFrame(dict);
df

(index),ID,City,Zip Code,State,IsHome,Values,Date,Age,Gender
0,1,Sarajevo,71000,BiH,True,3.14,2022-01-20 09:15:52Z,31,male
1,2,Seattle,98101,USA,False,3.21,2022-01-30 09:15:52Z,25,<null>
2,3,<null>,10115,GER,False,4.55,2022-01-20 09:15:52Z,45,male


In [30]:
//drop rows with missing values
var newDf = df.DropNA();
newDf

(index),ID,City,Zip Code,State,IsHome,Values,Date,Age,Gender
0,1,Sarajevo,71000,BiH,True,3.14,2022-01-20 09:15:52Z,31,male


Replacing missing values with specified value can be achieve by using `FillNA`
method. The following code replaces the missing values with `replValue`:

In [31]:
var date = DateTime.Now.AddDays(-20);
//define a dictionary of data
var dict = new Dictionary<string, List<object>>
{
    { "ID",new List<object>() { 1,2,3} },
    { "City",new List<object>() { "Sarajevo", "Seattle", DataFrame.NAN } },
    { "Zip Code",new List<object>() { 71000,98101,10115 } },
    { "State",new List<object>() {"BiH","USA","GER" } },
    { "IsHome",new List<object>() { true, false, false} },
    { "Values",new List<object>() { 3.14, 3.21, 4.55 } },
    { "Date",new List<object>() { date , DateTime.Now.AddDays(-10) , date } },
    { "Age", new List<object>() { 31, 25, 45 } },
    { "Gender", new List<object>() { "male", "female", "male" } }
};

//create df
var df = new DataFrame(dict);
df


(index),ID,City,Zip Code,State,IsHome,Values,Date,Age,Gender
0,1,Sarajevo,71000,BiH,True,3.14,2022-01-20 09:15:53Z,31,male
1,2,Seattle,98101,USA,False,3.21,2022-01-30 09:15:53Z,25,female
2,3,<null>,10115,GER,False,4.55,2022-01-20 09:15:53Z,45,male


In [32]:
//drop rows with missing values
string replValue = "Berlin";
df.FillNA(replValue);
df

(index),ID,City,Zip Code,State,IsHome,Values,Date,Age,Gender
0,1,Sarajevo,71000,BiH,True,3.14,2022-01-20 09:15:53Z,31,male
1,2,Seattle,98101,USA,False,3.21,2022-01-30 09:15:53Z,25,female
2,3,Berlin,10115,GER,False,4.55,2022-01-20 09:15:53Z,45,male


### Filter and Conditional Remove

`Filter` operation returns data frame with specific filter condition. Also, `RemoveRows` method acts opposite  and removes all rows with specified condition by using delegate implementation. The following code shows filter data frame between dates:

In [33]:
var date1 = DateTime.Now.AddDays(-20);
var date2 = DateTime.Now.AddDays(-10);
var date3 = DateTime.Now.AddDays(-5);
//define a dictionary of data
var dict = new Dictionary<string, List<object>>
{
    { "ID",new List<object>() { 1,2,3} },
    { "City",new List<object>() { "Sarajevo", "Seattle", "Berlin" } },
    { "Zip Code",new List<object>() { 71000,98101,10115 } },
    { "State",new List<object>() {"BiH","USA","GER" } },
    { "IsHome",new List<object>() { true, false, false} },
    { "Values",new List<object>() { 3.14, 3.21, 4.55 } },
    { "Date",new List<object>() { date3 , date2 , date1 } },
    { "Age", new List<object>() { 31, 25, 45 } },
    { "Gender", new List<object>() { "male", "female", "male" } }
};

//create df
var df = new DataFrame(dict);

df


(index),ID,City,Zip Code,State,IsHome,Values,Date,Age,Gender
0,1,Sarajevo,71000,BiH,True,3.14,2022-02-04 09:15:53Z,31,male
1,2,Seattle,98101,USA,False,3.21,2022-01-30 09:15:53Z,25,female
2,3,Berlin,10115,GER,False,4.55,2022-01-20 09:15:53Z,45,male


In [34]:
//filter data frame between dates
var opers = new FilterOperator[2] { FilterOperator.Greather, FilterOperator.Less };
var cols = new string[] { "Date", "Date" };
var values = (new DateTime[] { DateTime.Now.AddDays(-7), DateTime.Now.AddDays(-3) }).Select(x => (object)x).ToArray();
//filter
var filteredDF = df.Filter(cols, values, opers);
filteredDF

(index),ID,City,Zip Code,State,IsHome,Values,Date,Age,Gender
0,1,Sarajevo,71000,BiH,True,3.14,2022-02-04 09:15:53Z,31,male


The following code shows how to remove all rows in the data frame containing `Miami` value in the `City` column:

In [35]:
var date = DateTime.Now.AddDays(-20);
//define a dictionary of data
var dict = new Dictionary<string, List<object>>
{
    { "ID",new List<object>() { 1,2,3} },
    { "City",new List<object>() { "Sarajevo", "Seattle", DataFrame.NAN } },
    { "Zip Code",new List<object>() { 71000,98101,10115 } },
    { "State",new List<object>() {"BiH","USA","GER" } },
    { "IsHome",new List<object>() { true, false, false} },
    { "Values",new List<object>() { 3.14, 3.21, 4.55 } },
    { "Date",new List<object>() { date , DateTime.Now.AddDays(-10) , date } },
    { "Age", new List<object>() { 31, 25, 45 } },
    { "Gender", new List<object>() { "male", "female", "male" } }
};

//create df
var dff = new DataFrame(dict);

dff

(index),ID,City,Zip Code,State,IsHome,Values,Date,Age,Gender
0,1,Sarajevo,71000,BiH,True,3.14,2022-01-20 09:15:54Z,31,male
1,2,Seattle,98101,USA,False,3.21,2022-01-30 09:15:54Z,25,female
2,3,<null>,10115,GER,False,4.55,2022-01-20 09:15:54Z,45,male


In [36]:
//remove rows with 'Seattle'
var newDf = dff.RemoveRows((row, i) => row["City"]?.ToString() == "Seattle");
newDf

(index),ID,City,Zip Code,State,IsHome,Values,Date,Age,Gender
0,1,Sarajevo,71000,BiH,True,3.14,2022-01-20 09:15:54Z,31,male
2,3,<null>,10115,GER,False,4.55,2022-01-20 09:15:54Z,45,male


As can be seen the Function delegate is implemented with boolean return type. Every row will be remove if function delegate returns ```true``` value.

### Sorting in data frame

Data frame can be sorted by using `SortBy` or `SortByDescending`. The following code sorts data frame in ascending and descending order:

In [37]:
var dict = new Dictionary<string, List<object>>
{
    { "col1",new List<object>() { 1,31,41,51,61,11,21,71,81,91} },
    { "col2",new List<object>() { 2,32,42,52,62,12,22,72,82,92 } },
    { "col3",new List<object>() { 3,43,33,63,53,13,23,73,83,93 } },
    { "col4",new List<object>() { 4,54,44,34,64,14,24,74,84,94} },

};
//
var df = new DataFrame(dict);

var dict1 = new Dictionary<string, List<object>>
{
    { "col1",new List<object>() { 1,11,21,31,41,51,61,71,81,91} },
    { "col2",new List<object>() { 2,12,22,32,42,52,62,72,82,92 } },
    { "col3",new List<object>() { 3,13,23,43,33,63,53,73,83,93 } },
    { "col4",new List<object>() { 4,14,24,54,44,34,64,74,84,94} },
};
var df1 = new DataFrame(dict1);

df1

(index),col1,col2,col3,col4
0,1,2,3,4
1,11,12,13,14
2,21,22,23,24
3,31,32,43,54
4,41,42,33,44
5,51,52,63,34
6,61,62,53,64
7,71,72,73,74
8,81,82,83,84
9,91,92,93,94


Same implementation would be in case of descending sort, except that the `SortByDescenging` would be called.

### GroupBy and Rolling 

Rolling operation provides calculation on specific number (window size) of successive data frame rows. The following code shows rolling of `sum` operation performed on column `A`.

In [38]:
var dict = new Dictionary<string, List<object>>
{
    { "ID",new List<object>()  { 1,2,3,4,5,6,7,8,9,10} },
    { "A",new List<object>()  { -2.385977,-1.004295,0.735167, -0.702657,-0.246845,2.463718, -1.142255,1.396598, -0.543425,-0.64050} },
    { "B",new List<object>()  { -0.102758,0.905829, -0.165272,-1.340923,0.211596, 3.157577, 2.340594, -1.647453,1.761277, 0.289374} },
    { "C",new List<object>()  { 0.438822, -0.954544,-1.619346,-0.706334,-0.901819,-1.380906,-0.039875,1.677227, -0.220481,-1.55067} },
    { "D",new List<object>()  { "chair", "label", "item", "window", "computer", "label", "chair", "item", "abaqus", "window" } },
    {"E", new List<object>() { DateTime.ParseExact("12/20/2016", "MM/dd/yyyy", CultureInfo.InvariantCulture, DateTimeStyles.None),
                                DateTime.ParseExact("6/13/2016" , "M/dd/yyyy", CultureInfo.InvariantCulture, DateTimeStyles.None),
                                DateTime.ParseExact("8/25/2016",  "M/dd/yyyy", CultureInfo.InvariantCulture, DateTimeStyles.None),
                                DateTime.ParseExact("11/4/2016" , "MM/d/yyyy", CultureInfo.InvariantCulture, DateTimeStyles.None),
                                DateTime.ParseExact("6/18/2016",  "M/dd/yyyy", CultureInfo.InvariantCulture, DateTimeStyles.None),
                                DateTime.ParseExact("3/8/2016" ,  "M/d/yyyy", CultureInfo.InvariantCulture, DateTimeStyles.None),
                                DateTime.ParseExact("9/3/2016" ,  "M/d/yyyy", CultureInfo.InvariantCulture, DateTimeStyles.None),
                                DateTime.ParseExact("11/24/2016", "MM/dd/yyyy", CultureInfo.InvariantCulture, DateTimeStyles.None),
                                DateTime.ParseExact("6/16/2016",  "M/dd/yyyy", CultureInfo.InvariantCulture, DateTimeStyles.None),
                                DateTime.ParseExact("1/31/2016",  "M/dd/yyyy", CultureInfo.InvariantCulture, DateTimeStyles.None)}
    }
};

//
var df = new DataFrame(dict);
var rollingdf = df.Rolling(3, new Dictionary<string, Aggregation> { { "A", Aggregation.Sum } });

rollingdf

(index),A
0,<null>
1,<null>
2,-2.655105
3,-0.971785
4,-0.214335
5,1.514216
6,1.074618
7,2.718061
8,-0.289082
9,0.212673


GroupBy operation perform grouping similar rows in data frame. The following code groups data frame based on `Gender` column:

In [39]:
var date1 = DateTime.Now.AddDays(-20);
var date2 = DateTime.Now.AddDays(-10);
var date3 = DateTime.Now.AddDays(-5);
//define a dictionary of data
var dict = new Dictionary<string, List<object>>
{
    { "ID",new List<object>() { 1,2,3} },
    { "City",new List<object>() { "Sarajevo", "Seattle", "Berlin" } },
    { "Zip Code",new List<object>() { 71000,98101,10115 } },
    { "State",new List<object>() {"BiH","USA","GER" } },
    { "IsHome",new List<object>() { true, false, false} },
    { "Values",new List<object>() { 3.14, 3.21, 4.55 } },
    { "Date",new List<object>() { date3 , date2 , date1 } },
    { "Age", new List<object>() { 31, 25, 45 } },
    { "Gender", new List<object>() { "male", "female", "male" } }
};

//create df
var df = new DataFrame(dict);
//group df by gender
var gDf = df.GroupBy("Gender");
var swqs = gDf.ToStringBuilder();
swqs

Group By Column: Gender
male                
        ID      City    Zip CodeState   IsHome  Values  Date    Age     Gender  
0       1       Sarajevo71000   BiH     True    3.14    2/4/2022 9:15:56 AM31      male    
2       3       Berlin  10115   GER     False   4.55    1/20/2022 9:15:56 AM45      male    

female              
        ID      City    Zip CodeState   IsHome  Values  Date    Age     Gender  
1       2       Seattle 98101   USA     False   3.21    1/30/2022 9:15:56 AM25      female  



In case two grouping columns should be applied (`Gender` and `City`), the following code is used:

In [40]:
var date1 = DateTime.Now.AddDays(-20);
var date2 = DateTime.Now.AddDays(-10);
var date3 = DateTime.Now.AddDays(-5);
//define a dictionary of data
var dict = new Dictionary<string, List<object>>
{
    { "ID",new List<object>() { 1,2,3} },
    { "City",new List<object>() { "Sarajevo", "Sarajevo", "Berlin" } },
    { "Zip Code",new List<object>() { 71000,98101,10115 } },
    { "State",new List<object>() {"BiH","USA","GER" } },
    { "IsHome",new List<object>() { true, false, false} },
    { "Values",new List<object>() { 3.14, 3.21, 4.55 } },
    { "Date",new List<object>() { date3 , date2 , date1 } },
    { "Age", new List<object>() { 31, 25, 45 } },
    { "Gender", new List<object>() { "male", "female", "male" } }
};

//create df
var df = new DataFrame(dict);
//group df by gender
var gDf = df.GroupBy("Gender","City");
var swqs = gDf.ToStringBuilder();
swqs

Group By Column: Gender, City                
male                Sarajevo            
        ID      City    Zip CodeState   IsHome  Values  Date    Age     Gender  
0       1       Sarajevo71000   BiH     True    3.14    2/4/2022 9:15:59 AM31      male    

Berlin              
        ID      City    Zip CodeState   IsHome  Values  Date    Age     Gender  
2       3       Berlin  10115   GER     False   4.55    1/20/2022 9:15:59 AM45      male    

female              Sarajevo            
        ID      City    Zip CodeState   IsHome  Values  Date    Age     Gender  
1       2       Sarajevo98101   USA     False   3.21    1/30/2022 9:15:59 AM25      female  



Often, after grouping aggregation is applied on each group of data frame. The following code shows combination of  grouping and rolling. The following code, groups data by `Gender` column, then perform aggregation on two columns `Age` and `Values`.

In [41]:
var date1 = DateTime.Now.AddDays(-20);
var date2 = DateTime.Now.AddDays(-10);
var date3 = DateTime.Now.AddDays(-5);
//define a dictionary of data
var dict = new Dictionary<string, List<object>>
{
    { "ID",new List<object>() { 1,2,3,4} },
    { "City",new List<object>() { "Sarajevo", "Seattle", "Berlin", "Amsterdam" } },
    { "Zip Code",new List<object>() { 71000,98101,10115, 11000 } },
    { "State",new List<object>() {"BiH","USA","GER", "NL" } },
    { "IsHome",new List<object>() { true, false, false, true} },
    { "Values",new List<object>() { 3.14, 3.21, 4.55, 5.55 } },
    { "Date",new List<object>() { date3 , date2 , date1 , date2} },
    { "Age", new List<object>() { 31, 25, 45, 33 } },
    { "Gender", new List<object>() { "male", "female", "male", "female" } }
};

//create df
var df = new DataFrame(dict);
//group df by gender
var gDf = df.GroupBy("Gender")
            .Rolling(2, new Dictionary<string, Aggregation>()
                    { { "Values", Aggregation.Sum }, 
                       { "Age", Aggregation.Avg } }).TakeEvery(2);
gDf


(index),Values,Age,Gender
2,7.69,38,male
3,8.76,29,female


### Merge and Join two data frames

`Merge` method merges two data frames on a ways similar like SQL LIKE statement. In order to call it, you have to provide two data frames, left and right key columns and type of joining. The condition for merging is based on keys columns where the values from the first left key column must be equal to the first right key column. In case of more than one key column the values from the corresponded columns must be  equal. The following code shows mergings two data frames with two keys columns with inner join type.

In [42]:
var dict = new Dictionary<string, List<object>>
{
    { "itemID",new List<object>() { "foo", "bar", "baz", "foo" } },
    { "catId",new List<object>() { "A", "A", "B", "B" } },
    { "value1",new List<object>() { 1,2,3,4 } },
};
var dict1 = new Dictionary<string, List<object>>
{
    { "item2ID",new List<object>() {"foo", "bar", "baz","foo" } },
    { "cat2ID",new List<object>() { "A", "B", "A", "B" } },
    { "value2",new List<object>() { 5,6,7,8 } },
};
//
var df1 = new DataFrame(dict);
var df2 = new DataFrame(dict1);
var mergedDf = df1.Merge(df2, 
                    new string[] { "itemID", "catId" }, //left key columns
                    new string[] { "item2ID", "cat2ID" }, //right key columns
                    JoinType.Inner);//join type
mergedDf

(index),itemID,catId,value1,item2ID,cat2ID,value2
0,foo,A,1,foo,A,5
3,foo,B,4,foo,B,8


The limitation for the number of key columns is 3. That means you cannot merge two data frames with more than 3 key columns.

Unlike `Merge` which is working on ordinary columns, a `Join` method works only on data frame index, but the way of joining the data frames is the same. The following code joins two data frame base on their indexes:

In [43]:
var dict1 = new Dictionary<string, List<object>>
{
    { "itemID",new List<object>() { "foo", "bar", "baz", "foo" } },
    { "value1",new List<object>() { 1,2,3,4 } },
};
var dict2 = new Dictionary<string, List<object>>
{
    { "item2ID",new List<object>() {"foo", "bar", "baz" } },
    { "value2",new List<object>() { 5,6,7 } },
};
//
var df1 = new DataFrame(dict1);
var df2 = new DataFrame(dict2);
//
var mergedDf = df1.Join(df2, JoinType.Inner);
mergedDf

(index),itemID,value1,item2ID,value2
0,foo,1,foo,5
1,bar,2,bar,6
2,baz,3,baz,7


In [44]:
//
mergedDf = df1.Join(df2, JoinType.Left);
mergedDf

(index),itemID,value1,item2ID,value2
0,foo,1,foo,5
1,bar,2,bar,6
2,baz,3,baz,7
3,foo,4,<null>,<null>


### Select data from data frame

The data from data frame can be selected on many ways. It is useful to have set of `Linq` oriented methods. There are several methods for selecting data from data frame:

- ```TakeEvery(int nthRow)``` - select every nth row, created data frame and return.
- ```TakeRandom(int rows)``` - select randomly `nrows` and return data frame.
- ```Tail(int count = 5)``` - select last `count` rows.
- ```Head(int count = 5)``` - select first `count` rows.