# A basic .NET for Apache Spark example

## Preparation

### Start the Backend in Debug mode

**_Important_**: Before you run any cells in this example, please ensure that you have [started the .NET for Apache Spark DotnetBacken in Debug mode](01-start-spark-debug.ipynb).

### Install the Microsoft.Spark NuGet package

In [None]:
#r "nuget: Microsoft.Spark,1.0.0"

---

## Coding

### Create a new SparkSession
The entry point to all .NET for Apache Spark functionality is a SparkSession. To create one, just use SparkSession.Builder():

In [None]:
using Microsoft.Spark.Sql;
using Microsoft.Spark.Sql.Types;
using static Microsoft.Spark.Sql.Functions;

var spark = SparkSession.Builder().GetOrCreate();

### Create a new DataFrame
There are multiple ways of creating new DataFrames. Most of the time you will read data from another source. For this basic example, we just define our DataFrame via the code below, however.

In [None]:
var data = new List<GenericRow>
    {
        new GenericRow(new object[] { "Batman", "M", 3093, true, new Date(1939, 5, 1) }),
        new GenericRow(new object[] { "Superman", "M", 2496, true, new Date(1986, 10, 1) }),
        new GenericRow(new object[] { "Wonder Woman", "F", 1231, true, new Date(1941, 12, 1) }),
        new GenericRow(new object[] { "Lois Lane", "F", 934, true, new Date(1938, 6, 1) })
    };

var schema = new StructType(new List<StructField>()
    {
        new StructField("Name", new StringType()),
        new StructField("Sex", new StringType()),
        new StructField("Appearances", new IntegerType()),
        new StructField("Alive", new BooleanType()),
        new StructField("FirstAppearance", new DateType())
    });

DataFrame df = spark.CreateDataFrame(data, schema);

### Get a quick overview of your data

To verify/display the Spark data types of a DataFrame use **PrintSchema()**

In [None]:
df.PrintSchema();

Use **Show()** to have a look at the first couple of rows of your DataFrame.

In [None]:
df.Show();

To get some basic DataFrame statistics, use **Describe()**.

In [None]:
df.Describe().Show();

### Filtering

Column style filtering

In [None]:
df.Filter(df.Col("Name") == "Batman").Show();

In [None]:
df.Filter(df["Appearances"] > 1000).Show();

SQL style Filtering

In [None]:
df.Filter("Sex == 'F'").Show();

In [None]:
df.Filter("FirstAppearance >= '1971-01-01'").Show()

In [None]:
df.Filter("Name not like '%man'").Show()

### Grouping

In [None]:
df.GroupBy("Sex").Count().Show();

In [None]:
df.GroupBy("Sex")
    .Agg(Count(df["Sex"]), Avg(df["Appearances"]), Min(df["Appearances"]), Max(df["Appearances"]))
    .OrderBy(Desc("avg(Appearances)"))
    .Show();

### Cleanup
Stop your spark session, once you are done.

In [None]:
spark.Stop();