# User-Defined Functions with Complex Types in .NET for Apache Spark

A user-defined function, or UDF, is a routine that can take in parameters, perform some sort of calculation, and then return a result. UDFs are a powerful mechanism to encapsulate your business logic and use the power of Spark to execute them at scale. This notebook explains how to construct UDFs in C# and includes example functions, such as how to use UDFs with complex Row objects.

[Addition Reading](https://docs.microsoft.com/en-us/dotnet/spark/how-to-guides/deploy-worker-udf-binaries)

Now let's get started with some examples!

## Create a simple DataFrame

Create a DataFrame which will be used in the following examples.

In [None]:
DataFrame df = spark.Range(0, 5).WithColumn("structId", Struct("id"));

## UDF that takes in Row objects

Now, let us define a UDF that takes in Row objects and adds 100 to the original data's first column.


In [17]:
Func<Column, Column> udf1 = Udf<Row, int>(
    row => row.GetAs<int>(0) + 100);

We now show how to use a UDF with DataFrames

In [16]:
df.Select(udf1(df["structId"]).As("newId")).Show();

+-----+
|newId|
+-----+
|  100|
|  101|
|  102|
|  103|
|  104|
+-----+

## UDF that returns Row objects

Often times, you might want to accept a Row as input, and construct a **new** Row based on some complex business logic. You can do this as follows:


In [19]:
using Microsoft.Spark.Sql.Types;

// First define the schema for Row objects
var schema = new StructType(new[]
{
    new StructField("col1", new IntegerType()),
    new StructField("col2", new StringType())
});

// Then define UDF that returns Row objects          
Func<Column, Column> udf2 = Udf<int>(
    id => new GenericRow(new object[] { id, "abc" }), schema);

In [21]:
// Use UDF with DataFrames
df.Select(udf2(df["id"]).As("newStructId")).Show();

+-----------+
|newStructId|
+-----------+
|   [0, abc]|
|   [1, abc]|
|   [2, abc]|
|   [3, abc]|
|   [4, abc]|
+-----------+

## Chained UDF with Row objects


In [22]:
// Chained UDF using udf1 and udf2 defined above.
df.Select(udf1(udf2(df["id"])).As("chainedId")).Show();

+---------+
|chainedId|
+---------+
|      100|
|      101|
|      102|
|      103|
|      104|
+---------+