## User Defined Functions (UDFs) in PySpark
**Overview** User Defined Functions (UDFs) in PySpark allow the application of plain Python functions to rows in a DataFrame, similar to mapping or applying functions in Pandas. Due to the strongly typed nature of PySpark, UDFs require specifying output types, akin to when creating a DataFrame from a list.
**Using UDFs:**
- Importing UDF Module: Start by importing the udf function from PySpark SQL functions.
- **Defining a Simple Function:**
    - Example: Define a simple square function that returns the square of a number.
- **Creating a UDF:**
    - Wrap the function using a lambda to handle data.
    - Specify the return type with PySpark SQL types (e.g., `IntegerType`).
    - Assign the UDF to a variable (`square_udf_int`) for reuse.

## Example Implementation:
1. **Square Function UDF:**
- Define a square function.
- Convert it to a UDF implementing an integer output type.
- Use this function to transform scores in a DataFrame, resulting in a new column with squared values.
2. **Handling Complex Data Types (Dates):**
- Dates are stored as a string with time components.
- Use split functions on a string (e.g., separating dates by commas into an array).
- Define a UDF with a return type of a list of strings.
- Apply the UDF to split date strings, producing a structured date list.

## Working with UDFs in PySpark:
- **Explode Functionality:**
- Import and use `explode` to manage list structures, transforming a column of lists into individual rows.
- Use `withColumn` to create a new column for each element in the exploded date list structure.
- This creates rows for each check-in date while maintaining the original business ID.

## Practical Example:
- Define a DataFrame column.
- Apply UDFs to transform the data type from strings to lists.
- Use explode to iterate and create individual entries for transformed lists.
- Validate outputs ensuring the transformation matches expected data structures.

The exercise embodies using UDFs to manage and transform data types not natively supported or easily manageable in traditional SQL-type operations within PySpark.