
# üê∑ Pig UDF for Data Normalization

Apache Pig allows **User Defined Functions (UDFs)** to extend Pig Latin for custom processing like **data normalization**.

---

## üìÇ Dataset (`numbers.txt`)

```text
10
20
30
40
50
````

---

## üß© Step 1: Create a Custom UDF (Java)

### `Normalize.java`

```java
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;

public class Normalize extends EvalFunc<Double> {

    public Double exec(Tuple input) throws IOException {
        if (input == null || input.size() == 0)
            return null;

        double value = (Double) input.get(0);

        // Example normalization (value / 100)
        return value / 100;
    }
}
```

‚úî Normalizes numeric values
‚úî Simple logic for demonstration

---

## üß© Step 2: Compile and Create JAR

```bash
javac Normalize.java
jar -cf normalize.jar Normalize.class
```

---

## üêñ Step 3: Pig Script

### Register the UDF

```pig
REGISTER 'normalize.jar';
```

### Load Dataset

```pig
data = LOAD 'numbers.txt'
AS (value:double);
```

### Apply UDF for Normalization

```pig
normalized_data = FOREACH data
GENERATE Normalize(value) AS normalized_value;
```

### Display Output

```pig
DUMP normalized_data;
```

---

## ‚úÖ Output

```text
(0.1)
(0.2)
(0.3)
(0.4)
(0.5)
```
