# Assignment 3: Higher-Order Functions
---
Welcome to Assignment3, where we delve deeper into the power of higher-order functions in Python, focusing on their applications in data science and machine learning.

In this assignment, you'll explore how these functions can be used to transform data, identify outliers, engineer features, and ultimately prepare your datasets for building robust machine learning models.

#### General Instructions 

For all questions, please adhere to the following guidelines:

- **Code Clarity:** Your code should be well-formatted, easy to understand, and include meaningful variable names.
- **Docstrings:**  Use docstrings to document your functions and explain their purpose, arguments, and return values.
- **Testing:**  Use the same given data example of your code to demonstrate its functionality.

---
## Q1. Scaling Features with `map()`

**Task:**

Create a function to scale a list of features to a range between 0 and 1. 

**Requirements:**

- **Function:**  Define a function or use an anonymous function (lambda).
- **Scaling Formula:** The function should scale each feature value using the formula:
  ```python
  (feature - min_val) / (max_val - min_val)
  ```
  where `min_val` and `max_val` are the minimum and maximum values in the input list.
- **`map()` Function:** Utilize the `map()` function to apply the scaling function to every element in the list.

**Output:** 

Return a new list containing the scaled feature values.

In [20]:
# TODO: Define a function that scales a feature between 0 and 1
'''
This function used to scale features
Args:
    x : number
    min_val : number
    max_val : number
'''
scaler = lambda x, min_val, max_val: (x - min_val) / (max_val - min_val)

# Simulated feature data (age feature)
age_feature = [25, 32, 45, 18, 60, 55, 48, 72]

# TODO: Find the minimum and maximum values
min_val, max_val = min(age_feature), max(age_feature)

# TODO: Scale each feature using your function and map
scaled_ages = list(map(lambda x: scaler(x, min_val, max_val), age_feature))

# TODO: Print the scaled features to verify the output
print(scaled_ages)

[0.12962962962962962, 0.25925925925925924, 0.5, 0.0, 0.7777777777777778, 0.6851851851851852, 0.5555555555555556, 1.0]


---
## Q2. Filtering Outliers


The basic approach is to identify potential outliers based on a defined threshold. A common method is to use the **Interquartile Range (IQR)**. Here's how it works:

1. **Calculate the IQR:**
   - Find the first quartile (Q1) and the third quartile (Q3) of the list.
   - IQR = Q3 - Q1.

2. **Define Outlier Boundaries:**
   - Lower Boundary: Q1 - 1.5 * IQR
   - Upper Boundary: Q3 + 1.5 * IQR

3. **Filter the List:**
   - Remove any values that fall outside the calculated boundaries.

In [4]:
# Simulated data for temperatures
temperatures = [23, 25, 20, 23, -5, 21, 18, 19, 24, 21,19, 24, 0, 
                20, 24, 55, 22, 50, 22, 20, 21, 22, 20, 25, 19, 
                22, 26, 23, 21, 23, 17, 20, 18]

# TODO: Define a function to calculate percentiles (Q1, Q2, Q3)
import numpy as np
'''
This function used for calculate the percentiles of a list
Args:
    data : list
    percentile : int
Return:
    Percentiles : number
'''
def calculate_percentile(data, percentile):
    return np.percentile(data, percentile)

# TODO: Calculate Q1 by calling the function with percentile=25
q1 = calculate_percentile(temperatures,25)
print("Q1:", q1)

# TODO: Calculate Q1 by calling the function with percentile=50
q2 = calculate_percentile(temperatures,50) 
print("Q2:", q2)

# TODO: Calculate Q1 by calling the function with percentile=75
q3 = calculate_percentile(temperatures,75) 
print("Q3:", q3)

# TODO: Calculate IQR by subtracting Q1 from Q3
iqr = q3 - q1
print("IQR:", iqr)

Q1: 20.0
Q2: 21.0
Q3: 23.0
IQR: 3.0


**Define the lower and upper bounds for outliers**

In [5]:
# TODO: Calculate lower bound using formula: Q1 - 1.5 * IQR
lower_bound = q1 - 1.5 * iqr
print("Lower Bound:", lower_bound)

# TODO: Calculate upper bound using formula: Q3 + 1.5 * IQR
upper_bound = q3 + 1.5 * iqr
print("Upper Bound:", upper_bound)


Lower Bound: 15.5
Upper Bound: 27.5


**Filter the outliers**

In [10]:
# TODO: Create a list comprehension that only includes values between the bounds

filtered_temperatures = [temp for temp in temperatures if lower_bound<=temp<=upper_bound]

**Output the results**

In [11]:
print("Original Temperatures:", temperatures)
print("Filtered Temperatures:", filtered_temperatures)

Original Temperatures: [23, 25, 20, 23, -5, 21, 18, 19, 24, 21, 19, 24, 0, 20, 24, 55, 22, 50, 22, 20, 21, 22, 20, 25, 19, 22, 26, 23, 21, 23, 17, 20, 18]
Filtered Temperatures: [23, 25, 20, 23, 21, 18, 19, 24, 21, 19, 24, 20, 24, 22, 22, 20, 21, 22, 20, 25, 19, 22, 26, 23, 21, 23, 17, 20, 18]


---
## Q3: Outlier Removal Function

Take the outlier filtering steps from the previous explanation (Q2) and encapsulate them into a reusable function. This function should be designed to be easily applied to different festures for consistent outlier removal and analysis.

In [12]:
# Simulated data 
temperatures = [23, 25, 23, -5, 18, 19, 24, 21, 19, 24, 0, 
                24, 55, 50, 20, 25, 22, 26, 23, 17, 18]

humidity = [60, 65, 72, 68, 75, 80, 82, 78, 62, 68, 71, 
            69, 77, 81, 79, 64, 69, 67,  74, 68, 75, 100] 

In [13]:
# TODO: Define a function to retraive only data without outliers. Use the previous percentiles function
# Temperatures Lower bound @ Upper bound
iqr_temperatures = calculate_percentile(temperatures,75) - calculate_percentile(temperatures,25)
lower_bound_temp = calculate_percentile(temperatures,25) - 1.5 * iqr
upper_bound_temp = calculate_percentile(temperatures,75) + 1.5 * iqr

# humidity Lower bound @ Upper bound
iqr_humidity = calculate_percentile(humidity,75) - calculate_percentile(humidity,25)
lower_bound_hum = calculate_percentile(humidity,25) - 1.5 * iqr
upper_bound_hum = calculate_percentile(humidity,75) + 1.5 * iqr

In [15]:
# TODO: Apply the defined function to retraive only data without outliers. 

filtered_temperatures = [temp for temp in temperatures if lower_bound_temp <= temp <= upper_bound_temp]
filtered_humidity = [hum for hum in humidity if lower_bound_hum <= hum <= upper_bound_hum]

**Output the results**

In [16]:
print("Original Temperatures:", sorted(temperatures))
print("Filtered Temperatures:", filtered_temperatures)
print('='*110)
print("Original Humidity:", sorted(humidity))
print("Filtered Humidity:", filtered_humidity)

Original Temperatures: [-5, 0, 17, 18, 18, 19, 19, 20, 21, 22, 23, 23, 23, 24, 24, 24, 25, 25, 26, 50, 55]
Filtered Temperatures: [23, 25, 23, 18, 19, 24, 21, 19, 24, 24, 20, 25, 22, 26, 23, 17, 18]
Original Humidity: [60, 62, 64, 65, 67, 68, 68, 68, 69, 69, 71, 72, 74, 75, 75, 77, 78, 79, 80, 81, 82, 100]
Filtered Humidity: [65, 72, 68, 75, 80, 82, 78, 68, 71, 69, 77, 81, 79, 64, 69, 67, 74, 68, 75]
