<a href="https://colab.research.google.com/github/cloudpedagogy/data-science-programming/blob/main/python-programming/05_Functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Functions


## Overview

In Python, functions are reusable blocks of code that perform specific tasks. They allow you to break down your code into smaller, more manageable pieces, making it easier to understand, debug, and maintain. Functions in Python can take inputs, process them, and produce outputs. They promote code reusability, as you can call a function multiple times throughout your program.

**Defining Functions with `def**`:
In Python, functions are defined using the `def` keyword followed by the function name, parentheses, and a colon. You can define a function with or without parameters. For example, consider a function that calculates the average BMI (Body Mass Index) of a group of individuals in the Pima Indian dataset:

```python
def calculate_average_bmi(heights, weights):
    total_bmi = 0
    count = 0
    for height, weight in zip(heights, weights):
        bmi = weight / ((height / 100) ** 2)
        total_bmi += bmi
        count += 1
    average_bmi = total_bmi / count
    return average_bmi
```

**Function Arguments and Return Values**:
Functions can have arguments (inputs) and return values (outputs). Arguments are specified within the parentheses when defining the function. Return values are specified using the `return` statement. In the example above, the `calculate_average_bmi` function takes two arguments: `heights` and `weights`. It calculates the BMI for each individual and returns the average BMI.

**Anonymous (Lambda) Functions**:
Python supports the creation of anonymous functions using the `lambda` keyword. Lambda functions are typically used for short, one-line operations. For instance, suppose we want to calculate the squared value of each weight in the Pima Indian dataset. We can use a lambda function with the `map` function as follows:

```python
weights = [65, 70, 75, 80, 85]
squared_weights = list(map(lambda x: x ** 2, weights))
```

Here, the lambda function `lambda x: x ** 2` takes each weight `x` and returns its squared value. The `map` function applies this lambda function to each element in the `weights` list, returning a new list `squared_weights` containing the squared values.

**Map, Filter, Reduce Functions**:
The `map`, `filter`, and `reduce` functions are powerful tools for manipulating data. The `map` function applies a given function to each element in an iterable and returns a new iterable with the transformed values. The `filter` function applies a function to each element in an iterable and returns a new iterable containing only the elements that satisfy a given condition. The `reduce` function applies a function to the elements of an iterable in a cumulative way, reducing them to a single value.

For instance, let's say we want to filter out the heights below a certain threshold in the Pima Indian dataset. We can use the `filter` function along with a lambda function:

```python
heights = [160, 165, 170, 175, 180]
threshold = 170
filtered_heights = list(filter(lambda x: x >= threshold, heights))
```

In this example, the lambda function `lambda x: x >= threshold` checks if each height `x` is greater than or equal to the specified threshold. The `filter` function filters out the heights that do not meet this condition, and the result is stored in the `filtered_heights` list.

---



## Defining functions with def

Defining functions in Python allows you to encapsulate a block of code that performs a specific task or calculation. It enables you to reuse the code and make your program more modular and organized. In Python, you can define a function using the `def` keyword followed by the function name, parameters (if any), and a colon. The function body is indented and contains the code that defines the behavior of the function.

Here's an example using the Pima Indian Diabetes dataset to demonstrate defining a function:


In [None]:
import pandas as pd

# Load the Pima Indian Diabetes dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age", "Outcome"]
dataset = pd.read_csv(url, names=column_names)

# Define a function to calculate the average BMI
def calculate_average_bmi(data):
    total_bmi = data['BMI'].sum()
    num_entries = data.shape[0]
    average_bmi = total_bmi / num_entries
    return average_bmi

# Call the function and pass the dataset as an argument
avg_bmi = calculate_average_bmi(dataset)

# Print the average BMI
print("Average BMI:", avg_bmi)


In this example, we load the Pima Indian Diabetes dataset using the Pandas library. We define a function named `calculate_average_bmi` using the `def` keyword. The function takes one parameter, `data`, which represents the dataset.

Within the function body, we calculate the sum of the 'BMI' column by using the `sum()` function on `data['BMI']`. We also determine the number of entries in the dataset using the `shape[0]` property of the DataFrame. Then, we calculate the average BMI by dividing the total BMI by the number of entries. The result is stored in the `average_bmi` variable.

Finally, we return the `average_bmi` value using the `return` statement. We call the `calculate_average_bmi()` function and pass the `dataset` as an argument. The returned result is assigned to the `avg_bmi` variable. We print the average BMI using the `print()` function.


## Function arguments and return values

In Python, functions can have arguments and return values. Arguments are the inputs that are passed to a function, and return values are the outputs that a function produces and returns back to the caller. Functions provide a way to encapsulate reusable code and perform specific tasks.

Here's an example using the Pima Indian Diabetes dataset to demonstrate function arguments and return values:


In [None]:
import pandas as pd

# Load the Pima Indian Diabetes dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age", "Outcome"]
dataset = pd.read_csv(url, names=column_names)


# Define a function to calculate the average glucose level for a given age range
def calculate_average_glucose(age_min, age_max):
    # Filter the dataset based on the age range
    filtered_data = dataset[(dataset['Age'] >= age_min) & (dataset['Age'] <= age_max)]

    # Calculate the average glucose level
    average_glucose = filtered_data['Glucose'].mean()

    # Return the average glucose level
    return average_glucose


# Call the function with arguments and receive the return value
age_range_min = 30
age_range_max = 40
avg_glucose = calculate_average_glucose(age_range_min, age_range_max)

# Print the return value
print("Average Glucose for age range", age_range_min, "-", age_range_max, ":", avg_glucose)


In this example, we define a function called `calculate_average_glucose` that takes two arguments: `age_min` and `age_max`. These arguments represent the minimum and maximum age range. Within the function, the dataset is filtered based on the provided age range using comparison operators. Then, the average glucose level is calculated for the filtered dataset using the `mean()` function.

Finally, the calculated average glucose value is returned using the `return` statement. We call the function `calculate_average_glucose` by passing the arguments `age_range_min` and `age_range_max` (30 and 40 in this case), and store the returned value in the variable `avg_glucose`. Finally, we print the average glucose value for the specified age range.


## Anonymous (Lambda) Functions


Lambda functions, also known as anonymous functions, are a way to create small, one-line functions in Python without using the `def` keyword. They are useful when you need a simple function that you don't want to define separately. Lambda functions are typically used in combination with higher-order functions like `map()`, `filter()`, and `reduce()`.

The syntax for a lambda function is as follows:

```python
lambda arguments: expression
```
Here's an example using the Pima Indian Diabetes dataset to demonstrate the use of a lambda function:



In [None]:
import pandas as pd

# Load the Pima Indian Diabetes dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age", "Outcome"]
dataset = pd.read_csv(url, names=column_names)

# Use a lambda function to calculate BMI category
dataset['BMI_Category'] = dataset['BMI'].apply(lambda bmi: 'Normal' if bmi < 25 else 'Overweight')

# Print the updated dataset with BMI category
print(dataset[['BMI', 'BMI_Category']].head())


In this example, we load the Pima Indian Diabetes dataset using the Pandas library. We use a lambda function with the `apply()` method to calculate the BMI category for each record in the dataset.

The lambda function takes the BMI value as an argument (`bmi`) and checks if it is less than 25. If it is, the expression `'Normal'` is returned; otherwise, `'Overweight'` is returned. The result of the lambda function is assigned to a new column called 'BMI_Category' in the dataset.

Finally, we print the 'BMI' column and the newly added 'BMI_Category' column using the `head()` method to see the updated dataset with the BMI categories.


## Map, filter, reduce


In Python, `map()`, `filter()`, and `reduce()` are built-in functions that provide powerful tools for manipulating data in different ways.

1. `map()`: The `map()` function applies a given function to each item in an iterable (e.g., a list) and returns a new iterator with the results.

Here's an example using the `map()` function with the Pima Indian Diabetes dataset:


In [None]:
import pandas as pd

# Load the Pima Indian Diabetes dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age", "Outcome"]
dataset = pd.read_csv(url, names=column_names)

# Extract the square of the Glucose values using map()
squared_glucose = list(map(lambda x: x**2, dataset['Glucose']))

# Print the squared Glucose values
print(squared_glucose)


In this example, we load the Pima Indian Diabetes dataset using the Pandas library. We then use the `map()` function to apply a lambda function to each value in the 'Glucose' column. The lambda function squares each value, and the `map()` function returns an iterator with the squared values.

We convert the iterator to a list using the `list()` function and assign it to the variable `squared_glucose`. Finally, we print the squared Glucose values.

2. `filter()`: The `filter()` function creates a new iterator with elements from an iterable that satisfy a given condition.

Here's an example using the `filter()` function with the Pima Indian Diabetes dataset:


In [None]:
import pandas as pd

# Load the Pima Indian Diabetes dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age", "Outcome"]
dataset = pd.read_csv(url, names=column_names)

# Filter the dataset to include only records with Age greater than or equal to 30 using filter()
filtered_data = list(filter(lambda x: x >= 30, dataset['Age']))

# Print the filtered data
print(filtered_data)


In this example, we load the Pima Indian Diabetes dataset using the Pandas library. We then use the `filter()` function to create a new iterator that contains only the values from the 'Age' column that satisfy the lambda function's condition (age greater than or equal to 30).

We convert the iterator to a list using the `list()` function and assign it to the variable `filtered_data`. Finally, we print the filtered data.


3. `reduce()`: The `reduce()` function is used to apply a rolling computation to a sequence of values and returns a single result.

To use `reduce()`, we need to import it from the `functools` module:

```python
from functools import reduce
```

Here's an example using the `reduce()` function with the Pima Indian Diabetes dataset:


In [None]:
import pandas as pd
from functools import reduce

# Load the Pima Indian Diabetes dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age", "Outcome"]
dataset = pd.read_csv(url, names=column_names)

# Calculate the product of all Glucose values using reduce()
glucose_product = reduce(lambda x, y: x * y, dataset['Glucose'])

# Print the product of all Glucose values
print(glucose_product)


In this example, we load the Pima Indian Diabetes dataset using the Pandas library. We import the `reduce()` function from the `functools` module.

We then use the `reduce()` function to calculate the product of all values in the 'Glucose' column. The lambda function takes two arguments (`x` and `y`) and returns their product. The `reduce()` function applies this lambda function cumulatively to the 'Glucose' values, resulting in a single value, which is the product of all Glucose values.

Finally, we print the product of all Glucose values.


# Modules and Packages

# Overview

Python provides a modular approach to programming through the use of modules and packages. In this introduction, we will explore the concepts of modules and packages, importing modules, and delve into the exploration of built-in modules such as math and os using the Pima Indian dataset as an example.

**Modules and Packages**:
Modules are files containing Python code that define functions, variables, and classes, which can be used in other programs. A package is a collection of modules organized in a directory structure. Modules and packages help in organizing code and provide a way to reuse functionality across multiple projects. For instance, if we have a set of functions related to data analysis for the Pima Indian dataset, we can group them into a module or package for easy access and reuse.

**Exploring Built-in Modules**:
Python comes with a rich set of built-in modules that provide various functionalities. Two commonly used built-in modules are math and os. The math module provides mathematical functions and constants, while the os module provides functions for interacting with the operating system.



## Importing modules

In Python, modules are files containing Python definitions and statements that can be used in other Python programs. They allow you to organize and reuse code. To use functions, classes, or variables defined in a module, you need to import the module into your program.

Here's an example using the Pima Indian Diabetes dataset to demonstrate importing modules:


In [None]:
import pandas as pd

# Load the Pima Indian Diabetes dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age", "Outcome"]
dataset = pd.read_csv(url, names=column_names)

# Perform some operations using the pandas module
average_glucose = dataset['Glucose'].mean()
max_blood_pressure = dataset['BloodPressure'].max()

# Print the results
print("Average Glucose:", average_glucose)
print("Max Blood Pressure:", max_blood_pressure)


In this example, we import the `pandas` module using the `import` keyword. The `pandas` module provides functions and data structures for data manipulation and analysis.

We then use the `pd.read_csv()` function from the `pandas` module to load the Pima Indian Diabetes dataset into a DataFrame. This function allows us to read a CSV file and create a DataFrame object.

Next, we perform some operations using functions from the `pandas` module. We calculate the average glucose level by calling the `mean()` function on the 'Glucose' column of the dataset. We also find the maximum blood pressure by calling the `max()` function on the 'BloodPressure' column.

Finally, we print the results using the `print()` function to display the average glucose level and maximum blood pressure calculated using functions from the `pandas` module.


## Exploring built-in modules (like math, os)

Built-in modules in Python are pre-existing libraries that provide a wide range of functionalities to perform various operations. These modules extend the capabilities of Python by providing additional functions, classes, and constants. Two commonly used built-in modules in Python are `math` and `os`.

1. `math` module: This module provides mathematical functions and constants for numerical operations.

Here's an example using the `math` module with the Pima Indian Diabetes dataset:


In [None]:
import pandas as pd
import math

# Load the Pima Indian Diabetes dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age", "Outcome"]
dataset = pd.read_csv(url, names=column_names)

# Calculate the square root of the average glucose level
average_glucose = dataset['Glucose'].mean()
square_root = math.sqrt(average_glucose)

# Print the square root
print("Square root of the average glucose level:", square_root)


In this example, after loading the Pima Indian Diabetes dataset using Pandas, we import the `math` module. We then calculate the average glucose level using the `mean()` function from Pandas. Afterward, we use the `sqrt()` function from the `math` module to calculate the square root of the average glucose level. The `sqrt()` function is a mathematical function provided by the `math` module.

Finally, we print the square root of the average glucose level using the `print()` function.


2. `os` module: This module provides a way to interact with the operating system, allowing you to perform various operating system-related tasks such as file operations, directory manipulation, and more.

Here's an example using the `os` module with the Pima Indian Diabetes dataset:


In [None]:
import pandas as pd
import os

# Load the Pima Indian Diabetes dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age", "Outcome"]
dataset = pd.read_csv(url, names=column_names)

# Get the current working directory
current_directory = os.getcwd()

# Create a new directory to save the dataset
new_directory = os.path.join(current_directory, "diabetes_dataset")
os.mkdir(new_directory)

# Save the dataset in the new directory
new_file_path = os.path.join(new_directory, "diabetes_data.csv")
dataset.to_csv(new_file_path, index=False)

# Print the path of the new file
print("Path of the new file:", new_file_path)


In this example, after loading the Pima Indian Diabetes dataset using Pandas, we import the `os` module. We use the `getcwd()` function from the `os` module to get the current working directory.

Then, we create a new directory named "diabetes_dataset" using the `mkdir()` function from the `os` module. The `mkdir()` function creates a new directory with the specified name.

Afterward, we use the `join()` function from the `os` module to create the file path of the new file within the newly created directory. We save the dataset to the new file using the `to_csv()` function provided by Pandas.

Finally, we print the path of the new file using the `print()` function.


# Reflection Points

**Defining and Calling Functions:**

1. What is a function in Python?
   - Answer: A function is a named block of code that performs a specific task. It takes inputs (arguments) and may return a value.

2. How do you define a function in Python?
   - Answer: A function is defined using the `def` keyword, followed by the function name, parentheses for optional parameters, and a colon. The code block is indented below the function definition.

3. How do you call a function in Python?
   - Answer: To call a function, you simply write the function name followed by parentheses, optionally passing any required arguments.

**Anonymous (Lambda) Functions:**

1. What is an anonymous function in Python?
   - Answer: An anonymous function, also known as a lambda function, is a function without a name. It is a one-liner function used for simple tasks and is defined using the `lambda` keyword.

2. How do you define an anonymous function in Python?
   - Answer: An anonymous function is defined using the `lambda` keyword, followed by optional parameters, a colon, and the expression to be evaluated.

3. When are anonymous functions useful?
   - Answer: Anonymous functions are useful when you need a simple function that will not be reused elsewhere in your code. They are commonly used as arguments in higher-order functions or in situations where a function is needed temporarily.

**Modules and Packages:**

1. What are modules and packages in Python?
   - Answer: Modules are Python files containing functions, classes, and variables that can be imported and used in other Python scripts. Packages are directories containing multiple modules and an additional `__init__.py` file.

2. How do you import modules in Python?
   - Answer: Modules are imported using the `import` statement, followed by the module name. You can also import specific functions or variables from a module using the `from` keyword.

3. What is the purpose of the `__init__.py` file in a package?
   - Answer: The `__init__.py` file is required in a package directory. It can be left empty, but it signals to Python that the directory should be treated as a package. It may also contain initialization code that runs when the package is imported.


# A quiz on Functions


1. What keyword is used to define a function in Python?
   <br>a) define
   <br>b) function
   <br>c) def
   <br>d) return

2. What is the purpose of function arguments?
   <br>a) To specify the number of times a function should execute.
   <br>b) To provide input values to a function.
   <br>c) To define the output of a function.
   <br>d) To determine the name of a function.

3. How do you return a value from a function in Python?
   <br>a) Using the keyword "result."
   <br>b) By printing the value inside the function.
   <br>c) By assigning the value to a variable.
   <br>d) Using the keyword "return."

4. What is an anonymous function in Python?
   <br>a) A function that does not have a name.
   <br>b) A function that can only be used once.
   <br>c) A function that does not accept any arguments.
   <br>d) A function that is not defined using the "def" keyword.

5. Which keyword is used to define an anonymous function in Python?
   <br>a) lambda
   <br>b) func
   <br>c) anon
   <br>d) def

6. What is the purpose of the map() function in Python?
   <br>a) To filter a sequence based on a condition.
   <br>b) To perform a specified operation on each item of a sequence.
   <br>c) To reduce a sequence to a single value.
   <br>d) To sort a sequence in ascending order.

7. What is the purpose of the filter() function in Python?
   <br>a) To apply a function to each item of a sequence.
   <br>b) To check if a condition is True or False for each item of a sequence.
   <br>c) To combine a sequence of values into a single value.
   <br>d) To create a new sequence based on a condition.

8. What is the purpose of the reduce() function in Python?
   <br>a) To apply a function to each item of a sequence.
   <br>b) To check if a condition is True or False for each item of a sequence.
   <br>c) To combine a sequence of values into a single value.
   <br>d) To create a new sequence based on a condition.
---
Answers:

1. c) def
2. b) To provide input values to a function.
3. d) Using the keyword "return."
4. a) A function that does not have a name.
5. a) lambda
6. b) To perform a specified operation on each item of a sequence.
7. d) To create a new sequence based on a condition.
8. c) To combine a sequence of values into a single value.
---

# A quiz on Modules and Packages



1. Which statement is used to import a module in Python?
   <br>a) `require module_name`
   <br>b) `import module_name`
   <br>c) `include module_name`
   <br>d) `load module_name`

2. What is the purpose of importing modules in Python?
   <br>a) To make the code run faster
   <br>b) To add comments to the code
   <br>c) To extend the functionality of Python
   <br>d) To reduce the number of lines of code

3. Which module in Python is commonly used for reading and manipulating data?
   <br>a) `numpy`
   <br>b) `pandas`
   <br>c) `matplotlib`
   <br>d) `datetime`

4. How can you import a module with a different name in Python?
   <br>a) `import module_name as new_name`
   <br>b) `import new_name.module_name`
   <br>c) `module_name as new_name`
   <br>d) `new_name.module_name`

5. What does the `dir()` function do in Python?
   <br>a) Prints the current directory path
   <br>b) Lists all the files in the current directory
   <br>c) Lists all the available functions and attributes of a module or object
   <br>d) Displays the documentation of a module or object

6. Which module can be used for data visualization in Python?
   <br>a) `os`
   <br>b) `sys`
   <br>c) `numpy`
   <br>d) `matplotlib`

7. How can you install a third-party module in Python?
   <br>a) Use the `pip install module_name` command
   <br>b) Download the module and manually copy it to the Python installation folder
   <br>c) Use the `apt-get install module_name` command
   <br>d) Third-party modules cannot be installed in Python

8. What is the purpose of the `importlib` module in Python?
   <br>a) It allows you to import modules written in other programming languages
   <br>b) It provides utilities for working with dynamically imported modules
   <br>c) It is used for handling network requests in Python
   <br>d) It is a built-in module for handling dates and times

9. Which module can be used for performing mathematical operations in Python?
   <br>a) `math`
   <br>b) `csv`
   <br>c) `os`
   <br>d) `random`

10. Suppose you have a file named "data.csv" in the current directory. Which module can be used to read this file in Python?
    <br>a) `os`
    <br>b) `csv`
    <br>c) `pandas`
    <br>d) `sys`

---

**Answers:**

1. b) `import module_name`
2. c) To extend the functionality of Python
3. b) `pandas`
4. a) `import module_name as new_name`
5. c) Lists all the available functions and attributes of a module or object
6. d) `matplotlib`
7. a) Use the `pip install module_name` command
8. b) It provides utilities for working with dynamically imported modules
9. a) `math`
10. c) `pandas`

---