# For and While Loops in Python for Data Science

Loops **for** and **while** are fundamental tools in Python for automating repetitive tasks, such as processing large datasets, cleaning data, or performing iterative calculations. In data science, they are widely used to iterate over DataFrame rows, filter data, or run simulations. This notebook explores the use of loops with practical examples, best practice tips, and real-world applications.

## Objectives
- Understand how to use **for** loops to iterate over lists, dictionaries, and DataFrames.
- Apply **while** loops for processing based on dynamic conditions.
- Combine loops with common data structures in data science.
- Learn best practices to avoid inefficient loops and common errors.

## Why Are Loops Important?
In data science, you often need to process thousands or millions of records. Loops allow you to automate tasks such as:
- Categorizing data (e.g., classifying customers by age group).
- Filtering information (e.g., finding outliers in a column).
- Transforming data (e.g., applying a formula to each row).

**Note**: While loops are powerful, in data science, vectorized operations (using libraries like NumPy or Pandas) are generally more efficient for large datasets. This notebook focuses on loops for scenarios where they are appropriate or necessary.

## 1. For Loop
The **for** loop is ideal for iterating over sequences, such as lists, tuples, dictionaries, or DataFrame rows. It executes a block of code for each element in the sequence, making it perfect for tasks like categorization or data transformation.

### Characteristics
- **Simple syntax**: `for item in sequence:`.
- **Iterable**: Can traverse any iterable object (lists, strings, range, etc.).
- **Common use**: Processing elements in an ordered manner or applying rules to each item.

Let’s see a practical example of age categorization, a common case in demographic data analysis.

In [None]:
# Example: Categorizing ages
ages = [22, 35, 28, 19, 40]
categories = []

for age in ages:
    if age < 18:
        categories.append("Minor")
    elif age <= 35:
        categories.append("Young Adult")
    else:
        categories.append("Adult")

print("Ages:", ages)
print("Categories:", categories)

### Example Explanation
- **List of ages**: Represents a data column, such as in a customer dataset.
- **For loop**: Iterates over each age, applying a categorization rule.
- **Conditions**: Uses `if/elif/else` to classify each age into a category.
- **List of categories**: Stores the results, which could be used for further analysis (e.g., visualizations or reports).

### Best Practice Tips
- **Avoid unnecessary nested loops**: Multiple nested loops can slow down your code. Consider alternative structures, such as dictionaries or vectorized operations.
- **Use list comprehensions when possible**: For simple tasks like the one above, you can rewrite the loop as:
  ```python
  categories = ['Minor' if age < 18 else 'Young Adult' if age <= 35 else 'Adult' for age in ages]
  ```
  This is more concise and, in some cases, faster.

### Additional Example: Iterating with `range`
Sometimes, you need to iterate over indices. The `range()` method is useful for this.

In [None]:
# Example: Calculating the cumulative sum of ages
ages = [22, 35, 28, 19, 40]
cumulative_sum = []
total = 0

for i in range(len(ages)):
    total += ages[i]
    cumulative_sum.append(total)

print("Ages:", ages)
print("Cumulative sum:", cumulative_sum)

## 2. While Loop
The **while** loop executes a block of code as long as a specific condition remains true. It is useful when the number of iterations is not known in advance, such as in searches or simulations.

### Characteristics
- **Condition-controlled**: Runs until the condition becomes false.
- **Beware of infinite loops**: Always ensure the condition will eventually become false.
- **Common use**: Processing data until a limit is reached or a specific value is found.

Let’s see an example of searching a list, simulating the search for an outlier in a dataset.

In [None]:
# Example: Finding the first passenger with age > 50
ages = [22, 35, 55, 28, 19]
i = 0

while i < len(ages) and ages[i] <= 50:
    i += 1

if i < len(ages):
    print(f"First age > 50 found: {ages[i]} at position {i}")
else:
    print("No age > 50 found")

### Example Explanation
- **List of ages**: Simulates a dataset with passenger information.
- **While loop**: Continues until an age greater than 50 is found or the end of the list is reached.
- **Index `i`**: Tracks the current position in the list, allowing access to the corresponding element.
- **Output**: Indicates whether the condition was met and, if so, which value and position were found.

### Best Practice Tips
- **Avoid infinite loops**: Always include a clear stopping condition (e.g., `i < len(ages)`).
- **Use `break` for greater control**: If you need to exit the loop under a specific condition, use `break`.
- **Consider alternatives**: For simple searches, methods like `index()` or comprehensions may be more straightforward.

### Additional Example: Simulation with While
A common case in data science is simulating processes until a threshold is exceeded, such as accumulating values until surpassing a limit.

In [None]:
# Example: Accumulating ages until exceeding 100
ages = [22, 35, 28, 19, 40]
total = 0
i = 0

while i < len(ages) and total <= 100:
    total += ages[i]
    i += 1

print(f"Accumulated sum: {total} after {i} iterations")

## 3. Applications in Data Science
Loops are often combined with libraries like Pandas and NumPy. Here’s an example of using a **for** loop with a DataFrame (we’ll briefly introduce Pandas, but the focus is on the loop).

### Example: Categorizing Ages in a DataFrame

In [None]:
import pandas as pd

# Creating a simple DataFrame
data = pd.DataFrame({'Age': [22, 35, 28, 19, 40]})

# Adding a category column
data['Category'] = ['Minor' if age < 18 else 'Young Adult' if age <= 35 else 'Adult' for age in data['Age']]

print(data)

### Explanation
- **Pandas DataFrame**: A common tabular structure in data science.
- **List comprehension**: Replaces an explicit loop for greater efficiency, but the logic remains the same.
- **Application**: The new column can be used for analysis or visualizations.

## Conclusion
Loops **for** and **while** are powerful tools for automation in data science. While vectorized operations are preferred for large datasets, loops are indispensable in specific scenarios, such as conditional processing or simulations. Practice with real datasets and explore libraries like Pandas to maximize the potential of loops!