# Looping Through Different Data Types in Python

**Introduction**

In this notebook, we'll explore how to treat various data types when looping through them in Python. Understanding how each data type behaves in a loop can help us write efficient and effective code, especially when dealing with large datasets in data science.


While loops on sets, tuples, and strings are less frequent in data science, they’re still valuable in specific scenarios. For example, strings are crucial in NLP (Natural Language Processing), sets are useful for ensuring uniqueness, and tuples are beneficial when we need fixed data structures. However, due to their specific limitations, they’re not as versatile as lists, dictionaries, and dataframes, which are more suitable for the complex data manipulation commonly required in data science.

**Uses of Loops**

- Iterating over data structures: Access each element in lists, dictionaries, tuples, sets, or dataframes.

- Automating repetitive tasks: Perform the same operation multiple times (e.g., cleaning rows in a dataset).

- Generating sequences: Create ranges or series of values.

- Aggregating data: Summarize or compute metrics like sums, averages, or counts.

- Condition-based processing: Apply specific operations based on conditions for each element.

- Building new data structures: Generate new lists, dictionaries, or other structures based on existing data.

- Parallel processing: Split tasks across multiple elements (e.g., batches in machine learning).

- Data visualization: Generate multiple plots in a loop.

- Batch processing: Handle data in chunks (useful for large datasets).

## 1. Lists

Lists are ordered collections of items. They can contain mixed data types and are mutable, meaning we can change elements within the list.

**Explanation**

- Lists are typically used when we need an ordered collection of items, especially when we need to process each item.
- Example use case: Iterating over a list of numerical values to calculate the mean or sum.

In [None]:
# Define a list of numbers
numbers = [10, 20, 30, 40, 50]

# Loop through each item in the list
total = 0
for num in numbers:
    total += num
    print(f"Current total: {total}")  # Updates the running total with each number

# After the loop, we can calculate the average
average = total / len(numbers)
print("Average:", average)

## 2. Strings
   
Strings in Python are sequences of characters, making them iterable. When looping through a string, each iteration gives one character.

**Explanation**

- Strings are useful for processing each character individually.
- Example use case: Validating or transforming individual characters in text processing.

In [1]:
# Define a string
text = "DataScience"

# Loop through each character in the string
for char in text:
    print(char)  # Prints each character in the string


D
a
t
a
S
c
i
e
n
c
e


## 3. Dictionaries

Dictionaries store key-value pairs and are often used for structured data. When looping through a dictionary, you can choose to iterate over keys, values, or both.

**Explanation**

- Looping through dictionaries can be useful when working with structured data, like JSON files or database records.
- Example use case: Extracting and processing specific fields from a dataset.

In [3]:
# Define a dictionary with some key-value pairs
person = {
    "name": "Alice",
    "age": 30,
    "profession": "Data Scientist"
}

# Loop through the dictionary keys and values
for key, value in person.items():
    print(f"{key}: {value}")  # Prints each key-value pair in the dictionary


name: Alice
age: 30
profession: Data Scientist


## 4. Sets 

Sets are unordered collections of unique items. They’re helpful when you need to avoid duplicates and perform mathematical operations like union, intersection, etc.

**Explanation**

- Sets are useful for ensuring each element is unique, such as filtering unique values in a list.
- Example use case: Removing duplicate entries in a list of values.

In [4]:
# Define a set of numbers with duplicates
unique_numbers = {1, 2, 3, 4, 2, 3}

# Loop through the set
for number in unique_numbers:
    print(number)  # Only unique numbers will be printed


1
2
3
4


## 5. Tuples

Tuples are similar to lists but are immutable, meaning their values cannot be changed once defined. Tuples are often used to store related pieces of information together.

**Explanation**

- Tuples are useful for representing structured data that doesn’t need modification.
- Example use case: Iterating over a list of (latitude, longitude) coordinates.

In [5]:
# Define a tuple of coordinates
coordinates = (40.7128, -74.0060)

# Loop through each value in the tuple
for coord in coordinates:
    print(coord)  # Prints each coordinate (latitude and longitude)


40.7128
-74.006


## Additional Looping Exercises for lists, dictionaries and pandas DataFrames

In this section, we will look at how loops can be applied to lists, dictionaries, and dataframes. These examples demonstrate that loops are helpful for:

- Aggregating and calculating statistics
- Cleaning data and handling invalid values
- Applying conditional logic on structured data in dataframes

### Additional Examples with Lists

**Example 1: Calculating Statistics on a List of Values**

Lists are often used to store columns of data. For instance, if we have a list of daily temperatures, we can calculate statistics such as the average temperature, maximum, and minimum.

In [6]:
# List of daily temperatures in Celsius
temperatures = [22.1, 23.5, 19.8, 21.0, 25.3, 24.2, 20.5]

# Calculate the average, max, and min temperature
total = 0
for temp in temperatures:
    total += temp

average_temp = total / len(temperatures)
max_temp = max(temperatures)
min_temp = min(temperatures)

print(f"Average Temperature: {average_temp:.2f}°C")
print(f"Maximum Temperature: {max_temp}°C")
print(f"Minimum Temperature: {min_temp}°C")


Average Temperature: 22.34°C
Maximum Temperature: 25.3°C
Minimum Temperature: 19.8°C


**Example 2: Cleaning Data in a List**

Sometimes data can contain invalid or missing values. Using a list comprehension, we can loop through a list and replace invalid values with a placeholder or remove them.

In [7]:
# List of values with some None entries
data = [4, 8, None, 15, None, 23, 42]

# Remove None values with a list comprehension
cleaned_data = [x for x in data if x is not None]

print("Cleaned Data:", cleaned_data)


Cleaned Data: [4, 8, 15, 23, 42]


### Additional Examples with Dictionaries

**Example 1: Aggregating Data by Categories**

Dictionaries are useful for grouping and aggregating data. For example, we might have a dictionary where keys are categories (like product types) and values are lists of sales numbers.

In [8]:
# Dictionary of sales data by product category
sales_data = {
    "electronics": [200, 150, 300],
    "furniture": [400, 100, 250],
    "clothing": [50, 70, 90]
}

# Calculate total sales per category
for category, sales in sales_data.items():
    total_sales = sum(sales)
    print(f"Total sales for {category}: {total_sales}")


Total sales for electronics: 650
Total sales for furniture: 750
Total sales for clothing: 210


**Example 2: Nested Dictionaries for Multi-Level Data**

Dictionaries can also be nested to handle more complex data. Suppose we have data on students with each student’s scores in different subjects.

In [9]:
# Nested dictionary of students and their subject scores
student_scores = {
    "Alice": {"Math": 88, "Science": 92, "English": 85},
    "Bob": {"Math": 79, "Science": 84, "English": 80},
    "Charlie": {"Math": 95, "Science": 89, "English": 91}
}

# Loop through students and calculate their average score
for student, scores in student_scores.items():
    average_score = sum(scores.values()) / len(scores)
    print(f"{student}'s average score: {average_score:.2f}")


Alice's average score: 88.33
Bob's average score: 81.00
Charlie's average score: 91.67


### Looping Through DataFrames in Data Science

In data science, pandas dataframes are commonly used for data manipulation. Using loops with dataframes is typically a fallback. This is because looping through each row in a dataframe can be inefficient for large datasets. Pandas’ vectorized operations or methods like apply() are generally faster alternatives. Loops in dataframes are often reserved for cases with complex logic that can’t be easily vectorized, such as row-wise calculations that depend on multiple conditions.


Here are a few examples of how to use loops with dataframes, although vectorized operations (using pandas functions) are usually preferred for efficiency.


**Example 1: Iterating Over Rows with iterrows()**

If you need to access each row, you can use iterrows(), which provides the index and row data as a series.

In [10]:
import pandas as pd

# Sample data
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [24, 27, 22],
    'Score': [88, 92, 95]
}
df = pd.DataFrame(data)

# Iterate over rows to print each student's name and score
for index, row in df.iterrows():
    print(f"Student {row['Name']} has a score of {row['Score']}")


Student Alice has a score of 88
Student Bob has a score of 92
Student Charlie has a score of 95


**Example 2: Applying Conditional Logic on DataFrames**

Sometimes, you may want to modify the dataframe based on conditions. Here’s how you could loop through rows to apply a conditional operation.

In [11]:
# Example: Update 'Score' if 'Age' is greater than 25
for index, row in df.iterrows():
    if row['Age'] > 25:
        df.at[index, 'Score'] += 5  # Increment score by 5 for students older than 25

df


Unnamed: 0,Name,Age,Score
0,Alice,24,88
1,Bob,27,97
2,Charlie,22,95


**Example 3: Using apply() for Column-Wise Operations**

The apply() function is typically more efficient than looping and is recommended for column-wise operations.

In [12]:
# Define a function to grade based on score
def grade(score):
    if score >= 90:
        return 'A'
    elif score >= 80:
        return 'B'
    else:
        return 'C'

# Apply function to the 'Score' column
df['Grade'] = df['Score'].apply(grade)

df


Unnamed: 0,Name,Age,Score,Grade
0,Alice,24,88,B
1,Bob,27,97,A
2,Charlie,22,95,A


## How about sets, tuples and strings?

Looping through sets, tuples, and strings is generally less common in data science compared to lists, dictionaries, and dataframes, mainly because of their unique characteristics and limited use cases. However, here’s how and why these types might still be used:

**1. Sets:**

- Unique Values: Sets are primarily used when we need a collection of unique items. They’re unordered and don’t allow duplicates, which makes them useful for tasks like filtering out duplicate values in a dataset.

- Common Use Cases in Data Science:
    - Checking for unique values (e.g., unique categories in a dataset).
    - Performing set operations like union, intersection, and difference to compare groups of data.
      



- Looping Example:
    - While we don’t loop through sets as often, it can be useful if we need to examine each unique item. However, due to the unordered nature, they’re not used in cases where order matters.

In [13]:
unique_labels = {"spam", "ham", "neutral"}
for label in unique_labels:
    print(label)


neutral
ham
spam


**2.Tuples:**

- Fixed Structure: Tuples are immutable, so they’re ideal for storing fixed sets of data where values should not change, like coordinates or certain types of records.


- Common Use Cases in Data Science:


    - Representing data that doesn’t need modification (e.g., coordinates, configurations, multi-key groupings).

    - When used as keys in dictionaries for fast lookups.

- Looping Example:
Looping is typically straightforward, especially when processing structured, unchangeable records.

In [14]:
coordinates = (35.6895, 139.6917)  # Latitude and longitude
for coord in coordinates:
    print(coord)


35.6895
139.6917


**3. Strings:**

- Character Processing: Strings are often treated as atomic values, but looping can be useful for individual character processing, such as when parsing text.

- Common Use Cases in Data Science:
    
    - Text cleaning, character counting, or creating n-grams in natural language processing (NLP).

- Looping Example:

    - Looping through characters in text for character-level processing or encoding.

In [15]:
text = "Data"
for char in text:
    print(char)


D
a
t
a


In [16]:
import sys
print(sys.version)


3.12.3 | packaged by Anaconda, Inc. | (main, May  6 2024, 19:42:21) [MSC v.1916 64 bit (AMD64)]


## Bonus: Nested Loops

**Question** - When designing a nested loop structure for a particular problem, how do we determine the structure of the loop?

**Answer** - The starting point depends on the hierarchy of operations and how the data is organized. To determine where to start, consider the following steps:

  1. Understand the Data Structure

    Analyze the data structure you're working with. Is it:

        A single list?
        
        A list of lists (e.g., a matrix)?
        
        A dictionary of lists?
        
        A dataframe with rows and columns?
        
Knowing this helps define how many levels of loops are needed and what each loop should handle.

2. Define the Goal
   
    Clearly outline what you are trying to achieve:
    
        Are you iterating over all rows?
   
        Accessing nested elements?
   
        Performing operations across combinations of elements?
   
This will clarify the order in which you need to process data.

3. Start with the Outermost Layer
   
    Identify the highest-level structure that defines the primary iteration:
    
        For a matrix, it’s typically the rows.
   
        For a dictionary of lists, it might be the keys.
   
        The outer loop should iterate over this structure.

In [17]:
matrix = [[1, 2], [3, 4], [5, 6]] #-----> List of lists

for row in matrix:  # Outer loop for rows
    for element in row:  # Inner loop for elements in the row
        print(element)


1
2
3
4
5
6


4. Add Nested Loops for Inner Structures
   
    - Once the outer loop is defined, drill down into the inner structures step by step.
      
    - Each nested loop should handle the next level of granularity (e.g., columns, elements, or subkeys).

In [18]:
data = {'A': [1, 2], 'B': [3, 4], 'C': [5, 6]} #----> Dictionary of lists

for key, values in data.items():  # Outer loop for keys
    for value in values:  # Inner loop for list values
        print(f"{key}: {value}")


A: 1
A: 2
B: 3
B: 4
C: 5
C: 6


5. Think in Terms of Dependencies
       
    Nested loops often reflect a hierarchy of dependencies:
    
        - Outer loop: Handles broad tasks (e.g., iterating over rows).
   
        - Inner loop: Handles finer-grained tasks (e.g., iterating over columns in a row).

In [19]:
list1 = [1, 2]
list2 = [3, 4]
for item1 in list1:  # Outer loop for list1
    for item2 in list2:  # Inner loop for list2
        print(f"Combination: {item1}, {item2}")


Combination: 1, 3
Combination: 1, 4
Combination: 2, 3
Combination: 2, 4


6. Focus on the Order of Operations

    For more complex scenarios (e.g., matrix multiplication or cartesian products):
    
        - Think about the logical order:
   
        - What needs to happen first?
   
        - What depends on the outer loop vs. the inner loop?
   
        - Ensure the logic aligns with this dependency.

In [21]:
A = [[1, 2, 3],
    [4, 5, 6]]

B = [[7, 8],
     [9, 10],
    [11, 12]]

In [23]:
result = [[0 for _ in range(len(B[0]))] for _ in range(len(A))]


In [28]:
for i in range(len(A)):  # Rows of A
    for j in range(len(B[0])):  # Columns of B
        for k in range(len(A[0])):  # Common dimension (columns of A / rows of B)
            result[i][j] += A[i][k] * B[k][j]
            
result

[[116, 128], [278, 308]]

7. Test Small Pieces

    Start with just the outer loop and ensure it behaves as expected.
   
    Gradually add inner loops, checking their output at each step.

In [29]:
# Iterate over rows
for row in matrix:
    print(row)


[1, 2]
[3, 4]
[5, 6]


In [30]:
# Iterate over elements in rows
for row in matrix:
    for element in row:
        print(element)


1
2
3
4
5
6


8. General Guideline for Nested Loops
   
    Outer Loop: Broadest layer or highest-level task.
   
    Inner Loop(s): Narrower layers or more detailed tasks.
   
    Base each loop on the level of data structure you’re working with or the hierarchy of the problem.

Checklist for Building Nested Loops

✅ Understand your data structure and goal.

✅ Identify the outermost structure to iterate over.

✅ Add inner loops for finer levels of granularity.

✅ Ensure the loops align with logical dependencies.

✅ Test incrementally, starting with the outer loop.