\[<< [Truth Value Testing](./03_truth_value_testing.ipynb) | [Index](./00_index.ipynb) | [Generators and Lazy Evaluation](./05_generators_and_lazy_evaluation.ipynb) >>\]

## Pythonic Loops and Comprehensions

### Pythonic Loop

#### Python for loop works as a for-each loop

In [1]:
fruits = ["apple", "banana", "cherry", "date", "elderberry"]

In [2]:
# Good
for index in range(len(fruits)):
    print(fruits[index])

apple
banana
cherry
date
elderberry


In [3]:
# More Pythonic
for fruit in fruits:
    print(fruit)

apple
banana
cherry
date
elderberry


#### In case you want index use `enumerate`

In [4]:
for index, fruit in enumerate(fruits, start=1):
    print(index, fruit)

1 apple
2 banana
3 cherry
4 date
5 elderberry


#### Use `zip()` when you have to loop through multiple list at once

In [5]:
fruits = ["apple", "banana", "cherry", "date", "elderberry"]
colors = ["red", "yellow", "red", "brown", "purple"]

In [6]:
# OK
n = min(len(fruits), len(colors))
for i in range(n):
    print(colors[i], fruits[i])

red apple
yellow banana
red cherry
brown date
purple elderberry


In [7]:
# More Pythonic
for color, fruit in zip(colors, fruits):
    print(color, fruit)

red apple
yellow banana
red cherry
brown date
purple elderberry


##### `zip` can also be used to transpose matrix

In [8]:
# Original Matrix
# [
#     (1, 2, 3),
#     (4, 5, 6),
# ]
# Matrix Transpose
# [
#     (1, 4),
#     (2, 5),
#     (3, 6),
# ]

matrix = [(1, 2, 3), (4, 5, 6)]
matrix_transpose = list(zip(*matrix))

import pprint

print(f"{matrix = }")
print(f"{matrix_transpose = }")

matrix = [(1, 2, 3), (4, 5, 6)]
matrix_transpose = [(1, 4), (2, 5), (3, 6)]


#### Use `reversed` to loop the items in reversed order

In [9]:
# Good
for index in range(len(fruits) - 1, -1, -1):
    print(fruits[index])

elderberry
date
cherry
banana
apple


In [10]:
# More Pythonic
for fruit in reversed(fruits):
    print(fruit)

elderberry
date
cherry
banana
apple


#### Use `sorted()` with the key to loop list with sorting applied as per your need

In [11]:
for fruit in sorted(fruits, key=len):
    print(fruit)

date
apple
banana
cherry
elderberry


#### Read until certain character

In [12]:
# Good
def read_until_character(filename, character):
    with open(filename, "r") as file:
        content = ""
        while True:
            char = file.read(1)
            if char == character:
                break
            content += char
    return content


content = read_until_character("static/serial-out.txt", "#")
print(content)

CR+PENDING();CR+PENDING();CR+PENDING();


In [13]:
# Pythonic
from functools import partial


def read_until_character(filename, character):
    with open(filename, "r") as file:
        content = ""
        for char in iter(partial(file.read, 1), character):
            content += char
    return content


content = read_until_character("static/serial-out.txt", "#")
print(content)

CR+PENDING();CR+PENDING();CR+PENDING();


#### Discard dummy variables

In [14]:
# OK
for i in range(4):
    print(f"Doing something")

Doing something
Doing something
Doing something
Doing something


In [15]:
# Pythonic
for _ in range(4):
    print(f"Doing something")

Doing something
Doing something
Doing something
Doing something


### Comprehension

For example we will use the **/etc/passwd** file in linux which contains details for all the users.

In [16]:
!head static/linux-etc-passwd

# demo content from /etc/passwd file
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin


#### Get all the users

In [17]:
# OK
users = []

with open("static/linux-etc-passwd") as f:
    for line in f:
        if ":" in line:
            user = line.split(":")[0]
            users.append(user)

print(users)

['root', 'daemon', 'bin', 'sys', 'sync', 'games', 'man', 'lp', 'mail', 'news', 'uucp', 'proxy', 'www-data', 'backup', 'list', 'irc', 'gnats', 'nobody', 'systemd-network', 'systemd-resolve', 'messagebus', 'systemd-timesync', 'syslog', '_apt', 'uuidd', 'tcpdump', 'debakarr']


In [18]:
# More Pythonic
users = [
    line.split(":")[0]
    for line in open("static/linux-etc-passwd")
    if ":" in line
]
print(users)

['root', 'daemon', 'bin', 'sys', 'sync', 'games', 'man', 'lp', 'mail', 'news', 'uucp', 'proxy', 'www-data', 'backup', 'list', 'irc', 'gnats', 'nobody', 'systemd-network', 'systemd-resolve', 'messagebus', 'systemd-timesync', 'syslog', '_apt', 'uuidd', 'tcpdump', 'debakarr']


#### Get unique shell

In [19]:
# OK
shells = set()

with open("static/linux-etc-passwd") as f:
    for line in f:
        if ":" in line:
            shell = line.split(":")[-1].strip()
            shells.add(shell)

print(shells)

{'/bin/bash', '/usr/sbin/nologin', '/bin/sync'}


In [20]:
# More Pythonic
shells = {
    line.split(":")[-1].strip()
    for line in open("static/linux-etc-passwd")
    if ":" in line
}
print(shells)

{'/bin/bash', '/usr/sbin/nologin', '/bin/sync'}


#### Get user and shell mapping

In [21]:
# OK
user_shell = {}

with open("static/linux-etc-passwd") as f:
    for line in f:
        if ":" in line:
            user, shell = line.split(":")[0], line.split(":")[-1].strip()
            user_shell[user] = shell

In [22]:
print(user_shell)

{'root': '/bin/bash', 'daemon': '/usr/sbin/nologin', 'bin': '/usr/sbin/nologin', 'sys': '/usr/sbin/nologin', 'sync': '/bin/sync', 'games': '/usr/sbin/nologin', 'man': '/usr/sbin/nologin', 'lp': '/usr/sbin/nologin', 'mail': '/usr/sbin/nologin', 'news': '/usr/sbin/nologin', 'uucp': '/usr/sbin/nologin', 'proxy': '/usr/sbin/nologin', 'www-data': '/usr/sbin/nologin', 'backup': '/usr/sbin/nologin', 'list': '/usr/sbin/nologin', 'irc': '/usr/sbin/nologin', 'gnats': '/usr/sbin/nologin', 'nobody': '/usr/sbin/nologin', 'systemd-network': '/usr/sbin/nologin', 'systemd-resolve': '/usr/sbin/nologin', 'messagebus': '/usr/sbin/nologin', 'systemd-timesync': '/usr/sbin/nologin', 'syslog': '/usr/sbin/nologin', '_apt': '/usr/sbin/nologin', 'uuidd': '/usr/sbin/nologin', 'tcpdump': '/usr/sbin/nologin', 'debakarr': '/bin/bash'}


In [23]:
# Pythonic
user_shell = {
    line.split(":")[0]: line.split(":")[-1].strip()
    for line in open("static/linux-etc-passwd")
    if ":" in line
}

In [24]:
print(user_shell)

{'root': '/bin/bash', 'daemon': '/usr/sbin/nologin', 'bin': '/usr/sbin/nologin', 'sys': '/usr/sbin/nologin', 'sync': '/bin/sync', 'games': '/usr/sbin/nologin', 'man': '/usr/sbin/nologin', 'lp': '/usr/sbin/nologin', 'mail': '/usr/sbin/nologin', 'news': '/usr/sbin/nologin', 'uucp': '/usr/sbin/nologin', 'proxy': '/usr/sbin/nologin', 'www-data': '/usr/sbin/nologin', 'backup': '/usr/sbin/nologin', 'list': '/usr/sbin/nologin', 'irc': '/usr/sbin/nologin', 'gnats': '/usr/sbin/nologin', 'nobody': '/usr/sbin/nologin', 'systemd-network': '/usr/sbin/nologin', 'systemd-resolve': '/usr/sbin/nologin', 'messagebus': '/usr/sbin/nologin', 'systemd-timesync': '/usr/sbin/nologin', 'syslog': '/usr/sbin/nologin', '_apt': '/usr/sbin/nologin', 'uuidd': '/usr/sbin/nologin', 'tcpdump': '/usr/sbin/nologin', 'debakarr': '/bin/bash'}


In [25]:
# A bit more readable
user_shell = {
    fields[0]: fields[-1].strip()
    for line in open("static/linux-etc-passwd")
    if ":" in line and (fields := line.split(":"))
}
print(user_shell)

{'root': '/bin/bash', 'daemon': '/usr/sbin/nologin', 'bin': '/usr/sbin/nologin', 'sys': '/usr/sbin/nologin', 'sync': '/bin/sync', 'games': '/usr/sbin/nologin', 'man': '/usr/sbin/nologin', 'lp': '/usr/sbin/nologin', 'mail': '/usr/sbin/nologin', 'news': '/usr/sbin/nologin', 'uucp': '/usr/sbin/nologin', 'proxy': '/usr/sbin/nologin', 'www-data': '/usr/sbin/nologin', 'backup': '/usr/sbin/nologin', 'list': '/usr/sbin/nologin', 'irc': '/usr/sbin/nologin', 'gnats': '/usr/sbin/nologin', 'nobody': '/usr/sbin/nologin', 'systemd-network': '/usr/sbin/nologin', 'systemd-resolve': '/usr/sbin/nologin', 'messagebus': '/usr/sbin/nologin', 'systemd-timesync': '/usr/sbin/nologin', 'syslog': '/usr/sbin/nologin', '_apt': '/usr/sbin/nologin', 'uuidd': '/usr/sbin/nologin', 'tcpdump': '/usr/sbin/nologin', 'debakarr': '/bin/bash'}


For next few example we will be using data from **movie dataset**

In [26]:
!head static/titles.csv

id,title,type,description,release_year,age_certification,runtime,genres,production_countries,seasons,imdb_id,imdb_score,imdb_votes,tmdb_popularity,tmdb_score
ts300399,Five Came Back: The Reference Films,SHOW,"This collection includes 12 World War II-era propaganda films â€” many of which are graphic and offensive â€” discussed in the docuseries ""Five Came Back.""",1945,TV-MA,51,['documentation'],['US'],1.0,,,,0.601,
tm82169,Rocky,MOVIE,"When world heavyweight boxing champion, Apollo Creed wants to give an unknown fighter a shot at the title as a publicity stunt, his handlers choose palooka Rocky Balboa, an uneducated collector for a Philadelphia loan shark. Rocky teams up with trainer  Mickey Goldmill to make the most of this once in a lifetime break.",1976,PG,119,"['drama', 'sport']",['US'],,tt0075148,8.1,588100.0,106.361,7.782
tm17823,Grease,MOVIE,"Australian good girl Sandy and greaser Danny fell in love over the summer. But when they unexpectedly discover they're now in the same h

##### Find top 5 most common genre for all the movies released in 2022

In [27]:
# Not pythonic. This code is generate using GPT-4-32K

# Read the CSV file and store the data in a list of dictionaries
with open("static/titles.csv", "r", encoding="utf-8") as file:
    lines = file.readlines()
    header = lines[0].strip().split(",")
    data = [dict(zip(header, line.strip().split(","))) for line in lines[1:]]

# Filter the data for movies released in 2022
movies_2022 = [movie for movie in data if movie.get("release_year") == "2022"]

# Count the occurrences of each genre
genre_counts = {}
for movie in movies_2022:
    genres = movie.get("genres", "").strip("[]").replace("'", "").split(", ")
    for genre in genres:
        genre = genre.strip()
        genre_counts[genre] = genre_counts.get(genre, 0) + 1

# Get the top 5 genres
top_5_genres = sorted(genre_counts.items(), key=lambda x: x[1], reverse=True)[:5]

print(top_5_genres)

[('"[drama', 47), ('"[comedy', 34), ('documentation', 16), ('reality', 14), ('comedy', 11)]


**Track to more Pythonic code**

- Use `csv.DictReader` to read the csv file.

In [28]:
import csv


with open("static/titles.csv", newline="", encoding="utf-8") as csvfile:
    reader = csv.DictReader(csvfile)
    for index, row in enumerate(reader):
        if index == 5:
            break
        print(row)

{'id': 'ts300399', 'title': 'Five Came Back: The Reference Films', 'type': 'SHOW', 'description': 'This collection includes 12 World War II-era propaganda films — many of which are graphic and offensive — discussed in the docuseries "Five Came Back."', 'release_year': '1945', 'age_certification': 'TV-MA', 'runtime': '51', 'genres': "['documentation']", 'production_countries': "['US']", 'seasons': '1.0', 'imdb_id': '', 'imdb_score': '', 'imdb_votes': '', 'tmdb_popularity': '0.601', 'tmdb_score': ''}
{'id': 'tm82169', 'title': 'Rocky', 'type': 'MOVIE', 'description': 'When world heavyweight boxing champion, Apollo Creed wants to give an unknown fighter a shot at the title as a publicity stunt, his handlers choose palooka Rocky Balboa, an uneducated collector for a Philadelphia loan shark. Rocky teams up with trainer  Mickey Goldmill to make the most of this once in a lifetime break.', 'release_year': '1976', 'age_certification': 'PG', 'runtime': '119', 'genres': "['drama', 'sport']", 'pr

- Use `ast.literal_eval` to evaluate the list from string.

In [29]:
genre = "['drama', 'sport']"
type(genre)

str

In [30]:
import ast


genre = ast.literal_eval("['drama', 'sport']")
print(genre)
type(genre)

['drama', 'sport']


list

- Use `Counter` with list comprehension to find the most common genre

In [31]:
from collections import Counter


with open("static/titles.csv", newline="", encoding="utf-8") as csvfile:
    reader = csv.DictReader(csvfile)
    c = Counter(
        [
            one_genre
            for row in reader
            for one_genre in ast.literal_eval(row["genres"])
            if row["release_year"] == "2022" and row["type"] == "MOVIE"
        ]
    )
c.most_common(5)

[('drama', 256),
 ('comedy', 250),
 ('thriller', 127),
 ('documentation', 114),
 ('romance', 107)]

[![](https://img.youtube.com/vi/qMv1ZD2V1A4/0.jpg)](https://youtu.be/qMv1ZD2V1A4)

\[<< [Truth Value Testing](./03_truth_value_testing.ipynb) | [Index](./00_index.ipynb) | [Generators and Lazy Evaluation](./05_generators_and_lazy_evaluation.ipynb) >>\]3