# Advanced Python Techniques Exploration

I picked a project to apply and practice my [advanced python concept skills](https://www.geeksforgeeks.org/top-10-advance-python-concepts-that-you-must-know/), namely:

- iterators
- lambda functions
- decorators
- collections
- list comprehensions
- generators, etc

![](advanced-python.png)

## <span style="color:red">This is a work-in-progress. It will be updated at regular intervals.</span>

---

## 1. Efficiently Loading Large Datasets in Python

Iterators and generator functions offer powerful techniques for handling large datasets in Python. This section demonstrates how to leverage these approaches to load data in chunks or line by line, improving memory efficiency and performance.


### a. Reading Data in Chunks with a Context Manager and Iterator

Here, I read in the first 10000 rows and computed the frequency of diamond shapes.

__Applied Concepts__: Context Managers, Iterators

In [14]:
# Open a connection to the file
with open("diamonds2.csv") as file:

    # Skip the column names
    file.readline()

    # Initialize an empty dictionary: counts_dict
    counts_dict = {}

    # Process only the first 10000 rows
    for j in range(10000):

        # Split the current line into a list: line
        line = file.readline().split(',')

        # Get the value for the `shape` column: shape (the shape of the diamond)
        shape = line[2]

        # If the column value is in the dict, increment its value
        if shape in counts_dict.keys():
            counts_dict[shape] += 1

        # Else, add to the dict and set value to 1
        else:
            counts_dict[shape] = 1

# Print the resulting dictionary
print(counts_dict)

{'Princess': 7401, 'Round': 2599}


### b. Doing the same as above, but this time, use a user-defined generator to read the data
__Applied Concepts__: Context Managers, Generators

In [15]:
# Define read_large_file()
def read_large_file(file_object):
    """A generator function to read a large file lazily."""

    # Loop indefinitely until the end of the file
    while True:

        # Read a line from the file: data
        data = file_object.readline()

        # Break if this is the end of the file
        if not data:
            break

        # Yield the line of data
        yield data
        


# Initialize an empty dictionary: counts_dict
counts_dict = {}

# Open a connection to the file
with open("diamonds2.csv") as file:

    # Iterate over the generator from read_large_file()
    for line in read_large_file(file):

        row = line.split(',')
        shape = row[2]

        if shape in counts_dict.keys():
            counts_dict[shape] += 1
        else:
            counts_dict[shape] = 1

# Print            
print(counts_dict)

{'shape': 1, 'Princess': 7695, 'Round': 152801, 'Pear': 7177, 'Marquise': 1768, 'Emerald': 13734, 'Oval': 12940, 'Heart': 1298, 'Cushion': 7340, 'Radiant': 2394, 'Asscher': 1233}


# 2. List Comprehension

 a. Given a paragraph as a string, write a function that return the number of character with odd frequencies. E.g The paragraph ``Hello world.`` has *8* characters with odd frequencies. i.e the entire frequency count is given as {'l': 3, 'o': 2, 'H': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1, '.': 1} and there are *8* characters with odd frequences. So the function should return *10*. 

__Applied Concepts__: List comprehension, Collections

In [1]:
from collections import Counter

def oddFrequencyCounter(theParagraph):

    # Convert sentence to a list
    sentence_list = list(theParagraph)

    # Obtain the count of each element in the list
    count = Counter(sentence_list)
    print(count)

    # Convert counter object `count` to a dictionary
    counter = dict(count)

    # Use list comprehension to count the number of characters with odd frequencies
    odd_freq = len([i for i in counter.values() if i % 2 != 0])

    # return number of characters with odd frequencies
    return odd_freq

oddFrequencyCounter("Hello world.")


Counter({'l': 3, 'o': 2, 'H': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1, '.': 1})


8

# 3. Generator Tasks

a. Write an infinite generator function `odd_squares_sum` that yields the square of odd numbers. e.g $1^2, 3^2, 5^2, ...$ up to a ``limit``

In [23]:
def odd_squares_sum(limit):
    current_value = 1
    current_sum = 0
    while current_value <= limit:
        yield current_value**2
        current_value += 2


In [25]:
for odd_num in odd_squares_sum(10):
    print(odd_num)

1
9
25
49
81


ai. Using the `odd_squares_sum` generator defined above, create a list of sum of squares up to a limit of $20$ and store the results in a numpy.array variable called `oddSumList` 

In [29]:
import numpy as np

oddSumList = []

# Iterate through the generator and append each sum to the list
for sum_of_squares in odd_squares_sum(20):
  oddSumList.append(sum_of_squares)

# Convert the list to a numpy array
oddSumList = np.array(oddSumList)

# Print the resulting array
print(oddSumList)

[  1   9  25  49  81 121 169 225 289 361]


aii. Compute the element-wise remainder of ``oddSumList`` when divided by $5$ and merge it with ``oddSumList``. The final output stored in the variable `mergedList` should be in the form of a list of tupples e.g ``[(1,1), (4,9), (0,25), ...]`` 

In [31]:
elt_wise_remainder = oddSumList % 5

# Zip the lists together and convert to a list: mergedList
mergedList = list(zip(elt_wise_remainder, oddSumList))
mergedList

[(1, 1),
 (4, 9),
 (0, 25),
 (4, 49),
 (1, 81),
 (1, 121),
 (4, 169),
 (0, 225),
 (4, 289),
 (1, 361)]

# 4. Lambda Functions

a. Write a function `get_3_nearest` that takes in a point of interest ``pt`` and a **list** of points ``ptlist``  and returns a list of 3 nearest points from the point of interest ``pt``. Assume the distance between any two point is defined by the `L1-norm`

__Applied Concepts__: list comprehension, lambda functions

In [32]:
def get_3_nearest(pt, ptlist):

    dict_dist = {}

    # Iterate through each point in the ptlist
    for point in ptlist:
        # Calculate the absolute distance between the current point and the reference point (pt)
        abs_distance = np.abs(np.array(pt) - np.array(point)) 
        
        # Calculate the total distance by summing up the absolute distances along all dimensions
        total_distance = np.sum(abs_distance)
        
        # Store the total distance in the dictionary, where the key is the current point
        # and the value is the total distance from the reference point (pt) to that point
        dict_dist[point] = total_distance

    
    # Sort distances in ascending order based on values
    sorted_distances = sorted(dict_dist.items(), key=lambda item: item[1])

    # Extract the nearest 3 keys (points)
    nearest_points = [key for key, _ in sorted_distances[:3]]

    return nearest_points

In [33]:
get_3_nearest((5, 8), [(5, 9), (9, 1), (2, 4), (13, 9), (10, 12)])

[(5, 9), (2, 4), (13, 9)]

In [34]:
get_3_nearest((12, 8), [(5, 9), (9, 1), (2, 4), (13, 9), (10, 12)])

[(13, 9), (10, 12), (5, 9)]

In [35]:
get_3_nearest((3, 8), [(9, 3), (8, 5), (7, 6)])

[(7, 6), (8, 5), (9, 3)]

# 5. Decorator

__a. Decorators Without Arguments__

This section explores decorators that can be applied to functions without requiring any arguments themselves. I'll create a decorator that doubles the value of arguments passed to the decorated function.

In [51]:
# A decorator that doubles the value of the arguments passed to a function
def double_args(func):

    def wrapper(a, b):

        return func(a*2, b*2)

    return wrapper

In [68]:
@double_args
def multipy(a,b):

    """
        Compute the product of two numbers.
    """
    return a*b

# Call the function
multipy(1,5)

20

__b. Decorators that take arguments__

This section explores decorators that can receive arguments during their definition. I'll build a decorator that measures the execution time of a decorated function and prints the result.

In [60]:
import time

# This function decorates another function to measure the time it takes to execute and prints the result.
def timer(*args, **kwargs):

    def decorator(func):

        def wrapper(*args, **kwargs):

            start = time.time()
            
            result = func(*args, **kwargs)
            
            total_time = round(time.time() - start,10)

            print(result)
            
            print(f"It took {func.__name__} {total_time} seconds to run")
            
        return wrapper

    return decorator


In [61]:
from functools import reduce

# A function that takes in a 2-d list, flattens it and sums the elements of the flattened list

@timer()
def sum_nested_list(superlist):

    flattend_list = [item for sublist in superlist for item in sublist]

    sum = reduce(lambda a, b: a+b, flattend_list) 

    return sum

sum_nested_list([[1,2],[3,4],[5,6,7,8]])

36
It took sum_nested_list 0.0 seconds to run


In [62]:
@timer(n)
def list_comprehension(n):
    
    """Calculates the square of each number from 1 to n using list comprehension."""
    result = [x**2 for x in range(1, n+1)]
    return None

In [63]:
import numpy as np

@timer(n)
def numpy_array(n):
    """Calculates the square of each number from 1 to n using NumPy arrays."""
    
    arr = np.arange(1, n+1)
    result = arr**2

In [64]:
# Set the number of elements for comparison
n = 500_000

# Print execution time for list comprehension
list_comprehension(n)

# Print execution time for NumPy array
numpy_array(n)

None
It took list_comprehension 0.3999843597 seconds to run
None
It took numpy_array 0.0029919147 seconds to run
