<a href="https://colab.research.google.com/github/ShaunakSen/problem-solving-with-code/blob/master/DSA_in_Python_Fundamentals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Structures and Algorithms - Improving concepts

> Notes, codes, solutions from multiple resources to improve fundamentals on DSA

- Data Structures and Algorithms by Michael T Goodrich: https://www.amazon.in/Structures-Algorithms-Python-Michael-Goodrich/dp/1118290275

- https://realpython.com/introduction-to-python-generators/

---

## Some advanced python concepts

### Iterators and Generators

an instance of a list is an iterable, but not itself an iterator.
With data = [1, 2, 4, 8], it is not legal to call next(data). However, an iterator object can be produced with syntax, i = iter(data), and then each subsequent call to next(i) will return an element of that list. The for-loop syntax in Python simply automates this process, creating an iterator for the give iterable, and then repeatedly calling for the next element until catching the StopIteration exception

More generally, it is possible to create multiple iterators based upon the same
iterable object, with each iterator maintaining its own state of progress. However,
iterators typically maintain their state with indirect reference back to the original
collection of elements. For example, calling iter(data) on a list instance produces
an instance of the list iterator class. That iterator does not store its own copy of the
list of elements. Instead, it maintains a current index into the original list, representing the next element to be reported. Therefore, if the contents of the original list
are modified after the iterator is constructed, but before the iteration is complete,
the iterator will be reporting the updated contents of the list.
Python also supports functions and classes that produce an implicit iterable series of values, that is, without constructing a data structure to store all of its values
at once. For example, the call range(1000000) does not return a list of numbers; it
returns a range object that is iterable. This object generates the million values one
at a time, and only as needed. Such a lazy evaluation technique has great advantage. In the case of range, it allows a loop of the form, for j in range(1000000):,
to execute without setting aside memory for storing one million values. Also, if
such a loop were to be interrupted in some fashion, no time will have been spent
computing unused values of the range


A generator is implemented with a syntax that
is very similar to a function, but instead of returning values, a yield statement is
executed to indicate each element of the series. As an example, consider the goal
of determining all factors of a positive integer. For example, the number 100 has
factors 1, 2, 4, 5, 10, 20, 25, 50, 100. A traditional function might produce and
return a list containing all factors, implemented as:

In [None]:
def factors(n):
    results = []
    for k in range(1, n+1):
        if n%k == 0:
            results.append(k)

    return results

In [None]:
def factors(n):
    results = []
    for k in range(1, n+1):
        if n%k == 0:
            yield k

In [None]:
next(factors(200))

1

Notice use of the keyword yield rather than return to indicate a result. This indicates to Python that we are defining a generator, rather than a traditional function

If a programmer writes a loop such as for factor in factors(100):, an instance of our generator is created. For each iteration of the loop, Python executes our procedure  If a programmer writes a loop such as for factor in factors(100):, an instance of our generator is created. For each iteration of the loop, Python executes our procedure

In [None]:
def factors(n):
    k=1
    while k*k < n: ## while k < sqrt(n)
        if n%k == 0:
            yield k ## k is a factor
            yield n//k ## so is n/k
        k+=1
    if k*k == n: ##  special case if n is perfect square
        yield k

We should note that this generator differs from our first version in that the factors are not generated in strictly increasing order. For example, factors(100) generates the series 1,100,2,50,4,25,5,20,10

### How to Use Generators and yield in Python

> By Kyle Stratis: https://realpython.com/introduction-to-python-generators/


Generator functions are a special kind of function that return a lazy iterator. These are objects that you can loop over like a list. However, unlike lists, lazy iterators do not store their contents in memory.


#### Example 1: Reading Large Files


what if you want to count the number of rows in a CSV file? The code block below shows one way of counting those rows:

```python

csv_gen = csv_reader("some_csv.txt")
row_count = 0

for row in csv_gen:
    row_count += 1

print(f"Row count is {row_count}")
```

Looking at this example, you might expect csv_gen to be a list. To populate this list, csv_reader() opens a file and loads its contents into csv_gen. Then, the program iterates over the list and increments row_count for each row.

This is a reasonable explanation, but would this design still work if the file is very large? What if the file is larger than the memory you have available? To answer this question, let’s assume that csv_reader() just opens the file and reads it into an array:

```python
def csv_reader(file_name):
    file = open(file_name)
    result = file.read().split("\n")
    return result
```

This function opens a given file and uses file.read() along with .split() to add each line as a separate element to a list. If you were to use this version of csv_reader() in the row counting code block you saw further up, then you’d get the following output:

```
Traceback (most recent call last):
  File "ex1_naive.py", line 22, in <module>
    main()
  File "ex1_naive.py", line 13, in main
    csv_gen = csv_reader("file.txt")
  File "ex1_naive.py", line 6, in csv_reader
    result = file.read().split("\n")
MemoryError
```

In this case, open() returns a generator object that you can lazily iterate through line by line. However, file.read().split() loads everything into memory at once, causing the MemoryError.

Before that happens, you’ll probably notice your computer slow to a crawl. You might even need to kill the program with a KeyboardInterrupt. So, how can you handle these huge data files? Take a look at a new definition of csv_reader():

```python
def csv_reader(file_name):
    for row in open(file_name, "r"):
        yield row
```
In this version, you open the file, iterate through it, and yield a row. This code should produce the following output, with no memory errors:

```
Row count is 64186394
```

What’s happening here? Well, you’ve essentially turned csv_reader() into a generator function. This version opens a file, loops through each line, and 
yields each row, instead of returning it.

You can also define a generator expression (also called a generator comprehension), which has a very similar syntax to list comprehensions. In this way, you can use the generator without calling a function:

NOTE: here we use `()` isntead of `[]` like in list comprehension

```python 
csv_gen = (row for row in open(file_name))
```


#### Example 2: Generating an Infinite Sequence

Let’s switch gears and look at infinite sequence generation. In Python, to get a finite sequence, you call range() and evaluate it in a list context:

```python
>>> a = range(5)
>>> list(a)
[0, 1, 2, 3, 4]
```

Generating an infinite sequence, however, will require the use of a generator, since your computer memory is finite:



In [1]:
def infinite_sequence():
    num = 0
    while True:
        yield num
        num += 1  ### this statement is executed after yield, unlike return

This code block is short and sweet. First, you initialize the variable num and start an infinite loop. Then, you immediately yield num so that you can capture the initial state. This mimics the action of range().

After yield, you increment num by 1. If you try this with a for loop, then you’ll see that it really does seem infinite:

In [None]:
for i in infinite_sequence():
    print (i, end=' ')

The program will continue to execute until you stop it manually.

Instead of using a for loop, you can also call next() on the generator object directly. This is especially useful for testing a generator in the console:

In [3]:
gen = infinite_sequence()

next(gen)

0

In [5]:
next(gen)

2

Here, you have a generator called gen, which you manually iterate over by repeatedly calling next(). This works as a great sanity check to make sure your generators are producing the output you expect.

#### Example 3: Detecting Palindromes

You can use infinite sequences in many ways, but one practical use for them is in building palindrome detectors. A palindrome detector will locate all sequences of letters or numbers that are palindromes. These are words or numbers that are read the same forward and backward, like 121. First, define your numeric palindrome detector:



num = 121
temp = 121

rev_num = 12
temp = 1

In [6]:
def is_palindrome(num):
    # Skip single-digit inputs
    if num//10 == 0:
        return False

    temp = num
    reversed_num = 0

    ### calculate the reverse of num
    while temp!=0:
        reversed_num = (reversed_num * 10) + (temp % 10)
        temp = temp // 10

    if num == reversed_num:
        return num
    else:
        return False

In [None]:
for i in infinite_sequence():
    pal = is_palindrome(i)
    if pal:
        print (pal)

In this case, the only numbers that are printed to the console are those that are the same forward or backward.

Now that you’ve seen a simple use case for an infinite sequence generator, let’s dive deeper into how generators work.

### Understanding Generators

So far, you’ve learned about the two primary ways of creating generators: by using generator functions (calling a function with yield like we do in normal python) and generator expressions (using `()` like in list comprehensions). You might even have an intuitive understanding of how generators work. Let’s take a moment to make that knowledge a little more explicit.


Generator functions look and act just like regular functions, but with one defining characteristic. Generator functions use the Python yield keyword instead of return. Recall the generator function you wrote earlier:

```python
def infinite_sequence():
    num = 0
    while True:
        yield num
        num += 1
```

This looks like a typical function definition, except for the Python yield statement and the code that follows it. yield indicates where a value is sent back to the caller, but unlike return, you don’t exit the function afterward.

Instead, the state of the function is remembered. That way, when next() is called on a generator object (either explicitly or implicitly within a for loop), the previously yielded variable num is incremented, and then yielded again. Since generator functions look like other functions and act very similarly to them, you can assume that generator expressions are very similar to other comprehensions available in Python.

#### Building Generators With Generator Expressions

Like list comprehensions, generator expressions allow you to quickly create a generator object in just a few lines of code. They’re also useful in the same cases where list comprehensions are used, with an added benefit: you can create them without building and holding the entire object in memory before iteration. In other words, you’ll have no memory penalty when you use generator expressions. Take this example of squaring some numbers:



In [8]:
nums_squared_lc = [num**2 for num in range(5)]
nums_squared_lc

[0, 1, 4, 9, 16]

In [10]:
nums_squared_gc = (num**2 for num in range(5))
nums_squared_gc

<generator object <genexpr> at 0x7f1b12c79050>

The first object used brackets to build a list, while the second created a generator expression by using parentheses. The output confirms that you’ve created a generator object and that it is distinct from a list.



### Profiling Generator Performance

You learned earlier that generators are a great way to optimize memory. While an infinite sequence generator is an extreme example of this optimization, let’s amp up the number squaring examples you just saw and inspect the size of the resulting objects. You can do this with a call to `sys.getsizeof()`:



In [12]:
import sys
nums_squared_lc = [i * 2 for i in range(10000)]
sys.getsizeof(nums_squared_lc)

87632

In [14]:
nums_squared_gc = (i ** 2 for i in range(10000))
print(sys.getsizeof(nums_squared_gc))

128


In this case, the list you get from the list comprehension is 87,624 bytes, while the generator object is only 120. This means that the list is over 700 times larger than the generator object!

__There is one thing to keep in mind, though. If the list is smaller than the running machine’s available memory, then list comprehensions can be faster to evaluate than the equivalent generator expression__. To explore this, let’s sum across the results from the two comprehensions above. You can generate a readout with cProfile.run():

In [15]:
import cProfile

In [16]:
cProfile.run('sum([i * 2 for i in range(10000)])')

         5 function calls in 0.001 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.001    0.001 <string>:1(<listcomp>)
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
        1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




In [17]:
cProfile.run('sum((i * 2 for i in range(10000)))')

         10005 function calls in 0.004 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10001    0.002    0.000    0.002    0.000 <string>:1(<genexpr>)
        1    0.000    0.000    0.003    0.003 <string>:1(<module>)
        1    0.000    0.000    0.004    0.004 {built-in method builtins.exec}
        1    0.001    0.001    0.003    0.003 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




Here, you can see that summing across all values in the list comprehension took about a third of the time as summing across the generator. 

> If speed is an issue and memory isn’t, then a list comprehension is likely a better tool for the job.

Remember, list comprehensions return full lists, while generator expressions return generators. 

__Generators work the same whether they’re built from a function or an expression. Using an expression just allows you to define simple generators in a single line, with an assumed yield at the end of each inner iteration.__

The Python yield statement is certainly the linchpin on which all of the functionality of generators rests, so let’s dive into how yield works in Python.



