<a href="https://colab.research.google.com/github/AnupJoseph/adv-python/blob/master/Generators.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!wget https://raw.githubusercontent.com/realpython/materials/master/generators/techcrunch.csv

--2020-08-13 11:44:47--  https://raw.githubusercontent.com/realpython/materials/master/generators/techcrunch.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 93537 (91K) [text/plain]
Saving to: ‘techcrunch.csv’


2020-08-13 11:44:47 (9.38 MB/s) - ‘techcrunch.csv’ saved [93537/93537]



Introduced with PEP 255, generator functions are a special kind of function that return a lazy iterator. These are objects that you can loop over like a list. However, unlike lists, lazy iterators do not store their contents in memory.

Uses of generators:


*   Reading Large Files
*   Generating an Infinite Sequence
*   Detecting Palindromes

We'll go over this using examples

In [2]:
# A new cooler CSV reader
def csv_reader(csv_file):
  with open(csv_file) as file_obj:
    for line in file_obj:
      yield line

def line_counter(file_obj):
  csv_gen = csv_reader(file_obj)
  row_count = 0
  for row in csv_gen:
    row_count += 1
  print(f"Row count is {row_count}")

line_counter('techcrunch.csv')

Row count is 1461


In [3]:
# This is an even more smarter technique to generator creation called generator expression 
def line_counter(file_obj):
  csv_gen = (row for row in open(file_obj)) # Key line here using the fact that open is always a lazy_iterator
  row_count = 0
  for row in csv_gen:
    row_count += 1
  print(f"Row count is {row_count}")

line_counter('techcrunch.csv')

Row count is 1461


In [4]:
# Generating an infinite sequence. This requires genrator patterns as the computer memory is finite 

def infinite_pattern():
  num = 0
  while True:
    yield num
    num += 1 

In [5]:
index = 0 
for i in infinite_pattern():
  index += 1
  print(i, end=" ")
  if index == 10:
    break

0 1 2 3 4 5 6 7 8 9 

In [6]:
# More usefully you can use a next() function to well get the next number

gen = infinite_pattern()
print(next(gen))
print(next(gen))

0
1


In [7]:
# Using a generator expression
next_sum_squared = (num**2 for num in range(5))
print(next_sum_squared)

# We use tuples for ensuring that lazy execution is performed and all tuple comprehension patterns are defaulted to generator expressions

<generator object <genexpr> at 0x7f7411c7cf68>


In [None]:
# Understanding Yield
# When the Python yield statement is hit, the program suspends function execution and returns the yielded value to the caller. 
# (In contrast, return stops function execution completely.)

In [14]:
# Advanced methods over yield

def is_palindrome(num):
  if num // 10 == 0:
    return False
  temp = num
  reversed_num = 0

  while temp:
    reversed_num = (reversed_num*10)+(temp%10)
    temp //= 10

  if temp == num:
    return True
  else:
    return False

In [15]:
def infinite_palindrome():
  num = 0
  while True:
    if is_palindrome(num):
      i = (yield num) # This might seem linke an anti-pattern but in Python now yield is a expression-cum-statement
      if i is not None:
        num = 1
    num += 1

In [16]:
# Due to this particular property of being an expression the yield statement can no return data.
# And we can happily use this data for our particular brand of chicanery
pal_gen = infinite_palindrome()
for i in pal_gen:
  digits = len(str(i))
  pal_gen.send(10**(digits))

KeyboardInterrupt: ignored

In [None]:
# This pattern is called a co-routine
# Now lets take a look at .throw() which allows you to throw exceptions with the generator. 

In [None]:
pal_gen = infinite_palindrome()
for i in pal_gen:
    print(i)
    digits = len(str(i))
    if digits == 5:
        pal_gen.throw(ValueError("We don't like large palindromes"))
    pal_gen.send(10 ** (digits))

In [49]:
# Use a generator expression to read the lines of the file
file_name = 'techcrunch.csv'
lines = (line for line in open(file_name))

# Then we use another generator expression to split each line into a line
list_line = (line.rstrip().split(',') for line in lines)  # If you are wondering about the rstrip its to clear up the trailing \n 

# To get the first line i.e. column names in the csv file use
columns = next(list_line)

In [39]:
for i in columns:
  print(i,end=' ')

permalink company numEmps category city state fundedDate raisedAmt raisedCurrency round 

In [50]:
# Now to use this expression we will create a dictionary of each company as as separate element and of course use a generator expression to apply them
company_dict = (dict(zip(columns,data)) for data in list_line)

In [47]:
funding = ( int(company['raisedAmt']) for company in company_dict if company.get('round') == 'a')

In [48]:
total_series_a = sum(funding)
total_series_a

4376015000

In [None]:
# A few pitfalls 
# these expressions can and possibly will produce errors which are difficult to debug and unintuitve so be careful