## Notebook covers following topics:
- **List Comprehensions**
    - Nested list comprehensions
    - Nested loops using list comprehensions
    - List comprehensions behave like functions interms of scope
- **Iteration using next_ method**
- **Iterators**
    - Iterators are objects that implement the __iter__ and __next__ methods.
    - The __iter__ method of an iterator just returns itself.
- **Iterators and iterables**
    - How to solve exhaustion problem
    - How to use iter on lists
- **Consuming iterators manually**
    - Usecase with cars.csv file
    

In [107]:
# Imports required for this notebook
from math import factorial
import dis
from collections import namedtuple

### List Comprehensions

In [2]:
squares = [i**2 for i in range(0,11)]
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [3]:
#Even squares only
even_squares = [i**2 for i in range(0,10) if i%2 == 0]

In [4]:
even_squares

[0, 4, 16, 36, 64]

**Nested list comprehensions**

You'll notice here that we nested one list comprehension inside another.

You should also notice that the inner comprehension (the one that has i*j) is accessing a local variable i, as well as a variable from the enclosing comprehension - the j variable. Just like a closure! And in fact, it is exactly that. We'll come back to that in a bit.

In [6]:
prod = [ [i*j for j in range(0,11)]
         for i in range(0,11)]

In [7]:
prod

[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20],
 [0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30],
 [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40],
 [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50],
 [0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60],
 [0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 70],
 [0, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80],
 [0, 9, 18, 27, 36, 45, 54, 63, 72, 81, 90],
 [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]]

**Pascals triangle**

1

1 1

1 2 1

1 3 3 1

1 4 6 4 1

Calculation is C(n, k) = n! / (k! (n-k)!)

- row 0, column 0: n=0, k=0: c(0, 0) = 0! / 0! 0! = 1/1 = 1
- row 4, column 2: n=4, k=2: c(4, 2) = 4! / 2! 2! = 4x3x2 / 2x2 = 6

In [21]:
# Conventional way
from math import factorial as fact
pasc_lst = []
for n in range(10):
    lst = []
    for k in range(n+1):
        num = int(fact(n)//(fact(k)*fact(n-k)))        
        lst.append(num)
    if lst:
        pasc_lst.append(lst)
pasc_lst

[[1],
 [1, 1],
 [1, 2, 1],
 [1, 3, 3, 1],
 [1, 4, 6, 4, 1],
 [1, 5, 10, 10, 5, 1],
 [1, 6, 15, 20, 15, 6, 1],
 [1, 7, 21, 35, 35, 21, 7, 1],
 [1, 8, 28, 56, 70, 56, 28, 8, 1],
 [1, 9, 36, 84, 126, 126, 84, 36, 9, 1]]

In [22]:
def combo(n,k):
    num = fact(n) // (fact(k)*fact(n-k))
    return int(num)

***In below function, outer comprehension acceses global variable 'size', creates its local variable 'n' . Inner comprehension accesses 'n' which is a non-local variable to it & creates its own local variable 'k'***

In [24]:
size = 10
pascals = [[combo(n,k) for k in range(n+1)]
            for n in range(size)]
pascals

[[1],
 [1, 1],
 [1, 2, 1],
 [1, 3, 3, 1],
 [1, 4, 6, 4, 1],
 [1, 5, 10, 10, 5, 1],
 [1, 6, 15, 20, 15, 6, 1],
 [1, 7, 21, 35, 35, 21, 7, 1],
 [1, 8, 28, 56, 70, 56, 28, 8, 1],
 [1, 9, 36, 84, 126, 126, 84, 36, 9, 1]]

***Nested loops - Demonstration with an example***

In [25]:
l1 = ['a', 'b', 'c']
l2 = ['x', 'y', 'z']
lst = []
for i in l1:
    for j in l2:
        ele = i+j
        lst.append(ele)
lst

['ax', 'ay', 'az', 'bx', 'by', 'bz', 'cx', 'cy', 'cz']

In [26]:
# List comprehension way

lst = [i+j for j in l2 for i in l1]
lst

['ax', 'bx', 'cx', 'ay', 'by', 'cy', 'az', 'bz', 'cz']

In [28]:
# We dont want 'bb' and 'cc' ie elements repeated

l1 = ['a', 'b', 'c']
l2 = ['b', 'd', 'c']
lst = []
for i in l1:
    for j in l2:
        if i != j:
            ele = i+j
            lst.append(ele)
lst

['ab', 'ad', 'ac', 'bd', 'bc', 'cb', 'cd']

In [29]:
# List comprehension way
lst = [i+j for j in l2 for i in l1 if i != j]
lst

['ab', 'cb', 'ad', 'bd', 'cd', 'ac', 'bc']

***Implementing zip using list comprehension***

In [30]:
l1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
l2 = ['a', 'b', 'c', 'd']
lst = list(zip(l1, l2))
lst

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

In [31]:
# Conventional way to achieve the same

lst = []
for index_a, i in enumerate(l1):
    for index_b, j in enumerate(l2):
        if index_a == index_b:
            tup = (i, j)
            lst.append(tup)
lst

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

In [33]:
# List comprehension way
lst = [(i,j) for index_b, j in enumerate(l2)
       for index_a, i in enumerate(l1)
       if index_a == index_b]
lst

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

We need to recognize that list comprehensions are essentially temporary functions that Python creates, executes and returns the resulting list from it.

We can see this by compiling a comprehension, and then disassembling the compiled code to see what happened:

In [34]:
import dis

In [35]:
compiled_code = compile('[i**2 for i in (1, 2, 3)]', 
                        filename='', mode='eval')

In [36]:
dis.dis(compiled_code)

  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x000002A7F5551B70, file "", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_CONST               2 ((1, 2, 3))
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x000002A7F5551B70, file "", line 1>:
  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                12 (to 18)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LOAD_CONST               0 (2)
             12 BINARY_POWER
             14 LIST_APPEND              2
             16 JUMP_ABSOLUTE            4
        >>   18 RETURN_VALUE


As you can see, in step 4, Python created a function (MAKE_FUNCTION), called it (CALL_FUNCTION), and then returned the result (RETURN_VALUE) in the last step.

So, comprehensions will behave like functions in terms of scope. They have local scope, and can access global and nonlocal scopes too. And nested comprehensions will also behave like nested functions and closures.

In [39]:
v1 = (1, 2, 3, 4, 5, 6)
v2 = (10, 20, 30, 40, 50, 60)

In [43]:
dot = sum([v1*v2 for v1, v2 in zip(v1, v2)])

In [44]:
dot

910

**Let us check the scope of variables in list comprehension**

In [45]:
if 'number' in globals():
    del number

In [46]:
lst = [number**2 for number in v1]
lst

[1, 4, 9, 16, 25, 36]

In [48]:
number          #number was in local scope & its scope ended with list comprehension executed above

NameError: name 'number' is not defined

In [50]:
'number' in globals()

False

In [51]:
## Let us take another example
number = 11
lst = [number*i for i in range(5)]
lst

[0, 11, 22, 33, 44]

In [53]:
number       # In above example list comprehension took 'number' from global scope & successfully executed

11

Suppose we want to generate a list of functions that will calculate powers of their argument, i.e. we want to define a bunch of functions

- fn_1(arg) --> arg ** 1
- fn_2(arg) --> arg ** 2
- fn_3(arg) --> arg ** 3 etc...

**Let us see how we can effectively use lambda functions here**

In [61]:
# This wont work
lst_fn = [lambda x : x**i for i in range(5)]

In [62]:
lst_fn[0](10), lst_fn[1](10), lst_fn[-1](10)  # We get same value. This is because 'i' takes last value it assumed ie. 4
# SO we get 10^4 = 10_000 in all cases

(10000, 10000, 10000)

In [63]:
# We need to use an intermediate variable as below to make it work
lst_fn = [lambda x, pow=i:x**pow for i in range(5)]

In [64]:
lst_fn[0](10), lst_fn[1](10), lst_fn[-1](10)

(1, 10, 10000)

***Iteration using 'next' method***

This eliminates the need for collection being indexable. It can work with any collection & can continue to give next element

In [65]:
# We can do as below. Example taken here is a class returning 'squares' of argument

class Squares:
    def __init__(self):
        self.i = 0
        
    def next_(self):
        result = self.i**2
        self.i += 1
        return result

In [67]:
sq = Squares()

In [70]:
sq.next_()

0

In [71]:
sq.next_()

1

In [72]:
sq.next_()

4

In [73]:
for i in range(10):
    print(sq.next_())

9
16
25
36
49
64
81
100
121
144


In [74]:
# But pblm with above way is like we cant stop it. Next time we call sq.next_() we will get 169 as below

In [75]:
sq.next_()

169

In [77]:
# So we need to control the calls as below
class Squares:
    def __init__(self, length):
        self.i = 0
        self.length = length
        
    def next_(self):
        if self.i <= self.length:
            result = self.i**2
            self.i += 1
            return result
        else:
            raise StopIteration
            
    def __len__(self):          # We are defining __len__ so that we can call len(obj-name) to retrieve length of the obj
        return self.length

In [95]:
sq = Squares(5)

In [96]:
len(sq)

5

In [97]:
for i in range(5):
    print(sq.next_())

0
1
4
9
16


In [98]:
sq.i

5

In [99]:
for i in range(1):
    print(sq.next_())

25


In [100]:
sq.length, sq.i

(5, 6)

In [101]:
for i in range(1):
    print(sq.next_())

StopIteration: 

In [103]:
# Above approach still has pblm. Once exhausted it will start throwing error. We can resolve error by calling next as below

In [104]:
sq = Squares(5)
while True:
    try: 
        print(sq.next_())
    except StopIteration:
        break        

0
1
4
9
16
25


In [111]:
# We can also use __next__ method for ease of calling as below
class Squares:
    def __init__(self, length):
        self.i = 0
        self.length = length
        
    def __next__(self):
        if self.i <= self.length:
            result = self.i**2
            self.i += 1
            return result
        else:
            raise StopIteration
            
    def __len__(self):          # We are defining __len__ so that we can call len(obj-name) to retrieve length of the obj
        return self.length

In [112]:
sq = Squares(5)
for i in range(3):
    print(next(sq))

0
1
4


In [113]:
#We need to make 'Squares' iterable. Currently it is NOT.
for i in Squares(10):
    print(i)

TypeError: 'Squares' object is not iterable

## Iterators

We already saw that we could approach iterating over a collection using this concept of **next**.

But there were some downsides that did not resolve (yet!):

- we cannot use a **for** loop
- once we **exhaust** the iteration (repeatedly calling next), we're essentially done with object. The only way to iterate through it again is to create a new instance of the object.

First we are going to look at making our **next** be usable in a **for** loop.

This idea of using __next__ and the StopIteration exception is exactly what Python does.

So, somehow we need to tell Python that the object we are dealing with can be used with **next**.

To do so, we create an **iterator type object**.

Iterators are objects that implement:

- a __next__ method
- an __iter__ method that simply returns the object itself

That's it - that's all there is to an iterator - two methods, __iter__ and __next__.

In [2]:
class Squares:
    def __init__(self, length):
        self.length = length
        self.i = 0 
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.i <= self.length:
            result = self.i**2
            self.i += 1
            return result
        else:
            raise StopIteration

In [5]:
# next will continue to work here
sq = Squares(5)
print(next(sq))
print(next(sq))
print(next(sq))

0
1
4


In [6]:
# But now we can use 'for' loop too
for i in Squares(6):
    print(i)

0
1
4
9
16
25
36


In [9]:
# But exhaustion pblm still exists, once exhausted to reuse we have to redefine again

sq = Squares(5)  # sq is an iterator

for i in sq:
    print(i)

0
1
4
9
16
25


In [10]:
for i in sq:   # We get nothing bcoz already exhausted
    print(sq)

In [11]:
id(sq)

2095119205088

In [13]:
id(sq.__iter__())

2095119205088

In [14]:
id(iter(sq))

2095119205088

In [18]:
# We can use list comprehension also on iterator objects

sq = Squares(5)

lst = [i for i in sq if i%2 == 0]
lst

[0, 4, 16]

***We can even use any function that requires an iterable as an argument (iterators are iterable):***

### But exhaustion problem is still there. Once we fully iterate over an iterator, the iterator is exhausted and we can no longer use it for iteration purposes.

In [20]:
sq = Squares(5)
list(enumerate(sq))

[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25)]

In [21]:
sq = Squares(5)
sorted(sq, reverse=True)


[25, 16, 9, 4, 1, 0]

**Let us see how iterator flow is happening**

In [22]:
class Squares:
    def __init__(self, length):
        print('init')
        self.length = length
        self.i = 0 
        
    def __iter__(self):
        print('calling squares iter')
        return self
    
    def __next__(self):
        if self.i <= self.length:
            print('calling next')
            result = self.i**2
            self.i += 1
            return result
        else:
            print('about to stop iteration')
            raise StopIteration

In [23]:
sq = Squares(5)

init


In [24]:
lst = [i for i in sq]

calling squares iter
calling next
calling next
calling next
calling next
calling next
calling next
about to stop iteration


***It first calls iter & then calls next till elements are exhausted***

#### Let us focus on below 2 items

- when we looped over the iterator using a **for** loop (or a comprehension, or other functions that do some form of iteration), we saw that the __iter__ was always called first. What is the relevance of calling **iter** first ?
- the iterator gets exhausted after we have finished iterating it fully - which means we have to create a new iterator every time we want to use a new iteration over the collection - can we somehow avoid having to remember to do that every time?

In [25]:
class Cities:
    def __init__(self):
        self._cities = ['Newyork', 'Delhi', 'Mumbai', 'LA']
        self._index  = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._index >= len(self._cities):
            raise StopIteration
        else:
            item = self._cities[self._index]
            self._index += 1
            return item                

In [26]:
cities = Cities()
list(enumerate(cities))

[(0, 'Newyork'), (1, 'Delhi'), (2, 'Mumbai'), (3, 'LA')]

In [27]:
list(enumerate(cities)) # Exhausted

[]

In [28]:
# As a 1st step to resolve exhaustion problem, let us separate data from iteration part

In [29]:
class Cities:
    def __init__(self):
        self._cities = ['Newyork', 'Delhi', 'Mumbai', 'LA']
        self._index  = 0
        
    def __len__(self):
        return len(self._cities)

In [37]:
class CityIterator:
    def __init__(self, city_obj):
        # cities is an instance of Cities
        self._city_obj = city_obj
        self._index = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._index >= len(self._city_obj):
            raise StopIteration
        else:
            result = self._city_obj._cities[self._index]
            self._index += 1
            return result

In [38]:
# With this approach, we can create object only once & then call it several times via different iterators

In [39]:
cities = Cities()

In [40]:
cities

<__main__.Cities at 0x1e7cdf7c430>

In [41]:
iter_1 = CityIterator(cities)

In [42]:
for city in iter_1:
    print(city)

Newyork
Delhi
Mumbai
LA


In [43]:
lst = [i for i in iter_1]    # Exhausted
lst

[]

In [44]:
iter_2 = CityIterator(cities) # So creating another iterator passing same object
lst = [i for i in iter_2]
lst

['Newyork', 'Delhi', 'Mumbai', 'LA']

In [45]:
# But we still have to create different iterator & also 'cities' is no-longer iterable as we separated 'iter' part

for city in cities:
    print(city)

TypeError: 'Cities' object is not iterable

In [60]:
# We can solve this by including 'iter' in Cities class also

class Cities:
    def __init__(self):
        self._cities = ['Newyork', 'Delhi', 'Mumbai', 'LA']
        self._index  = 0
        
    def __len__(self):
        return len(self._cities)
    
    def __iter__(self):
        print(' Calling Cities iter')
        return CityIterator(self)

In [61]:
class CityIterator:
    def __init__(self, city_obj):
        # cities is an instance of Cities
        self._city_obj = city_obj
        self._index = 0
        
    def __iter__(self):
        print('Calling CityIterator iter')
        return self
    
    def __next__(self):
        if self._index >= len(self._city_obj):
            print('About to stop iteration')
            raise StopIteration
        else:
            print('Calling Next')
            result = self._city_obj._cities[self._index]
            self._index += 1
            return result

***When we try to iterate over the Cities instance***, Python will first call __iter__. The __iter__ method should then return an iterator which Python will use for the iteration.
A new iterator was created when the 'for' loop started, thus avoiding exhaustion also.

In [62]:
cities = Cities()

In [63]:
lst = [i for i in cities]

 Calling Cities iter
Calling Next
Calling Next
Calling Next
Calling Next
About to stop iteration


In [64]:
lst = [i for i in cities]

 Calling Cities iter
Calling Next
Calling Next
Calling Next
Calling Next
About to stop iteration


In [65]:
list(enumerate(cities))

 Calling Cities iter
Calling Next
Calling Next
Calling Next
Calling Next
About to stop iteration


[(0, 'Newyork'), (1, 'Delhi'), (2, 'Mumbai'), (3, 'LA')]

In [73]:
# Let us make the code self-contained by including 'CityIterator' class inside 'Cities' class

class Cities:
    def __init__(self):
        self._cities = ['Newyork', 'Delhi', 'Mumbai', 'LA']
        self._index  = 0
        
    def __len__(self):
        return len(self._cities)
    
    def __iter__(self):
        return self.CityIterator(self)
    
    class CityIterator:
        def __init__(self, city_obj):
            # cities is an instance of Cities
            self._city_obj = city_obj
            self._index = 0
        
        def __iter__(self):
            return self
    
        def __next__(self):
            if self._index >= len(self._city_obj):
                raise StopIteration
            else:
                result = self._city_obj._cities[self._index]
                self._index += 1
                return result

In [74]:
cities = Cities()

In [75]:
for city in cities:
    print(city)

Newyork
Delhi
Mumbai
LA


In [82]:
# We can create separate iterator instances also which are exhaustible if we wish to use that feature somewhere
iter_1 = iter(cities)
iter_2 = iter(cities)
id(iter_1), id(iter_2)

(2095118042592, 2095118042160)

In [83]:
lst = [i for i in iter_1]
lst

['Newyork', 'Delhi', 'Mumbai', 'LA']

In [84]:
lst = [i for i in iter_1]
lst

[]

***Making a sequence from iterable***

In [86]:
cities[0]

TypeError: 'Cities' object is not subscriptable

In [90]:
# We can introduce __getitem__ to make it indexable

class Cities:
    def __init__(self):
        self._cities = ['Newyork', 'Delhi', 'Mumbai', 'LA']
        self._index  = 0
        
    def __len__(self):
        return len(self._cities)
    
    def __getitem__(self, s):
        return self._cities[s]
    
    def __iter__(self):
        return self.CityIterator(self)
    
    class CityIterator:
        def __init__(self, city_obj):
            # cities is an instance of Cities
            self._city_obj = city_obj
            self._index = 0
        
        def __iter__(self):
            return self
    
        def __next__(self):
            if self._index >= len(self._city_obj):
                raise StopIteration
            else:
                result = self._city_obj._cities[self._index]
                self._index += 1
                return result

In [91]:
cities = Cities()
cities[0]

'Newyork'

In [92]:
cities[-1]

'LA'

***How to use iter on list***

In [94]:
lst = [1, 2, 3]

# Since lists are iterables, python implemented the __iter__ method and hence we can get an iterator for the list

iter_l = iter(lst)

In [95]:
next(iter_l)

1

In [96]:
next(iter_l)

2

In [97]:
next(iter_l)

3

In [98]:
next(iter_l)

StopIteration: 

### Consuming iterators manually - Usecase

In [102]:
with open(r"C:\Users\anila\Desktop\AI\EPAI-Phase1\S14_Iterables_Iterators\cars.csv", 'r') as file:
    for line in file:
        print(line)
    file.close()

Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin

STRING;DOUBLE;INT;DOUBLE;DOUBLE;DOUBLE;DOUBLE;INT;CAT

Chevrolet Chevelle Malibu;18.0;8;307.0;130.0;3504.;12.0;70;US

Buick Skylark 320;15.0;8;350.0;165.0;3693.;11.5;70;US

Plymouth Satellite;18.0;8;318.0;150.0;3436.;11.0;70;US

AMC Rebel SST;16.0;8;304.0;150.0;3433.;12.0;70;US

Ford Torino;17.0;8;302.0;140.0;3449.;10.5;70;US

Ford Galaxie 500;15.0;8;429.0;198.0;4341.;10.0;70;US

Chevrolet Impala;14.0;8;454.0;220.0;4354.;9.0;70;US

Plymouth Fury iii;14.0;8;440.0;215.0;4312.;8.5;70;US

Pontiac Catalina;14.0;8;455.0;225.0;4425.;10.0;70;US

AMC Ambassador DPL;15.0;8;390.0;190.0;3850.;8.5;70;US

Citroen DS-21 Pallas;0;4;133.0;115.0;3090.;17.5;70;Europe

Chevrolet Chevelle Concours (sw);0;8;350.0;165.0;4142.;11.5;70;US

Ford Torino (sw);0;8;351.0;153.0;4034.;11.0;70;US

Plymouth Satellite (sw);0;8;383.0;175.0;4166.;10.5;70;US

AMC Rebel SST (sw);0;8;360.0;175.0;3850.;11.0;70;US

Dodge Challenger SE;15.0;8;383.0;170.

File layout is as below:
    
As we can see, the values are delimited by ; and the first two lines consist of the column names, and column types.

The reason for the spacing between each line is that each line ends with a newline, and our print statement also emits a newline by default. So we'll have to strip those out.

**So here's what we want to do:**

- read the first line to get the column headers and create a named tuple class
- read data types from second line and store this so we can cast the strings we are reading to the correct data type
- read the data rows and parse them into a named tuple

In [2]:
cars_file_path = r"C:\Users\anila\Desktop\AI\EPAI-Phase1\S14_Iterables_Iterators\cars.csv"

In [106]:
# Conventional way 

with open(cars_file_path, 'r') as file:
    row_index = 0
    for line in file:
        if row_index == 0:
            # Header
            # Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin
            header = line.strip('\n').split(';')      # This will give a list
            print(header)
        elif row_index == 1:
            # Data-type
            # STRING;DOUBLE;INT;DOUBLE;DOUBLE;DOUBLE;DOUBLE;INT;CAT
            data_type = line.strip('\n').split(';')
            print(data_type)
        else:
            data = line.strip('\n').split(';')
            if row_index < 6:
                print(data)
        row_index += 1    
    file.close()

['Car', 'MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight', 'Acceleration', 'Model', 'Origin']
['STRING', 'DOUBLE', 'INT', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'INT', 'CAT']
['Chevrolet Chevelle Malibu', '18.0', '8', '307.0', '130.0', '3504.', '12.0', '70', 'US']
['Buick Skylark 320', '15.0', '8', '350.0', '165.0', '3693.', '11.5', '70', 'US']
['Plymouth Satellite', '18.0', '8', '318.0', '150.0', '3436.', '11.0', '70', 'US']
['AMC Rebel SST', '16.0', '8', '304.0', '150.0', '3433.', '12.0', '70', 'US']


In [3]:
# Using namedtuple

from collections import namedtuple

with open(cars_file_path, 'r') as file:
    row_index = 0
    cars = []
    for line in file:
        if row_index == 0:
            header = line.strip('\n').split(';') 
            Cars = namedtuple('Cars', header)
        elif row_index == 1:
            data_type= line.strip('\n').split(';')
            print(data_type)
        else:
            data = line.strip('\n').split(';')
            car = Cars(*data)
            cars.append(car)
            if row_index < 5:
                print(cars)
        row_index += 1
    file.close()

['STRING', 'DOUBLE', 'INT', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'INT', 'CAT']
[Cars(Car='Chevrolet Chevelle Malibu', MPG='18.0', Cylinders='8', Displacement='307.0', Horsepower='130.0', Weight='3504.', Acceleration='12.0', Model='70', Origin='US')]
[Cars(Car='Chevrolet Chevelle Malibu', MPG='18.0', Cylinders='8', Displacement='307.0', Horsepower='130.0', Weight='3504.', Acceleration='12.0', Model='70', Origin='US'), Cars(Car='Buick Skylark 320', MPG='15.0', Cylinders='8', Displacement='350.0', Horsepower='165.0', Weight='3693.', Acceleration='11.5', Model='70', Origin='US')]
[Cars(Car='Chevrolet Chevelle Malibu', MPG='18.0', Cylinders='8', Displacement='307.0', Horsepower='130.0', Weight='3504.', Acceleration='12.0', Model='70', Origin='US'), Cars(Car='Buick Skylark 320', MPG='15.0', Cylinders='8', Displacement='350.0', Horsepower='165.0', Weight='3693.', Acceleration='11.5', Model='70', Origin='US'), Cars(Car='Plymouth Satellite', MPG='18.0', Cylinders='8', Displacement='318.0', H

We still need to parse the data into strings, integers, floats...Currently everything is string

eg: 'DOUBLE' is represented as string as shown below

We need cast to a data type based on the data type string:

- STRING --> str
- DOUBLE --> float
- INT --> int
- CAT --> str

In [4]:
data_type = ['STRING', 'DOUBLE', 'INT', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'INT', 'CAT']
data_row  = ['Chevrolet Chevelle Malibu', '18.0', '8', '307.0', '130.0', '3504.', '12.0', '70', 'US']

In [5]:
# We need something like this

def cast(type_, data):
    if type_ == 'DOUBLE':
        return float(data)
    elif type_ == 'INT':
        return int(data)
    else:
        return str(data)

In [6]:
# Let us use zip and list comprehensions to convert a row of data using 'cast'

[cast(type_, data) for type_, data in list(zip(data_type, data_row))]

['Chevrolet Chevelle Malibu', 18.0, 8, 307.0, 130.0, 3504.0, 12.0, 70, 'US']

In [7]:
# Let us write this as a function
def cast_row(data_tye, data_row):
    return [cast(type_, data) 
            for type_, data in list(zip(data_type, data_row))]

In [10]:
# Let us integrate whatever enhancements we made to our namedtuple logic

with open(cars_file_path, 'r') as file:
    row_index = 0
    cars = []
    for line in file:
        if row_index == 0:
            header = line.strip('\n').split(';') 
            Cars = namedtuple('Cars', header)
        elif row_index == 1:
            data_type= line.strip('\n').split(';')
            print(data_type)
        else:
            data = line.strip('\n').split(';')
            data = cast_row(data_type, data)
            car = Cars(*data)
            cars.append(car)
            if row_index < 5:
                print(cars)
        row_index += 1
    file.close()

['STRING', 'DOUBLE', 'INT', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'INT', 'CAT']
[Cars(Car='Chevrolet Chevelle Malibu', MPG=18.0, Cylinders=8, Displacement=307.0, Horsepower=130.0, Weight=3504.0, Acceleration=12.0, Model=70, Origin='US')]
[Cars(Car='Chevrolet Chevelle Malibu', MPG=18.0, Cylinders=8, Displacement=307.0, Horsepower=130.0, Weight=3504.0, Acceleration=12.0, Model=70, Origin='US'), Cars(Car='Buick Skylark 320', MPG=15.0, Cylinders=8, Displacement=350.0, Horsepower=165.0, Weight=3693.0, Acceleration=11.5, Model=70, Origin='US')]
[Cars(Car='Chevrolet Chevelle Malibu', MPG=18.0, Cylinders=8, Displacement=307.0, Horsepower=130.0, Weight=3504.0, Acceleration=12.0, Model=70, Origin='US'), Cars(Car='Buick Skylark 320', MPG=15.0, Cylinders=8, Displacement=350.0, Horsepower=165.0, Weight=3693.0, Acceleration=11.5, Model=70, Origin='US'), Cars(Car='Plymouth Satellite', MPG=18.0, Cylinders=8, Displacement=318.0, Horsepower=150.0, Weight=3436.0, Acceleration=11.0, Model=70, Origin='US

In [12]:
# Now let us further clean-up using iterators here

with open(cars_file_path, 'r') as file:
    cars = []
    file_iter = iter(file)
    header = next(file_iter).strip('\n').split(';') 
    Cars = namedtuple('Cars', header)
    data_type= next(file_iter).strip('\n').split(';')
    for line in file:        
        data = cast_row(data_type, line.strip('\n').split(';'))
        car = Cars(*data)
        cars.append(car)
    file.close()

In [13]:
cars[7]

Cars(Car='Plymouth Fury iii', MPG=14.0, Cylinders=8, Displacement=440.0, Horsepower=215.0, Weight=4312.0, Acceleration=8.5, Model=70, Origin='US')

In [14]:
# Let us use list comprehension to make it further efficient

with open(cars_file_path, 'r') as file:
    file_iter = iter(file)
    header = next(file_iter).strip('\n').split(';') 
    Cars = namedtuple('Cars', header)
    data_type= next(file_iter).strip('\n').split(';')
    car_data = [cast_row(data_type, 
                     line.strip('\n').split(';'))
                for line in file]
    cars = [Cars(*item) for item in car_data]

In [15]:
cars[10]

Cars(Car='Citroen DS-21 Pallas', MPG=0.0, Cylinders=4, Displacement=133.0, Horsepower=115.0, Weight=3090.0, Acceleration=17.5, Model=70, Origin='Europe')