# **`1-Using iterators in PythonLand`**

# **`1.1- Introduction to iterators`**

#### Iterating with a for loop
- We can iterate over a list using a for loop

In [2]:
employees = ['Qasim','Hassan','Muneeb']
for employee in employees:
    print(employee)

Qasim
Hassan
Muneeb


- We can iterate over a string using a for loop

In [3]:
for letter in 'DataCamp':
    print(letter)

D
a
t
a
C
a
m
p


- We can iterate over a range object using a for loop

In [4]:
for i in range(4):
    print(i)

0
1
2
3


### Iterators vs. iterables
##### Iterable
- Examples: lists, strings, dictionaries, file connnections
- An object with an associated `iter()` method
- Applying `iter()` to an iterable creates an iterator
    
##### Iterator
- Produces next value with `next()`

### Iterating over iterables: `next()`

In [5]:
word = 'Da'
it = iter(word)
next(it)

'D'

In [6]:
next(it)

'a'

In [8]:
next(it) # B/c String finish

StopIteration: 

### Iterating at once with *

In [9]:
word = 'Data'
it = iter(word)
print(*it)

D a t a


In [11]:
print(*it) # Will print Nothing B/c No more values to go through!




### Iterating over dictionaries

In [15]:
pythonistas = {'Qasim': 'Hassan','Ammad': 'Mohsin'}
for key, value in pythonistas.items():
    print(key+" S/O "+ value)

Qasim S/O Hassan
Ammad S/O Mohsin


### Iterating over file connections

In [20]:
file = open('./datasets/file.txt')
it = iter(file)
print(next(it))

Hi, You're learning Python Data Science Toolbox.



In [21]:
print(next(it))

This is the Second line & you're going really great.


# **`1.2- Playing with iterators`**

### Using enumerate()

In [22]:
avengers = ['hawkeye','iron man','thor','quicksilver']
e = enumerate(avengers)
print(type(e))

<class 'enumerate'>


In [23]:
e_list = list(e)
print(e_list)

[(0, 'hawkeye'), (1, 'iron man'), (2, 'thor'), (3, 'quicksilver')]


### enumerate() and unpack

In [29]:
avengers = ['hawkeye','iron man','thor','quicksilver']
for index, name in enumerate(avengers):
    print("Index no:"+str(index)+ " have "+name.upper())

Index no:0 have HAWKEYE
Index no:1 have IRON MAN
Index no:2 have THOR
Index no:3 have QUICKSILVER


In [30]:
avengers = ['hawkeye','iron man','thor','quicksilver']
for index, name in enumerate(avengers, start=10):
    print("Index no:"+str(index)+ " have "+name.upper())

Index no:10 have HAWKEYE
Index no:11 have IRON MAN
Index no:12 have THOR
Index no:13 have QUICKSILVER


### Using zip()

In [31]:
avengers = ['hawkeye','iron man','thor','quicksilver']
names = ['barton','stark','odinson','maximoff']
z = zip(avengers, names)
print(type(z))

<class 'zip'>


In [32]:
print(list(z))

[('hawkeye', 'barton'), ('iron man', 'stark'), ('thor', 'odinson'), ('quicksilver', 'maximoff')]


### `zip()` and unpack

In [42]:
avengers = ['hawkeye','iron man','thor','quicksilver']
names = ['barton','stark','odinson','maximoff']

for z1, z2 in zip(avengers, names):
    print(f"{z1.upper()} <-- real name --> {z2.upper()} ")

HAWKEYE <-- real name --> BARTON 
IRON MAN <-- real name --> STARK 
THOR <-- real name --> ODINSON 
QUICKSILVER <-- real name --> MAXIMOFF 


### Print zip with *

In [43]:
avengers = ['hawkeye','iron man','thor','quicksilver']
names = ['barton','stark','odinson','maximoff']
z = zip(avengers, names)
print(*z)

('hawkeye', 'barton') ('iron man', 'stark') ('thor', 'odinson') ('quicksilver', 'maximoff')


# **`1.3- Using iterators to load large files into memory`**

### Loading data in chunks
- There can be too much data to hold in memory
- Solution: load data in chunks!
- Pandas function: `read_csv()`
    - Specify the chunk: `chunk_size`


## Iterating over data

In [52]:
# importing pandas
import pandas as pd

# loading dataset
tweets_df = pd.read_csv('./datasets/tweets.csv', chunksize=10)

next(tweets_df)
# tweets_df.head()

Unnamed: 0,contributors,coordinates,created_at,entities,extended_entities,favorite_count,favorited,filter_level,geo,id,...,quoted_status_id,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,timestamp_ms,truncated,user
0,,,Tue Mar 29 23:40:17 +0000 2016,"{'hashtags': [], 'user_mentions': [{'screen_na...","{'media': [{'sizes': {'large': {'w': 1024, 'h'...",0,False,low,,714960401759387648,...,,,0,False,"{'retweeted': False, 'text': "".@krollbondratin...","<a href=""http://twitter.com"" rel=""nofollow"">Tw...",RT @bpolitics: .@krollbondrating's Christopher...,1459294817758,False,"{'utc_offset': 3600, 'profile_image_url_https'..."
1,,,Tue Mar 29 23:40:17 +0000 2016,"{'hashtags': [{'text': 'cruzsexscandal', 'indi...","{'media': [{'sizes': {'large': {'w': 500, 'h':...",0,False,low,,714960401977319424,...,,,0,False,"{'retweeted': False, 'text': '@dmartosko Cruz ...","<a href=""http://twitter.com"" rel=""nofollow"">Tw...",RT @HeidiAlpine: @dmartosko Cruz video found.....,1459294817810,False,"{'utc_offset': None, 'profile_image_url_https'..."
2,,,Tue Mar 29 23:40:17 +0000 2016,"{'hashtags': [], 'user_mentions': [], 'symbols...",,0,False,low,,714960402426236928,...,,,0,False,,"<a href=""http://www.facebook.com/twitter"" rel=...",Njihuni me Zonjën Trump !!! | Ekskluzive https...,1459294817917,False,"{'utc_offset': 7200, 'profile_image_url_https'..."
3,,,Tue Mar 29 23:40:17 +0000 2016,"{'hashtags': [], 'user_mentions': [], 'symbols...",,0,False,low,,714960402367561730,...,7.149239e+17,7.149239e+17,0,False,,"<a href=""http://twitter.com/download/android"" ...",Your an idiot she shouldn't have tried to grab...,1459294817903,False,"{'utc_offset': None, 'profile_image_url_https'..."
4,,,Tue Mar 29 23:40:17 +0000 2016,"{'hashtags': [], 'user_mentions': [{'screen_na...",,0,False,low,,714960402149416960,...,,,0,False,"{'retweeted': False, 'text': 'The anti-America...","<a href=""http://twitter.com/download/iphone"" r...",RT @AlanLohner: The anti-American D.C. elites ...,1459294817851,False,"{'utc_offset': -18000, 'profile_image_url_http..."
5,,,Tue Mar 29 23:40:17 +0000 2016,"{'hashtags': [], 'user_mentions': [{'screen_na...",,0,False,low,,714960401759412224,...,,,0,False,"{'retweeted': False, 'lang': 'en', 'favorite_c...","<a href=""http://twitter.com/download/iphone"" r...",RT @BIackPplTweets: Young Donald trump meets h...,1459294817758,False,"{'utc_offset': None, 'profile_image_url_https'..."
6,,,Tue Mar 29 23:40:18 +0000 2016,"{'hashtags': [], 'user_mentions': [{'screen_na...",,0,False,low,,714960402791145472,...,,,0,False,"{'retweeted': False, 'lang': 'en', 'favorite_c...","<a href=""http://twitter.com"" rel=""nofollow"">Tw...",RT @trumpresearch: @WaitingInBagdad @thehill T...,1459294818004,False,"{'utc_offset': 10800, 'profile_image_url_https..."
7,,,Tue Mar 29 23:40:17 +0000 2016,"{'hashtags': [], 'user_mentions': [{'screen_na...",,0,False,low,,714960402346598400,...,,,0,False,"{'retweeted': False, 'lang': 'en', 'favorite_c...","<a href=""http://twitter.com/download/android"" ...","RT @HouseCracka: 29,000+ PEOPLE WATCHING TRUMP...",1459294817898,False,"{'utc_offset': None, 'profile_image_url_https'..."
8,,,Tue Mar 29 23:40:18 +0000 2016,"{'hashtags': [], 'user_mentions': [{'screen_na...","{'media': [{'sizes': {'large': {'w': 384, 'h':...",0,False,low,,714960402849927168,...,,,0,False,"{'retweeted': False, 'text': 'RT for Brendon U...","<a href=""http://twitter.com/download/iphone"" r...",RT @urfavandtrump: RT for Brendon Urie\nFav fo...,1459294818018,False,"{'utc_offset': None, 'profile_image_url_https'..."
9,,,Tue Mar 29 23:40:18 +0000 2016,"{'hashtags': [{'text': 'Trump', 'indices': [34...","{'media': [{'sizes': {'large': {'w': 573, 'h':...",0,False,low,,714960402853928960,...,,,0,False,"{'retweeted': False, 'text': 'This is how I se...","<a href=""http://twitter.com/download/iphone"" r...",RT @trapgrampa: This is how I see #Trump every...,1459294818019,False,"{'utc_offset': None, 'profile_image_url_https'..."


#### Processing large amounts of Twitter data
![image.png](attachment:image.png)

In [46]:
# Initialize an empty dictionary: counts_dict
counts_dict = {}

# Iterate over the file chunk by chunk
for chunk in pd.read_csv('./datasets/tweets.csv', chunksize=10):

    # Iterate over the column in DataFrame
    for entry in chunk['lang']:
        if entry in counts_dict.keys():
            counts_dict[entry] += 1
        else:
            counts_dict[entry] = 1

# Print the populated dictionary
print(counts_dict)


{'en': 97, 'et': 1, 'und': 2}


#### Extracting information for large amounts of Twitter data
![image.png](attachment:image.png)

In [48]:
# Define count_entries()
def count_entries(csv_file, c_size, colname):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    
    # Initialize an empty dictionary: counts_dict
    counts_dict = {}

    # Iterate over the file chunk by chunk
    for chunk in pd.read_csv(csv_file, chunksize=c_size):

        # Iterate over the column in DataFrame
        for entry in chunk[colname]:
            if entry in counts_dict.keys():
                counts_dict[entry] += 1
            else:
                counts_dict[entry] = 1

    # Return counts_dict
    return counts_dict

# Call count_entries(): result_counts
result_counts = count_entries('./datasets/tweets.csv', 10, 'lang')

# Print result_counts
print(result_counts)


{'en': 97, 'et': 1, 'und': 2}


## Congratulations!

#### What’s next?
- List comprehensions and generators
- List comprehensions:
- Create lists from other lists, DataFrame columns, etc.
    - Single line of code.
    - More efficient than using a loop.

# **`2- List comprehensions and generators`**

# **`2.1 List comprehensions`** 

#### Populate a list with a for loop

In [1]:
nums = [12, 8, 21, 3, 16]
new_nums = []
for num in nums:
    new_nums.append(num+1)
print(new_nums)

[13, 9, 22, 4, 17]


#### A list comprehension

In [2]:
nums = [12, 8, 21, 3, 16]
new_nums = [num+1 for num in nums]
print(new_nums)

[13, 9, 22, 4, 17]


#### For loop & List Comprehension comparsion
- `for loop`
    - Not efficient
    - Take time & long
    
- `list comprehension`
    - Efficient
    - Fast & time saving
    
- There are four characteristics of List comprehension
    - One line function
    - Without name function
    - Not used before 
    - Not used after

#### List comprehension with `range()`

In [4]:
result = [ num for num in range(10)]
print(result)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


#### List comprehensions
- Collapse for loops for building lists into a single line
- Components
    - Iterable
    - Iterator variable (represent members of iterable)
    - Output expression


#### Nested loops Example

In [8]:
paired_nums = []
for num1 in range(0,2):
    for num2 in range(6,8):
        paired_nums.append((num1, num2))
print(paired_nums)

[(0, 6), (0, 7), (1, 6), (1, 7)]


#####  QUESTION: How to do this with a `list comprehension`?

In [11]:
paired_nums =[(num1, num2) for num1 in range(0,2) for num2 in range(6,8)]
print(paired_nums)

[(0, 6), (0, 7), (1, 6), (1, 7)]


##### Tradeoff: readability (only way is keep practising)

# **`2.2 Advanced comprehensions`**

#### Conditionals in comprehensions

##### Conditionals on the iterable for even numbers

In [13]:
even_nums = [num for num in range(11) if num%2==0]
print(even_nums)

[0, 2, 4, 6, 8, 10]


Python documentation on the `%` operator: The `% (modulo)`
operator yields the remainder from the division of the 

In [14]:
5%2

1

In [15]:
6%2

0

#### complex Conditionals in comprehensions
- Conditionals on the `output expression`

In [18]:
even_nums_power2 = [num**2 if num%2==0 else 0 for num in range(11)]
print(even_nums_power2)

[0, 0, 4, 0, 16, 0, 36, 0, 64, 0, 100]


#### Dict comprehensions
- create dictionaries
- Use curly braces `{}` instead of brackets `[]`

In [21]:
pos_neg = {num:-num for num in range(8)}
print(pos_neg)

{0: 0, 1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7}


In [22]:
print(type(pos_neg))

<class 'dict'>


# **`2.3 Introduction to generator expressions`**

#### Generator expressions
- Recall list comprehension

In [23]:
[2*num for num in range(8)]

[0, 2, 4, 6, 8, 10, 12, 14]

- use `()` instead of `[]`

In [24]:
(2*num for num in range(8))

<generator object <genexpr> at 0x000001149EB3D510>

#### List comprehensions vs. generators
- List comprehension - `returns a list`
- Generators - `returns a generator object`
- Both can be iterated over


####  printing values from generators

In [26]:
result = (num for num in range(8))
for num in result:
    print(num)

0
1
2
3
4
5
6
7


In [27]:
print(list((2*num for num in range(8))))

[0, 2, 4, 6, 8, 10, 12, 14]


In [29]:
result = (num for num in range(6)) # Now will use next()

In [30]:
print(next(result)) # Lazy evaluation

0


In [31]:
print(next(result)) # Lazy evaluation

1


#### Generator vs. list comprehension

In [None]:
# Don't run becuase list coprehension doesn't support
# [num for num in range(10**100000)] 

In [32]:
# Generator can easily this the same that list comprehension not
(num for num in range(10**100000))

<generator object <genexpr> at 0x000001149EB3DF90>

#### Conditionals in generator expressions

In [33]:
even_nums = (num for num in range(10) if num%2==0)
print(even_nums)

<generator object <genexpr> at 0x000001149EB3D890>


In [34]:
print(list(even_nums))

[0, 2, 4, 6, 8]


#### Generator functions
- Produces generator objects when called
- Define like a regular function - `def`
- Yields a sequence of values instead of returning a single value
- Generates a value with `yield` keyword


##### Build a generator function

In [35]:
def num_sequence(n):
    """Generate values from 0 to n."""
    i=0
    while i<n:
        yield i
        i+=1

In [44]:
result = num_sequence(2)
print(result)

<generator object num_sequence at 0x000001149ECD4580>


In [45]:
for item in result:
    print(item)

0
1


In [41]:
print(list(num_sequence(4)))

[0, 1, 2, 3]


# **`2.4 Wrapping up comprehensions and generators.`**

#### Re-cap: list comprehensions
![image.png](attachment:image.png)