In [3]:
#Importing necessary libraries for this notebook
import pandas as pd

# **Iterators vs. Iterables**

Recall from the class that an iterable is an object that can return an iterator, while an iterator is an object that keeps state and produces the next value when you call `next()` on it. In this exercise, you will identify which object is an iterable and which is an iterator.

The environment has been pre-loaded with the variables `flash1` and `flash2`. Try printing out their values with `print()` and `next()` to figure out which is an iterable and which is an iterator. Which one is an iterable and which one is iterator? Print either A, B, or C in the codecell below.

* A. Both `flash1` and `flash2` are iterators.
* B. Both `flash1` and `flash2` are iterables.
* C. `flash1` is an iterable and `flash2` is an iterator.

In [2]:
#Assingning values to flash1 and flash2
flash1 = ['jay garrick', 'barry allen', 'wally west', 'bart allen']
flash2 = iter(flash1)

#Print and discover which one is an iterable and which one is an iterator
print('iterable')
print('iterator')

#Finally print either A, B or C as an answer to the question above
print('C')

iterable
iterator
C


# **Iterating over iterables (1)**

Now, you're familiar with what iterables and iterators are! In this exercise, you will reinforce your knowledge about these by iterating over and printing from iterables and iterators.

You are provided with a list of strings `flash`. You will practice iterating over the list by using a for loop. You will also create an iterator for the list and access the values from the iterator.

**Instructions:**
* Create a for loop to loop over `flash` and print the values in the list. Use `person` as the loop variable.
* Create an iterator for the list `flash` and assign the result to `superhero`.
* Print each of the items from `superhero` using `next()` 4 times.


In [5]:
# Create a list of strings: flash
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']

# Print each list item in flash using a for loop
for person in flash:
    print(person)

# 2. Create an iterator for the list and assign it to superhero
superhero = iter(flash)

# 3. Use next() to access the items in the iterator
print(next(superhero))  # Output: "Barry Allen"
print(next(superhero))  # Output: "Wally West"
print(next(superhero))  # Output: "Jay Garrick"
print(next(superhero))



jay garrick
barry allen
wally west
bart allen
jay garrick
barry allen
wally west
bart allen


# **Iterating over iterables (2)**

One of the things you learned about in this chapter is that not all iterables are actual lists. A couple of examples that we looked at are strings and the use of the `range()` function. In this exercise, we will focus on the `range()` function.

You can use `range()` in a for loop as if it's a list to be iterated over:
```
for i in range(5):
    print(i)
```

Recall that `range()` doesn't actually create the list; instead, it creates a range object with an iterator that produces the values until it reaches the limit (in the example, until the value 4). If `range()` created the actual list, calling it with a value of $10^{100}$  may not work, especially since a number as big as that may go over a regular computer's memory. The value is actually what's called a Googol which is a 1 followed by a hundred 0s. That's a huge number!

Your task for this exercise is to show that calling `range()` with $10^{100}$  won't actually pre-create the list.

**Instructions:**
* Create an iterator object `small_value` over `range(3)` using the function `iter()`.
* Using a for loop, iterate over `range(3)`, printing the value for every iteration. Use `num` as the loop variable.
* Create an iterator object `googol` over `range(10 ** 100)`.


In [7]:
# Create an iterator for range(3): small_value
small_value = iter(range(3))

# Print the values in small_value
print(next(small_value))
print(next(small_value))
print(next(small_value))

# 2. Use a for loop to iterate over range(3) and print each value
for num in range(3):
    print(num)

# 3. Create an iterator object googol over range(10 ** 100)
googol = iter(range(10 ** 100))


# Print the first 5 values from googol
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))


0
1
2
0
1
2
0
1
2
3
4


# **Iterators as function arguments**

You've been using the `iter()` function to get an iterator object, as well as the `next()` function to retrieve the values one by one from the iterator object.

There are also functions that take iterators and iterables as arguments. For example, the `list()` and `sum()` functions return a list and the sum of elements, respectively.

In this exercise, you will use these functions by passing an iterable from `range()` and then printing the results of the function calls.

**Instructions:**
* Create a range object that would produce the values from `10` to `20` (inclusive) using `range()`. Assign the result to `values`.
* Use the `list()` function to create a list of values from the range object values. Assign the result to `values_list`.
* Use the `sum()` function to get the sum of the values from `10` to `20` from the range object values. Assign the result to `values_sum`.


In [8]:
# 1. Create a range object that produces values from 10 to 20 (inclusive)
values = range(10, 21)

# 2. Use the list() function to create a list of values from the range object
values_list = list(values)
print("List of values:", values_list)

# 3. Use the sum() function to get the sum of values from 10 to 20
values_sum = sum(values)
print("Sum of values:", values_sum)




List of values: [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
Sum of values: 165


# **Using enumerate**

You've just gained several new ideas on iterators from the last class and one of them is the `enumerate()` function. Recall that `enumerate()` returns an enumerate object that produces a sequence of tuples, and each of the tuples is an index-value pair.

In this exercise, you are given a list of strings `mutants` and you will practice using `enumerate() `on it by printing out a list of tuples and unpacking the tuples using a `for` loop.

**Instructions:**
* Create a list of tuples from `mutants` and assign the result to `mutant_list`. Make sure you generate the tuples using `enumerate()` and turn the result from it into a list using `list()` function.
* Complete the first `for` loop by unpacking the tuples generated by calling `enumerate()` on `mutants`. Use `index` for the index and `value` for the value when unpacking the tuple.
* Complete the second for loop similarly as with the first, but this time change the starting index to start from `100` by passing it in as an argument to the `start` parameter of `enumerate()`. Use `index_100` for the index and `value_100` for the value when unpacking the tuple.


In [9]:
# Create a list of strings: mutants
mutants = ['charles xavier',
            'bobby drake',
            'kurt wagner',
            'max eisenhardt',
            'kitty pryde']


# 1. Create a list of tuples from mutants using enumerate() and assign it to mutant_list
mutant_list = list(enumerate(mutants))
print( mutant_list)

# 2. Unpack the tuples in a for loop (default index starting from 0)

for index, value in enumerate(mutants):
    print(f"Index: {index}, Value: {value}")

# 3. Unpack the tuples with a starting index of 100

for index_100, value_100 in enumerate(mutants, start=100):
    print(f"Index: {index_100}, Value: {value_100}")



[(0, 'charles xavier'), (1, 'bobby drake'), (2, 'kurt wagner'), (3, 'max eisenhardt'), (4, 'kitty pryde')]
Index: 0, Value: charles xavier
Index: 1, Value: bobby drake
Index: 2, Value: kurt wagner
Index: 3, Value: max eisenhardt
Index: 4, Value: kitty pryde
Index: 100, Value: charles xavier
Index: 101, Value: bobby drake
Index: 102, Value: kurt wagner
Index: 103, Value: max eisenhardt
Index: 104, Value: kitty pryde


In [13]:
#Initializing the aliases and powers string lists
aliases = ['prof x', 'iceman', 'nightcrawler', 'magneto', 'shadowcat']
powers = ['telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility']


# 1. Create a list of tuples from mutants, aliases, and powers using zip() and list()
mutant_data = list(zip(aliases, powers))

print(mutant_data)

# 2. Create a zip object using zip()
mutant_zip = zip( aliases, powers)
print("\nZip object:", mutant_zip)

# 3. Unpack the zip object in a for loop and print the values
print("\nUnpacking the zip object:")
for value1, value2 in mutant_zip:
    print(f"Alias: {value1}, Power: {value2}")





[('prof x', 'telepathy'), ('iceman', 'thermokinesis'), ('nightcrawler', 'teleportation'), ('magneto', 'magnetokinesis'), ('shadowcat', 'intangibility')]

Zip object: <zip object at 0x7fd2a48095c0>

Unpacking the zip object:
Alias: prof x, Power: telepathy
Alias: iceman, Power: thermokinesis
Alias: nightcrawler, Power: teleportation
Alias: magneto, Power: magnetokinesis
Alias: shadowcat, Power: intangibility


# **Using zip**

Another interesting function that you've learned is `zip()`, which takes any number of iterables and returns a zip object that is an iterator of tuples. If you wanted to print the values of a zip object, you can convert it into a list and then print it. Printing just a zip object will not return the values unless you unpack it first. In this exercise, you will explore this for yourself.

Three lists of strings are loaded: `mutants`, `aliases`, and `powers`. First, you will use `list()` and `zip()` on these lists to generate a list of tuples. Then, you will create a zip object using `zip()`. Finally, you will unpack this zip object in a `for` loop to print the values in each tuple. Observe the different output generated by printing the list of tuples, then the zip object, and finally, the tuple values in the `for` loop.

Recall that `mutants` is assigned a value in the previous codecell and `aliases` and `powers` are initialized in the first few lines of code in the codecell below.

**Instructions:**
* Using `zip()` with `list()`, create a list of tuples from the three lists `mutants`, `aliases`, and `powers` (in that order) and assign the result to `mutant_data`.
* Using `zip()`, create a zip object called `mutant_zip` from the three lists `mutants`, `aliases`, and `powers`.
* Complete the `for` loop by unpacking the zip object you created and printing the tuple values. Use `value1`, `value2`, `value3` for the values from each of `mutants`, `aliases`, and `powers`, in that order.


# **Using `*` and `zip` to 'unzip' bold**

You know how to use `zip()` as well as how to print out values from a zip object. Great!

Let's play around with `zip()` a little more. There is no unzip function for doing the reverse of what `zip()` does. We can, however, reverse what has been zipped together by using `zip()` with a little help from `*! *` unpacks an iterable such as a list or a tuple into positional arguments in a function call.

In this exercise, you will use `*` in a call to `zip()` to unpack the tuples produced by `zip()`.

Two tuples of strings, `mutants` and `powers` have been loaded in the previous codecells.

**Instructions:**
* Create a zip object by using `zip()` on `mutants` and powers, in that order. Assign the result to `z1`.
* Print the tuples in `z1` by unpacking them into positional arguments using the `*` operator in a `print()` call.
* Because the previous `print()` call would have exhausted the elements in `z1`, recreate the zip object you defined earlier and assign the result again to `z1`.
* 'Unzip' the tuples in `z1` by unpacking them into positional arguments using the `*` operator in a `zip()` call. Assign the results to `result1` and `result2`, in that order.
* The last `print()` statements prints the output of comparing `result1` to `mutants` and `result2` to `powers`. Run your code to see if the unpacked `result1` and `result2` are equivalent to `mutants` and `powers`, respectively.


In [14]:


# Step 1: Create a zip object using zip() on mutants and powers
z1 = zip(mutants, powers)

# Step 2: Print the tuples in z1 by unpacking them using the * operator
print(*z1)

# Step 3: Recreate the zip object as z1 (since the previous print exhausted it)
z1 = zip(mutants, powers)

# Step 4: 'Unzip' the tuples in z1 into result1 and result2 using the * operator
result1, result2 = zip(*z1)

# Step 5: Print whether result1 is equal to mutants and result2 is equal to powers
print(result1 == tuple(mutants))  # True if result1 matches mutants
print(result2 == tuple(powers))



('charles xavier', 'telepathy') ('bobby drake', 'thermokinesis') ('kurt wagner', 'teleportation') ('max eisenhardt', 'magnetokinesis') ('kitty pryde', 'intangibility')
True
True


# **Processing large amounts of Twitter data**

Sometimes, the data we have to process reaches a size that is too much for a computer's memory to handle. This is a common problem faced by data scientists. A solution to this is to process an entire data source chunk by chunk, instead of a single go all at once.

In this exercise, you will do just that. You will process a large csv file of Twitter data in the same way that you processed `'tweets.csv'` working on it in chunks of `10` entries at a time.

The pandas package has been imported as pd and the file `'tweets.csv'` directory is assigned to `file_path` variable for your use.

**Instructions:**
* Initialize an empty dictionary `counts_dict` for storing the results of processing the Twitter data.
* Iterate over the `'tweets.csv'` file by using a for loop. Use the loop variable chunk and iterate over the call to `pd.read_csv()` with a `chunksize` of 10.
* In the inner loop, iterate over the column `'lang'` in chunk by using a `for` loop. Use the loop variable `entry`.


In [15]:
#Initialize file_path to the 'tweet.csv' link
file_path = 'https://github.com/DataAnalyst21/DatasetsForDataAnalytics/blob/main/tweets.csv?raw=True'

# Initialize an empty dictionary: counts_dict
counts_dict = {}

# Step 2: Iterate over the file in chunks of 10
for chunk in pd.read_csv(file_path, chunksize=10):
    # Step 3: Iterate over the 'lang' column in the current chunk
    for entry in chunk['lang']:
        # Update the counts_dict for each language
        if entry in counts_dict:
            counts_dict[entry] += 1
        else:
            counts_dict[entry] = 1

# Print the resulting counts_dict
print(counts_dict)



{'en': 97, 'et': 1, 'und': 2}


# **Extracting information for large amounts of Twitter data**

You now know how to deal with situations where you need to process a very large file and that's a very useful skill to have!

It's good to know how to process a file in smaller, more manageable chunks, but it can become very tedious having to write and rewrite the same code for the same task each time. In this exercise, you will be making your code more reusable by putting your work in the last exercise in a function definition.

The pandas package has been imported as pd and the `file_path` variable for 'tweets.csv' is initialized previously for your use.

**Instructions:**
* Define the function `count_entries()`, which has `3` parameters. The first parameter is `csv_file` for the filename, the second is `c_size` for the chunk size, and the last is `colname` for the column name.
* Iterate over the file in `csv_file` file by using a `for` loop. Use the loop variable `chunk` and iterate over the call to `pd.read_csv()`, passing `c_size` to `chunksize` parameter.
* In the inner loop, iterate over the column given by `colname` in `chunk` by using a `for` loop. Use the loop variable `entry`.
* Call the `count_entries()` function by passing to it the filename `file_path`, the size of chunks `10`, and the name of the column to count, `'lang'`. Assign the result of the call to the variable `result_counts`.


In [18]:
# Define count_entries()
import pandas as pd

# Define the function count_entries()
def count_entries(csv_file, c_size, colname):

    # Initialize an empty dictionary to store counts
    counts_dict = {}

    # Iterate over the file in chunks
    for chunk in pd.read_csv(csv_file, chunksize=c_size):
        # Iterate over the specified column in the chunk
        for entry in chunk[colname]:
            # Update the count for each entry in the dictionary
            if entry in counts_dict:
                counts_dict[entry] += 1
            else:
                counts_dict[entry] = 1

    return counts_dict

# Assign file path
file_path = 'https://github.com/DataAnalyst21/DatasetsForDataAnalytics/blob/main/tweets.csv?raw=True'

# Call the function with the specified parameters
result_counts = count_entries(file_path, 10, 'lang')

# Print the result
print(result_counts)




{'en': 97, 'et': 1, 'und': 2}
