# Iterables, Sequences, and Iterators
**Commonality:** All sequences are iterables, but not all iterables are sequences. All iterables can produce an iterator with the iter() function.

**Use Cases:** You'd use an iterable when you want to loop over items, a sequence when you need ordered and index-based access, and an iterator when you want to lazily pull items one-by-one from a collection.

**Flexibility:** Iterators are more flexible in terms of data production (e.g. infinite sequences) due to their lazy evaluation.

**Memory Efficiency:** Iterators are more memory-efficient for large data streams or when generating large sequences on-the-fly.

## Iterables
An **iterable** is any Python object capable of returning its members one at a time, permitting it to be iterated over in a for-loop. Familiar examples of iterables include lists, tuples, and strings - any such sequence can be iterated over in a for-loop. The most common examples are strings, lists, tuples, sets, dictionaries, file connections, and generators.

## Sequences
A **sequence** is a type of iterable that has a strict left-to-right order among its items. This means that we can access any element in a sequence using its index. The most common examples of sequences are lists, tuples, and strings, and ranges.

## Iterators

An **iterator** is an object that represents a stream of data. This is different from a sequence, which is an ordered set of values that can be indexed. An iterator is an object that can be iterated upon, meaning that you can traverse through all the values. Technically, in Python, an iterator is an object which implements the iterator protocol, which consist of the methods `__iter__()` and `__next__()`. Lists, tuples, dictionaries, and sets are all iterable objects. They are iterable containers which you can get an iterator from. All these objects have a `iter()` method which is used to get an iterator. Some examples of iterators are lists, tuples, dictionaries, and sets because you can traverse through all the values.

### A. Strings
Strings are iterables, therefore we can get one or some of its items, and can be iterated over in a for-loop. Each iteration returns a single character from the string.

In [1]:
name = "Data Science"

#### A.1 Index Operator
The index operator can be used to get a single character from a string. The index operator is the square brackets `[]` with the index of the character you want to get inside the brackets. The index of the first character is 0, the second character is 1, and so on.

In [2]:
print(f"The name is {name}")
print(f"The item with index 0 is {name[0]}")
# Data Science
# 0         1
# 0123456789012
# -13       -1
print(f"The item with index 8 is {name[8]}")
print(f"The number of items in the string is {len(name)}")
print(f"The last item in the string is {name[len(name)-1]}")
print(f"The last item in the string is {name[-1]}")

The name is Data Science
The item with index 0 is D
The item with index 8 is e
The number of items in the string is 12
The last item in the string is e
The last item in the string is e


#### A.2 Slicing Operator
The slicing operator can be used to get a substring from a string. The slicing operator is the square brackets `[]` with the start index, colon `:`, and the end index inside the brackets. The start index is inclusive, and the end index is exclusive.

In [3]:
print(f"The first 3 items in the string are {name[0:3]}")
print(f"The first 3 items in the string are {name[:3]}")
print(f"The items starting at index 4 are {name[4:]}")
print(f"The last 3 items in the string are {name[-3:]}")
print(f"The copy of the string is {name[:]}")

The first 3 items in the string are Dat
The first 3 items in the string are Dat
The items starting at index 4 are  Science
The last 3 items in the string are nce
The copy of the string is Data Science


The slicing operator can also be used to get a substring from a string with a step. The slicing operator is the square brackets `[]` with the start index, colon `:`, the end index, colon `:`, and the step inside the brackets. The start index is inclusive, the end index is exclusive, and the step is the number of characters to skip.

In [4]:
print(f"The items starting at index 5, skipping every other item are {name[5::2]}")
print(f"The items starting from the last item, skipping every other item are {name[::-2]}")

The items starting at index 5, skipping every other item are Sine
The items starting from the last item, skipping every other item are eniSaa


#### A.3 Iteration
A string can be iterated over in a for-loop. Each iteration returns a single character from the string. The iteration variable is a single character. The iteration variable can be named anything, but it is common to name it `char`, `ch` or `c`. The iteration variable is a copy of the character, not a reference to the character. Therefore, changing the iteration variable does not change the character in the string. The iteration variable is a string, not a character. Therefore, the iteration variable can be used in string operations.

In [5]:
for c in name:
    print(c, end=" ")
print()
for c in name[::-1]:
    print(c, end=" ")
print()
for c in name[::2]:
    print(c, end=" ")
print()

D a t a   S c i e n c e 
e c n e i c S   a t a D 
D t   c e c 


#### A.4 Membership Operator
The membership operator can be used to check if a character is in a string. The membership operator is the keyword `in` with the character you want to check for membership, and the string you want to check for the character inside the parentheses. The membership operator returns `True` if the character is in the string, and `False` if the character is not in the string.

In [6]:
if "Data" in name:
    print("Data is in the string")
else:
    print("Data is not in the string")

if "data" in name:
    print("data is in the string")
else:
    print("data is not in the string")

if "Data" not in name:
    print("Data is not in the string")
else:
    print("Data is in the string")

Data is in the string
data is not in the string
Data is in the string


#### A.5 Length Function
The length function can be used to get the number of characters in a string. The length function is the keyword `len` with the string inside the parentheses. The length function returns the number of characters in the string.

In [7]:
print(f"The number of items in the string is {len(name)}")

The number of items in the string is 12


#### A.6 Concatenation Operator
The concatenation operator can be used to concatenate two strings. The concatenation operator is the plus sign `+` with the first string, plus sign `+`, and the second string. The concatenation operator returns a new string with the first string followed by the second string.

In [8]:
name_and_year = name + " 2023"
print(name_and_year)

Data Science 2023


#### A.7 Repetition Operator
The repetition operator can be used to repeat a string a number of times. The repetition operator is the asterisk `*` with the string, asterisk `*`, and the number of times to repeat the string. The repetition operator returns a new string with the string repeated the number of times.

In [9]:
print(name * 3)

Data ScienceData ScienceData Science


#### A.8 String Formatting
String formatting can be used to format a string with values. String formatting is the string with curly braces `{}` where you want to insert the values. The string formatting operator is the percent sign `%` with the string, percent sign `%`, and the values inside the parentheses. The string formatting operator returns a new string with the values inserted into the string.

In [10]:
# formatting alternatives
print("The name is {}".format(name))
print("The name is %s" % name)
print("The name is %s and the number is %d" % (name, 3))
print(f"The name is {name} and the number is {3}")

The name is Data Science
The name is Data Science
The name is Data Science and the number is 3
The name is Data Science and the number is 3


#### A.9 String Methods
String methods can be used to manipulate a string. String methods are the string, period `.`, method name, and the arguments inside the parentheses. String methods return a new string with the string manipulated.

In [11]:
# methods
print(name.upper())
print(name.lower())
print(name.title())
print(name.capitalize())
print(name.swapcase())
print(name.replace("Data", "Big"))
print(name.count("a"))
print(name.find("a"))
print(name.find("a", 3))
print(name.find("a", 3, 5))
print(name.find("x"))
print(name.index("a"))
print(name.index("a", 3))
print(name.index("a", 3, 5))
# print(name.index("x")) # ValueError
print(name.startswith("Data"))
print(name.startswith("data"))
print(name.endswith("Science"))
print(name.endswith("science"))
print(name.isalpha())
print(name.isalnum())
print(name.isnumeric())
print(name.isdigit())
print(name.isdecimal())
print(name.isspace())
print(name.islower())
print(name.isupper())
print(name.istitle())
print(name.isidentifier())
print(name.isprintable())
print(name.isascii())

DATA SCIENCE
data science
Data Science
Data science
dATA sCIENCE
Big Science
2
1
3
3
-1
1
3
3
True
False
True
False
False
False
False
False
False
False
False
False
True
False
True
True


Python's primary built-in sequences—List, Tuple, Set, and Dictionary—in terms of their methods and demonstrate each with code examples.

| Methods / Sequence | **List** | **Tuple** | **Set** | **Dictionary** |
|:-------------------|:-------:|:--------:|:------:|:--------------:|
| **Add/Append**     | `append()` | N/A | `add()` | N/A (`dict[key] = value`) |
| **Remove**         | `remove()`, `pop()` | N/A | `remove()`, `discard()`, `pop()` | `pop()`, `popitem()`, `del dict[key]` |
| **Find/Index**     | `index()` | `index()` | N/A (can use `in` to check membership) | `get()`, `dict[key]`, `in` (for keys) |
| **Count**          | `count()` | `count()` | N/A (sets have unique elements) | N/A (use `len()` for total key-value pairs) |
| **Sort/Order**     | `sort()`, `sorted()` | `sorted()` | N/A (sets are unordered, but `sorted()` can be used) | N/A (use `sorted()` on keys or values) |
| **Length**         | `len()` | `len()` | `len()` | `len()` |
| **Clear**          | `clear()` | N/A | `clear()` | `clear()` |
| **Iterate**        | `for item in list:` | `for item in tuple:` | `for item in set:` | `for key, value in dict.items():` |
| **Extend/Update**  | `extend()`, `+` | `+` | `update()`, `|` | `update()`, `|` |
| **Copy**           | `copy()`, `[:]` | N/A (tuples are immutable) | `copy()`, `[:]` | `copy()`, `[:]` |
| **Other Notables**   | `insert()`, `reverse()` | - | `difference()`, `intersection()`, `union()` | `keys()`, `values()`, `items()` |



#### Iterator Example

In [12]:
def iterator():
    for i in range(10):
        yield i

In [13]:
dir(iterator())

['__class__',
 '__del__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__name__',
 '__ne__',
 '__new__',
 '__next__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'gi_suspended',
 'gi_yieldfrom',
 'send',
 'throw']

In [14]:
for i in iterator():
    print(i, end=" ")

0 1 2 3 4 5 6 7 8 9 

# NumPy for Numerical Analysis
Numeric Python (numpy.org)

In [15]:
# pip install numpy
# pip3 install numpy
# python -m pip install numpy
# python3 -m pip install numpy
# py -m pip install numpy
# !pip install numpy # if you really have to
import numpy as np

In [16]:
dir(np)

['ALLOW_THREADS',
 'BUFSIZE',
 'CLIP',
 'DataSource',
 'ERR_CALL',
 'ERR_DEFAULT',
 'ERR_IGNORE',
 'ERR_LOG',
 'ERR_PRINT',
 'ERR_RAISE',
 'ERR_WARN',
 'FLOATING_POINT_SUPPORT',
 'FPE_DIVIDEBYZERO',
 'FPE_INVALID',
 'FPE_OVERFLOW',
 'FPE_UNDERFLOW',
 'False_',
 'Inf',
 'Infinity',
 'MAXDIMS',
 'MAY_SHARE_BOUNDS',
 'MAY_SHARE_EXACT',
 'NAN',
 'NINF',
 'NZERO',
 'NaN',
 'PINF',
 'PZERO',
 'RAISE',
 'SHIFT_DIVIDEBYZERO',
 'SHIFT_INVALID',
 'SHIFT_OVERFLOW',
 'SHIFT_UNDERFLOW',
 'ScalarType',
 'True_',
 'UFUNC_BUFSIZE_DEFAULT',
 'UFUNC_PYVALS_NAME',
 'WRAP',
 '_CopyMode',
 '_NoValue',
 '_UFUNC_API',
 '__NUMPY_SETUP__',
 '__all__',
 '__builtins__',
 '__cached__',
 '__config__',
 '__deprecated_attrs__',
 '__dir__',
 '__doc__',
 '__expired_functions__',
 '__file__',
 '__former_attrs__',
 '__future_scalars__',
 '__getattr__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_add_newdoc_ufunc',
 '_builtins',
 '_distributor_init',
 '_financial_names',
 '_ge

We find a similar function to range called arange

In [17]:
help(np.arange)

Help on built-in function arange in module numpy:

arange(...)
    arange([start,] stop[, step,], dtype=None, *, like=None)

    Return evenly spaced values within a given interval.

    ``arange`` can be called with a varying number of positional arguments:

    * ``arange(stop)``: Values are generated within the half-open interval
      ``[0, stop)`` (in other words, the interval including `start` but
      excluding `stop`).
    * ``arange(start, stop)``: Values are generated within the half-open
      interval ``[start, stop)``.
    * ``arange(start, stop, step)`` Values are generated within the half-open
      interval ``[start, stop)``, with spacing between values given by
      ``step``.

    For integer arguments the function is roughly equivalent to the Python
    built-in :py:class:`range`, but returns an ndarray rather than a ``range``
    instance.

    When using a non-integer step, such as 0.1, it is often better to use
    `numpy.linspace`.


    Parameters
    ---------

In [18]:
an_array = np.arange(10)
print(an_array)
print(type(an_array))

[0 1 2 3 4 5 6 7 8 9]
<class 'numpy.ndarray'>


In [19]:
python_list = [i for i in range(10000)]
numpy_array = np.arange(10000)

In [20]:
sum(python_list)

49995000

In [21]:
np.sum(numpy_array)

49995000

The key feature of NumPy is the performance benefit coming with the arrays compared to standard Python lists. We can measure the performance difference by using timeit module.

In [22]:
import timeit
dir(timeit)

['Timer',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_globals',
 'default_number',
 'default_repeat',
 'default_timer',
 'dummy_src_name',
 'gc',
 'itertools',
 'main',
 'reindent',
 'repeat',
 'sys',
 'template',
 'time',
 'timeit']

Compare standard python with NumPy

In [23]:
help(timeit.timeit)

Help on function timeit in module timeit:

timeit(stmt='pass', setup='pass', timer=<built-in function perf_counter>, number=1000000, globals=None)
    Convenience function to create Timer object and call timeit method.



In [24]:
# standard Python
standard_statement = \
'''
s = 0
for i in range(10000):
    s += i
'''
# standard Python with list comprehension
standard_plus_statement = \
'''
n = [i for i in range(10000)]
s = sum(n)
'''
# Numpy array
numpy_statement = \
'''
import numpy as np
n = np.arange(10000)
s = np.sum(n)
'''

In [25]:
timeit.timeit(standard_statement, number=1000)

0.2546855840009812

In [26]:
timeit.timeit(standard_plus_statement, number=1000)

0.1598703330000717

In [27]:
timeit.timeit(numpy_statement, number=1000)

0.005080958999315044

Where does this performance improvement come from? It is something about the way they use the memory.

In [28]:
n_standard = []
print(f"{hex(id(n_standard))} << Initial Memory Location")
for i in range(5):
    n_standard.append(i)
    print(f"{hex(id(n_standard))}", end=':')
    print(f"{[hex(id(x)) for x in n_standard]}")

0x108d7d780 << Initial Memory Location
0x108d7d780:['0x101450c68']
0x108d7d780:['0x101450c68', '0x101450c88']
0x108d7d780:['0x101450c68', '0x101450c88', '0x101450ca8']
0x108d7d780:['0x101450c68', '0x101450c88', '0x101450ca8', '0x101450cc8']
0x108d7d780:['0x101450c68', '0x101450c88', '0x101450ca8', '0x101450cc8', '0x101450ce8']


In [29]:
n = n_numpy = np.array([])
print(f"{hex(id(n_numpy))} << Initial Memory Location")
for i in range(5):
    n_numpy = np.append(n_numpy, np.array(i))
    print(f"{hex(id(n_numpy))}", end=':')
    print(f"{[hex(id(x)) for x in n_numpy]}")

0x108da4750 << Initial Memory Location
0x108da4990:['0x108df1730']
0x108da4810:['0x108df1850', '0x108df09f0']
0x108da4210:['0x108df1730', '0x108df1850', '0x108df1730']
0x108da4990:['0x108df09f0', '0x108df1850', '0x108df09f0', '0x108df1850']
0x108da4810:['0x108df1730', '0x108df09f0', '0x108df1730', '0x108df09f0', '0x108df1730']


In [30]:
# if you want to go deeper, you can check a memory address by using the following code
#import ctypes
#g = (ctypes.c_char*40000).from_address(0x1103bbdc0)
#print(g.value.decode("utf-8", errors='ignore'))

## What makes numpy faster or slower?
Numpy arrays are stored at one continuous place in memory, so processes can access and manipulate them very efficiently.

This behavior is called locality of reference in computer science.

This is the main reason why NumPy is SOMETIMES faster than lists.

In [31]:
standard_append = \
'''
n = []
for i in range(10000):
    n.append(i)
'''
numpy_append = \
'''
import numpy as np
n = np.array([])
for i in range(10000):
    n = np.append(n, np.array(i))
'''

In [32]:
timeit.timeit(standard_append, number=100)

0.017140833000667044

In [33]:
timeit.timeit(numpy_append, number=100)

2.62140620900027