<div style="width: 38.5%;">
    <p><strong>City College of San Francisco</strong><p>
    <hr>
    <p>MATH 108 - Foundations of Data Science</p>
</div>

# Lecture 05: Sequences

Associated Textbook Sections: [5.0, 5.1, 5.2, 5.3](https://inferentialthinking.com/chapters/05/Sequences.html)

---

## Overview

* [Sequences](#Sequences)
* [Beyond the Python Library](#Beyond-the-Python-Library)
* [Arrays](#Arrays)
* [Indexing Sequences](#Indexing-Sequences)
* [NumPy](#NumPy)

## Set Up the Notebook

In [1]:
from datascience import *
import numpy as np

---

## Sequences

A sequenced data type represents an ordered collection

<img src="./sequence_blocks.png"  alt="A sequence visualized a collection of blocks" width = 40%>


---

### Built-In Sequence Data Types

There are several built-in sequence data types in Python such as:
*  Strings (`str`):
    * A text sequence of characters
    * `"data science"`
*  Lists (`list`):
    * A sequence of a mixture of data types
    *  `['a', 1, max]`
*  Ranges (`range`):
    * A sequence of numbers
    * `range(10)`

---

### Demo: Built-In Sequence Types

Create a string and notice the length of a string includes the blank space.

In [2]:
a_string = "data science" # SOLUTION
a_string

'data science'

In [3]:
len(a_string) # SOLUTION

12

---

Create a list, notice it contains a mixture of data types, and check it's length

In [4]:
a_list = ['a', 1, max] # SOLUTION
a_list

['a', 1, <function max>]

In [5]:
type(a_list) # SOLUTION

list

In [6]:
len(a_list) # SOLUTION

3

---

Create a range from 0 up to (but not including) 10.

In [7]:
a_range = range(10) # SOLUTION
a_range

range(0, 10)

In [8]:
type(a_range) # SOLUTION

range

In [9]:
len(a_range) # SOLUTION

10

---

Convert the range to a list to see the items.

In [10]:
list(a_range) # SOLUTION

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

---

## Beyond the Python Library

### Modules, Libraries, and Packages

* Modules, libraries, and packages are collections of Python code
    * Python has a built-in module called `math` that contains a collection of mathematical functions and values 
    * `datascience` is a package written by staff at UC Berkeley for this course
    * `Matplotlib` is a standard data visualization library
* The `import` command is a way to load modules, libraries, and packages into the coding environment
    * `from datascience import *` imports everything from the `datascience` package
    * `import numpy as np` imports the NumPy package and provides it with the common shorter name (alias) `np`

---

## Arrays

---

### Arrays

* An array (`numpy.ndarray`) is a sequenced type from the `NumPy` package
* All elements in an array have the same data type
* Math operations work on each element of the array separately
* When adding arrays, elements are added one by one (if they have the same length)
* They're better than lists for handling big datasets efficiently
* We'll frequently use arrays instead of lists in this course
* You can make an array with the `make_array` function from the `datascience` package

---

### Demo: Arrays

Create an array using `make_array`.

In [11]:
my_array = make_array(1, 2, 3, 4) # SOLUTION
my_array

array([1, 2, 3, 4])

In [12]:
type(my_array) # SOLUTION

numpy.ndarray

---

Explore using array arithmetic and notice that the operations create a new array.

In [13]:
my_array * 2 # SOLUTION

array([2, 4, 6, 8])

In [14]:
my_array ** 2 # SOLUTION

array([ 1,  4,  9, 16])

In [15]:
my_array + 1 # SOLUTION

array([2, 3, 4, 5])

In [16]:
# my_array is unchanged
my_array

array([1, 2, 3, 4])

---

Use some functions such as `len` and `sum` on an array

In [17]:
len(my_array) # SOLUTION

4

In [18]:
# Built-in sum
sum(my_array) # SOLUTION

10

In [19]:
# NumPy sum - optimized for arrays
np.sum(my_array) # SOLUTION

10

---

There are some rules for array arithmetic. For example, you can only add arrays of the same length.

In [20]:
another = make_array(60, 70, 80, 90) # SOLUTION

In [21]:
my_array + another # SOLUTION

array([61, 72, 83, 94])

In [22]:
yet_another = make_array(5, 6, 7) # SOLUTION

In [23]:
# A ValueError
# my_array + yet_another

In [24]:
sum(my_array) / len(my_array) # SOLUTION

2.5

---

You can make an array with non-numeric data types.

In [25]:
tunas_array = make_array('bluefin', 'albacore', 'jim') # SOLUTION
tunas_array

array(['bluefin', 'albacore', 'jim'],
      dtype='<U8')

---

## Indexing Sequences

---

### Indexes

<img src="./sequence_blocks_with_indices.png"  alt="A sequence visualized a collection of blocks with the index values included" width = 45%>

Sequence data types have a variety of ways to access each item in the sequence:
* A standard way: Use the item's position number in the sequence (index) with bracket `[...]` notation
    * Indices start with `0`
* Array way specific in MATH 108: NumPy arrays in our class can be indexed with the `.item()` method

---

### Demo: Indexing

Get the first character from a string.

In [26]:
another_string = "San Francisco"
another_string

'San Francisco'

In [27]:
another_string[0] # SOLUTION

'S'

---

Get the last character from a string.

In [28]:
len(another_string) # SOLUTION

13

In [29]:
# An IndexError
# another_string[13]

In [30]:
another_string[12] # SOLUTION

'o'

In [31]:
# Another way to access the last item in a sequence.
another_string[-1]

'o'

In [32]:
# And another way ...
num_items = len(another_string) # SOLUTION
another_string[num_items - 1] # SOLUTION

'o'

---

Access items in an array using .item and bracket notation.

In [33]:
an_array = make_array('eats', 'shoots', 'leaves')
an_array

array(['eats', 'shoots', 'leaves'],
      dtype='<U6')

In [34]:
an_array.item(0) # SOLUTION

'eats'

In [35]:
an_array[0] # SOLUTION

'eats'

In [36]:
an_array.item(-1) # SOLUTION

'leaves'

In [37]:
an_array[-1] # SOLUTION

'leaves'

---

Be careful! Check your data types because sometimes you don't get what you expect.

In [38]:
another_array = make_array(10, 20, 30)
another_array

array([10, 20, 30])

In [39]:
type(another_array.item(0)) # SOLUTION

int

In [40]:
type(another_array[0]) # SOLUTION

numpy.int64

---

## NumPy

---

### NumPy Functions

* NumPy is a Python library with a collection of tools optimized to work with arrays
* In our reference material, array tools commonly start with `np`
* One function `np.arange` is really helpful for generating arrays of numbers
    * General command: `np.arange(start, stop, step)`
    * the `start` and `step` values have a default of 0 and 1, respectively
    * `np.arange(5)` creates the array `array([0, 1, 2, 3, 4])`
    * `np.arange(1, 5)` creates the array `array([1, 2, 3, 4])`
    * `np.arange(1, 5, 2)` creates the array `array([1, 3])`
    * `np.arange` is not the same as `range`

---

### Demo: NumPy Functions

Generate arrays with `arange`.

In [41]:
np.arange(10) # SOLUTION

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [42]:
np.arange(1, 10) # SOLUTION

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [43]:
my_array = np.arange(1, 10, 2) # SOLUTION
my_array

array([1, 3, 5, 7, 9])

---

Demonstrate a few NumPy functions, attributes, and methods.

In [44]:
np.average(my_array) # SOLUTION

5.0

In [45]:
np.diff(my_array) # SOLUTION

array([2, 2, 2, 2])

In [46]:
np.cumsum(my_array) # SOLUTION

array([ 1,  4,  9, 16, 25])

In [47]:
# an array attribute
my_array.size # SOLUTION

5

In [48]:
# an array method
my_array.min() # SOLUTION

1

---

<footer>
    <p>Adopted from UC Berkeley DATA 8 course materials.</p>
    <p>This content is offered under a <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">CC Attribution Non-Commercial Share Alike</a> license.</p>
</footer>