# Lecture 5



In [None]:
from datascience import *
import numpy as np

---

# Arrays

Arrays are ordered "lists" of elements that can be directly accessed by location.

### Making Arrays

**Exercise:** Make an array of 4 elements:

In [None]:
my_array = make_array(1, 2, 3, 4)
my_array

<details><summary>Solution</summary>
   
```python
my_array = make_array(1, 2, 3, 4)
my_array
```
    
</details></br></br>

**Exercise:** Arrays can be any type. Make an array of `Strings` called `string_array`:

In [None]:
string_array = make_array("cat", "dog", "bird")
string_array

<details><summary>Solution</summary>
   
```python
string_array = make_array("cat", "dog", "bird")
string_array
```
    
</details></br></br>


**Exercise:** Mixing types (Strings, Numbers, Booleans).  Make an array of multiple types:

<details>
    
You can do this but it will find a type that supports everything (e.g., string).

</details>

In [None]:
weird_array = make_array("cat", 3, True)
weird_array

<details><summary>Solution</summary>
   
```python
weird_array = make_array("cat", 3, True)
weird_array
```
    
</details></br></br>

### Accessing Elements

For this exercise lets start with this array of strings.

In [None]:
string_array = make_array("cat", "dog", "bird")
string_array

You can use `array_name.item( NUMBER )` to get an element from an array.

**Exercise:** What will the following expression return?

In [None]:
string_array.item(1)

**Bonus!** This is called **array indexing**.  There is a shorter "equivalent" syntax that people will often use. However, for this class you only need to know about `.item()` but you may use whatever you prefer.

In [None]:
string_array[1]

**Exercise:** Use the `len` function to determine the length of the string array.

In [None]:
len(string_array)

<details><summary>Solution</summary>
   
```python
len(string_array)
```
    
</details></br></br>

Arrays also have a **member variable** `array_name.size` that contains the size of the array.  

**Exercise:** Use the size **member variable** to check the size of the array:

In [None]:
string_array.size

<details><summary>Solution</summary>
   
```python
string_array.size
```
    
</details></br></br>

### Aggregation Operations

You will often need to compute summaries of an array like the `sum`, `max`, or the `min`.  These are all **member functions** of an array.  Here is the documentation on all the **[member functions](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)** for arrays.

In [None]:
cool_numbers = make_array(0, 1, 42, np.pi, np.e)
cool_numbers

**Exercise:** Use the `sum`, `min`, `mean`, and `max` operations to summarize the cool numbers array.

In [None]:
print("sum", cool_numbers.sum())
print("min", cool_numbers.min())
print("mean", cool_numbers.mean())
print("max", cool_numbers.max())

<details><summary>Solution</summary>
   
```python
print("sum", cool_numbers.sum())
print("min", cool_numbers.min())
print("mean", cool_numbers.mean())
print("max", cool_numbers.max())
```
    
</details></br></br>

You can also use numpys built-in library of math functions on arrays.  Here we compute the `mean` and the `log`:

In [None]:
print("np.average", np.average(my_array))
print("np.mean", np.mean(my_array))
print("np.log", np.log(my_array))

### Doing math with arrays

You can do mathematical operations on arrays:

In [None]:
a = make_array(1, 2, 3, 4)
b = make_array(10, 20, 30, 40)
print("The a array:", a)
print("The b array:", b)

**Exercise:** Add and multiply the arrays: 

In [None]:
a + b

In [None]:
a * b

<details><summary>Solution</summary>
   
```python
print("Adding Arrays", a + b)
print("Multiplying Arrays", a * b)
```
    
</details></br></br>

You can also add and multiply scalars

In [None]:
a * 3.

In [None]:
3 + b

### Common Bugs

**Exercise:** What happens if we run the following:

```python
bigger_array = make_array(1,2,3,4,5)
a * bigger_array
```

In [None]:
# bigger_array = make_array(1,2,3,4,5)
# a * bigger_array

**Exercise:** What happens if I run the following:

```python
uhoh = make_array(0,1,2,3)
a / uhoh
```

In [None]:
# uhoh = make_array(0,1,2,3)
# a / uhoh

**Exercise:** What happens if I run the following:

```python
a.item(4)
```

In [None]:
# a.item(4)

**Exercise:** What happens if I run the following:

```python
a.item(-1)
```

In [None]:
a.item(-1)

Negative indexing is a *common trick* to access the end of an array. 

</br></br>

---

## Ranges

We use ranges to make arrays of number sequence easily.  The numpy `np.arange(start, stop, step)` function produce an array starting at `start` and ending *before* `stop`, in increments of `step`.

**Exercise:** Make an array of the nubmers 0 through 6:

In [None]:
make_array(0, 1, 2, 3, 4, 5, 6)

In [None]:
np.arange(0, 7, 1)

In [None]:
np.arange(0, 7)

In [None]:
np.arange(7)

**Exercise:** What will the following produce:

In [None]:
np.arange(40, -1, -5) 

</br></br></br></br>

---

## Columns of Tables are Arrays

We are covering arrays partly because this is the mathematical object that is returned when we work on specific columns of a table. Here we load a table of NBA salaries from a local file `nba_salaries.csv` 

In [None]:
nba = Table.read_table('nba_salaries.csv')
nba

Let's focus on the **Golden State Warriors**.

**Exercise:** Use the `my_table.where` function to select the rows where team is the `"Golden State Warriors"`.

In [None]:
warriors = nba.where("team", "Golden State Warriors")
warriors

<details><summary>Solution</summary>
   
```python
warriors = nba.where("team", "Golden State Warriors")
warriors
```
    
</details></br></br>

We can also select columns by name. 

**Exercise**: Make a table with just the `"name"` and `"salary"` of the warriors.


In [None]:
warriors.select("name", "salary")

<details><summary>Solution</summary>
   
```python

warriors.select("name", "salary")
```
    
</details></br></br>

**Exercise:** Compute the average average salary of the warriors.  Would the following work:

*Option (A):*
```python
warriors.mean()
```

*Option (B):*
```python
warriors.select("salary").mean()
```

*Option (C):*
```python
warriors.column("salary").mean()
```



In [None]:
warriors.column("salary").mean()

**Exercise:** Would the following work?

```python
np.average(warriors.select("salary"))
```

In [None]:
# np.average(warriors.select("salary"))

What is going on?

In [None]:
type(warriors.select("salary"))

In [None]:
type(warriors.column("salary"))

**Exercise:** Use `np.average` to compute the average salary of the Warriors:

In [None]:
np.average(warriors.column("salary"))

<details><summary>Solution</summary>
   
```python
np.average(warriors.column("salary"))
```
    
</details></br></br>

**Exercise:** Compute the difference in the average salaries of the warriors and the `"Los Angeles Lakers"`.

In [None]:
lakers = nba.where('team', 'Los Angeles Lakers')
warriors.column('salary').mean() - lakers.column('salary').mean()

---

</br></br>
# Ways to Create a Table 

There are many ways to create a table.  Often we will load a table from a file but we can also build a table from arrays.


## Creating a Table from Arrays

Let's start with an array of street names.

In [None]:
streets = make_array('Bancroft', 'Durant', 'Channing', 'Haste')
streets

We can make an empty table (no rows, no columns, no problems ...).

The `Table()` function makes an empty table.

In [None]:
empty_table = Table()
empty_table

**Exercise:** Check that the empty table has 0 rows and 0 columns

In [None]:
print("Rows:", empty_table.num_rows)
print("Cols:", empty_table.num_columns)

<details><summary>Solution</summary>
   
```python
print("Rows:", empty_table.num_rows)
print("Cols:", empty_table.num_columns)
```
    
</details></br></br>

**Exercise:** Use the `table.with_column` function to add a column to the table and save the new table as `southside`.

In [None]:
southside = empty_table.with_column("Streets", streets)
southside

<details><summary>Solution</summary>
   
```python
southside = empty_table.with_column("Streets", streets)
southside
```
    
</details></br></br>

**Exercise:** Can you do the same thing without using `empty_table`?

In [None]:
southside = Table().with_column("Streets", streets)
southside

**Exercise:** What is the output of:

In [None]:
empty_table.with_column("Streets", streets)
print("Number of Columns", empty_table.num_columns)

**Exercise:** Extend the southside table to include the blocks from campus. ([map](https://goo.gl/maps/7QcNgpRC52NHbM6o7))

In [None]:
southside = southside.with_column('Blocks from campus', np.arange(4))
southside

**Exercise:** Build the entire table with blocks from campus in one call to the `table.with_columns` function.

In [None]:
Table().with_columns(
    'Streets', streets,
    'Blocks from campus', np.arange(4)
)

## Loading 

# Case Study: Du Bois Visualization

In [None]:
du_bois = Table.read_table('du_bois.csv')
du_bois

In [None]:
du_bois.column('ACTUAL AVERAGE')

In [None]:
du_bois.column('FOOD')

In [None]:
du_bois.column('ACTUAL AVERAGE') * du_bois.column('FOOD')

In [None]:
food_dollars = du_bois.column('ACTUAL AVERAGE') * du_bois.column('FOOD')
du_bois.with_columns('Food $', food_dollars)

In [None]:
du_bois.select('CLASS', 'ACTUAL AVERAGE', 'FOOD', 'Food $')

In [None]:
food_dollars = du_bois.column('ACTUAL AVERAGE') * du_bois.column('FOOD')

du_bois = du_bois.with_columns('Food $', food_dollars)

du_bois

In [None]:
du_bois.select('CLASS', 'ACTUAL AVERAGE', 'FOOD', 'Food $')

In [None]:
du_bois.labels

In [None]:
du_bois.num_rows

In [None]:
du_bois.num_columns