In [None]:
# run this cell
import numpy as np
from datascience import *

# Lecture 7: Arrays and Tables

Check out the [Data 6 Python Reference Sheet](https://data6.org/su22/reference/)  to see documentation for the Array and Table methods we will cover in this lecture.

## Part 1: Arrays


Arrays are a data type that allow us to store sets or sequences of data.



In [None]:
# an example - we can make an array using the make_array function
my_array = make_array(1,2,3)
my_array

In [None]:
# arrays can contain numbers, texts or booleans
text_arr = make_array("g", "o", " ", "b", "e", "a", "r", "s", "!")
bool_arr = make_array(True, False, False, True)

In [None]:
text_arr

In [None]:
bool_arr

A special function that allows us to create numerical sequence arrays is the `np.arange` function. This function is part of the numpy module (hence the `np` prefix) and takes in the following parameters (in order):
- Start: (INCLUSIVE) the numerical value at which you want your sequence to start (default - 0)
- Stop: (EXCLUSIVE) the numerical value right before which your sequence will end
- Step: the increment between successive values
Output: An array of numbers that form a sequence matching the above criteria. (default - 1)

In [None]:
# here are some examples of the arange call
zero_to_ten_3 = np.arange(0, 11, 1)
zero_to_ten_3
# try change the code above so that zero_to_ten_3 instead gives us the first ten even numbers 2 - 20

In [None]:
# If your desired step is 1, you do not need to specify the step
zero_to_ten_2 = np.arange(0, 11)
zero_to_ten_2
# try change the code above so that zero_to_ten_2 instead gives us the first 20 natural numbers (1-20)

In [None]:
# If your desired start is 0, you do not need to specify a start
zero_to_ten = np.arange(11)
zero_to_ten
# try change the code above so that zero_to_ten instead gives us the first 30 whole numbers (0-29)

Arrays must always contain elements of the same type. Take a look at the code below. Without running it, would the array work as is (hint: look at the data types of elements)? Discuss with your neighbour. Now run the cell and examine the output. How is it different from what you input?

In [None]:
new_array = make_array(1, "a", True)
new_array

**Array methods**

There is a lot of cool stuff you can do with arrays! Here are some methods that you might find handy:

`len(array)` - length of an array

`array.item(x)` - extract the (x+1)th item in an array (Indices in Python start at 0)

`min(array)` - smallest item in an array

`max(array)` - largest item in an array

`sum(array)` - sum of all elements added together

In [None]:
example_array = np.arange(2, 100, 20)
len(example_array)

In [None]:
example_array.item(0)
# change the above code to instead get the last element of the array

In [None]:
min(example_array)

In [None]:
max(example_array)

In [None]:
sum(example_array)

In [None]:
# Why does this return 3?
sum(make_array(True, False, False, True, True))

**Array Arithmetic**

Array arithmetic can be branched into two broad categories:

- array operated with another array: in this case, operation may only occur if the two arrays have the same lengths and if the elements of each are of the same data type. operation is then carried out element-wise resulting in an output array with the same length as each input array.

- array operated with a scalar (constant): operation applied to each element of the array itemwise.

In [None]:
# array with antoher array
arr1 = make_array(1, 2, 3, 4)
arr2 = make_array(10, 20, 30, 40)
arr1 + arr2
# try to see if there is any mathematical operation you can perform between these two arrays that does not work

In [None]:
# will this work? discuss with your neighbour before running
arr3 = make_array("a", "b", "c", "d")
arr1 + arr3

In [None]:
# will this work? discuss with your neighbour before running
arr4 = make_array(100, 200, 300)
arr2 + arr4

In [None]:
# array operated with a constant
arr1 + 10
# try see if any mathematical operation between the array and constant 10 does not work as described above
# what would happen if arr1 is an array of strings? What operation(s) would work then?

# Tables

Another useful way to store data is with tables! The following cell shows how you can create your own Table.

When making a Table


*   For each column you want to create, you must pass in a column name and an array of data
*   All arrays must be of the same length





In [None]:
sunya_table = Table().with_columns("ice cream flavor", make_array("chocolate", "vanilla", "stawberry", "mint"),
                    "Ranking", make_array(1, 2, 4, 3))
sunya_table

### Practice Question:
In the following cell, create a table **my_table** with your own ice cream flavor ranking. You may add/change any flavors, add columns, and change column names.

In [None]:
my_table = ...

### tbl.sort(column_name_or_index) method
This method can be used to sort a table by a particular column. For instance, the following cell sorts the sunya_table by the "Ranking" column.

In [None]:
sunya_table.sort("Ranking")

Now the table is in ascending "Ranking" order. We could also sort in descending order by passing in a second optional argument to the .sort method. See below:

In [None]:
sunya_table.sort("Ranking", descending = True)

### .sort Practice
In the following cell, see what happens when you sort the **my_table** you created earlier by the first column.

In [None]:
#sort my_table by first column
...

## Working with Larger Tables
While it is useful to create our own data tables, we can also analyze data that we import from outside sources.

Run the cell below to load the data from our spotify playlist.

In [None]:
#might not load on colab
playlist = Table().read_table('playlist.csv')
playlist

## Some Useful Table Methods

In the following methods, make sure you replace `tbl` with the name of your table.

`tbl.column(column_name_or_index)` -> Returns the values of a column in an array

`tbl.select(col1, col2, ...)` ->  Create a copy of a table with only some of the columns. Each column is the column name or index.

`tbl.num_rows` -> Returns the number of rows in the Table

`tbl.num_columns` -> Returns the number of columns in the Table

`tbl.with_column` -> Can be used to concatenate one new column to a table

In [None]:
playlist.num_rows

In [None]:
playlist.num_columns

After running the following two cells, discuss with a partner what the difference is between the .column and .select methods.

In [None]:
playlist.column("track name")

In [None]:
playlist.select("track name")

### Practice Problem
In the cell below, write one line of code which will output the first element in the "artist name" column of our **playlist** table.

Hint: Remember the .item array method that we discussed earlier!

In [None]:
# this cell should output the name of the artist in the first row of playlist

...

### Question: Practice with .with_column

In [None]:
#Just run this cell
table = playlist.sample(playlist.num_rows, with_replacement=False).take(np.arange(5)).select(0,1,2,10)

In [None]:
table

Add a column to `table` describing the data, some ideas: genre, if you know the song, if you like the song

Fill in the two arguments in `.with_column()` for first 5 rows in the table

In [None]:
table.with_column(..., ...)