# Working with tables
Now that you know about the basic functions of Python and common data types, let's take a look at arrays and tables, which are the most common ways to organize data.

In [None]:
# This cell needs to be run first; don't worry about why just yet!
# Click on the cell to highlight it, then press Shift+Enter or Control+Enter to run it.
from datascience import *
import numpy as np

## Arrays
An array contains a sequence of values. It is a fundamental data type in data science. They are created with the function `make_array`, which takes in a number of inputs. Arrays have similarities to lists, but we will work more with arrays than with lists in this course. Don't worry too much about lists.

In [None]:
make_array(1, 2, 3, 4)

Let's store the array we made in a variable called "ar":

In [None]:
ar = make_array(1, 2, 3, 4)
ar

And now let's make another one, stored in a different variable, "br":

In [None]:
br = make_array(4, 3, 2, 1)
br

Arrays of the same length can be added together or subtracting from each other:

In [None]:
ar + br

In [None]:
ar - br

You can take the length of an array, as well as other objects, using the `len` function:

In [None]:
len(ar)

If you want to get a particular value from an array, you use bracket notation.

In [None]:
ar

In [None]:
# remember that '0' finds the first item in Python
ar[0]

In [None]:
ar[3]

Why does the line below throw an error?

In [None]:
ar[4]

Arrays may also contain strings.

In [None]:
cr = make_array('a', 'b', 'c', 'd')
cr

If you put different types into an array, then the array will contain the most generic type. That is, an array can only contain elements of the same type. In the line below, there are *strings* and *integers*; the integers will be coerced to become strings.

In [None]:
dr = make_array('1', '2', 3, 4)
dr

Why does the line below throw an error?

In [None]:
dr + ar

## Tables
Tables are the fundamental way we will use to represent data in this course. Tables are exactly what you think they are: rows and columns and values in corresponding cells. The `Table` object is a type of data object that can be explored using *methods*.

Recall from your reading that you call functions by putting the argument inside parentheses after the function. The arguments are one or more inputs:
```
>>> function(arguments)
>>> abs(-5)    # take the absolute value of argument '-5'
>>> max(4, 10) # find the maximum value of argument '4, 10'
```
Methods are virtually identical to functions, except that you call them by adding a period `.` *after the object*, followed by the usual `function(arguments)` format:
```
>>> name_of_table.method(arguments)
```
Or more generally:
```
>>> name_of_object.method(arguments)
```

## Creating a `Table` from scratch
A new `Table` may be made from scratch using `Table()`. This is a function with no arguments -- it creates a blank `Table`.

In [None]:
Table()

We will then use the `.with_columns()` *method* to add data to this `Table`. The arguments for this method include column names and *arrays* that include the values in those columns. That is to say: `Table().with_columns(column1_name, column1_data, column2_name, column2_data, ...)`.

Note that when using the `.with_columns()` method, you must have an even number of arguments, matching each column name with its corresponding array of data.

In [None]:
Table().with_columns('column 1', ar, 'column 2', br)

You can even create new arrays within the method, nesting the `make_array()` function inside the `.with_columns()` method.

In [None]:
Table().with_columns('Name', make_array('Juan','Mai'), 'Age', make_array(24, 27))

But why does the following line throw an error?

In [None]:
Table.with_columns('Name', make_array('Juan','Mai'), 'Age', make_array(24, 27))

## Loading a `Table`
Of course, a majority of the time, we will be loading tables from files that we've retrieved from an online corpus. Tables are loaded from a file using the `read_table(FILEPATH)` method, or one of its variations, depending on the file type.

Let's use an example from PHOIBLE (Phonetics Information Base and Lexicon), a database of phonological inventories from thousands of languages. For more information, and to browse the data interactively, visit [their website](https://phoible.org/) (Moran et al. 2019).

Part of the PHOIBLE corpus is available in the ''cloud'' (DataHub server) for this course. It exists as a `csv` file called `wk1-phoible.csv`. (That stands for comma-separated values. If you opened this up in a program such as Excel, you'd just see a table of rows and columns.) The file is located in the same folder as this notebook.

Read `wk1-phoible.csv` into your notebook using the `read_table()` method. The argument of this method is the name of the file as a *string*.

In [None]:
phoible = Table.read_table('wk1-phoible.csv')

## Viewing the `Table`
If you want to take a look at our newly-created object `phoible`, you can just type that into a cell. But beware...

In [None]:
phoible

It's big! Over 105,000 rows! So most of them are cut out from the preview. And look at all those columns...

Let's simplify things a little bit. Let's just show the first 5 rows of `phoible`, which conveniently can be done with the method `.show()`:

In [None]:
phoible.show(5)

## Getting the size of a table
We want to get a better idea of just how large this table is. Let's use `num_rows` and `num_columns`. (As a technical note, these are not methods, but *attributes*. Attributes are a bit different from methods; for one, they do not require parentheses.)

In [None]:
phoible.num_rows

In [None]:
phoible.num_columns

## Accessing columns and column names
Column names can be accessed using `labels`.

In [None]:
phoible.labels

And all the values in a column can be accessed by calling the .column() method on the column name as a string. This returns an array.

In [None]:
phoible.column('LanguageName')

By the way, "Tableland Lamalama" is most likely a reference to one of the [Paman languages](https://en.wikipedia.org/wiki/Lamu-Lamu_language) spoken by the indigenous people of northeastern Australia (Queensland). And Korean is the language that I (Andrew) study!

Okay, let's access some of the elements of the array using bracket notation.

In [None]:
phoible.column('LanguageName')[0]

In [None]:
phoible.column('Phoneme')[0]

You may instead want to have the output be another table. To do that, use `select(column_or_columns)`. (You may select one or more columns.) This is one way to `subset` a table.

In [None]:
phoible.select('LanguageName','Phoneme')

So what are we looking at here? All of the phonemes of many languages, and their phonological attributes. How many languages are there? Definitely not 105,000. We can use the function `numpy.unique()` to count every unique instance of a value in a column.

We are going to call `numpy.unique()` using a shortcut you've already created by running the first cell: `np.unique()`. The argument it takes is an *array*. So why doesn't the following line work?

In [None]:
np.unique(phoible)

In [None]:
np.unique(phoible.column('LanguageName'))

And now we want to count the number of items in this array.

In [None]:
len(np.unique(phoible.column('LanguageName')))

For clarity's sake we might have assigned the array to another variable.

In [None]:
u = np.unique(phoible.column('LanguageName'))
len(u)

Functions and methods can stack one after another in Python. This line below is somewhat unintelligible...
```
print(np.unique(phoible.column('LanguageName')).tolist())
```
But if we replace the complex part with a variable and then break it down...
```
print(u.tolist())
```
We can see how it's simply calling a method called `.tolist()` on our object `u` (which is an array), and then printing that list.

In [None]:
print(u.tolist())

Order of operations matters in Python! Be aware of where your parentheses start and end.
```
print(u).tolist()
```
... is very different. Try it.

In [None]:
print(u).tolist()

## Dropping columns
If you ever feel like you have too many columns in a table, you may remove one or more of them using the method `drop(column_or_columns)`. The column you want to drop is the argument:

In [None]:
phoible.drop('LanguageName')

Of course, this method doesn't permanently change the table. You have to create a new variable and assign your new table to it:

In [None]:
phoible_new = phoible.drop('LanguageName')
phoible_new

## Selecting rows that satisfy a condition
But how can we look at specific rows? Let's use the method `where(column_name, value)` to select specific rows based on their value in a column. This method has two arguments: the name of the column you're thinking of as a string, followed by the value that you want. Think of this method as commanding Python to show you the part of the `Table` "where" `column_name` equals `value`.

In [None]:
selection = phoible.where('LanguageName', 'Spanish')

In [None]:
selection

The second argument should of course match the data type of the column you're searching; it's not always a string

In [None]:
selection2 = phoible.where('InventoryID', 75)

In [None]:
selection2

Hm... I wonder how many phonemes are in the [Dakota language](https://en.wikipedia.org/wiki/Dakota_language)? How might I find the *length* of the "Phoneme" column?

In [None]:
len(np.unique(selection2.column('Phoneme')))

In [None]:
# Assuming that every phoneme gets its own row, we could also use the attribute .num_rows
selection2.num_rows

Let's do this the fancy way.

In [None]:
x = str(len(selection2))
print("There are " + x + " phonemes in the Dakota language.")

That's it for now! You will use PHOIBLE again for your homework assignment this week.