# Tables

UC Berkeley has its own brand of Python called the [datascience library](http://datascience.readthedocs.io/en/v0.8.1/). This additional Python software library allows you to efficiently create, import, manipulate data in tabular form and then plot it. 

Since table and plotting functionalities do not exist within the base Python installation, we must import the `datascience` (for tables) and `matplotlib` (for plotting) add-on libraries. 

In [None]:
from datascience import *

import matplotlib
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('seaborn-poster')

Begin by defining a variable named `book`. Just like strings and lists, Tables also have methods you can call by typing a period and then pressing your tab key to view the list that pops up. 

Place your cursor after `Table.` in the cell below and press the tab key to see the many methods available for Tables:

In [None]:
Table.

Here, we call `Table().with_columns` in order to define our column names in quotation marks `" "` immediately follwed by lists in square brackets that contain the values. This is all nested inside a set of round parantheses `( )` and square brackets `[ ]`.  

Notice that we do not need to call the `print` function, but try it and see what happens!

In [None]:
book = Table().with_columns([
    "Chapter", [1,2,3,4,5,6,7,8],
    "Length", [4,13,21,44,56,36,21,12],
    "Setting", ["Paris", "Paris", "Tokyo", "Beijing", "New York", "Rome", "Paris", "Paris"]
])
print(type(book))
book

# Columns

Like strings and lists, we can also index values in a Table using the `.column` method:

In [None]:
# extract the first column by its index
book.column(0)

In [None]:
# extract a single column by its name
book.column("Chapter")

Or, you can simply use bracket notation:

In [None]:
book[0]

In [None]:
book["Length"]

Select multiple columns with `.select`

In [None]:
book.select("Chapter", "Length")

Count the number of columns with `.num_columns`

In [None]:
book.num_columns

# Rows

Count the number of rows with `.num_rows`

In [None]:
book.num_rows

Extract single rows with `.row`

In [None]:
book.row(4)

`.take` will extract rows as well

In [None]:
book.take[0]

`.where` will extract rows based on certain conditions. Let's select rows that are only equal to 21 pages:

In [None]:
book.where("Length", are.equal_to(21))

# Adding new columns

Suppose that we know there are exactly 250 words per page in our `book`. We can even add a column that multiplies the number of pages in each chapter by 250 to produce the number of words per chapter. 

Let's create a new table named `book_words` so that we do not alter our original `book` table:

In [None]:
book_words = book.with_column("Words per chapter", book["Length"] * 250)
book_words

We can now ask questions such as:
1. How many chapters are in the book?  
2. How many pages are in the book?  
3. How many words are in the book?  

In [None]:
print("There are", book_words.num_rows, "chapters in the book.")
print("The number of pages in the book is:", sum(book_words[1]))
print(sum(book_words["Words per chapter"]), "is the number of words in the book.")

# Relabeling column names

We can relabel column names using `.relabeled`

In [None]:
book_words.relabeled("Setting", "SETTING")

# Sorting data

Use the `.sort` method to sort your data! Include the optional argument `descenging = ` to sort it in descending or ascending order. 

In [None]:
book.sort("Length", descending=True)

# Frequency tables

Use the `.group` method to create frequency tables. 

In [None]:
book.group("Setting")

You can also use `.pivot` to create pivot tables.

In [None]:
book_words.pivot("Setting", "Words per chapter")

# Challenge 1

1. Create a table that has 3 columns and 8 rows. 
2. Which Table methods might you use to verify that your Table has 3 columns and 8 rows?
3. What methods can you use to extract columns? To extract rows?
4. Add a new, fourth column to your table! 
5. Subset this table to include only two columns. 

In [None]:
## YOUR CODE HERE

# Visualizing your data

We will begin by using the `matplotlib` Python library to plot data from our `datascience` Tables. 

`plots.style.available` will give you a list of stock options to customize your plots. Scroll back up top to the second cell and see we are using 'seaborn-poster' due to its classic look.

In [None]:
plots.style.available

# Histogram

We can plot one numeric variable using a histogram to view its distribution using `.hist`

We might ask the question: what does the proportion of pages per chapter look like throughout the book?

In [None]:
book_words.hist("Length")

Change the number of bins and using the `bins = range()` argument. Change the plot height and width using the `height =` and `width =` arguments:

In [None]:
book.hist("Length", bins = range(30,60), height = 2, width=2)

# Bar plot

We can also use bar plots to visualize two variables using `.bar` and `.barh`

We might ask a question such as: how can we visualize the length of each chapter relative to the other chapters?

In [None]:
book_words.bar("Chapter", "Length")

In [None]:
book.barh("Chapter", "Length")

# Scatter plot

Scatterplots are useful when we want to visualize two numeric variables. 

Ask a question such as: what is the relationship between number of pages in each chapter ("Length") and the number of words per chapter ("Words per chapter"). 

Why is this relationship positive linear? (hint: think back to the relationship of the number of words per page and chapter length!)

In [None]:
book_words.scatter("Length", "Words per chapter", fit_line=False)

# Boxplot

Boxplots are useful when we want to visualize the distribution of a variable. 

For example, we might want to see how "Words per chapter" are distributed.

**NOTE**: look at how we are now using _two_ periods (methods) within a single line of code: 
1. `select` - to select the column we want to plot, and
2. `boxplot` - the way we want to visualize our data!

In [None]:
book_words.select("Words per chapter").boxplot()

# Line plot

Line plots can help us in cases such as looking at change over time. 

We might ask: how does the number of words per chapter change from 1-8?

In [None]:
book_words.select("Chapter", "Words per chapter").plot("Chapter","Words per chapter")

# Importing .csv files

Comma-separated values (.csv) files are a common way to store data and they look like a basic spreadsheet. Although great for teaching exercises, in real life research we never fabricate data. 

Instead, we design collection protocols and record data in spreadsheets, but then we need a way to import it into Python so we can do the manipulations you have learned so far. 

Fortunately, `datascience` Tables have a neat `.read_table()` function that allows us to load data from files.

In [None]:
#load the "iris" .csv file
iris = Table.read_table("iris.csv")
print(type(iris))
iris

# Challenge 2

Using the iris dataset and your help files:  
1. Create a frequency table that shows how many observations (rows) there are of each species.  
2. Define a new variable called `large_petal_lengths` that are greater than or equal to 4.6.
3. Define a new variable names `setosa` that contains only setosa species from the Species column. 
4. Create a new column in `iris` named Petal_Area that contains the product of Petal_Length and Petal_Width.
5. Create overlaid histograms for "Petal_Length" for each of the three species. 
6. Make boxplots for "Petal_Length", "Petal_Width", "Sepal_Length", and "Sepal_Width" columns. 
7. Create a scatterplot of `iris` Petal_Length versus Petal_Width. Color each point of this scatterplot according to "Species".
8. What can you say about the relationship between Petal_Length and Petal_Width for the three species?