# Columns and Rows

In [None]:
from datascience import *
from cs104 import *
import numpy as np

%matplotlib inline

## 1. Table Review: Hopkin's Forest Tree Surveys

[Hopkins Forest tree survey](https://web.williams.edu/wp-etc/biology/hmfplot/index.php)

![](https://hmf.williams.edu/files/foliage.jpg)

![](https://web.williams.edu/wp-etc/biology/hmfplot/image/grid_index.jpg)




In [None]:
trees = Table().read_table('data/hopkins-plot-0011.csv')
trees

In [None]:
# Use our str method from last time!
print("This table has " + str(trees.num_rows) + " rows and " + str(trees.num_columns) + " columns")

**Review Table operations**

In [None]:
trees.sort("count", descending=True)

In [None]:
trees.sort("count", descending=True).sort("genus", distinct=True)

In [None]:
maples = trees.where("common name", are.containing("Maple"))
maples

### Quick Array review

In [None]:
maple_counts = maples.column("count")
maple_counts

In [None]:
sum(maple_counts)

In [None]:
maple_counts.item(0)

In [None]:
maple_counts.item(2)

### Visualization

Let's explore the data with a couple plots.

In [None]:
trees.barh('common name', 'count')

In [None]:
trees.sort('count', descending=True).barh('common name', 'count')

A quick method chaining example.

In [None]:
trees.sort('count', descending=True).where('common name', are.containing('Maple')).barh('common name', 'count')

In [None]:
sorted_trees = trees.sort('count', descending=True)
maples = sorted_trees.where('common name', are.containing('Maple'))
maples.barh('common name', 'count')

Select columns. 

In [None]:
trees.select("common name", "count")

Q: Return just the first 3 species names that appear first in the alphabet. 

In [None]:
species = trees.select("species").sort("species", descending=False).take(make_array(0,1,2))
species

What if we want the top 10? 20? 30?

## 2. Numpy methods

**Numpy** is a package for numerical computing in Python.

We will use numpy methods throughout this course to help us understand trends in data. 

In [None]:
# In this class, we will always import numpy the same way 
import numpy as np

### Creating ranges and take

What if I wanted the top 50?  `make_array(0,1,2,...,49)`?  Ugh.
We can make an array for a *range* of numbers with `np.arange(low,high)`, which gives us the integers in the range `[low,high)`.

In [None]:
np.arange(0, 3)

In [None]:
np.arange(0, 50)

In [None]:
first3 = species.take(np.arange(0, 3))
first3

In [None]:
first3 = species.take(np.arange(3))
first3

Why not just use `show`? Show doesn't actually create a new table of the data we want, it just displays it.

In [None]:
other_first3 = species.show(3)

In [None]:
other_first3   # no real value stored in this variable.

### New numpy methods

We can measure how much the radius of a tree grows in a given year by measuring the width of tree ring for that year:

![](https://media.istockphoto.com/id/1135929210/photo/wooden-cross-section-detail-wood-background.jpg?s=612x612&w=0&k=20&c=gL0o1C0NdLjvbp4_3AZXwSQPIMTP3xrjr-Y67PamiBU=)

Suppose we have the ring widths (in mm) for a tree for five years. Let's store this in an **array**. 

In [None]:
ring_widths = make_array(3, 2, 1, 1, 3)
ring_widths

Q: What was the total growth? 

In [None]:
np.sum(ring_widths)

In [None]:
mean_width = np.mean(ring_widths)
mean_width

Q: How did the number of visitors change from year-to-year? 

In [None]:
np.diff(ring_widths)

Q: Compute change in area, rounded to the nearest whole number of mm^2. 

In [None]:
np.round(np.pi * ring_widths**2)

### Think-pair-share: Proportion of Each Maple Species

In [None]:
trees.show()

Q: For each maple species, what proportion of the total count across all species do they consist of?

In [None]:
counts = trees.column("count")
counts

In [None]:
total_count = sum(counts)
total_count 

In [None]:
maples = trees.where('genus', 'Acer')
maples

There will be an error in this next one. Why? 

In [None]:
maple_counts = maples.select("count")
proportion =  maple_counts / total_count
proportion

In [None]:
maple_counts = maples.column("count")
maple_counts

In [None]:
maple_proportions = maple_counts / total_count
maple_proportions

Striped maples are 13%... sugar maples are only 1%.

Q: Why use array broadcasting? 

Takeaway: Array broadcasting saves you work! You do not have to apply the same conversion over and over and over. 

### More Questions...

What is the total proportion of maples in the plot?

In [None]:
sum(maple_proportions)

What is the proportion of non-maples?

In [None]:
1 - sum(maple_proportions)

What is the greatest proportion of any species in our plot?

In [None]:
max(trees.column('count') / total_count)

## 3. Creating a Table from Scratch

Premise: Suppose you find some really interesting facts online, for example, the [list of the world's largest giant sequoia trees](https://www.nps.gov/seki/learn/nature/largest-trees-in-world.htm).

![](https://www.nps.gov/common/uploads/cropped_image/primary/1BF87320-E487-28A4-8E0F241A813FA447.jpg?width=600&quality=90&mode=crop)

Sometimes you may want to manually take the data you're viewing and put it into your Python code. Let's make a table from scratch (rather than a `.csv` file) from an array and the `.with_columns()` method. 

In [None]:
names = make_array('General Sherman', 'General Grant', 'President')
trunk_volume = make_array(52508, 46608, 45148)

In [None]:
big_trees = Table().with_columns('Name', names)
big_trees

You can extend existing Tables with new arrays.

In [None]:
big_trees = big_trees.with_columns('Trunk Volume',trunk_volume)
big_trees

We can also create Tables with multiple arrays at the same time.  

In [None]:
big_trees2 = Table().with_columns('Name', names, 
                                 'Trunk Volume', trunk_volume)
big_trees2

### Table info

In [None]:
big_trees.labels

In [None]:
big_trees.num_rows

In [None]:
big_trees.num_columns

### Relabeling columns


In [None]:
big_trees.relabeled('Trunk Volume', 'Trunk (cubic ft)')

Recall, if we want the results of a method to persist we have to reassign the variable.

In [None]:
big_trees

In [None]:
big_trees = big_trees.relabeled('Trunk Volume', 'Trunk (cubic ft)')
big_trees

### Adding columns


How much do the these tree trunks weigh?  We can estimate that by assuming their trunks weigh about [63 lbs per cubic foot](https://www.spikevm.com/list/green-weight.php).

In [None]:
weights = big_trees.column('Trunk (cubic ft)') * 63
big_trees = big_trees.with_columns('Trunk Weight (lbs)', weights)
big_trees

## Other quantitative questions we can ask about this dataset? 