# Selecting and Updating Data

## Slicing and indexing trees
Imagine you are a researcher working with data from New York City's tree census. Each row of the tree_census 2D array lists information for a different tree: the tree ID, block ID, trunk diameter, and stump diameter in that order. Living trees do not have stump diameters, which explains why there are so many zeros in that column. Column order is important because NumPy does not have column names! The first and last three rows of tree_census are shown below.
```
array([[     3, 501451,     24,      0],
       [     4, 501451,     20,      0],
       [     7, 501911,      3,      0],
       ...,
       [  1198, 227387,     11,      0],
       [  1199, 227387,     11,      0],
       [  1210, 227386,      6,      0]])
```

In this exercise, you'll be working specifically with the second column, representing block IDs: your research requires you to select specific city blocks for further analysis using NumPy slicing and indexing. numpy is loaded as np, and the `tree_census` 2D array is available.

* Select all rows of data from the second column, representing block IDs; save the resulting array as `block_ids`.
* Print the first five block IDs from block_ids.

In [2]:
import numpy as np

tree_census = np.load('datasets/tree_census.npy')

In [3]:
# Select all rows of block ID data from the second column
block_ids = tree_census[:, 1]

# Print the first five block_ids
print(block_ids[:5])

[501451 501451 501911 501911 501911]


* Select the tenth block ID from `block_ids`, saving the result as `tenth_block_id`.

In [5]:
# Select the tenth block ID from block_ids
tenth_block_id = block_ids[9]
print(tenth_block_id)

501911


* Select five consecutive block IDs from `block_ids`, starting with the tenth ID, and save as `block_id_slice`.

In [6]:
# Select five block IDs from block_ids starting with the tenth ID
block_id_slice = block_ids[9:14]
print(block_id_slice)

[501911 501911 501911 501909 501909]


No mental block for you! Well done. Notice how the slicing and indexing syntax for a 1D array is exactly as it would be if you were working with a Python list!

## Stepping into 2D
Now assume that your research requires you to take an admittedly unrepresentative sample of trunk diameters, which are located in the third column of `tree_census`. Getting just a selection of trunk diameters can be done with NumPy's slicing and stepping functionality.

* Create an array called `hundred_diameters` which contains the first 100 trunk diameters in `tree_census`.

In [7]:
# Create an array of the first 100 trunk diameters from tree_census
hundred_diameters = tree_census[:100, 2]
print(hundred_diameters)

[24 20  3  3  4  4  4  4  4  3  3  4  2  2  3  4  4  4  0 14  3  4  7  8
  7  8  7  5  6  5  5 17  0 19 21 18  4  5  3  4  3  4 13 13 13  5  4  4
  4 11  5  4  5  8 51  7  4 15  3  8  6  6  3  4  3  2  3  3  6  5  5  5
  5  9  4  4  7  7  6  5  4  4  5  5  5  7  3  5  3  3  6  6  8  7  4  5
  4  4  4  4]


* Create an array,`every_other_diameter`, which contains only every other trunk diameter for trees with row indices from 50 to 100, inclusive.

In [8]:
# Create an array of trunk diameters with even row indices from 50 to 100 inclusive
every_other_diameter = tree_census[50:101:2, 2]
print(every_other_diameter)

[ 5  5 51  4  3  6  3  3  3  6  5  5  4  7  6  4  5  5  3  3  6  8  4  4
  4  6]


Look at that slicing! Great work. Notice how you only sliced the rows. In terms of columns, you just indicated that you were working with the column at index two: trunk diameter.

## Sorting trees
Sometimes it's easiest to understand data when it is sorted according to the value you are most interested in. Your new research task is to create an array containing the trunk diameters in the New York City tree census, sorted in order from smallest to largest.

* Create an array called `sorted_trunk_diameters` which selects only the trunk diameter column from `tree_census` and sorts it so that the smallest trunk diameters are at the top of the array and the largest at the bottom.

In [9]:
# Extract trunk diameters information and sort from smallest to largest
sorted_trunk_diameters = np.sort(tree_census[:, 2])
print(sorted_trunk_diameters)

[ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  1  1  1  1  1  1  2  2  2  2  2  2
  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3
  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3
  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3
  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  4  4  4  4
  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4
  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4
  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4
  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4
  4  4  4  4  4  4  4  4  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5

Way to get to the root of that tree problem! Notice that the np.sort() function doesn't include an option to sort ascending or descending.

## Filtering with masks
In the last lesson, you sorted trees from smallest to largest. Now, you'll use fancy indexing to return the row of data representing the largest tree in tree_census. You'll also examine other trees located on the same block as the largest tree: are they also large?

* Using Boolean indexing, create an array, `largest_tree_data`, which contains the row of data on the largest tree in `tree_census` corresponding to the tree with a diameter of 51.

In [10]:
# Create an array which contains row data on the largest tree in tree_census
largest_tree_data = tree_census[tree_census[:, 2] == 51]
print(largest_tree_data)

[[    61 501882     51      0]]


* Slice `largest_tree_data` to retrieve only the block id of the block the largest tree is located on; save this block id as `largest_tree_block_id`.

In [11]:
# Slice largest_tree_data to get only the block id
largest_tree_block_id = largest_tree_data[:, 1]
print(largest_tree_block_id)

[501882]


* Using fancy indexing, create an array called `trees_on_largest_tree_block` which contains data on all trees with the same block ID as the largest tree.

In [12]:
# Create an array which contains row data on all trees with largest_tree_block_id
trees_on_largest_tree_block = tree_census[tree_census[:, 1] == largest_tree_block_id]
print(trees_on_largest_tree_block)

[[    60 501882      8      0]
 [    61 501882     51      0]
 [    62 501882      7      0]
 [    63 501882      4      0]
 [    64 501882     15      0]
 [    65 501882      3      0]
 [    66 501882      8      0]
 [    67 501882      6      0]
 [    68 501882      6      0]
 [    69 501882      3      0]]


So fancy! Great job. Based on your work, it looks like the largest tree on the tree_census is the only really big tree on its block.

## Fancy indexing vs. np.where()
You and your tree research team are double-checking collection data by visiting a few trees in person to confirm their measurements. You've been assigned to check the data for trees on block 313879, and you'd like to make a small array of just the tree data that relates to your work.

* Using fancy indexing, create an array called `block_313879` which only contains data for trees with a block ID of 313879.

In [13]:
# Create the block_313879 array containing trees on block 313879
block_313879 = tree_census[tree_census[:, 1] == 313879]
print(block_313879)

[[  1115 313879      3      0]
 [  1116 313879     17      0]]


* Using `np.where()`, create an array of `row_indices` for trees with a block ID of 313879.
* Using `row_indices`, create `block_313879`, which contains data for trees on block 313879.

In [14]:
# Create an array of row_indices for trees on block 313879
row_indices = np.where(tree_census[:, 1] == 313879)

# Create an array which only contains data for trees on block 313879
block_313879 = tree_census[row_indices]
print(block_313879)

[[  1115 313879      3      0]
 [  1116 313879     17      0]]


Where'd you get those np.where() skills!? Great filtering. You probably noticed that fancy indexing is more elegant than np.where() in this example. That's because we haven't really tapped into the power of np.where() yet. It's most useful for finding indices and then using that location information to update an array. We'll see an example of this in the next exercise, and also in the next lesson, where one of the functions takes indices as arguments!

## Creating arrays from conditions
Currently, the stump diameter and trunk diameter values in `tree_census` are in two different columns. Living trees have a stump diameter of zero while stumps have a trunk diameter of zero. If you'd like to include both living trees and stumps in certain research calculations, it might be useful to have their diameters together in just one column.

* Create and print a 1D array called `trunk_stump_diameters`, which replaces a tree's trunk diameter with its stump diameter if the trunk diameter is zero.



In [15]:
# Create and print a 1D array of tree and stump diameters
trunk_stump_diameters = np.where(tree_census[:, 2] == 0, tree_census[:, 3], tree_census[:, 2])
print(trunk_stump_diameters)

[24 20  3  3  4  4  4  4  4  3  3  4  2  2  3  4  4  4  3 14  3  4  7  8
  7  8  7  5  6  5  5 17 31 19 21 18  4  5  3  4  3  4 13 13 13  5  4  4
  4 11  5  4  5  8 51  7  4 15  3  8  6  6  3  4  3  2  3  3  6  5  5  5
  5  9  4  4  7  7  6  5  4  4  5  5  5  7  3  5  3  3  6  6  8  7  4  5
  4  4  4  4  6  5  3  4 12 12 12  5  6  6  6  6  6  5  5  6  7  7 25  5
  5  4  6  6  7 11  6 17 13 14 14 20 15 13  7  7 10 17 14  4  6  7  8  7
  7  6  7  5  2  2  2  2 26 25  2 15  6 20  5  9 15 13 15  3  2 13  6 12
 15 18 22 18 18 15 17  7  3  7  8  4 12 11 12  3  9 12 11 10  8  6  6  7
  7  3 15 12 12  4  5  5  5  4  4  5  4  9  2  4  4  6  5  5  2  5  5  4
  4  5  5  6 11  4  5  7  3 14 11 10  7 15 10  5  6 10 10  6  5  4  4  3
  5  4 14 12 11  8 14 12  9 12 11  7  8 10 10 12 11 12  5  5  6  9  9  8
  5  5  5  6  6 12 12 11 12  8  9  5  5  5  8  2  2  2 14 18 14 14 22 15
 19 14 18  7  7  7  8  8  5 10 14  2  2  2  2 11 12 12  3  3  3  3  3  6
  6  8  2  2 11 11 11  9 11 12 13  9 11  6  4  5  5

Looks fantastic! But this is just a 1D array without any tree or block ID information. How do we add this information back to the tree_census array? That's the subject of our next lesson!

## Adding rows
The research team has discovered two trees that were left off the tree_census. Your task is to add rows containing the data for these new trees to the end of the tree_census array. The new trees' data is saved in a 2D array called new_trees:
```
new_trees = np.array([[1211, 227386, 20, 0], [1212, 227386, 8, 0]])
```

* Print the shapes of `tree_census` and `new_trees` to confirm they are compatible to concatenate.

In [16]:
new_trees = np.array([[1211, 227386, 20, 0], [1212, 227386, 8, 0]])
# Print the shapes of tree_census and new_trees
print(tree_census.shape, new_trees.shape)

(1000, 4) (2, 4)


* Add rows to the end of `tree_census` containing data for the new trees from the new_trees 2D array; save the new array as `updated_tree_census`.

In [17]:
# Add rows to tree_census which contain data for the new trees
updated_tree_census = np.concatenate((tree_census, new_trees))
print(updated_tree_census)

[[     3 501451     24      0]
 [     4 501451     20      0]
 [     7 501911      3      0]
 ...
 [  1210 227386      6      0]
 [  1211 227386     20      0]
 [  1212 227386      8      0]]


Excellent concatenation work! Notice that this concatenation task was easier because the array shapes and dimensions were already compatible. Let's take a look at an example where the arrays to be concatenated are not compatible.

## Adding columns
You finished the last set of exercises by creating an array called trunk_stump_diameters, which combined data from the trunk diameter and stump diameter columns into a 1D array. Now, you'll add that 1D array as a column to the `tree_census array`.

* Print the shapes of both `tree_census` and `trunk_stump_diameters`.

In [18]:
# Print the shapes of tree_census and trunk_stump_diameters
print(tree_census.shape, trunk_stump_diameters.shape)

(1000, 4) (1000,)


* Reshape `trunk_stump_diameters` so that it can be appended as the last column in `tree_census`; call the reshaped array `reshaped_diameters`.

In [19]:
# Reshape trunk_stump_diameters
reshaped_diameters = trunk_stump_diameters.reshape((1000, 1))

* Concatenate `reshaped_diameters` to the end of `tree_census` so that it becomes the last column; call the new array `concatenated_tree_censu`s.

In [20]:
# Concatenate reshaped_diameters to tree_census as the last column
concatenated_tree_census = np.concatenate((tree_census, reshaped_diameters), axis=1)
print(concatenated_tree_census)

[[     3 501451     24      0     24]
 [     4 501451     20      0     20]
 [     7 501911      3      0      3]
 ...
 [  1198 227387     11      0     11]
 [  1199 227387     11      0     11]
 [  1210 227386      6      0      6]]


That's right! Adding a 1D array to an existing 2D array requires you to reshape the 1D array into a 2D array first. We'll dive into shape compatibility issues like this even further in the next chapter on array mathematics!

## Deleting with np.delete()
What if your tree research focuses only on living trees on publicly-owned city blocks? It might be helpful to delete some unneeded data like the stump diameter column and some trees located on private blocks.

You've learned that NumPy's `np.delete()` function takes three arguments: the original array, the index or indices to be deleted, and the axis to delete along. If you don't know the index or indices of the array you'd like to delete, recall that when it is only passed one argument,`np.where()` returns an array of indices where a condition is met!

* Delete the stump diameter column from `tree_census`, and save the new 2D array as `tree_census_no_stumps`.
* Using `np.where()`, find the indices of any trees on block 313879, a private block. Save the indices in an array called `private_block_indices`.

In [21]:
# Delete the stump diameter column from tree_census
tree_census_no_stumps = np.delete(tree_census, 3, axis=1)

# Save the indices of the trees on block 313879
private_block_indices = np.where(tree_census[:, 1] == 313879)

* Using the indices you just found using `np.where()`, delete the rows for trees on block 313879 from `tree_census_no_stumps`, saving the new 2D array as `tree_census_clean`.
* Print the shape of `tree_census_clean`.

In [22]:
# Delete the rows for trees on block 313879 from tree_census_no_stumps
tree_census_clean = np.delete(tree_census_no_stumps, private_block_indices, axis=0)

# Print the shape of tree_census_clean
print(tree_census_clean.shape)

(998, 3)


We can't stump you! Notice that the new shape reflects two fewer rows and one fewer column than tree_census started with because of your deletions: just as expected.