# Introduction

When it comes to programming in Python, libraries are where the real power is. Similar to plugins or extensions, libraries unlock your code and make it more efficient. With a few fundamentals of Python programming, you can use libraries for high-performance data analysis and data visualization. In this lesson, we'll go over one of the most user-friendly libraries out there: NumPy.

![alt text](images/1.1-m289.gif)

NumPy is very popular because it makes writing programs easy. Python is a high-level language, which means you don't have to allocate memory manually. With low-level languages, you have to define memory allocation and processing, which gives you more control over performance, but it also slows down your programming. NumPy gives you the best of both worlds: processing performance without all the allocation.

![alt text](images/1.2-m289.gif)

In this lesson, we'll learn how to use NumPy to work with databases, statistics, machine learning, and more — with real-world datasets, including New York City taxi trip data.

You'll need to be comfortable with programming in Python, but here are a few takeaways you can expect in this lesson:

* How vectorized operations speed up your code
* How to select data from NumPy ndarrays
* How to analyze data using NumPy methods



# Introduction to Numpy arrays

The core data structure in NumPy is the **ndarray** or **n-dimensional array**. In programming, **array** describes a collection of elements, similar to a list. The word **n-dimensional** refers to the fact that ndarrays can have one or more dimensions. We'll start by working with one-dimensional (1D) ndarrays.

![alt text](images/2.1-m289.svg)

To use the NumPy library, we first need to import it into our Python environment. It's common to import NumPy using the alias `np`:

```
import numpy as np
```

Then, we can directly convert a list to an ndarray using the [`numpy.array()` constructor](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.array.html). To create a 1D ndarray, we can pass in a single list:

```
data_ndarray = np.array([5, 10, 15, 20])
```

We used the syntax `np.array()` instead of `numpy.array()` because of our `import numpy as np` code. When we introduce new syntax, we'll always use the full name to describe it, and you'll need to substitute in the shorthand where it's appropriate.

Now, let's practice creating 1D ndarrays.

In [1]:
import numpy as np
data_ndarray = np.array([10, 20, 30])

# Understanding Vectorization

Ndarrays and the NumPy library make it easier to manipulate and analyze data. Let's explore why.

Using standard Python, we might consider using **lists of lists** to represent datasets. While lists of lists work with small datasets, they aren't very good for larger data sets.

Let's look at an example involving two columns of data. Each row contains two numbers we want to add together. Using standard Python, we could use a list-of-lists structure to store our data, and we could use **for loops** to iterate over that data:

![alt text](images/3.1-m289.svg)

In each iteration of our loop, Python turns our code into bytecode, and the bytecode asks our computer's processor to add the two numbers together:

![alt text](images/3.2-m289.gif)

Our computer would take eight processor cycles to process the eight rows of our data.

The NumPy library takes advantage of a processor feature called **Single Instruction Multiple Data (SIMD)** to process data faster. SIMD allows a processor to perform the same operation on multiple data points in a single processor cycle:

![alt text](images/3.3-m289.gif)

As a result, using NumPy would only take two processor cycles — making it four times faster than standard Python alone. We call this concept of replacing for loops with operations applied to multiple data points at once vectorization, and ndarrays make vectorization possible.

We'll explore how **vectorization** makes our code faster and easier to execute throughout this lesson. On the next screen, we'll practice converting a real-world dataset from a list of lists to an ndarray.

# NYC Taxi-Airport Data

So far, we've only practiced creating one-dimensional ndarrays, but ndarrays can also be two-dimensional:

![alt text](images/4.1-m289.svg)

To explore two-dimensional (2D) ndarrays, we'll analyze New York City taxi trip data released by the city of New York.

![alt text](images/nyc_taxi.jpg "New York City Taxis")

We'll only work with a subset of this data — approximately 90,000 yellow taxi trips to and from New York City airports between January and June 2016. Below is information about selected columns from the dataset:

`pickup_year`: the year of the trip
`pickup_month`: the month of the trip (January is 1, December is 12)
`pickup_day`: the day of the month of the trip
`pickup_location_code`: the airport or borough where the trip started
`dropoff_location_code`: the airport or borough where the trip ended
`trip_distance`: the distance of the trip in miles
`trip_length`: the length of the trip in seconds
`fare_amount`: the base fare of the trip, in dollars
`total_amount`: the total amount charged to the passenger, including all fees, tolls and tips
You can find information on all columns in the dataset data dictionary.

Our data is in a CSV file called `nyc_taxis.csv`. Below are the first few lines of raw data in our CSV (we're showing only the first four columns from the file to make the format easier to understand):

```
pickup_year,pickup_month,pickup_day,pickup_dayofweek
2016,1,1,5
2016,1,1,5
2016,1,1,5
2016,1,1,5
```

You may notice that we could also represent the data in a table form:

| pickup_year | pickup_month | pickup_day | pickup_dayofweek | pickup_time | pickup_location_code | dropoff_location_code | trip_distance | trip_length | fare_amount | fees_amount | tolls_amount | tip_amount | total_amount | payment_type
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 2016 | 1 | 1 | 5 | 0 | 2 | 4 | 21.00 | 2037 | 52.0 | 0.8 | 5.54 | 11.65 | 69.99 | 1 |
| 2016 | 1 | 1 | 5 | 0 | 2 | 1 | 16.29 | 1520 | 45.0 | 1.3 | 0.00 | 8.00 | 54.30 | 1 | 
| 2016 | 1 | 1 | 5 | 0 | 2 | 6 | 12.70 | 1462 | 36.5 | 1.3 | 0.00 | 0.00 | 37.80 | 2 |

Does this look familiar? Compare this table to the diagram of the 2D ndarray we saw earlier. You can picture 2D ndarrays as storing data like this table.

To convert the dataset into a 2D ndarray, we'll first use Python's built-in [`csv` module](https://docs.python.org/3/library/csv.html) to import our CSV as a "list of lists." Then, we'll convert the list of lists to a ndarray. We'll again use the `numpy.array()` constructor, but to create a 2D ndarray, we'll pass in our list of lists instead of a single list:

```
# our list of lists is stored as data_list
data_ndarray = np.array(data_list)
```

Let's convert our taxi CSV into a NumPy ndarray!

In [3]:
import csv
import numpy as np

# import nyc_taxi.csv as a list of lists
f = open("nyc_taxis.csv", "r")
taxi_list = list(csv.reader(f))

# remove the header row
taxi_list = taxi_list[1:]

# convert all values to floats
converted_taxi_list = []
for row in taxi_list:
    converted_row = []
    for item in row:
        converted_row.append(float(item))
    converted_taxi_list.append(converted_row)

# start writing your code below this comment
taxi = np.array(converted_taxi_list)


# Array shapes

Let's look at the data in the `taxi` variable from the previous screen by printing it using Python's `print()` function:

```
print(taxi)
```
```
[[  2016      1      1 ...  11.65  69.99      1]
 [  2016      1      1 ...      8   54.3      1]
 [  2016      1      1 ...      0   37.8      2]
 ...
 [  2016      6     30 ...      5  63.34      1]
 [  2016      6     30 ...   8.95  44.75      1]
 [  2016      6     30 ...      0  54.84      2]]
```

The ellipses (...) between rows and columns indicate that there is more data in our NumPy ndarray than can easily be printed.

However, it's often useful to know the number of rows and columns in a ndarray. When we can't easily print the entire ndarray, we can use the `ndarray.shape` attribute instead:

```
data_ndarray = np.array([[5, 10, 15], 
                         [20, 25, 30]])
print(data_ndarray.shape)
```
Outputs -
```
(2, 3)
```
The data type returned is called a **tuple**. Tuples are very similar to Python lists, but you can't modify them.

The output gives us a few important pieces of information:

* The first number tells us that there are two rows in `data_ndarray`.
* The second number tells us that there are three columns in `data_ndarray`.

Next, let's confirm the number of number of rows and columns in our dataset.

In [None]:
taxi_shape = taxi.shape

# Selecting and Slicing Rows and Items from Ndarrays

Next, let's compare working with ndarrays and list of lists to select one or more rows of data:

![alt text](images/6.1-m289.svg)

As we see above, we can select rows in ndarrays very similarly to lists of lists. In reality, what we're seeing is a kind of shortcut. For any 2D array, the full syntax for selecting data is the following:

```
ndarray[row_index,column_index]
​
# or if you want to select all
# columns for a given set of rows
ndarray[row_index]
```

... where `row_index` defines the location along the row axis and `column_index` defines the location along the column axis.

Like lists, array slicing is from the first specified index up to — but not including — the second specified index. For example, to select the items at index `1`, `2`, and `3`, we'd need to use the slice `[1:4]`.

This is how we select a single item from a 2D ndarray:

![alt text](images/6.2-m289.svg)

With a list of lists, we use two separate pairs of square brackets back-to-back. With a NumPy ndarray, we use a single pair of brackets with comma-separated row and column locations.

Let's practice selecting one row, multiple rows, and single items from our `taxi` ndarray.

In [None]:
row_0 = taxi[0]
rows_391_to_500 = taxi[391:501]
row_21_column_5 = taxi[21, 5]

# Selecting Columns and Custom Slicing Ndarrays

Let's continue by learning how to select one or more columns of data:

![alt text](images/7.1-m289.svg)

With a list of lists, we need to use a for loop to extract specific column(s) and append them back to a new list. With ndarrays, the process is much simpler. We again use single brackets with comma-separated row and column locations, but we use a colon (`:`) for the row locations, which gives us all of the rows.

If we want to select a partial 1D slice of a row or column, we can combine a single value for one dimension with a slice for the other dimension:

![alt text](images/7.2-m289.svg)
![alt text](images/selection_1darray_updated.svg)


Selecting partial 1D slices from a 2D ndarray

Lastly, if we want to select a 2D slice, we can use slices for both dimensions:


Let's practice everything we've learned so far to perform some more complex selections using NumPy.

In [4]:
columns_1_4_7 = taxi[:, [1, 4, 7]]
row_99_columns_5_to_8 = taxi[99, 5:9]
rows_100_to_200_column_14 = taxi[100:201, 14]

# Vector math 

As we saw on the previous two screens, NumPy ndarrays allow us to select data much more easily. Beyond this, the selection is much faster when working with **vectorized operations** because the operations apply to multiple data points at once.

When we first talked about vectorized operations, we used the example of adding two columns of data. With data in a list of lists, we'd have to construct a for-loop and add each pair of values from each row individually:

![alt text](images/8.1-m289.svg)

At the time, we only talked about how vectorized operations make this faster; however, vectorized operations also make our code easier to execute. Here's how we would perform the same task above with vectorized operations:

```
# convert the list of lists to an ndarray
my_numbers = np.array(my_numbers)

# select each of the columns - the result
# of each will be a 1D ndarray
col1 = my_numbers[:,0]
col2 = my_numbers[:,1]

# add the two columns
sums = col1 + col2
```

We could simplify this further if we wanted to:

```
sums = my_numbers[:,0] + my_numbers[:,1]
```

Here are some key observations about this code:

When we selected each column, we used the syntax `ndarray[:,c]` where `c` is the column index we wanted to select. Like we saw in the previous screen, the colon selects all rows.
To add the two 1D ndarrays, `col1` and `col2`, we simply use the addition operator (`+`) between them.
Here's what happened behind the scenes:

![alt text](images/8.2-m289.gif)

The result of adding two 1D ndarrays is a 1D ndarray of the same shape (or dimensions) as the original. In this context, we can also call ndarrays vectors, a term from linear algebra. We call adding two **vectors** together **vector addition**.

In [5]:
fare_amount = taxi[:,9]
fees_amount = taxi[:,10]

fare_and_fees = fare_amount + fees_amount

<div><p>On the previous screen, we used vector addition to add two columns (or vectors) together. We can actually use any of the standard <a href="https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex" target="_blank">Python numeric operators</a> with vectors, including the following:</p>
<ul>
<li><code>vector_a + vector_b</code> — addition</li>
<li><code>vector_a - vector_b</code> — subtraction</li>
<li><code>vector_a * vector_b</code> — multiplication (this is unrelated to the vector multiplication used in linear algebra).</li>
<li><code>vector_a / vector_b</code> - division</li>
</ul>
<p>When we perform these operations on two 1D vectors, both vectors must have the same shape.  </p>
<p>Let's look at another example from our taxi data set.  Here are the first five rows of two of the columns in the data set:</p>
<table class="dataframe">
<thead>
<tr>
<th>trip_distance</th>
<th>trip_length</th>
</tr>
</thead>
<tbody>
<tr>
<td>21.00</td>
<td>2037.0</td>
</tr>
<tr>
<td>16.29</td>
<td>1520.0</td>
</tr>
<tr>
<td>12.70</td>
<td>1462.0</td>
</tr>
<tr>
<td>8.70</td>
<td>1210.0</td>
</tr>
<tr>
<td>5.56</td>
<td>759.0</td>
</tr>
</tbody>
</table>
<p>Let's use these columns to calculate the average travel speed of each trip in miles per hour. This is the formula for calculating miles per hour:</p>
<p></p><center><span class="MathJax_Preview" style="color: inherit; display: none;"></span><span id="MathJax-Element-3-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 109%; position: relative;" data-mathml="<math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;><mtext>miles per hour</mtext><mo>=</mo><mtext>distance in miles</mtext><mo>&amp;#x00F7;</mo><mtext>length in hours</mtext></math>" role="presentation"><span id="MJXc-Node-19" class="mjx-math" aria-hidden="true"><span id="MJXc-Node-20" class="mjx-mrow"><span id="MJXc-Node-21" class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.41em; padding-bottom: 0.512em;">miles per hour</span></span><span id="MJXc-Node-22" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.104em; padding-bottom: 0.308em;">=</span></span><span id="MJXc-Node-23" class="mjx-mtext MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.41em; padding-bottom: 0.359em;">distance in miles</span></span><span id="MJXc-Node-24" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.257em; padding-bottom: 0.359em;">÷</span></span><span id="MJXc-Node-25" class="mjx-mtext MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.41em; padding-bottom: 0.563em;">length in hours</span></span></span></span><span class="MJX_Assistive_MathML" role="presentation"><math xmlns="http://www.w3.org/1998/Math/MathML"><mtext>miles per hour</mtext><mo>=</mo><mtext>distance in miles</mtext><mo>÷</mo><mtext>length in hours</mtext></math></span></span><script type="math/tex" id="MathJax-Element-3">\text{miles per hour} = \text{distance in miles} \div \text{length in hours}</script></center><p></p>
<p>As we learned earlier in this lesson, <code>trip_distance</code> is already expressed in miles, but<code>trip_length</code> is expressed in seconds. First, we'll convert <code>trip_length</code> into hours:</p>
</div>

```
trip_distance = taxi[:,7]
trip_length_seconds = taxi[:,8]

trip_length_hours = trip_length_seconds / 3600 # 3600 seconds is one hour
```

<div>
<p>In this case, we divided each value in the vector by a <em>single number</em>, 3600, instead of another vector. Below are the first five rows of the result:</p>
<table class="dataframe">
<thead>
<tr>
<th>trip_length_hours</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.565833</td>
</tr>
<tr>
<td>0.422222</td>
</tr>
<tr>
<td>0.406111</td>
</tr>
<tr>
<td>0.336111</td>
</tr>
<tr>
<td>0.210833</td>
</tr>
</tbody>
</table>
<p>Let's perform vector division again to calculate the miles per hour.</p></div>

In [6]:
trip_distance_miles = taxi[:,7]
trip_length_seconds = taxi[:,8]

trip_length_hours = trip_length_seconds / 3600 # 3600 seconds is one hour

trip_mph = trip_distance_miles/ trip_length_hours

# Calculating statistics for 1D Ndarrays

<div><p>On the previous screen, we created <code>trip_mph</code>, a 1D ndarray of the average mile-per-hour speed of each trip in our dataset. Next, we'll explore this data further and calculate the minimum, maximum, and mean values for <code>trip_mph</code>.</p>
<p>To calculate the minimum value of a 1D ndarray, we use the vectorized <a href="http://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.min.html" target="_blank"><code>ndarray.min()</code> method</a>, like this:</p>
</div>

```
mph_min = trip_mph.min()
print(mph_min)
```
```
0.0
```

<div>
<p>The minimum value in our <code>trip_mph</code> ndarray is <code>0.0</code> — for a trip that didn't travel any distance at all.</p>
<p>NumPy ndarrays have methods for many different calculations. Here are a few of the key methods:</p>
<ul>
<li><a href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.min.html#numpy.ndarray.min" target="_blank"><code>ndarray.min()</code> to calculate the minimum value</a></li>
<li><a href="https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.ndarray.max.html" target="_blank"><code>ndarray.max()</code> to calculate the maximum value</a></li>
<li><a href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.mean.html#numpy.ndarray.mean" target="_blank"><code>ndarray.mean()</code> to calculate the mean or average value</a></li>
<li><a href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.sum.html#numpy.ndarray.sum" target="_blank"><code>ndarray.sum()</code> to calculate the sum of the values</a></li>
</ul>
<p>You can see the full list of ndarray methods in the <a href="https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.ndarray.html#calculation" target="_blank">NumPy ndarray documentation</a>.</p>
<p>It's important to become familiar with the documentation because it's not possible to remember the syntax for every variation of every data science library. However, if you remember what is possible and can read the documentation, you'll always be able to refamiliarize yourself with it when you need to.</p>
<p>Whenever you see the syntax <code>ndarray.method_name()</code>, substitute <code>ndarray</code> with the name of your ndarray (in this case, <code>trip_mph</code>) like this:</p>
<p></p><center>
<img src="https://dq-content.s3.amazonaws.com/289/10.1-m289.svg">
</center><p></p>
<p>Let's use what we've just learned to calculate the maximum and mean (average) speed from our <code>trip_mph</code> ndarray.</p></div>

In [9]:
mph_min = trip_mph.min()

mph_max = trip_mph.max()
mph_mean = trip_mph.mean()

<div><p>Looking at the result of the code on the previous screen, we can see the following:</p>
<ul>
<li>Average (mean) trip speed (rounded): 170 mph</li>
<li>Maximum trip speed (rounded): 82,000 mph</li>
</ul>
<p>A trip speed of 82,000 mph is definitely not possible in New York traffic — that's almost 20x faster than the fastest plane in the world! And the average trip speed of 170 mph is also not possible. These results could be due to errors in the devices that record the data, or perhaps errors made somewhere in the data pipeline.</p>
<p>Before we look at other array methods, let's review the difference between methods and functions. <strong>Functions</strong> act as stand alone segments of code that usually take an input, perform some processing, and return some output. For example, we can use the <code>len()</code> function to calculate the length of a <em>list</em> or the number of characters in a <em>string</em>.</p>
</div>

```
my_list = [21,14,91]
print(len(my_list))
```
```
3
```
```
my_string = 'Dataquest'
print(len(my_string))
```
```
9
```
<div><p>Looking at the result of the code on the previous screen, we can see the following:</p>
<ul>
<li>Average (mean) trip speed (rounded): 170 mph</li>
<li>Maximum trip speed (rounded): 82,000 mph</li>
</ul>
<p>A trip speed of 82,000 mph is definitely not possible in New York traffic — that's almost 20x faster than the fastest plane in the world! And the average trip speed of 170 mph is also not possible. These results could be due to errors in the devices that record the data, or perhaps errors made somewhere in the data pipeline.</p>
<p>Before we look at other array methods, let's review the difference between methods and functions. <strong>Functions</strong> act as stand alone segments of code that usually take an input, perform some processing, and return some output. For example, we can use the <code>len()</code> function to calculate the length of a <em>list</em> or the number of characters in a <em>string</em>.</p>
</div>
```
my_string.append(' is the best!')
```
```
Traceback (most recent call last):
    File "stdin", line 1, in module
AttributeError: 'str' object has no attribute 'append'
```
<div>
<p>In NumPy, sometimes operations implement as both methods and functions, which can be confusing. Let's look at some examples:</p>
<table>
<thead>
<tr>
<th>Calculation</th>
<th>Function Representation</th>
<th>Method Representation</th>
</tr>
</thead>
<tbody>
<tr>
<td>Calculate the minimum value of <code>trip_mph</code></td>
<td><code>np.min(trip_mph)</code></td>
<td><code>trip_mph.min()</code></td>
</tr>
<tr>
<td>Calculate the maximum value of <code>trip_mph</code></td>
<td><code>np.max(trip_mph)</code></td>
<td><code>trip_mph.max()</code></td>
</tr>
<tr>
<td>Calculate the <a href="https://en.wikipedia.org/wiki/Mean" target="_blank">mean average</a> value of <code>trip_mph</code></td>
<td><code>np.mean(trip_mph)</code></td>
<td><code>trip_mph.mean()</code></td>
</tr>
<tr>
<td>Calculate the <a href="https://en.wikipedia.org/wiki/Median" target="_blank">median average</a> value of <code>trip_mph</code></td>
<td><code>np.median(trip_mph)</code></td>
<td>There is no ndarray median method</td>
</tr>
</tbody>
</table>
<p>To remember the right terminology, anything that starts with <code>np</code> (e.g., <code>np.mean()</code>) is a function, and anything expressed with an object (or variable) name first (e.g., <code>trip_mph.mean()</code>) is a method. When both exist, it's up to you to decide which to use, but it's much more common to use the method approach.</p></div>

# Calculating stats for 2D Ndarrays

<div><p>Next, we'll calculate statistics for 2D ndarrays.  If we use the <code>ndarray.max()</code> method on a <em>2D ndarray</em> without any additional parameters, it will return a single value, just like a 1D array:</p>
<p></p><center>
<img src="https://dq-content.s3.amazonaws.com/289/12.1-m289.gif">
</center><p></p>
<p>But what if we want to find the maximum value of each row? We need to use the <code>axis</code> parameter and specify a value of <code>1</code> to indicate that we want to calculate the maximum value for each row.</p>
<p></p><center>
<img src="https://dq-content.s3.amazonaws.com/289/12.2-m289.gif">
</center><p></p>
<p>If we want to find the maximum value of each column, we'd use an <code>axis</code> value of <code>0</code>:</p>
<p></p><center>
<img src="https://dq-content.s3.amazonaws.com/289/12.3-m289.gif">
</center><p></p>
<p>Let's use what we've learned to check the data in our taxi dataset. To remind ourselves how the data looks, let's examine the first five rows of the columns with indices 9 through 13:</p>
<table class="dataframe">
<thead>
<tr>
<th>fare_amount</th>
<th>fees_amount</th>
<th>tolls_amount</th>
<th>tip_amount</th>
<th>total_amount</th>
</tr>
</thead>
<tbody>
<tr>
<td>52.0</td>
<td>0.8</td>
<td>5.54</td>
<td>11.65</td>
<td>69.99</td>
</tr>
<tr>
<td>45.0</td>
<td>1.3</td>
<td>0.00</td>
<td>8.00</td>
<td>54.3</td>
</tr>
<tr>
<td>36.5</td>
<td>1.3</td>
<td>0.00</td>
<td>0.00</td>
<td>37.8</td>
</tr>
<tr>
<td>26.0</td>
<td>1.3</td>
<td>0.00</td>
<td>5.46</td>
<td>32.76</td>
</tr>
<tr>
<td>17.5</td>
<td>1.3</td>
<td>0.00</td>
<td>0.00</td>
<td>18.8</td>
</tr>
</tbody>
</table>
<p>You may have noticed that the sum of the first four values in each row should equal the last value, <code>total_amount</code>:</p>
<p></p><center><span class="MathJax_Preview" style="color: inherit; display: none;"></span><span id="MathJax-Element-4-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 109%; position: relative;" data-mathml="<math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;><mtext>fare amount</mtext><mo>+</mo><mtext>fees amount</mtext><mo>+</mo><mtext>tolls amount</mtext><mo>+</mo><mtext>tip amount</mtext><mo>=</mo><mtext>total amount</mtext></math>" role="presentation"><span id="MJXc-Node-26" class="mjx-math" aria-hidden="true"><span id="MJXc-Node-27" class="mjx-mrow"><span id="MJXc-Node-28" class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.41em; padding-bottom: 0.359em;">fare amount</span></span><span id="MJXc-Node-29" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.308em; padding-bottom: 0.41em;">+</span></span><span id="MJXc-Node-30" class="mjx-mtext MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.41em; padding-bottom: 0.359em;">fees amount</span></span><span id="MJXc-Node-31" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.308em; padding-bottom: 0.41em;">+</span></span><span id="MJXc-Node-32" class="mjx-mtext MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.41em; padding-bottom: 0.359em;">tolls amount</span></span><span id="MJXc-Node-33" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.308em; padding-bottom: 0.41em;">+</span></span><span id="MJXc-Node-34" class="mjx-mtext MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.359em; padding-bottom: 0.512em;">tip amount</span></span><span id="MJXc-Node-35" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.104em; padding-bottom: 0.308em;">=</span></span><span id="MJXc-Node-36" class="mjx-mtext MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.41em; padding-bottom: 0.359em;">total amount</span></span></span></span><span class="MJX_Assistive_MathML" role="presentation"><math xmlns="http://www.w3.org/1998/Math/MathML"><mtext>fare amount</mtext><mo>+</mo><mtext>fees amount</mtext><mo>+</mo><mtext>tolls amount</mtext><mo>+</mo><mtext>tip amount</mtext><mo>=</mo><mtext>total amount</mtext></math></span></span><script type="math/tex" id="MathJax-Element-4">\text{fare amount} + \text{fees amount} + \text{tolls amount} +\text{tip amount} = \text{total amount}</script></center><p></p>
<p>In the next exercise, we'll check these values. We'll only review the first five rows in <code>taxi</code> so we can verify the results more easily.</p></div>

In [8]:
# we'll compare against the first 5 rows only
taxi_first_five = taxi[:5]
# select these columns: fare_amount, fees_amount, tolls_amount, tip_amount
fare_components = taxi_first_five[:,9:13]

fare_sums = fare_components.sum(axis=1)
fare_totals = taxi_first_five[:, 13]
print(fare_sums)
print(fare_totals)

[69.99 54.3  37.8  32.76 18.8 ]
[69.99 54.3  37.8  32.76 18.8 ]
