<h1 style="font-size: 40px; margin-bottom: 0px;">16.6 DataFrames and <br /> working with your data in Python</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 750px;"></hr>

Today, we'll expand on what we've been learning from the previous exercises and practice applying what we've learned to working with Pandas DataFrame objects. We'll first review how to use keywords and operators to set up conditional statements and review how mathematical operators can be used to perform maths on arrays. Then we'll apply these concepts as we learn about DataFrames, setting us up to work with our own data in Python tomorrow.

<strong>Today's learning objectives:</strong>
<ol>
    <li>Review conditional statements</li>
    <li>Review mathematical operations on arrays</li>
    <li>Understanding DataFrames</li>
    <li>Creating DataFrames</li>
    <li>Working with DataFrames</li>
</ol>

In [None]:
#Import your packages as needed
#I'll begin transitioning to having any required packages imported at the start of notebooks
#so all you need to do to get started is run this cell

import numpy as np
import pandas as pd

<h1 style="font-size: 40px; margin-bottom: 0px;">Exercises</h1>
<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 400px;"></hr>

<h1 style="font-size: 32px;">Exercise #1: Review conditional statements</h1>

Recall from 16-3 that you can make use of keywords and operators to set up a statement to evaluate whether or not that statement is true. 

<ul>
    <li><code>&excl;&equals;</code> - ...not equal to...</li>
    <li><code>&gt;</code> - ...greater than...</li>
    <li><code>&lt;</code> - ...less than...</li>
    <li><code>&gt;&equals;</code> - ...greater than or equal to...</li>
    <li><code>&lt;&equals;</code> - ...less than or equal to...</li>
</ul>

Previously, we used this to direct our Python interpreter to take a specific action based on whether or not specific conditions were met. In those examples, we made provided Python with a non-composite data type such as a float, int, or string and assessed whether or not the conditional statement was <code>True</code> or <code>False</code>. 

In the code cell below, Jack has set up a random integer generator using numpy's <code>np.random.randin()</code> function. <a href="https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html" rel="noopener noreferrer"><u>Documentation for <code>np.random.randint()</code> can be found here.</u></a>

In the same code cell, set up a conditional statement to generate a Boolean output based on whether or not the randomly generated number is greater than or equal to 5.

In [None]:
liebchen_generator = np.random.randint(10)

<h1 style="font-size: 32px;">Exercise #2: Evaluate if elements of a compound data type satisfy a condition</h1>

Conditional statements can also be applied to compound data types like a list or an array. Dig into <a href="https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html" rel="noopener noreferrer"><u>the <code>np.random.randint()</code> documentation</u></a> and see what argument you will need to pass to the function if you want to create a 1D array of ten randomly generated integers between 0 (inclusive) and 10 (exclusive), which is sometimes indicated in documentation as a shorthand [0, 10). 

Create the array below and assign it to a variable to be used in the next exercise.

Now set up a conditional statement as you would with a non-composite data type (like in exercise #1), but instead use the variable for your 1D array. Evaluate whether the variable is greater than or equal to 5, and assign the output to a new variable. We'll also make use of this object in exercise #3.

How does the output look compared to when you set up a conditional statement with just one randomly generated number?

What you can see is that each element of your array is evaluated against the conditional statement that you set up, so the output is a second array, which simply contains within it the Boolean result for each element in their respective position in the array.

Now use <code>np.random.randint()</code> to try to set up a 2D array that's 4x4 in shape and apply our conditional statement to the 2D array.

How does the output look now? 

You should see that the resulting output is the same shape as the array that you're evaluating, with each position corresponding to that element's Boolean result. This will become important later when we perform image analyses, as this is fundamentally no different than what happens when you threshold an image.

<h1 style="font-size: 32px;">Exercise #3: Review mathematical operations on arrays</h1>

Recall from 16-3 that you can make use of operators to perform the mathematical operations that we're all familiar with, and when we use these operators on an array, we instruct our Python interpreter to apply that operation to each element (element-wise operation).

For example, take your array of ten randomly generated integers from exercise #2, and multiply 5 to each element.

Does the output look as you would expect it to?

The same can be done with other math operators as well. Feel free to try in the cell below.

<h1 style="font-size: 32px;">Exercise #4: Mathematical operations with multiple arrays</h1>

With mathematical operators, we can also perform element-wise operations using two arrays. In this situation, the operation is performed using an element from one array and an element from another array at the same position. So the element at position 0 from the first array and the element at position 0 from the second array are used in the operation, and then Python moves to the next position in both arrays and repeats the operation. This goes on until the end of both arrays. Something to note is that the two objects must have the same shape otherwise you'll run into an error because element-wise operation can't be performed for those elements that aren't matched up to another.

Instead of every element of your array being multiplied (or added,subtracted,divided) by the same constant value (like in exercise #3), the value is instead dependent on the elements in the second array. 

Taking your array from exercise #2 and the Boolean output, multiply them together.

How does the resulting output look? 

Why are some values preserved but others are 0? Depending on the Boolean values in your second array, you may not have 0s or you may have all 0s.

When we dive into image analysis in the Fall, you'll learn more about how you can apply this concept to image analysis because what you've essentially just done is segmented your array based on a specified threshold.

<h1 style="font-size: 32px;">Exercise #5: Understanding DataFrames</h1>

DataFrames will be probably be among the most important compound data types that you'll be working with because the pandas package facilitates much more convenient data handling via its DataFrames and associated functions. DataFrames can be thought of as a container holding multiple Series together in a single object. Because of all the tools available, DataFrames offer a lot of versatility for data analysis and the type of analyses that we often do. DataFrames, often shorthanded to <code>df</code>, are more similar to the types of tables/spreadsheets that we are familiar with through Excel and Google Sheets. 

DataFrames are two-dimensional with both rows and columns, unlike Series which only contain positions along a single dimension. Each row and column of a DataFrame have their own positional information and can also contain a label as well, defining what that row or column is. Again not too different than Excel or Google Sheets.

DataFrames can also hold heterogeneous data, much like lists. So you can have floats, integers, strings, etc all in a single DataFrame, and you won't usually run into issues.

There's a large number of ways to create a pandas DataFrame from the ground up, and you can find <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html" rel="noopener noreferrer"><u>documentation on different ways to do so here</u></a>.

By making use of the <code>pd.DataFrame()</code> function, we can construct a 2D data structure that contains rows and columns. And if we specify labels for rows and/or columns, we can access their elements using "dictionary-notation". So in a way, much like Series, a DataFrame can be thought of as dictionary-like as well.

If we visualize the DataFrame that we want to create as if it were on an Excel spreadsheet.

<table style="margin-left: 0px;">
    <tr>
        <th style="background-color: none; border-left: none; border-top: none;">&nbsp;</th>
        <th style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;">x</th>
        <th style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;">y</th>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>0</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">100</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">200</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>1</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">13</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">10</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>2</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">1000</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">2000</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>3</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">2</td>
    </tr>    
</table>

<h1 style="font-size: 32px;">Exercise #6: Creating DataFrame from list or array</h1>

See if you can create a 2D <u>list</u> or 2D <u>array</u> that is organized like the table above. You don't need to worry about the row/column labels for now. We'll use this 2D matrix to create a DataFrame in the next step. Save the list to a variable.

Now dig into <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html" rel="noopener noreferrer"><u>the documentation for <code>pd.DataFrame()</code></u></a> and see if you can set up a DataFrame with our 2D list. Save the DataFrame to a variable.

Now print the variable to visualize your DataFrame. You can also make use of the <code>df.style</code> attribute to have Python output a stylized/formatted table for your DataFrame. <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.style.html#pandas.DataFrame.style" rel="noopener noreferrer"><u>Documentation on the style attribute can be found here.</u></a> You can replace the <code>df</code> placeholder with the variable corresponding to your DataFrame.

How does the output look between using <code>print()</code> compared to <code>df.style</code>?

What you might have noticed is that none of the columns have the labels that our example table had. Go back to your cell where you created the DataFrame. What adjustments do you need to make to the code in order to add the column labels when the DataFrame is set up?

Update the code and regenerate the DataFrame with the correct columns.

Recall that you can use dictionary-notation to access a column in a DataFrame. See if you can pull out the <code>x</code> column, and then see if you can pull out the <code>y</code> column by treating the column labels as dictionary keys.

<h1 style="font-size: 32px;">Exercise #7: Creating DataFrame from a dictionary</h1>

In addition to lists, we can also set up a DataFrame from a dictionary as well. In this situation, the dictionary key will simply become the column label and the values paired to that key populate the column. In order to pair multiple values to a single key, we'll group all those values together into a list, and then pair the list to our key.

For example, we can set up a dictionary with a single key:value pair, where the value is a list object.

```
Tony = {'cat' : [0, 1, 2, 3]}
```

And if we want, we can then set up a dictionary with multiple key:value pairs.

```
Tony = {'cat' : [0, 1, 2, 3], 'meow' : [4, 5, 6, 7], 'whisker' : [8, 9, 10, 11]}
```

Take a look at how the dictionaries print, and see if you can use dictionary-notation to pull out <code>'cat'</code>.

Now pass the dictionary to the <code>pd.DataFrame()</code> function.

How does the DataFrame look? What happens if your dictionary's key:value pairs have different amounts of values?

<h1 style="font-size: 32px;">Exercise #8: Creating DataFrame from pandas Series</h1>

You can use the <code>pd.DataFrame()</code> function to create DataFrames from Series, and how you set up the arguments can affect whether you create a DataFrame with your Series as a column or as a row.

If you want your Series to populate a column, then you can simply set up your Series, and run it through the <code>pd.DataFrame()</code> function.

Let's set up a simple Series matching the values from column <code>x</code> of our DataFrame from exercise #5. Recall that to define a Series, you will need to use the <code>pd.Series()</code> function. <a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.html" rel="noopener noreferrer"><u>Documentation for <code>pd.Series()</code> can be found here.</u></a> Also specify the <code>name</code> as <code>'x'</code> when creating the Series. Save the Series to a variable.

Now use the <code>pd.DataFrame()</code> function to create a DataFrame from your Series.

How has the values populated your newly created DataFrame?

If you instead want your Series to populate a row, essentially filling in the values horizontally across the DataFrame, you can place your Series into a list when passing it to <code>pd.DataFrame()</code>.

For example,

```
Liebchen = pd.DataFrame([meow_series])
```

See if you can have your Series instead populate a row instead of a column when setting up your DataFrame.

<h1 style="font-size: 32px;">Exercise #9: Setting up DataFrame from more than one Series</h1>

To prepare for this exercise, set up a Series with values matching column <code>y</code> from exercise #5. Also specify the <code>name</code> as <code>'y'</code> when creating this Series. Save to another variable.

To then create a DataFrame from the two Series we set up (corresponding to the <code>x</code> and <code>y</code> columns), there are different approaches we can potentially take, and below are just two examples.

The first approach will allow us to fill in each Series as its own column within a DataFrame. To do this, you can follow the same set up that we used to set up a DataFrame using a dictionary containing mulitple key:value pairs. Instead of pairing keys to lists, you'll pair keys to Series. See if you can set up a DataFrame with this approach.

The second approach will have us fill in each Series as its own <u>row</u> within a DataFrame. In this case, rather than having our Series in a dictionary, we group our Series together into a list. We provide the list to the <code>pd.DataFrame()</code> function. Give that a try below.

How does the DataFrame look?

<h1 style="font-size: 32px;">Exercise #10: Transposing a DataFrame</h1>

In exercise #9, you ended up with two DataFrames with different orientations. If you ever want to transpose your DataFrame (changing rows to columns and columns to rows), you can easily do so with the <code>df.transpose()</code> function or the <code>df.T</code> property. <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.transpose.html" rel="noopener noreferrer"><u>Documentation on the <code>df.transpose()</code> function is found here.</u></a>

See if you can take your last DataFrame from exercise #9 and transpose it so that the rows are now the columns.

In a new code cell below, output a stylized table (see exercise #6 on the <code>df.style</code> attribute).

How does the DataFrame look now?

<h1 style="font-size: 32px;">Exercise #11: Retrieving elements in a DataFrame</h1>

To prepare for this exercise, let's take what we've learned and set up a DataFrame that looks like the below table:

<table style="margin-left: 0px;">
    <tr>
        <th style="background-color: none; border-left: none; border-top: none; width: 100px;">&nbsp;</th>
        <th style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; width: 100px;">size</th>
        <th style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; width: 100px;">weight</th>
        <th style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; width: 100px;">taste</th>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; "><strong>apple</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">2</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">8</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>orange</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">3</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">7</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>cherry</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">1</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">2</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">3</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>pineapple</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">8</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">7</td>
    </tr>    
</table>

When working with your data, you'll oftentimes need to retrieve a subset of your data to perform calculations, statistical analysis, or simply to visualize you results. Like with other compound data types, you can pull information from your DataFrame. You can either specify the column, index, or both (if you wanted to retrieve a specific cell).

The syntax is like that used for a Series, where you can treat it like a dictionary and use keys to tell Python what rows and columns to retrive data from. Depending on what you want to retrieve, you will provide Python with slightly different things.

<h3>Using dictionary-notation</h3>

You can simply use dictionary-notation to pull columns out based on their key.

```
fruits_df['size']
```

The above code will pull out the <code>'size'</code> column from our DataFrame, which is not too different than selecting a row in Excel or Google Sheets like in the below table.

<table style="margin-left: 0px;">
    <tr>
        <th style="background-color: none; border-left: none; border-top: none; width: 100px;">&nbsp;</th>
        <th style="background-color: #FFFF00; border: 1px solid; border-color: #000000; width: 100px;">size</th>
        <th style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; width: 100px;">weight</th>
        <th style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; width: 100px;">taste</th>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; "><strong>apple</strong></td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">2</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">8</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>orange</strong></td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">3</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">7</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>cherry</strong></td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">1</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">2</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">3</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>pineapple</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">8</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">7</td>
    </tr>    
</table>

Where the yellow cells are the ones that you are retrieving.

Give that a try below. Try subsetting your data one column at a time.

If we wanted to pull more than one column, we can specify a list of columnn labels instead of just a single column label.

```
fruits_df[['size', 'weight']]
```

So then the idea isn't too different than selecting multiple columns on Excel or Google Sheets.

<table style="margin-left: 0px;">
    <tr>
        <th style="background-color: none; border-left: none; border-top: none; width: 100px;">&nbsp;</th>
        <th style="background-color: #FFFF00; border: 1px solid; border-color: #000000; width: 100px;">size</th>
        <th style="background-color: #FFFF00; border: 1px solid; border-color: #000000; width: 100px;">weight</th>
        <th style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; width: 100px;">taste</th>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; "><strong>apple</strong></td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">2</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">8</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>orange</strong></td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">3</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">7</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>cherry</strong></td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">1</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">2</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">3</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>pineapple</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">8</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">7</td>
    </tr>    
</table>

Give that a try below.

We can also rearrange our columns by re-ordering the column labels when we specify them in the list to pull out columns. Give that a try below to see if you can rearrange columns.

<h3>To retrieve elements using the locator function:</h3>

Recall from lecture that you can also retrieve data using a label-based or index-based locator. 

Label-based locator is specified by <code>df.loc[]</code>, where <code>df</code> is a placeholder for a DataFrame, and the label/key goes into the <code>&lbrack;&rbrack;</code> square brackets. The index-based locator is specified by <code>df.iloc[]</code>, where the row or column index position goes into the <code>&lbrack;&rbrack;</code> square brackets.

<a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html" rel="noopener noreferrer"><u>The documentation for <code>df.loc[]</code> can be found here.</u></a>

<a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html" rel="noopener noreferrer"><u>The documentation for <code>df.iloc[]</code> can be found here.</u></a>

If you wanted to pull a single row, you would simply specify the key for <code>df.loc[]</code> or the index position for <code>df.iloc[]</code>. 

For example, if you wanted to pull the row <code>orange</code>, which is located at position 1:

```
fruits_df.loc['orange']
```

Or I could use the index position:

```
fruits_df.iloc[1]
```

Notice that <code>.loc</code> was swapped to using the index-based locator <code>.iloc</code> because you're providing the index value, and you don't want Python to think it's corresponding to a key.

So that would look something like this in Excel or Google Sheets:

<table style="margin-left: 0px;">
    <tr>
        <th style="background-color: none; border-left: none; border-top: none; width: 100px;">&nbsp;</th>
        <th style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; width: 100px;">size</th>
        <th style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; width: 100px;">weight</th>
        <th style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; width: 100px;">taste</th>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; "><strong>apple</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">2</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">8</td>
    </tr>
    <tr>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;"><strong>orange</strong></td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">3</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">7</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>cherry</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">1</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">2</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">3</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>pineapple</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">8</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">7</td>
    </tr>    
</table>


Give it a try below to try and retrieve rows from a DataFrame using index-based and/or label-based locators.

One thing to keep in mind is that our DataFrames are 2D, and so if we want to do more than just pull rows using the locators, we'll need to actually specify the axes as well.

For example, if I wanted to use <code>.loc</code> to retrieve the <code>weight</code> column:

```
fruits_df.loc[:, 'weight']
```

You might notice that the information passed to the <code>.loc</code> is different now. 

<strong>Breaking down the line of code, you get:</strong>

<ul>
<code>fruits_df</code> - This is our DataFrame from which we're retrieving elements.<br />
<code>.loc[]</code> - This is our label-based locator.<br />
<code>:</code> - The colon is slice notation that indicates that we want to retrieve everything from the specified axis.<br />
<code>,</code> - The comma is a delimiter separating out the two axes of a DataFrame. To the left of the commaa are the rows, and to the right are the columns. So the colon to the left of the comma indicates that we're pulling all the rows.<br />
<code>'weight'</code> - This is the label for the weight column.
</ul>

So overall, the code is invoking the <code>fruits_df</code> DataFrame, trying to locate something within it based on the label, and it will retrieve everything that is the intersection of all rows and the weight column.

So that will look something like the below in Excel or Google Sheets:

<table style="margin-left: 0px;">
    <tr>
        <th style="background-color: none; border-left: none; border-top: none; width: 100px;">&nbsp;</th>
        <th style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; width: 100px;">size</th>
        <th style="background-color: #FFFF00; border: 1px solid; border-color: #000000; width: 100px;">weight</th>
        <th style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; width: 100px;">taste</th>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; "><strong>apple</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">2</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">8</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>orange</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">3</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">7</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>cherry</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">1</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">2</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">3</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>pineapple</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">8</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">7</td>
    </tr>    
</table>

Give that a try below. See if you can subset your DataFrame.

Like with dictionary-notation, you can also provide a list of column labels to <code>.loc</code>, and it will retrieve the rows specified in the list.

For example:

```
fruits_df.loc[:, ['weight', 'taste']]
```

The above code will then pull the weight and the taste columns, which would look like the following in Excel or Google Sheets:

<table style="margin-left: 0px;">
    <tr>
        <th style="background-color: none; border-left: none; border-top: none; width: 100px;">&nbsp;</th>
        <th style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; width: 100px;">size</th>
        <th style="background-color: #FFFF00; border: 1px solid; border-color: #000000; width: 100px;">weight</th>
        <th style="background-color: #FFFF00; border: 1px solid; border-color: #000000; width: 100px;">taste</th>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; "><strong>apple</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">2</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">8</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>orange</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">3</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">7</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>cherry</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">1</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">2</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">3</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>pineapple</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">8</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">7</td>
    </tr>    
</table>

Give that try below and see if you can rearrange the column order when using the <code>.loc</code> locator.

<h1 style="font-size: 32px;">Exercise #12: Practice retrieving elements</h1>

Now see if you can take what you've learned about specifying two axes with the <code>.loc</code> locator and use that to pull two rows from a DataFrame.

<h1 style="font-size: 32px;">Exercise #13: Retrieve a single element</h1>

With the <code>.loc</code> locator, we can specify a single row and a single column from which we're pulling from, and Python will find the intersection of that column and row, pulling out a single element. That will essentially look like the below table in Excel or Google Sheets:

<table style="margin-left: 0px;">
    <tr>
        <th style="background-color: none; border-left: none; border-top: none; width: 100px;">&nbsp;</th>
        <th style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; width: 100px;">size</th>
        <th style="background-color: #FFFDB6; border: 1px solid; border-color: #000000; width: 100px;">weight</th>
        <th style="background-color: #FFFFFF; border: 1px solid; border-color: #000000; width: 100px;">taste</th>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000; "><strong>apple</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">2</td>
        <td style="background-color: #FFFDB6; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">8</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>orange</strong></td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">3</td>
        <td style="background-color: #FFFDB6; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">7</td>
    </tr>
    <tr>
        <td style="background-color: #FFFDB6; border: 1px solid; border-color: #000000;"><strong>cherry</strong></td>
        <td style="background-color: #FFFDB6; border: 1px solid; border-color: #000000;">1</td>
        <td style="background-color: #FFFF00; border: 1px solid; border-color: #000000;">2</td>
        <td style="background-color: #FFFDB6; border: 1px solid; border-color: #000000;">3</td>
    </tr>
    <tr>
        <td style="background-color: #EEEEEE; border: 1px solid; border-color: #000000;"><strong>pineapple</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">8</td>
        <td style="background-color: #FFFDB6; border: 1px solid; border-color: #000000;">5</td>
        <td style="background-color: #FFFFFF; border: 1px solid; border-color: #000000;">7</td>
    </tr>    
</table>

See if you can pull a single value like shown in the above table.

<h1 style="font-size: 32px;">Exercise #13: Filtering data</h1>

For this exercise, we'll combine what we learned about conditional statements and what we've learned about retrieving elements from a DataFrame in order to filter a DataFrame, whether that's to remove data we don't want or won't need for analysis, etc.

Like with lists, we can also set up a conditional statement with DataFrames as well. In the cell below, apply a conditional statement to <code>fruits_df</code> to determine what elements are greater than 2.

How does the output look like? What is the data type of the outputs?

Now we often will filter based on specific conditions rather than broadly across all rows and columns. Instead, we might be interested in the characteristics of fruits that are greater than 2 in size, so we'll want to filter out fruits that do not meet that minimum size.

To break this down into manageable pieces, let's first see if we can retrieve just the size column and then set up a conditional statement to return a Boolean value based on whether or not the elements are greater than 2.

One thing about <code>.loc</code> is that we can also pass it a Boolean array, which is what we got above when we set up our conditional statement for the size column. So let's then insert our conditional statement into the brackets of <code>.loc[]</code>, and don't forget to specify what DataFrame we're pulling from, so <code>fruits.loc[]</code>.

Before running the code, remember that we may need to specify two-dimensions when invoking <code>.loc</code> to make sure that Python knows clearly how we want it to filter our DataFrame. What axis would the conditional statement be applied to? and how would we tell Python to pull all the values along the other axis after filtering based on size?

Run your code to see if you were able to get Python to filter your DataFrame, leaving you with just the fruits whose size is greater than two and their characteristics.

<h1 style="font-size: 32px;">Exercise #14: Performing math operations on DataFrames</h1>

Like arrays, you can also use math operators to perform element-wise calculations on DataFrames. Pandas also has a large number of functions that you can invoke to perform more complicated calculations or to do more specific operations on your DataFrame.

Play around with some basic math operations with your DataFrame. Is the output as you would expect for simple calculations?

Pandas can also perform more specific/complicated calculations, such as calculating the mean. To do this, you'll make use of the <code>df.mean()</code> function. <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html" rel="noopener noreferrer"><u>Documentation for <code>df.mean()</code> is found here.</u></a>

In our situation, we will want to find the average of the size, weight, and taste, which means that we will operate along each column, finding the average of values within a column, rather than along each row. Look into <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html" rel="noopener noreferrer"><u>the documentation for <code>df.mean()</code></u></a> and see if you can identify what argument we will need to pass to the <code>df.mean()</code> function.

<h1 style="font-size: 40px; margin-bottom: 0px;">Summary</h1>
<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 400px;"></hr>

Today, you reinforced your understanding of DataFrames and began learning how to work with them. Specifically, you learned how to access/retrieve elements within a DataFrame, and then you built on that knowledge as well as your knowledge applying conditional statements to filter a DataFrame. You also learned that you can use the usual operators to perform element-wise calculations on DataFrames, and you can find additional functions in pandas to perform more specific and/or more complicated calculations. 