## <center><b>Python for Data Science</b></center>
## <center><b>Lesson 28</b></center>
## <center><b>Pandas Data Manipulation -- Part Two</b></center>
## <center><b>Working with Pandas DataFrames ... The Essentials Part Two (Notes)</b></center>

![7.jpg](attachment:7.jpg)

<font size="6"><center>[Link: Pandas Documentation](https://pandas.pydata.org/docs/)</center></font>

##  <span style="color:red">TABLE OF CONTENTS</span>

1. [Sorting a DataFrame](#1)<br>
a. [Sorting by a Single Column](#1a)<br>
b. [Sorting by Multiple Columns](#1b)<br>
2. [Sorting by Index](#2)<br>
a. [Sorting by row index](#2a)<br>
b. [Sorting by column index](#2b)<br>
3. [Setting a New Index](#3)<br>

In [None]:
# set up notebook to display multiple output in one cell

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

print('The notebook is set up to display multiple output in one cell.')

In [None]:
import pandas as pd
import numpy as np

<div class="alert alert-block alert-warning">
    <b><font size="4">Files needed for this presentation:</font></b>
</div>

[**nba.csv**](https://drive.google.com/file/d/1QqOGTK_NWp6jtKYIhXc5tdlzYz_9y3nf/view?usp=share_link)

<hr style="border:1px solid gray">

<a class="anchor" id="1"></a>
# <span style="color:blue"><b>1. Sorting a DataFrame</b></span>

Data:  &emsp;[**nba.csv**](https://drive.google.com/file/d/1QqOGTK_NWp6jtKYIhXc5tdlzYz_9y3nf/view?usp=share_link)

In [None]:
nba = pd.read_csv('nba.csv')

In [None]:
nba

<a class="anchor" id="1a"></a>
## <span style="color:red"><b><i>a. Sorting by a Single Column</b></span>

- Often times a dataset’s rows arrive in jumbled, random order.
- That’s no problem because a DataFrame can be sorted by one or more columns by using the sort_values method.
- Let’s first sort the players in alphabetical order by Name. 
- The sort_values method’s first parameter, <b>by</b>, accepts the column that pandas should use to sort the DataFrame.
- Let’s pass in the Name column as a string:

In [None]:
# The two lines below are equivalent

nba.sort_values("Name")
nba.sort_values(by = "Name")

- The sort_values method’s ascending parameter determines the sort order; it has a default argument of True. 
- By default, pandas will sort a column of numbers in increasing order, a column of strings in alphabetical order, and a column of datetimes in chronological order.
- If we wanted to sort the names in reverse alphabetical order, we could pass the ascending parameter a False instead.

In [None]:
nba.sort_values("Name", ascending = False).head()

In [None]:
nba.sort_values("Name", ascending = False).tail()

<a class="anchor" id="1b"></a>
## <span style="color:red"><b><i>b. Sorting by Multiple Columns</b></span>

- We can sort multiple columns in a DataFrame by passing a list to the sort_values method’s <b>by</b> parameter. 
- Pandas will sort the DataFrame’s columns consecutively in the order in which they appear in the list. 
- The next example sorts the nba DataFrame first by the Team column and then by the Name column. 
- Pandas defaults to ascending sorts for all columns.

In [None]:
nba.sort_values(by = ["Team", "Name"])

- Here’s how you read the output ... The Atlanta Hawks are the first team in the data set when we sort teams by alphabetical order. Within the Atlanta Hawks, Alex Len’s name comes first, followed by Allen Crabbe and Brandon Goodwin. Pandas repeats this sorting logic for the remaining teams and names.
- We can pass a single Boolean to the ascending parameter to apply the same sort order to each column. 
- The next example passes False, so pandas first sorts the Team column in descending order and then the Name column in descending order.

In [None]:
nba.sort_values(["Team", "Name"], ascending = False)

- What if we want to sort each column in a different order? For example, We might want to sort theteams in ascending order and  then sort the salaries within those teams in descending order. 
- To accomplish this task, we can pass the ascending parameter a list of Boolean values. The lists passed to the by and ascending parameters must be equal in length. 
- Pandas will use shared index positions between the two lists to match each column with its associated sort order. 
- In the next example, the Team column occupies index position 0 in the by list; pandas matches it with the True at index position 0 in the ascending list, so it sorts the column in ascending order. Pandas applies the same logic to the Salary column and sorts it in descending order.|

In [None]:
nba.sort_values(by = ["Team", "Salary"], ascending = [True, False])

- The data looks good, so let’s make our sort permanent. The sort_values method supports the inplace parameter, but we’ll be explicit and reassign the returned DataFrame to the nba variable.

In [None]:
nba = nba.sort_values(by = ["Team", "Salary"], ascending = [True, False])

<a class="anchor" id="2"></a>
# <span style="color:blue"><b>2. Sorting by Index</b></span>

- With our permanent sort, our DataFrame is in a different order from when we started. 

In [None]:
nba.head()

- How can we return it to its original form?

<a class="anchor" id="2a"></a>
## <span style="color:red"><b><i>a. Sorting by row index</b></span>

- Our nba DataFrame still has its numeric index. If we could sort the data set by index positions rather than by column values, we could return it to its original shape. The sort_index method does just that.

In [None]:
# The two lines below are equivalent

nba.sort_index().head()

nba.sort_index(ascending = True).head()

- We can also reverse the sort order by passing False to the method’s ascending parameter. The next example shows the greatest index positions first.

In [None]:
nba.sort_index(ascending = False).head()

- We are back where we started, with the DataFrame sorted by index position. 
- Let’s assign this DataFrame back to the nba variable:

In [None]:
nba = nba.sort_index()

In [None]:
nba.head()

- Next up, let’s explore how we can sort our nba data on its other axis.

<a class="anchor" id="2b"></a>
## <span style="color:red"><b><i>b. Sorting by column index</b></span>

- A DataFrame is a two-dimensional data structure so we can sort an additional axis: the vertical axis.
- To sort the DataFrame columns in order, we’ll again rely on the sort_index method. 
- This time, however, we’ll need to add an axis parameter and pass it an argument of "columns" or 1. 
- The next example sorts the columns in ascending order.

In [None]:
# The two lines below are equivalent

nba.sort_index(axis = "columns").head()

nba.sort_index(axis = 1).head()

- How about sorting the columns in reverse alphabetical order? 
- That task is a simple one: we can pass the ascending parameter an argument of False. 
- The next example invokes the sort_index method, targets the columns with the axis parameter,and sorts in descending order with the ascending parameter.

In [None]:
nba.sort_index(axis = "columns", ascending = False).head()

- Let’s take a second to reflect on the power of pandas. With two methods and a few parameters, we were able to sort the DataFrame on both axes, by one column, by multiple columns, in ascending order, in descending order, or in multiple orders.
- Pandas is remarkably flexible. We only have to combine the right method with the right arguments.

<a class="anchor" id="3"></a>
# <span style="color:blue"><b>3. Setting a New Index</b></span>

- At its core, our data set is a collection of players. Therefore, it seems fitting to use the Name column’s values as the DataFrame’s index labels. 
- Name also has the benefit of being the only column with unique values.
- The set_index method returns a new DataFrame with a given column set as the index. 
- Its first parameter, keys, accepts the column name as a string.

In [None]:
# The two lines below are equivalent

nba.set_index(keys = "Name")

nba.set_index("Name")

- Let’s overwrite our nba variable.

In [None]:
nba = nba.set_index(keys = "Name")

- As a side note, we can set the index when importing a data set. 
- Pass the column name as a string to the read_csv function’s index_col parameter. 
- The following code leads to the same DataFrame.

In [None]:
nba = pd.read_csv("nba.csv", parse_dates = ["Birthday"], index_col = "Name")

In [None]:
nba.head()