# Data Manipulation with Pandas 2

## Sorting 

We can sort the data in our DataFrame according to different criteria. For example, we can sort by index using the `sort_index` method.

In [None]:
# Don't forget to import the appropriate packages when you start a new notebook
import pandas as pd
import numpy as np

In [None]:
df = pd.DataFrame(np.arange(10).reshape((2,5)),
                  index = ['two','one'],
                  columns = ['b','c','a','e','d'])
df

In [None]:
df.sort_index() #sort by row index (default option)

In [None]:
df.sort_index(axis=1) #sort by column index

Data is by default sorted in ascending order; however, we can sort in descending order as follows:

In [None]:
df.sort_index(axis=1, ascending=False) #sort columns in descending order

If we want to sort a DataFrame by values instead of indexes, we can do the following:

In [None]:
df.sort_values(by='a', ascending=False) #sort dataframe according to values in column 'a' in descending order

## Computing Descriptive Statistics

In [None]:
#Creating a new dataframe

df2 = pd.DataFrame([[3.1, np.nan],[3.1, -5.0],[np.nan, np.nan],[0.2, -1.3]],
                 index = ['a','b','c','d'],
                 columns = ['One','Two'])
df2

In our Pandas Overview, we already learned how to get the sum and mean of the values in our DataFrame. Let's see what happens when we have NaN values. 

In [None]:
df2.sum()

In [None]:
df2.mean()

By default, NaN values are excluded from our calculations. If we don't want this to be the case, we can do the following:

In [None]:
df2.mean(axis = 'columns', skipna = False) #Calculate the mean across columns, without skipping the NaN values

If we want to find the index value where the maximum or minimum values are located, we can use `idxmin` and `idxmax`

In [None]:
df2.idxmax() #Returns the indexes where the maximum values are located

In [None]:
df2.idxmin() #Returns the indexes where the minimum values are located

The `describe` method is very useful to obtain a glance at the summary statistics of our DataFrame:

In [None]:
df2.describe()

When we work with non-numeric data, `describe` returns summary statistics such as count, unique, top and frequency. 

In [None]:
df2['Three'] = ['a','c','d','a'] #add a column with non-numeric data to our DataFrame
df2['Three'].describe() #Obtain summary statistics of column 'Three'

## Unique values

When we work with data sets, sometimes we need to obtain a list of all the unique values in a series or in a given column of a DataFrame. The function `unique()` allows us to get this information.

In [None]:
df2 #Printing the DataFrame again just as a reminder

In [None]:
df2['One'].unique() #Retunrs an array of the unique values in the column 'One' of df

The function `value_counts()` allows us to find the number of time each value appears in a Series or Column. 

In [None]:
df2['One'].value_counts()

In [None]:
# The function above can also be used as a top-level pandas method, as follows:
pd.value_counts(df2['One'], sort = False)   #Note that we have the option of sorting the values