# Productivity with pandas
![](http://pandas.pydata.org/_static/pandas_logo.png)

**ToC**
 - [unique - find unique rows](#unique---find-unique-rows)
   - [nunique - find number of unique rows](#nunique---find-number-of-unique-rows)
   - [value_counts - find unique values and number of occurrences](#value_counts---find-unique-values-and-number-of-occurrences)
 - [apply - batch process column values](#apply---batch-process-column-values)
 - [sort_values - sorting rows](#sort_values---sorting-rows)
 - [isnull - finding null values throughtout the DataFrame](#isnull---finding-null-values-throughout-the-DataFrame)

## unique - find unique rows
Find unique rows in dataset

In [1]:
import pandas as pd

In [2]:
comp_data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],
       'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
       'Sales':[200,120,340,124,243,350]}

comp_df = pd.DataFrame(comp_data)
comp_df

Unnamed: 0,Company,Person,Sales
0,GOOG,Sam,200
1,GOOG,Charlie,120
2,MSFT,Amy,340
3,MSFT,Vanessa,124
4,FB,Carl,243
5,FB,Sarah,350


In [3]:
# find unique company names
comp_df['Company'].unique()

array(['GOOG', 'MSFT', 'FB'], dtype=object)

### nunique - find number of unique rows
More efficient than finding the unique array and finding the length of it.

In [4]:
comp_df['Company'].nunique()

3

### value_counts - find unique values and number of occurrences

In [5]:
comp_df['Company'].value_counts()

GOOG    2
FB      2
MSFT    2
Name: Company, dtype: int64

## apply - batch process column values
Calling the apply() is similar to calling the `map()` in Python. It can apply an operation on all records of a selected column. For instance, to find the squared sales, do the following

In [6]:
comp_df['sq_sales'] = comp_df['Sales'].apply(lambda x:x*x)
comp_df

Unnamed: 0,Company,Person,Sales,sq_sales
0,GOOG,Sam,200,40000
1,GOOG,Charlie,120,14400
2,MSFT,Amy,340,115600
3,MSFT,Vanessa,124,15376
4,FB,Carl,243,59049
5,FB,Sarah,350,122500


We can also define a function and call that within the `apply()` method. This can accept values of one or more columns to calculate a new column.

In [9]:
def cuber(row):
    return row['Sales'] * row['sq_sales']

comp_df['cu_sales'] = comp_df.apply(cuber, axis=1) 
#note - how the function is called as an obj
# note - how I need to set axis to 1, instead of 0 which is defualt.
comp_df

Unnamed: 0,Company,Person,Sales,sq_sales,cu_sales
0,GOOG,Sam,200,40000,8000000
1,GOOG,Charlie,120,14400,1728000
2,MSFT,Amy,340,115600,39304000
3,MSFT,Vanessa,124,15376,1906624
4,FB,Carl,243,59049,14348907
5,FB,Sarah,350,122500,42875000


## sort_values - sorting rows

In [10]:
comp_df.sort_values('Sales')

Unnamed: 0,Company,Person,Sales,sq_sales,cu_sales
1,GOOG,Charlie,120,14400,1728000
3,MSFT,Vanessa,124,15376,1906624
0,GOOG,Sam,200,40000,8000000
4,FB,Carl,243,59049,14348907
2,MSFT,Amy,340,115600,39304000
5,FB,Sarah,350,122500,42875000


Note how the index remains attached to the original rows.

In [12]:
#sorting along multiple columns
comp_df.sort_values(['Company','Sales'])

Unnamed: 0,Company,Person,Sales,sq_sales,cu_sales
4,FB,Carl,243,59049,14348907
5,FB,Sarah,350,122500,42875000
1,GOOG,Charlie,120,14400,1728000
0,GOOG,Sam,200,40000,8000000
3,MSFT,Vanessa,124,15376,1906624
2,MSFT,Amy,340,115600,39304000


## isnull - finding null values throughout the DataFrame

In [13]:
comp_df.isnull()

Unnamed: 0,Company,Person,Sales,sq_sales,cu_sales
0,False,False,False,False,False
1,False,False,False,False,False
2,False,False,False,False,False
3,False,False,False,False,False
4,False,False,False,False,False
5,False,False,False,False,False
