# Operations

There are lots of operations with pandas that will be really useful to you, but don't fall into any distinct category. Let's show them here in this lecture:

In [None]:
import numpy as np
import pandas as pd

In [None]:
# For having gridlines

In [None]:
%%HTML
<style type="text/css">
table.dataframe td, table.dataframe th {
    border: 1px  black solid !important;
  color: black !important;
}

In [None]:
df = pd.DataFrame({'col1':[1,2,3,4],'col2':[444,555,666,444],'col3':['abc','def','ghi','xyz']})

In [None]:
df.head()

### Info on Unique Values

In [None]:
#  There are 3 main methods concerned with finding unique values in a DataFrame
df['col2'].unique() # Finds all the unique values in col2
# Returns numpy array of the unique values.

In [None]:
# Say, instead of getting a numpy array of unique values, you just need the number of unique values.
   # (i)  Either check the length of the array that gets returned from unique() method.
   # (ii) Use pandas built-in method called nunique() to get number of unique values.

In [None]:
len(df['col2'].unique())

In [None]:
df['col2'].nunique()

In [None]:
# If you want a table of unique values and how many times they show up then use value_counts() method.
df['col2'].value_counts() #Returns how many times each value occured in the column.

### Selecting Data

In [None]:
# Return the df where col1 > 2
df[df['col1']>2]

In [None]:
# Wrap conditions in parantheses to combine them.
df[(df['col1']>2)  & (df['col2']>443)] # Gives the columns where col1 > 2 and col2 > 443 simultaneously.

### Applying Functions

In [None]:
def times2(x):
    return 2*x

In [None]:
df['col1'].sum() # Returns sum of the column

In [None]:
# What if you need to apply your own custom function, such as times2 ?
#  You can do so by col_name.apply(pass_in_user_defined_function_to_apply)
df['col1'].apply(times2) # Broadcasts the function to every element in the column

In [None]:
#  Say we want to have a column which had length of each string.
df['col3'].apply(len) # Broadcasts the function len upon every element of column 3, i.e. output 3 for each string.

In [None]:
#  To get a list of column names
df.columns

In [None]:
# Apply is even more powerful when you use it with lambda expressions.
# It also tkaes away the time wasted to define an entire function if you apply it only once.
df['col2'].apply(lambda x: x*2)

In [None]:
# To get the info of index - Reports back start, stop and size of the index.
df.index

In [None]:
#  To drop a column, REMEMBER! To set axis = 1
df.drop('col1',axis=1)

# Sorting and Ordering

In [None]:
df

In [None]:
#  To sort col2
df.sort_values('col2') # Pass in the column you want to sort by.
# df.sort_values(by='col2') #Alternate way of doing the same thing above.
# Notice : In df above and df.sort() look how index stays attached to the row, so that you never lose that information.

* A really useful method to find **null** values in dataframe is using isnull() method

In [None]:
df.isnull() # Returns a dataframe of booleans indicating whether values are null or not.

## Pivot Table Method

In [None]:
data = {'A':['foo','foo','foo','bar','bar','bar'],
     'B':['one','one','two','two','one','one'],
       'C':['x','y','x','y','x','y'],
       'D':[1,3,2,5,4,1]}

df = pd.DataFrame(data)

In [None]:
df # Notice how we have repeating values in columns A, B and C.
# If not familiar with pivot table feature of MS Excel then don't worry, what we are going to do is create a multi-index
# out of the dataframe below.

In [None]:
df.pivot_table(values='D',index = ['A','B'],columns=['C']) 

* pivot_table() takes in 3 main arguments values, index and columns.
* Here we want the values in our table are D value. Data points will be made up of D.
* Then we make index = ["A","B"] . This makes A and B multi-level index.
* Finally columns to be defined by C column.
#### Just a concept that we put in so as to cover all the topics, but if you don't get this then don't worry, you'll be rarely using pivot_table and won't have a problem understanding this series of notebooks even if you miss out on pivot_table.

# Great Job!