# Operations

Reviewing some of the most important operations for pandas.

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.DataFrame({'col1': [1,2,3,4], 'col2': [4444,5555,6666,4444], 'col3':['abc', 'def', 'ghi', 'xyz']})
df.head()

Unnamed: 0,col1,col2,col3
0,1,4444,abc
1,2,5555,def
2,3,6666,ghi
3,4,4444,xyz


### Finding unique values.

There are **three* useful methods concerned with finding unique values in a data frame.

**Unique() & nunique()** : `unique()` returns all the unique values in a data frame/column, and `nunique()` returns the number of unique values.

**value_counts()**: `value_counts()` returns the number of times each unique value is repeated in a column.

In [8]:
df['col2'].unique()


array([4444, 5555, 6666], dtype=int64)

In [9]:
df['col2'].nunique()


3

In [10]:
df['col2'].value_counts()

4444    2
5555    1
6666    1
Name: col2, dtype: int64

### Conditional selection and data selection.



In [12]:
df[df['col1']>2]

Unnamed: 0,col1,col2,col3
2,3,6666,ghi
3,4,4444,xyz


In [14]:
df[(df['col1']>2) & (df['col2'] == 6666)]

Unnamed: 0,col1,col2,col3
2,3,6666,ghi


### Apply Method.

the **apply method** is one of the most important methods when using pandas.

We can use the `apply()` method to apply a function to a data frame, such as the example below.

and `apply()` can be especially powerful when you combine it with lambda expersions, which will save us time as we wont need to create whole functions to perform a certain action or expression.

In [15]:
def times2(x):
    return x*2

In [16]:
#Using apply with a function that we created
df['col1'].apply(times2)

0    2
1    4
2    6
3    8
Name: col1, dtype: int64

In [17]:
#Using the apply function with a built-in function.
df['col3'].apply(len)

0    3
1    3
2    3
3    3
Name: col3, dtype: int64

In [19]:
#Using the apply function with a lambda expression.
df['col2'].apply(lambda x:x*2)

0     8888
1    11110
2    13332
3     8888
Name: col2, dtype: int64

In [21]:
#you can use .column or .index to get the names of the columns and indices.
df.columns


RangeIndex(start=0, stop=4, step=1)

In [22]:
df.index

RangeIndex(start=0, stop=4, step=1)

### Sorting and ordering a data frame.

using the `sort_values()` function we can pass in the column that we want to sort by and when ran it will return the data frame sorted to that specification, and it also keeps the index of the columns the same.


In [23]:
df.sort_values('col2')

Unnamed: 0,col1,col2,col3
0,1,4444,abc
3,4,4444,xyz
1,2,5555,def
2,3,6666,ghi


In [24]:
#finding if the value is null

df.isnull()

Unnamed: 0,col1,col2,col3
0,False,False,False
1,False,False,False
2,False,False,False
3,False,False,False


### Pivot table

the `pivot_table` function creates a multi index table with the data frame.
we can use the `df.pivot_table` function to create a pivot table with the data points

In [25]:
data = {'A': ['foo', 'foo', 'foo','bar', 'bar', 'bar'], 
        'B': ['one', 'one', 'two', 'two','one', 'one'],
        'C':['x', 'y','x', 'y', 'x', 'y'],
        'D': [1,3,2,5,4,1]}

df = pd.DataFrame(data)

In [26]:
df

Unnamed: 0,A,B,C,D
0,foo,one,x,1
1,foo,one,y,3
2,foo,two,x,2
3,bar,two,y,5
4,bar,one,x,4
5,bar,one,y,1


In [27]:
df.pivot_table(values='D', index=['A', 'B'], columns=['C'])

Unnamed: 0_level_0,C,x,y
A,B,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,4.0,1.0
bar,two,,5.0
foo,one,1.0,3.0
foo,two,2.0,
