# Operations

There are lots of operations with pandas that will be really useful to you, but don't fall into any distinct category. Let's show them here in this lecture:

In [3]:
import pandas as pd
df = pd.DataFrame({'col1':[1,2,3,4],'col2':[444,555,666,444],'col3':['abc','def','ghi','xyz']})
df.head()

Unnamed: 0,col1,col2,col3
0,1,444,abc
1,2,555,def
2,3,666,ghi
3,4,444,xyz


### Info on Unique Values

In [4]:
df['col2'].unique()

array([444, 555, 666])

In [5]:
df['col2'].nunique() #Shows the number of values that are unique


3

In [6]:
df['col2'].value_counts()

444    2
555    1
666    1
Name: col2, dtype: int64

In [19]:
#Since this returns a pandas Series we can then filter by number
dic = df['col2'].value_counts()[df['col2'].value_counts()>1].to_dict()
dic.keys()

dict_keys([444])

### Selecting Data

In [20]:
#Select from DataFrame using criteria from multiple columns
newdf = df[(df['col1']>2) & (df['col2']==444)]

In [21]:
newdf

Unnamed: 0,col1,col2,col3
3,4,444,xyz


### Applying Functions

In [22]:
def times2(x):
    return x*2

In [23]:
df['col1'].apply(times2)

0    2
1    4
2    6
3    8
Name: col1, dtype: int64

In [25]:
df['col1'].sum()

10

In [27]:
df['col3'].apply(len)

0    3
1    3
2    3
3    3
Name: col3, dtype: int64

In [28]:
df['col1'].apply(lambda x: x*x)

0     1
1     4
2     9
3    16
Name: col1, dtype: int64

**Sort and Ordering**

In [29]:
df

Unnamed: 0,col1,col2,col3
0,1,444,abc
1,2,555,def
2,3,666,ghi
3,4,444,xyz


In [31]:
df.sort_values(by=['col2','col3'])

Unnamed: 0,col1,col2,col3
0,1,444,abc
3,4,444,xyz
1,2,555,def
2,3,666,ghi
