# Introduction to Pandas library

![title](https://www.telegraph.co.uk/content/dam/news/2016/08/23/106598324PandawaveNEWS_trans_NvBQzQNjv4Bqeo_i_u9APj8RuoebjoAHt0k9u7HhRJvuo-ZLenGRumA.jpg?imwidth=400)

#### Pandas is one of the best things that happened to Python in recent times. It is due to Pandas that Python has become the language of choice for data scientists.
#### Using Pandas we can handle tabular in Python with absolute ease.

#### The expectation here is that you learn enough Pandas to completely replace Excel or spreadsheet software from your analysis routines.

In [1]:
import pandas

#### `import` command allows you to import any Python package that is already installed.
#### Since we are used Anaconda to install Python, Pandas was automatically installed along with a many other useful packages/libraries

#### In most scenarios your data would be present in a file (most likely and unfortunately in an Excel file).
#### However, for the pedagogic reasons we will try to create a Pandas table (or DataFrame, as its called) from a regular Python data structure

In [2]:
my_friends = [
    {
        'name': 'Jon',
        'age': 23,
        'vocation': 'nightwatch'
    },
    {
        'name': 'Gendry',
        'age': 17,
        'vocation': 'blacksmith'
    },
    {
        'name': 'Arya',
        'age': 14,
        'vocation': 'assassin'
    },
    {
        'name': 'Tywin',
        'age': 62,
        'vocation': 'advisor'
    },
    {
        'name': 'Olena',
        'age': 62,
        'vocation': 'advisor'
    },
    {
        'name': 'Jaqen',
        'age': 35,
        'vocation': 'assassin'
    },
    {
        'name': 'Jon',
        'age': 28,
        'vocation': 'coder'
    },
]

In [3]:
print (my_friends)

[{'name': 'Jon', 'age': 23, 'vocation': 'nightwatch'}, {'name': 'Gendry', 'age': 17, 'vocation': 'blacksmith'}, {'name': 'Arya', 'age': 14, 'vocation': 'assassin'}, {'name': 'Tywin', 'age': 62, 'vocation': 'advisor'}, {'name': 'Olena', 'age': 62, 'vocation': 'advisor'}, {'name': 'Jaqen', 'age': 35, 'vocation': 'assassin'}, {'name': 'Jon', 'age': 28, 'vocation': 'coder'}]


In [4]:
my_friends

[{'name': 'Jon', 'age': 23, 'vocation': 'nightwatch'},
 {'name': 'Gendry', 'age': 17, 'vocation': 'blacksmith'},
 {'name': 'Arya', 'age': 14, 'vocation': 'assassin'},
 {'name': 'Tywin', 'age': 62, 'vocation': 'advisor'},
 {'name': 'Olena', 'age': 62, 'vocation': 'advisor'},
 {'name': 'Jaqen', 'age': 35, 'vocation': 'assassin'},
 {'name': 'Jon', 'age': 28, 'vocation': 'coder'}]

#### This is wehere Pandas' magic begins

In [5]:
friends_df = pandas.DataFrame(my_friends)

In [6]:
friends_df

Unnamed: 0,age,name,vocation
0,23,Jon,nightwatch
1,17,Gendry,blacksmith
2,14,Arya,assassin
3,62,Tywin,advisor
4,62,Olena,advisor
5,35,Jaqen,assassin
6,28,Jon,coder


In [7]:
friends_df.T   # T for transpose. It is a method of our DataFrame

Unnamed: 0,0,1,2,3,4,5,6
age,23,17,14,62,62,35,28
name,Jon,Gendry,Arya,Tywin,Olena,Jaqen,Jon
vocation,nightwatch,blacksmith,assassin,advisor,advisor,assassin,coder


#### We can get back different kinds of Python data strucutres from a DataFrame

In [8]:
friends_df.to_dict('records')

[{'age': 23, 'name': 'Jon', 'vocation': 'nightwatch'},
 {'age': 17, 'name': 'Gendry', 'vocation': 'blacksmith'},
 {'age': 14, 'name': 'Arya', 'vocation': 'assassin'},
 {'age': 62, 'name': 'Tywin', 'vocation': 'advisor'},
 {'age': 62, 'name': 'Olena', 'vocation': 'advisor'},
 {'age': 35, 'name': 'Jaqen', 'vocation': 'assassin'},
 {'age': 28, 'name': 'Jon', 'vocation': 'coder'}]

In [9]:
friends_df.to_dict('list')

{'age': [23, 17, 14, 62, 62, 35, 28],
 'name': ['Jon', 'Gendry', 'Arya', 'Tywin', 'Olena', 'Jaqen', 'Jon'],
 'vocation': ['nightwatch',
  'blacksmith',
  'assassin',
  'advisor',
  'advisor',
  'assassin',
  'coder']}

#### In most cases, running a method of the DataFrame gives back a new DataFrame and doesn't make changes in the DataFrame itself

In [10]:
friends_df.sort_values('age')

Unnamed: 0,age,name,vocation
2,14,Arya,assassin
1,17,Gendry,blacksmith
0,23,Jon,nightwatch
6,28,Jon,coder
5,35,Jaqen,assassin
3,62,Tywin,advisor
4,62,Olena,advisor


In [11]:
friends_df

Unnamed: 0,age,name,vocation
0,23,Jon,nightwatch
1,17,Gendry,blacksmith
2,14,Arya,assassin
3,62,Tywin,advisor
4,62,Olena,advisor
5,35,Jaqen,assassin
6,28,Jon,coder


In [12]:
sorted_friends = friends_df.sort_values('age')
sorted_friends

Unnamed: 0,age,name,vocation
2,14,Arya,assassin
1,17,Gendry,blacksmith
0,23,Jon,nightwatch
6,28,Jon,coder
5,35,Jaqen,assassin
3,62,Tywin,advisor
4,62,Olena,advisor


In [13]:
friends_df.sort_values('age', ascending=False)

Unnamed: 0,age,name,vocation
3,62,Tywin,advisor
4,62,Olena,advisor
5,35,Jaqen,assassin
6,28,Jon,coder
0,23,Jon,nightwatch
1,17,Gendry,blacksmith
2,14,Arya,assassin


In [14]:
#### Some useful and commonly used methods of the DataFrame

In [15]:
friends_df['age'].mean()

34.42857142857143

In [16]:
friends_df.age.mean()

34.42857142857143

In [17]:
friends_df['vocation'].value_counts()

assassin      2
advisor       2
blacksmith    1
coder         1
nightwatch    1
Name: vocation, dtype: int64

In [18]:
friends_df.max()

age                 62
name             Tywin
vocation    nightwatch
dtype: object

In [19]:
friends_df.max()['name']

'Tywin'

In [20]:
friends_df.min()['name']

'Arya'

#### Basics of filtering DataFrames

In [21]:
friends_df

Unnamed: 0,age,name,vocation
0,23,Jon,nightwatch
1,17,Gendry,blacksmith
2,14,Arya,assassin
3,62,Tywin,advisor
4,62,Olena,advisor
5,35,Jaqen,assassin
6,28,Jon,coder


In [22]:
friends_df['vocation'] == 'assassin'

0    False
1    False
2     True
3    False
4    False
5     True
6    False
Name: vocation, dtype: bool

In [23]:
friends_df[friends_df['vocation'] == 'assassin']

Unnamed: 0,age,name,vocation
2,14,Arya,assassin
5,35,Jaqen,assassin


In [24]:
friends_df['age'] > 20

0     True
1    False
2    False
3     True
4     True
5     True
6     True
Name: age, dtype: bool

In [25]:
friends_df[friends_df['age'] > 20]

Unnamed: 0,age,name,vocation
0,23,Jon,nightwatch
3,62,Tywin,advisor
4,62,Olena,advisor
5,35,Jaqen,assassin
6,28,Jon,coder


In [26]:
(friends_df['age'] > 20) & (friends_df['vocation'] != 'nightwatch')

0    False
1    False
2    False
3     True
4     True
5     True
6     True
dtype: bool

In [27]:
friends_df[(friends_df['age'] > 20) & (friends_df['vocation'] != 'nightwatch')]

Unnamed: 0,age,name,vocation
3,62,Tywin,advisor
4,62,Olena,advisor
5,35,Jaqen,assassin
6,28,Jon,coder


In [28]:
filter1 = friends_df['age'] > 20
filter1

0     True
1    False
2    False
3     True
4     True
5     True
6     True
Name: age, dtype: bool

In [29]:
filter2 = friends_df['vocation'] != 'nightwatch'
filter2

0    False
1     True
2     True
3     True
4     True
5     True
6     True
Name: vocation, dtype: bool

In [30]:
friends_df[filter1 & filter2]

Unnamed: 0,age,name,vocation
3,62,Tywin,advisor
4,62,Olena,advisor
5,35,Jaqen,assassin
6,28,Jon,coder


In [31]:
friends_df[filter2]

Unnamed: 0,age,name,vocation
1,17,Gendry,blacksmith
2,14,Arya,assassin
3,62,Tywin,advisor
4,62,Olena,advisor
5,35,Jaqen,assassin
6,28,Jon,coder


In [32]:
friends_df[filter1]

Unnamed: 0,age,name,vocation
0,23,Jon,nightwatch
3,62,Tywin,advisor
4,62,Olena,advisor
5,35,Jaqen,assassin
6,28,Jon,coder
