# Pandas
In computer programming, pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
* Created in 2008 by Wes Mskinney.
* Open source new BSD license.



# The Series Data Structure
Pandas Series is an one dimensional labelled array, capable of holding values of any data types(integers, float, strings,python objects etc).
* The axis labels are collectively referred to as index.

In [0]:
# We cant import the pandas as.
import pandas as pd
import numpy as np
pd.Series?

## Create a series by using a list



In [0]:
animals = ['Tiger', 'Bear', 'Moose']
pd.Series(animals)

0    Tiger
1     Bear
2    Moose
dtype: object

Observation - Here the index are by default.

In [0]:
numbers = [1, 2, 3]
pd.Series(numbers)

0    1
1    2
2    3
dtype: int64

In [0]:
animals = ['Tiger', 'Bear', None]
pd.Series(animals)

0    Tiger
1     Bear
2     None
dtype: object

In [0]:
numbers = [1, 2, None]
pd.Series(numbers)

0    1.0
1    2.0
2    NaN
dtype: float64

In [0]:
import numpy as np
np.nan == None

False

In [0]:
np.isnan(np.nan)

True

## Create Series by using Python dictionary.

In [0]:
sports = {'Archery': 'Bhutan',
          'Golf': 'Scotland',
          'Sumo': 'Japan',
          'Taekwondo': 'South Korea'}
s = pd.Series(sports)
s

Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
dtype: object

In [0]:
s.index

Index(['Archery', 'Golf', 'Sumo', 'Taekwondo'], dtype='object')

**Observation -** Here the keys of the dictionary are the index of the Series.

In [0]:
# We can also access the index of Series by using dict.keys()
s.keys()

Index(['Archery', 'Golf', 'Sumo', 'Taekwondo'], dtype='object')

In [0]:
s.items()

<zip at 0x7f2d6407cec8>

## Creating a Series by passing a list and index of the Series

In [0]:
s = pd.Series(['Tiger', 'Bear', 'Moose'], index=['India', 'America', 'Canada'])
s

India      Tiger
America     Bear
Canada     Moose
dtype: object

In [0]:
sports = {'Archery': 'Bhutan',
          'Golf': 'Scotland',
          'Sumo': 'Japan',
          'Taekwondo': 'South Korea'}
s = pd.Series(sports, index=['Golf', 'Sumo', 'Hockey'])
s

Golf      Scotland
Sumo         Japan
Hockey         NaN
dtype: object

# Querying a Series
A pd.Series can be queried either by the index position or the index label.


In [0]:
sports = {'Archery': 'Bhutan',
          'Golf': 'Scotland',
          'Sumo': 'Japan',
          'Taekwondo': 'South Korea'}
s = pd.Series(sports)
s

Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
dtype: object

**iloc[]** - is used to query by numeric location.<br>
**loc[]** - To query by the index label we can use the loc attribute.

In [0]:
s.iloc[3]

'South Korea'

In [0]:
s.loc['Golf']

'Scotland'

In [0]:
# Returs the 4th element of the series.
s[3]

'South Korea'

In [0]:
s['Golf']

'Scotland'

In [0]:
sports = {99: 'Bhutan',
          100: 'Scotland',
          101: 'Japan',
          102: 'South Korea'}
s = pd.Series(sports)
s

99          Bhutan
100       Scotland
101          Japan
102    South Korea
dtype: object

In [0]:
s[1] #This won't call s.iloc[0] as one might expect, it generates an error instead

KeyError: ignored

### Vectorization
Pandas and the underlying NUmpy libraries support a method of computation called **Vectorization.**

In [0]:
s = pd.Series([100.00, 120.00, 101.00, 3.00])
s[1]

120.0

In [0]:
# Not vectorized
total = 0
for item in s:
    total+=item
print(total)

324.0


In [0]:
# Vectorized
import numpy as np

total = np.sum(s)
print(total)

324.0


In [0]:
#this creates a big series of random numbers
s = pd.Series(np.random.randint(0,1000,10000))
s.head()


0    264
1    999
2    704
3    248
4    288
dtype: int64

In [0]:
len(s)

10000

In [0]:
# Non vectorized
%%timeit -n 100
summary = 0
for item in s:
    summary+=item

100 loops, best of 3: 1.24 ms per loop


In [0]:
# Vectorized
%%timeit -n 100
summary = np.sum(s)

100 loops, best of 3: 84.4 µs per loop


**Observation** - Vectorized version is much faster than the non vectorized.

In [0]:
# Vectorized
s+=2 #adds two to each item in s using broadcasting
s.head()

0     266
1    1001
2     706
3     250
4     290
dtype: int64

In [0]:
# Non Vectorized
for label, value in s.iteritems():
    s.set_value(label, value+2)
s.head()

  


0     270
1    1005
2     710
3     254
4     294
dtype: int64

In [0]:
%%timeit -n 10
s = pd.Series(np.random.randint(0,1000,10000))
for label, value in s.iteritems():
    s.loc[label]= value+2

In [0]:
%%timeit -n 10
s = pd.Series(np.random.randint(0,1000,10000))
s+=2


In [0]:
s = pd.Series([1, 2, 3])
s.loc['Animal'] = 'Bears'
s

### Now lets create a Series and append into another series.

In [0]:
original_sports = pd.Series({'Archery': 'Bhutan',
                             'Golf': 'Scotland',
                             'Sumo': 'Japan',
                             'Taekwondo': 'South Korea'})
cricket_loving_countries = pd.Series(['Australia',
                                      'Barbados',
                                      'Pakistan',
                                      'England'], 
                                   index=['Cricket',
                                          'Cricket',
                                          'Cricket',
                                          'Cricket'])
all_countries = original_sports.append(cricket_loving_countries)

In [0]:
original_sports

Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
dtype: object

In [0]:
cricket_loving_countries

Cricket    Australia
Cricket     Barbados
Cricket     Pakistan
Cricket      England
dtype: object

In [0]:
all_countries

Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
Cricket        Australia
Cricket         Barbados
Cricket         Pakistan
Cricket          England
dtype: object

In [0]:
all_countries.loc['Cricket']

Cricket    Australia
Cricket     Barbados
Cricket     Pakistan
Cricket      England
dtype: object