## Pandas

Pandas is built on numpy, and can be described as NumPy with labels, i.e, it is a package that deals with the data in tabular form but attaches more general labels and not just numerical indices to the rows and columns.

1. Data labels and descriptive indices 
2. Robust handling of common data formats and missing data
3. Relational-database operations (such as joins)

## Pandas Series

(one-dimensional numpy arrays)

1. Making series objects from Python lists and dicts
2. Extracting indices and values from a series object
3. Indexing series objects implicitly(using numerical indices) and explicitly(using indices that you assign to the series yourself)

In [10]:
import pandas as pd

In [11]:
#We can make pandas series from a python list

s = pd.Series([1,2,4,9,16,25],name='squares')

In [12]:
s       #The series contains a numpy array with implicit indices and values

0     1
1     2
2     4
3     9
4    16
5    25
Name: squares, dtype: int64

In [13]:
s.values

array([ 1,  2,  4,  9, 16, 25])

In [14]:
s.index

RangeIndex(start=0, stop=6, step=1)

In [15]:
s[0]

1

In [16]:
s[2]

4

In [17]:
#Slicing

s[2:4]

2    4
3    9
Name: squares, dtype: int64

In [30]:
#Series with explicit indices

pop2014 = pd.Series([100,99.3,95.5,93.5,92.4,84.8,84.5,78.9,74.3,72.8],index=['Java','C','C++','Python','C#','PHP','JavaScript','Ruby','R','Matlab'])


In [19]:
pop2014

Java          100.0
C              99.3
C++            95.5
Python         93.5
C#             92.4
PHP            84.8
JavaScript     84.5
Ruby           78.9
R              74.3
Matlab         72.8
dtype: float64

In [20]:
pop2014.index

Index(['Java', 'C', 'C++', 'Python', 'C#', 'PHP', 'JavaScript', 'Ruby', 'R',
       'Matlab'],
      dtype='object')

In [21]:
#We can still index by number
pop2014[0]

100.0

In [22]:
pop2014[0:2]

Java    100.0
C        99.3
dtype: float64

In [23]:
#Selecting an element with an explicit value from the index
pop2014['Python']

93.5

In [24]:
pop2014['C++':'C#']

C++       95.5
Python    93.5
C#        92.4
dtype: float64

**Note that here Pandas deviates from the standard slicing convention since the ending element of the slice C# is included.**

If you just use brackets for indexing, Pandas will do its best to decide whether you're trying to use numbers or explicit values for indices. However, you can also be explicit.

In [25]:
# In that case, you can use the 'iloc' object to specify that you're using numbers and 
#the 'loc' object to make it clear that you're using explicit values.

pop2014.iloc[0:2]

Java    100.0
C        99.3
dtype: float64

In [26]:
pop2014.loc[:'Ruby']

Java          100.0
C              99.3
C++            95.5
Python         93.5
C#             92.4
PHP            84.8
JavaScript     84.5
Ruby           78.9
dtype: float64

In [27]:
#also can use advanced indexing for instance, a boolean mask

pop2014[pop2014 > 90]

Java      100.0
C          99.3
C++        95.5
Python     93.5
C#         92.4
dtype: float64

In [28]:
# Another way to make a series is from a Python dictionary

pop2015 = pd.Series({'Java': 100,'C': 99.9,'C++': 99.4,'Python': 96.5,'C#':91.3,'R': 84.8,'PHP': 84.5, 'JavaScript': 83.0, 'Ruby': 76.2, 'Matlab': 72.4})

In [29]:
pop2015

C              99.9
C#             91.3
C++            99.4
Java          100.0
JavaScript     83.0
Matlab         72.4
PHP            84.5
Python         96.5
R              84.8
Ruby           76.2
dtype: float64