### 10 minutes to Pandas

In [1]:
# importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Creating a Series by passing a list of values, letting pandas create a default integer index:

In [2]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
print s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64


### Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

```mySeries = pd.Series(data, index=index)
```

Syntax for declaring a Series object

Here, data can be many different things:

- a Python dict
- an ndarray
- a scalar value (like 5)
The passed index is a list of axis labels. Thus, this separates into a few cases depending on what data is

#### From ndarray

If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0, ..., len(data) - 1]

In [3]:
s = pd.Series(np.random.randn(5), index=['a','b','c','d','e'])

print s

print s.index

print pd.Series(np.random.randn(5))

a    0.339193
b    0.720179
c    2.086801
d    0.460535
e   -1.278014
dtype: float64
Index([u'a', u'b', u'c', u'd', u'e'], dtype='object')
0   -0.996717
1    1.273811
2    0.356022
3   -0.590274
4    0.058864
dtype: float64


#### From dict

If data is a dict, if index is passed the values in data corresponding to the labels in the index will be pulled out. Otherwise, an index will be constructed from the sorted keys of the dict, if possible.

In [4]:
d = {'a' : 0., 'b' : 1., 'c' : 2.}

print pd.Series(d)

print pd.Series(d, index=['b', 'c', 'd', 'a'])

a    0.0
b    1.0
c    2.0
dtype: float64
b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64


** Note NaN (not a number) is the standard missing data marker used in pandas **

#### From scalar value
If data is a scalar value, an index must be provided. The value will be repeated to match the length of index

In [8]:
pd.Series(5., index=['a', 'b', 'c'])

a    5.0
b    5.0
c    5.0
dtype: float64

Series acts very similarly to a ndarray, and is a valid argument to most NumPy functions. However, things like slicing also slice the index.

In [16]:
print s
print s[0]
print s[:3]

print s.median()

print s > s.median()

print s[s > s.median()]

print s[[4,3,1]]


a    0.339193
b    0.720179
c    2.086801
d    0.460535
e   -1.278014
dtype: float64
0.339192700753
a    0.339193
b    0.720179
c    2.086801
dtype: float64
0.460535342798
a    False
b     True
c     True
d    False
e    False
dtype: bool
b    0.720179
c    2.086801
dtype: float64
e   -1.278014
d    0.460535
b    0.720179
dtype: float64


A Series is like a fixed-size dict in that you can get and set values by index label:

In [20]:
print s['a']

s['e'] = 100

print s

print 'e' in s

0.339192700753
a      0.339193
b      0.720179
c      2.086801
d      0.460535
e    100.000000
dtype: float64
True


Using the get method, a missing label will return None or specified default:

In [24]:
print s.get('f')

# if not found return -1
print s.get('f', -1)

print s.get('f', np.nan)

None
-1
nan


### Vectorized operations and label alignment with Series

When doing data analysis, as with raw NumPy arrays looping through Series value-by-value is usually not necessary. Series can also be passed into most NumPy methods expecting an ndarray.

In [26]:
print s+s

print s*2


a      0.678385
b      1.440359
c      4.173602
d      0.921071
e    200.000000
dtype: float64
a      0.678385
b      1.440359
c      4.173602
d      0.921071
e    200.000000
dtype: float64


A key difference between Series and ndarray is that operations between Series automatically align the data based on label. Thus, you can write computations without giving consideration to whether the Series involved have the same labels.

In [28]:
print s[1:]

print s[:-1]

print s[1:] + s[:-1]

b      0.720179
c      2.086801
d      0.460535
e    100.000000
dtype: float64
a    0.339193
b    0.720179
c    2.086801
d    0.460535
dtype: float64
a         NaN
b    1.440359
c    4.173602
d    0.921071
e         NaN
dtype: float64


The result of an operation between unaligned Series will have the union of the indexes involved. If a label is not found in one Series or the other, the result will be marked as missing NaN. Being able to write code without doing any explicit data alignment grants immense freedom and flexibility in interactive data analysis and research. The integrated data alignment features of the pandas data structures set pandas apart from the majority of related tools for working with labeled data.

#### Name attr

Series can also have a name attribute:

In [30]:
s = pd.Series(np.random.randn(5), name="something")
print s

# renaming a series
s2 = s.rename("different")
print s2

0   -0.898311
1    0.551849
2   -1.312258
3    0.442739
4   -1.086597
Name: something, dtype: float64
0   -0.898311
1    0.551849
2   -1.312258
3    0.442739
4   -1.086597
Name: different, dtype: float64
