# Pandas Series

In [1]:
import numpy as np

In [2]:
import pandas as pd

**Series** is a one-dimensional labelled array. It can hold any data type (integers, strings, floats, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create these arrays is the following syntax:

In [3]:
#s = pd.Series(data, index=index)

Some examples of what the *data* parameter could be are:
* a Python dict
* an ndarray
* a scalar value
<br>
<br>
The passed index is a list of axis labels.

## From ndarray
If the *data* parameter is an ndarray, an index must be the same length as the data. If an index isn't passed, one will generate with the values **[0, ..., len(*data*) - 1]**

In [4]:
#here, we specify our index

In [7]:
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])

In [8]:
s

a   -0.083832
b    1.132018
c    0.567136
d    1.208829
e   -0.111735
dtype: float64

In [9]:
s.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [10]:
#Here, we let Pandas create a default index
pd.Series(np.random.randn(5))

0    0.741194
1   -1.736756
2   -0.768725
3    0.038756
4    1.125928
dtype: float64

## From dict
*Series* can be created from dict.

In [11]:
d = {'b': 1, 'a': 0, 'c': 2}

In [12]:
pd.Series(d)

b    1
a    0
c    2
dtype: int64

Note: When data is a dict, and an index is not passed, the *Series* index will be ordered by the dict's insertion order. There is no sorting if you have Python version >= 3.6 and Pandas version >= 0.23.

## From scalar value

8If **data** is a scalar value, an index must be provided. The value will be repeated to match the length of the index.

In [13]:
pd.Series(5., index=['a','b','c','d','e'])

a    5.0
b    5.0
c    5.0
d    5.0
e    5.0
dtype: float64

## Series is *ndarray*-like
Series acts very similarly to an *ndarray* from **NumPy** and is a valid argument to most NumPy functions. Operations such as slicing will also slice the index.

In [14]:
s[0]

-0.08383162335878215

In [15]:
s[:3]

a   -0.083832
b    1.132018
c    0.567136
dtype: float64

In [16]:
s[s > s.median()]

b    1.132018
d    1.208829
dtype: float64

In [17]:
s[[4,3,1]]

e   -0.111735
d    1.208829
b    1.132018
dtype: float64

In [18]:
np.exp(s)

a    0.919586
b    3.101911
c    1.763210
d    3.349561
e    0.894281
dtype: float64

Each series has a *dtype*.

In [19]:
s.dtype

dtype('float64')

While **Series** is ndarray-like, if you need an actual ndarray, then use *Series.to_numpy()*

In [20]:
s.to_numpy()

array([-0.08383162,  1.13201841,  0.5671358 ,  1.20882916, -0.11173547])

## Series is *dict*-like
A **Series** is like a fixed-size dict in which you can get and set values by an index label.

In [21]:
s['a']

-0.08383162335878215

In [22]:
s['e'] = 12

In [23]:
s

a    -0.083832
b     1.132018
c     0.567136
d     1.208829
e    12.000000
dtype: float64

In [24]:
'e' in s

True

In [25]:
'f' in s

False

If a label is not containes, an exception is raised:

In [27]:
s['f']

KeyError: 'f'

# Vectorized operations
When working with raw **NumPy** arrays, looping through value-by-value is usually not necessary. The same is true when working with *Series* in Pandas. Series can also be passed into most NumPy methods expecting an ndarray.

In [28]:
s + s

a    -0.167663
b     2.264037
c     1.134272
d     2.417658
e    24.000000
dtype: float64

In [29]:
s * 2

a    -0.167663
b     2.264037
c     1.134272
d     2.417658
e    24.000000
dtype: float64

In [30]:
np.exp(s)

a         0.919586
b         3.101911
c         1.763210
d         3.349561
e    162754.791419
dtype: float64

A key difference between **Series** and **ndarray** is that operations between Series automatically align data based on the label. Thus, you can write computations without considering whether the **Series** involved have the same labels.

In [32]:
s1 = s[1:]
s2 = s[:-1]
s1 + s2

a         NaN
b    2.264037
c    1.134272
d    2.417658
e         NaN
dtype: float64

*The result of an operation between unaligned Series will have the union of the indexes involved. If a label is not found in one Series or the other, the result will be marked as missing NaN.*

## Name attribute
**Series** can also have a *name* attribute

In [33]:
s = pd.Series(np.random.randn(5), name='something')

In [34]:
s

0   -1.792081
1   -0.253439
2    0.256264
3   -0.809442
4    0.754589
Name: something, dtype: float64

In [35]:
s.name

'something'