# Introduction to Pandas

In [19]:
import numpy as np
import pandas as pd

`Series` is a 1-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.) The axis labels are collectively referred to as the index.
```
s = pd.Series(data, index=index)
```
`data` can be many different things:
- a Python dictionary
- an ndarray
- a scalar value

The passed index is a list of axis labels.

## From `ndarray`

If `data` is an ndarray, an index must be the same length as the data. If no index is passed, one will be created having values `[0, ..., len(data) - 1]`.

In [20]:
# Here, we specify the index
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
print(s, '\n')
print(s.index)

a    0.318553
b   -1.289665
c    0.907064
d   -0.414245
e   -0.888388
dtype: float64 

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')


In [21]:
# Here, we let Pandas create a default index
d= pd.Series(np.random.randn(5))
print(d, '\n')
print(d.index)

0   -0.295933
1   -0.214790
2    0.154161
3   -0.889950
4   -0.202108
dtype: float64 

RangeIndex(start=0, stop=5, step=1)


> `Pandas` supports non-unique index values. If an operation that does not support duplciate index values is attempted, an exception will be raised at that time.

## From `dict`

When data is a dict, and an index is not passed, the **Series** index will be ordered by the dict's insertion order. There is no sorting if you have:
- Python version >= 3.65
- Pandas version >= 0.23.

In [22]:
# Create a dictionary
d = {
    'b' : 1,
    'a' : 0,
    'c' : 2
}

# Create a database
pd.Series(d)

b    1
a    0
c    2
dtype: int64

## From `scalar` value

If `data` is a scalar value, an index must be provided. The value will be repeated to match the length of the index.

In [23]:
n = pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])
print(s)

a    0.318553
b   -1.289665
c    0.907064
d   -0.414245
e   -0.888388
dtype: float64


## Series is `ndarray`-like

Series acts very similarly to a `ndarray` from `NumPy` and is a valid argument to most Numpy functions. Operations such as slicing will also slice the index.

In [30]:
print(s[0], '\n')
print(s[:3], '\n')
print(s[s > s.median()], '\n')
print(s[[4, 3, 1]], '\n')
print(np.exp(s))

0.3185527493015059 

a    0.318553
b   -1.289665
c    0.907064
dtype: float64 

a    0.318553
c    0.907064
dtype: float64 

e   -0.888388
d   -0.414245
b   -1.289665
dtype: float64 

a    1.375136
b    0.275363
c    2.477040
d    0.660839
e    0.411318
dtype: float64


In [31]:
s.dtype

dtype('float64')

In [32]:
s.to_numpy()

array([ 0.31855275, -1.28966505,  0.90706435, -0.41424527, -0.88838827])

## Series is `dict`-like

A `series` is like a fixed-size dict in which you can get and set values by an index label.

In [33]:
print(s['a'])
print(s['e'])
s['e'] = 12
print(s['e'])
print('e' in s)
print('f' in s)

0.3185527493015059
-0.888388268996491
12.0
True
False


# Vectorized operations

When working with raw `NumPy` arrays, looping through value-by-value is usually not necessary. The same is true when working with `Series` in Pandas. `Series` can also be passed into most NumPy methods expecting an `ndarray`.

In [36]:
s + s

a     0.637105
b    -2.579330
c     1.814129
d    -0.828491
e    24.000000
dtype: float64

In [37]:
s * 2

a     0.637105
b    -2.579330
c     1.814129
d    -0.828491
e    24.000000
dtype: float64

In [38]:
np.exp(s)

a         1.375136
b         0.275363
c         2.477040
d         0.660839
e    162754.791419
dtype: float64

> The result of an operation between unaligned `Series` will involve the union of the indexes. If a label is not found in one `Series` or the other, the result with be marked as missing NaN.

# Name attribute

`Series` can also have a `name` attribute.

In [39]:
s = pd.Series(np.random.randn(5), name='something')

In [40]:
s

0    0.225447
1   -1.089910
2    0.991729
3    0.411017
4    1.036562
Name: something, dtype: float64

In [41]:
s.name

'something'