# Pandas Series

## Pandas Data Structure
1. Series - array based
2. Data frame - looking like Excel

In [4]:
import pandas as pd
import numpy as np

In [5]:
# Pandas Series
rand_float = pd.Series([123.45 ,55.45 ,334.12 ,332.12 ,45.54 ,87.65, 89.01])

In [6]:
rand_float

0    123.45
1     55.45
2    334.12
3    332.12
4     45.54
5     87.65
6     89.01
dtype: float64

**Series** are array-like structure of data. It contains info such as data, data indices, name, and series type. Series can have only one data type at a time. The Pandas use _numpy arrays_ to store these data. Maybe they look like lists at first, but actually are like dictionaries.

In [7]:
rand_float.name = "Random Numbers"

In [8]:
rand_float

0    123.45
1     55.45
2    334.12
3    332.12
4     45.54
5     87.65
6     89.01
Name: Random Numbers, dtype: float64

In [9]:
# return value only
rand_float.values

array([123.45,  55.45, 334.12, 332.12,  45.54,  87.65,  89.01])

In [10]:
rand_float.index

RangeIndex(start=0, stop=7, step=1)

Pandas' Series, like arrays/lists, provides built-in indices to data which is a range from 0 to the length of the array, 1 step at a time. But in Series, you can change the index of each data.

In [11]:
# modifying indices
rand_float.index = [
    'Q',
    'W',
    'E',
    'R',
    'T',
    'Y',
    'U'
]

In [12]:
rand_float

Q    123.45
W     55.45
E    334.12
R    332.12
T     45.54
Y     87.65
U     89.01
Name: Random Numbers, dtype: float64

In [13]:
rand_float['W']

55.45

You can also create series like creating a dictionary.

In [14]:
s = pd.Series({
    'Q': 123.45,
    'W': 55.45,
    'E': 334.12
}, name = "Random Float")

In [15]:
s

Q    123.45
W     55.45
E    334.12
Name: Random Float, dtype: float64

This is another way to create Series

In [16]:
certificates_earned = pd.Series(
    [8, 2, 5, 6],
    index=['Tom', 'Kris', 'Ahmad', 'Beau']
)

print(certificates_earned)

Tom      8
Kris     2
Ahmad    5
Beau     6
dtype: int64


<br>

## Indexing

Selecting items are the same from Regular Python where we use the index.

In [17]:
rand_float['U']

89.01

In [22]:
rand_float[['Q','U']]

Q    123.45
U     89.01
Name: Random Numbers, dtype: float64

We can still use the numeric index with `iloc`.

In [20]:
rand_float.iloc[-1]

89.01

In [21]:
rand_float.iloc[[0,-1]]

Q    123.45
U     89.01
Name: Random Numbers, dtype: float64

Another thing is we can call by range also in Series but a little different compared to the usual Python.

In [24]:
l = [0,1,2,3,4]
l[:2]

[0, 1]

In [25]:
rand_float[:'E']

Q    123.45
W     55.45
E    334.12
Name: Random Numbers, dtype: float64

In [26]:
rand_float.iloc[:2]

Q    123.45
W     55.45
Name: Random Numbers, dtype: float64

As if you notice, in regular python, the index selection start from 0 which brings the value of '0' and stop in idex 2 which has the value of '2'. It did not return the 2 anymore and stop at the '1'. Unlike in Series, it stopped in the index of 'E' and still return its value. But if you use `iloc`, the same logic from ordinary lists will happen.

We can also use conditional selection (boolean arrays) in series.

In [27]:
rand_float > 78

Q     True
W    False
E     True
R     True
T    False
Y     True
U     True
Name: Random Numbers, dtype: bool

In [30]:
rand_float[rand_float > 88]

Q    123.45
E    334.12
R    332.12
U     89.01
Name: Random Numbers, dtype: float64

The 'not' (`~`), 'or' (`|`), and 'and' (`&`). Can be also used.

In [41]:
rand_float[(rand_float > rand_float.mean()) & ~(rand_float < 150)]

E    334.12
R    332.12
Name: Random Numbers, dtype: float64

In [45]:
rand_float[(rand_float > rand_float.mean()) | (rand_float > 100)]

Q    123.45
E    334.12
R    332.12
Name: Random Numbers, dtype: float64

In [49]:
rand_float['Q':'R'].mean()

211.285

You can use NumPy methods in PyPandas since it is based in NumPy.