## PANDAS SERIES

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

In [1]:
import numpy as np
import pandas as pd

### Creating a Series

In [5]:
labels = ['a', 'b', 'c', 'd'] # list of strings
data = [5, 10, 43, 22] # list of numbers 
np_data = np.arange(1, 5) # numpy array
d = {"body-count": 10, "birth-month": 12, "age": 18, "score": 3} # dictionary

In [4]:
pd.Series(data=data, index=labels) # takes a data and an index

a     5
b    10
c    43
d    22
dtype: int64

In [6]:
pd.Series(data) # when a label isn't specified, it uses the usual indexing from 0 as the indexes

0     5
1    10
2    43
3    22
dtype: int64

In [7]:
pd.Series(np_data, labels) # creating a series from a numpy array

a    1
b    2
c    3
d    4
dtype: int32

In [6]:
"""
* You can also create a series from a dictionary
* Here, the key becomes the index and the value becomes the data
* You don't have to pass an index
"""
pd.Series(d)

body-count     10
birth-month    12
age            18
score           3
dtype: int64

In [10]:
# You can, however, pass labels to it as long as the labels match the key
labels = ["body-count", "birth-month", "age", "score"]
pd.Series(d, labels)

body-count     10
birth-month    12
age            18
score           3
dtype: int64

In [11]:
# If the labels do not match the keys, NaN is returned as it's value
only_one_match_labels = ["x", "y", "z", "a"]
pd.Series(d, only_one_match_labels)

x   NaN
y   NaN
z   NaN
a   NaN
dtype: float64

### Data in a Series

A pandas Series can hold a variety of object types:

In [13]:
pd.Series(data=labels) # list

0     body-count
1    birth-month
2            age
3          score
dtype: object

In [15]:
def count(b):
    return b.length
    
pd.Series(data=[sum, print, len, count]) # even functions

0                   <built-in function sum>
1                 <built-in function print>
2                   <built-in function len>
3    <function count at 0x0000017BA15ADDA0>
dtype: object

## Using an Index

The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).

In [21]:
serieA = pd.Series(data=[1, 2, 3, 4, 5, 6], index=["Banku", "Ampesi", "Waakye", "Akwele", "Fufu", "Banku"])

In [22]:
serieA

Banku     1
Ampesi    2
Waakye    3
Akwele    4
Fufu      5
Banku     6
dtype: int64

In [23]:
serieB =  pd.Series(data=[1, 2, 3, 8, 5, 6], index=["Banku", "Ampesi", "Waakye", "Tuozaafi", "Fufu", "Jollof"])

In [24]:
serieB

Banku       1
Ampesi      2
Waakye      3
Tuozaafi    8
Fufu        5
Jollof      6
dtype: int64

In [25]:
serieA["Ampesi"]

2

In [26]:
serieB["Tuozaafi"]

8

Operations are also based on indexing. So should you add two series with unmatching indexes, those indexes shall be replaced with NaN

In [27]:
serieA + serieB

Akwele       NaN
Ampesi       4.0
Banku        2.0
Banku        7.0
Fufu        10.0
Jollof       NaN
Tuozaafi     NaN
Waakye       6.0
dtype: float64

# END