# Series Object

* A series is a one-dimensional object labeled array for homogeneous data
* Pandas assigns each `Series` value a `label` an identifier we can use to locate the value
* The series combine and expand Python's native structures
* The best object to compare Series is with Python **dictionaries**


In [1]:
import pandas as pd
import numpy as np

  match = re.match("^#\s*version\s*([0-9a-z]*)\s*$", line)


A Series object with no data

In [7]:
pd.Series()

Series([], dtype: object)

Creating an object Series with data from a python list

In [14]:
ice_cream_flavors = ['Chocolate', 'Vanilla', 'Strawberry', 'Rum Raisin',]

When instantiating a Series object, we se that it has two columns, the `index` and the `values`

Index describes both the collection of identifiers and an individual identifier. By default, the index will be numbers starting from 0 and end in n-1

We have `index position` and `index label`

We can reference a value by its position or by a key/label. In a sense, each value has two identifiers

In [16]:
# Series with index position (number) by default
pd.Series(data=ice_cream_flavors)

0     Chocolate
1       Vanilla
2    Strawberry
3    Rum Raisin
dtype: object

In [18]:
# Series with both index position (inherent) and with index label (we put it)
days_of_the_week = ('Monday', 'Wednesday', 'Friday', 'Saturday')

pd.Series(data=ice_cream_flavors, index=days_of_the_week)

Monday        Chocolate
Wednesday       Vanilla
Friday       Strawberry
Saturday     Rum Raisin
dtype: object

**So we always can access to values with an index position and with the label position (if it is provided**

It is possible to have duplicated index labels. But we should avoid that

In [19]:
# Series with both index position (inherent) and with index label (we put it)
days_of_the_week = ('Monday', 'Wednesday', 'Friday', 'Wednesday')

pd.Series(data=ice_cream_flavors, index=days_of_the_week)

Monday        Chocolate
Wednesday       Vanilla
Friday       Strawberry
Wednesday    Rum Raisin
dtype: object

## missing values

When pandas sees a missing value, it is substituted by a `nan` object. 

nan -> not a number

Notice that pandas converts numeric values from integers to floating-points when it spots a `nan` value

In [21]:
temperatures = [94, 88, np.nan, 91]
pd.Series(data=temperatures)

0    94.0
1    88.0
2     NaN
3    91.0
dtype: float64

## constructing Series with other Python structures

In [23]:
# With dictionaries
calorie_info = {
    'Cereal': 125,
    'Chocolate Bar': 406,
    'Ice Cream Sundae': 342,
}

diet = pd.Series(data=calorie_info)
diet

Cereal              125
Chocolate Bar       406
Ice Cream Sundae    342
dtype: int64

In [24]:
# With tuples

pd.Series(data=('Red', 'Green', 'Blue'))

0      Red
1    Green
2     Blue
dtype: object

In [25]:
# We can store any data, for example, tuples

rgb_colors = [(120, 141, 26), (196, 165, 45)]
pd.Series(data=rgb_colors)

0    (120, 141, 26)
1    (196, 165, 45)
dtype: object

**We can't create a Series object with sets**

In [26]:
pd.Series(data={1,2})

TypeError: 'set' type is unordered

Series constructor also accepts ndarray object

In [32]:
random_data = np.random.randint(1, 101, 10)
random_data

array([91, 34, 87, 53, 70, 79, 11, 84,  2, 86])

In [35]:
pd.Series(data=random_data)

0    91
1    34
2    87
3    53
4    70
5    79
6    11
7    84
8     2
9    86
dtype: int64

# Series attributes

In [50]:
diet

Cereal              125
Chocolate Bar       406
Ice Cream Sundae    342
dtype: int64

In [39]:
# from diet series

# values attributes returns an ndarray Numpy objec
diet.values

array([125, 406, 342])

In [42]:
# index method returns an Index Pandas object
diet.index

Index(['Cereal', 'Chocolate Bar', 'Ice Cream Sundae'], dtype='object')

In [48]:
diet.dtype

dtype('int64')

In [49]:
diet.size

3

In [52]:
diet.shape

(3,)

In [53]:
diet.is_unique

True

In [58]:
diet.is_monotonic_increasing

False

# Series methods

In [4]:
# Create 100 values between 0 and 500 in increments of 5
values = range(0, 500, 5)
nums = pd.Series(data=values)


95    475
96    480
97    485
98    490
99    495
dtype: int64

In [6]:
nums.head(3)

0     0
1     5
2    10
dtype: int64

In [9]:
nums.tail()

95    475
96    480
97    485
98    490
99    495
dtype: int64

## mathematical operations

In [10]:
numbers = pd.Series(data=[1, 2, 3, np.nan, 4, 5])
numbers

0    1.0
1    2.0
2    3.0
3    NaN
4    4.0
5    5.0
dtype: float64

In [11]:
# Count method counts the number of non-null values
numbers.count()

5

### sum

In [12]:
# Sum adds the values together
numbers.sum()

15.0

We can't sum a number with a nan

In [13]:
3 + np.nan

nan

In [14]:
# se when change to False the skipna parameter
numbers.sum(skipna=False)

nan

In [18]:
# using the min_count to set the minimum number of valid values, a series must hold to calculate the sum

# when set to 0, it means at least it is necessary 0 values to make the sum

numbers.sum(min_count=0) # min 0 values to the sum

15.0

By default `min_count` is 0

Valid values: all except nan

In [19]:
numbers.sum(min_count=6)  # min 6 values, we don't have 6 values in our series, hence this sum won't work

nan

### product

Multiplies all Series values together

Also accepts `skipna` and `min_count` parameters

In [28]:
numbers.prod()

120.0

In [29]:
numbers.prod(skipna=False)

nan

In [27]:
numbers.prod(min_count=7)  # our series object has 5 valid values

nan

### cumsum

Cumulative sum returns a new Series with a rolling sum of the values

In [32]:
numbers

0    1.0
1    2.0
2    3.0
3    NaN
4    4.0
5    5.0
dtype: float64

In [31]:
numbers.cumsum()

0     1.0
1     3.0
2     6.0
3     NaN
4    10.0
5    15.0
dtype: float64

In [35]:
# If we change the skipna to False, then
numbers.cumsum(skipna=False)

0    1.0
1    3.0
2    6.0
3    NaN
4    NaN
5    NaN
dtype: float64

### pct_change

Returns the percentage difference from one Series value to the next

It uses a *forward fill* in case of nan values, using the previous valid value encountered to perform the operation

In [36]:
numbers

0    1.0
1    2.0
2    3.0
3    NaN
4    4.0
5    5.0
dtype: float64

In [37]:
numbers.pct_change()

  numbers.pct_change()


0         NaN
1    1.000000
2    0.500000
3    0.000000
4    0.333333
5    0.250000
dtype: float64

### mean

In [40]:
numbers.mean()

3.0

### median

In [41]:
numbers.median()

3.0

### standard deviation

ddof: delta degrees of freedom (by default N-1, ddof=1)

In [42]:
numbers.std()

1.5811388300841898

### max and min

In [43]:
numbers.max()

5.0

In [44]:
numbers.min()

1.0

Pandas sorts a string series alphabetically

In [46]:
animals = pd.Series(data=['koala', 'aardvark', 'zebra'])
animals

0       koala
1    aardvark
2       zebra
dtype: object

In [47]:
animals.min()

'aardvark'

In [48]:
animals.max()

'zebra'

### describe()

get the previous operations in one step


In [50]:
numbers

0    1.0
1    2.0
2    3.0
3    NaN
4    4.0
5    5.0
dtype: float64

In [49]:
numbers.describe()

count    5.000000
mean     3.000000
std      1.581139
min      1.000000
25%      2.000000
50%      3.000000
75%      4.000000
max      5.000000
dtype: float64

### sample

selects a random assortment of values from the series

By default returns just one element

If `replace` is True, we can do a sample of more than the total of values in the series, if false, just as maximum the total

In [62]:
numbers.sample(7)  # larger than total

ValueError: Cannot take a larger sample than population when 'replace=False'

In [63]:
numbers.sample(7, replace=True)  # With replace we can repeat the data

0    1.0
4    4.0
3    NaN
3    NaN
1    2.0
2    3.0
4    4.0
dtype: float64

###  unique and nunique

unique returns a NumPy ndarray of unique values prom the Series

This is if we have repeated data, we generate an array without repetitions of data

---

nunique returns a scalar, the number of unique elements

In [64]:
authors = pd.Series(data=['Hemingway', 'Orwell', 'Dostoevsky', 'Fitzgerald', 'Orwell'])

In [67]:
authors.unique()

array(['Hemingway', 'Orwell', 'Dostoevsky', 'Fitzgerald'], dtype=object)

In [68]:
authors.nunique()  # it is the length of the unique() return array

4