# Pandas Series

**How "pandas.Series = pandas.Index + *sometimes* numpy.ndarray? Also why sometimes?**

- pandas Series class is basically mapping 2 things. pandas.Index <-> *sometimes* numpy.ndarray
- Now why **sometimes**? Because the data we provide in Series, can be of anytype.
- Now what can be in this 'any' type?
    - Simple Cases: int, float, str, bool, None - in these cases the internal representation is always `numpy.ndarray`
    - Complex Cases: datetime, categorical, period, nullable int - in these cases the internal representation is not a `numpy.ndarray`. Hence sometimes.

In [1]:
# Simple Case
# You can check the internal representation with s._values

import pandas as pd

s = pd.Series([10, 20, 30])
print(type(s.array))
print(type(s._values))

s = pd.Series([1.1, 2.2, 3.3])
print(type(s.array))
print(type(s._values))

s = pd.Series(["a", "b", "c"])
print(type(s.array))
print(type(s._values))

s = pd.Series([True, False, True])
print(type(s.array))
print(type(s._values))

s = pd.Series([None, None, None])
print(type(s.array))
print(type(s._values))

<class 'pandas.core.arrays.numpy_.NumpyExtensionArray'>
<class 'numpy.ndarray'>
<class 'pandas.core.arrays.numpy_.NumpyExtensionArray'>
<class 'numpy.ndarray'>
<class 'pandas.core.arrays.numpy_.NumpyExtensionArray'>
<class 'numpy.ndarray'>
<class 'pandas.core.arrays.numpy_.NumpyExtensionArray'>
<class 'numpy.ndarray'>
<class 'pandas.core.arrays.numpy_.NumpyExtensionArray'>
<class 'numpy.ndarray'>


In [2]:
# Complex Case
# You can check the internal representation with s._values

import pandas as pd

s = pd.Series(pd.date_range("2024-01-01", periods=3))
print(type(s.array))
print(type(s._values))

s = pd.Series(pd.Categorical(["a", "b", "a"]))
print(type(s.array))
print(type(s._values))

s = pd.Series(pd.period_range("2020-01", periods=3, freq="M"))
print(type(s.array))
print(type(s._values))

s = pd.Series([1, 2, None], dtype="Int64")
print(type(s.array))
print(type(s._values))

<class 'pandas.core.arrays.datetimes.DatetimeArray'>
<class 'pandas.core.arrays.datetimes.DatetimeArray'>
<class 'pandas.core.arrays.categorical.Categorical'>
<class 'pandas.core.arrays.categorical.Categorical'>
<class 'pandas.core.arrays.period.PeriodArray'>
<class 'pandas.core.arrays.period.PeriodArray'>
<class 'pandas.core.arrays.integer.IntegerArray'>
<class 'pandas.core.arrays.integer.IntegerArray'>


**What is my guess on parameters of pandas.Series constructor?**

- One thing should be clear, the pandas.Series constructor must expect 1 thing, that is **data**.
- If you are not providing index(dirctly or indirectly) then it is going to create a RangeIndex object. Why RangeIndex? Because It’s memory-efficient and optimized for fast iteration and slicing which I studied in last point of pandas.Index
- What I meant when I say **directly or indirectly**?

In [3]:
# directly
import pandas as pd

series = pd.Series(data=[100,200,300,400,500], index=pd.Index([1,2,3,4,5]))
series

1    100
2    200
3    300
4    400
5    500
dtype: int64

In [4]:
# Indirectly
import pandas as pd

series = pd.Series(data = {
    1: 100,
    2: 200,
    3: 300,
    4: 400,
    5: 500
})
series

1    100
2    200
3    300
4    400
5    500
dtype: int64

**Now that we have a data structure, how do we perform *URD* Operation? Well *Create*, we just saw, so that's Done**

**How do you read/access a desired cell in pandas.Series?**

- Reading an element from a Series. Technically I need 1 value. Because it's 1D. And the value should be a Index. Now question is, whether it should be a explicit index(label) or inplicit index(positional) or we can use both?
- This was initially a confusion in panda's series when people started using the notation `series[index]` where both explicit and implicit index, are integers. Now what should pandas consider? Should it consider the implicit one or explicit one? pandas team cannot find a justified reason to choose one.
- So they came up with the solution. They came up with 2 attributes called `loc` and `iloc`.
- `iloc` with the name you could guess, implicit index and then `loc` will definitely be explicit index.

**Problem**

In [5]:
import pandas as pd

expense_categories = pd.Series({
    1: 'food',
    4: 'clothing',
    2: 'household',
    3: 'travel',
    5: 'entertainment'
})

print(expense_categories[:3]) # pandas is considering implicit index
print(expense_categories[3])  # pandas is considering explicit index

# This is confusing

1         food
4     clothing
2    household
dtype: object
travel


**loc and iloc in rescue**

In [6]:
import pandas as pd

expense_categories = pd.Series({
    1: 'food',
    4: 'clothing',
    2: 'household',
    3: 'travel',
    5: 'entertainment'
})

# Now pandas is considering implicit index strictly
print( expense_categories.iloc[:3] )
print( expense_categories.iloc[3] )

# Now pandas is considering explicit index strictly - BUT...
print( expense_categories.loc[:3] )
print( expense_categories.loc[3] )

1         food
4     clothing
2    household
dtype: object
travel
1         food
4     clothing
2    household
3       travel
dtype: object
travel


- What about that BUT? - Well this is an FYI, *While Slicing, if we are using explicit index, pandas will also consider the last element.* Why? Well I think it's for convinience. Say you know you want till `category `c`, it would be unnatural to provide everything till category c, but not category c. So I think pandas team decided that if you gonna use explicit index, they gonna consider the last element as well.

- Also, we know that `pandas.Index` can have duplicate in it's labels, so if, we use 

**I know pandas.Index can have duplicate labels, so what happens when I ask for an element with the label I know which is duplicate?**

Well answer to that is very simple. You will get more than 1 element if duplicate exists. As simple as that.

In [7]:
import pandas as pd

nums = pd.Series(data=[2, 3, 4, 6], index=['even', 'odd', 'even', 'odd'])
nums.loc['even']

even    2
even    4
dtype: int64

In [8]:
# This is a tricky one
import pandas as pd

nums = pd.Series({
    'even': 2,
    'odd': 3,
    'even': 4,
    'odd': 5
})

nums.loc['even'] # Why I am getting only 4, huh?

np.int64(4)

**How do you update pandas.Series?**

- Well, ofcourse using Index. And ofcourse the `.loc` and `.iloc` will be helpful here, because to update something, you first need to locate it ans to locate it, you need these 2.
- And odcourse its not required to explain when to use `iloc` and when to use `loc`. Because one gureentees to give an unique element and another may give one and more.

In [9]:
import pandas as pd

sales = pd.Series(data=[10, 20, 30, 40, 50], index=['Mon', 'Tue', 'Mon', 'Wed', 'Tue'])

sales.iloc[-1] = 100     # Only targetting 1 cell
sales.loc['Mon'] = 1000  # Targetting 2 cells

sales

Mon    1000
Tue      20
Mon    1000
Wed      40
Tue     100
dtype: int64

**How do you delete selected cells in pandas.Series?**

- Well, for deleting a cell, we ofcourse need to locate that cell. Locating a cell in pandas?, ofcourse we need index. Index in pandas, which one? ofcourse we need loc and iloc. Ok, now we have located one/multiple cell(s).
- Ok, now we know pandas.Series = mapping between pandas.Index and *sometime* numpy.ndarray and we also know that pandas.Index is immutable. So after we delete an element, what happens to the index??
- Well that's why after deleting, pandas create an entirely new Series object. And for deleting elements, we need to use `drop()` method.
- Ok, now that we are clear on the method which we need to use, what kinda of index should we use, can we use implicit? WELL NO. But why not? Well I don't know the reason. But if you want to delete based on position, get the index object and then go nuts.
- Also, there is an `inplace` param. By default, it's `false`. But if set to `True`, then it will return `None` and change the original series.

In [10]:
import pandas as pd

sales = pd.Series(
    data=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120],
    index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri']
)

# new_sales = sales.drop(labels='Mon')            # 2 Mon will be deleted
# new_sales = sales.drop(labels=['Mon', 'Tue'])   # 2 Mon and 2 Tue will be deleted

Of course there are fancier ways to do this

In [11]:
import pandas as pd

sales = pd.Series(
    data=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120],
    index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri']
)

sales.index                 # This returns a index object
sales.index[ [0, 2, 4, 6] ] # This will get us the labels and We can do this because pandas.Index supports fancy indexing

sales.drop(labels=sales.index[ [0, 2, 4, 6] ]) # Fancy Way

Tue     20
Thu     40
Sat     60
Tue     90
Thu    110
dtype: int64

**What properties of pandas.Series are similar to numpy.ndarray?**

- Well pandas.Series => pandas.Index <=> (sometime) numpy.ndarray => basically numpy.ndarray.
- So it supports what numpy.ndarray supports
- `.size`, `.shape`, `.dtype`, `.T`, `len`

In [19]:
import numpy as np

nums = np.array([1,2,3,4,5])

nums.size, nums.shape, nums.dtype, nums==nums.T, type(nums), len(nums)

(5,
 (5,),
 dtype('int64'),
 array([ True,  True,  True,  True,  True]),
 numpy.ndarray,
 5)

In [20]:
import pandas as pd

nums = pd.Series([1,2,3,4,5])

nums.size, nums.shape, nums.dtype, nums==nums.T, type(nums), len(nums)

(5,
 (5,),
 dtype('int64'),
 0    True
 1    True
 2    True
 3    True
 4    True
 dtype: bool,
 pandas.core.series.Series,
 5)

**Pandas Series object support majority of the python builtin functions. like what?**

In [24]:
import pandas as pd

nums = pd.Series([1, -1, 2, -2, 3, -3, 4, -4, 5, -5])

print(len(nums))

print(list(nums))

print(max(nums), min(nums))

# print(dict(nums)) # this is not useful though

print(sorted(nums))

10
[1, -1, 2, -2, 3, -3, 4, -4, 5, -5]
5 -5
[-5, -4, -3, -2, -1, 1, 2, 3, 4, 5]


**What pandas.Series specific, that nothing of it's ancestor provides**

In [28]:
import pandas as pd

nums = pd.Series([1, -1, 2, -2, 3, -3, 4, -4, 5, -5])


# getting index
print( nums.index )

# getting values
print( nums.values )

# getting head
print( nums.head() )

# getting tail
print( nums.tail() )

RangeIndex(start=0, stop=10, step=1)
[ 1 -1  2 -2  3 -3  4 -4  5 -5]
0    1
1   -1
2    2
3   -2
4    3
dtype: int64
5   -3
6    4
7   -4
8    5
9   -5
dtype: int64
