<a href="https://colab.research.google.com/github/Gitanjali1992/pandas/blob/main/Series_At_a_Glance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
import pandas as pd

## What is a Series?

In [4]:
students = ['Gitanjali','Abhay','Dipti']

In [5]:
pd.Series(students)

0    Gitanjali
1        Abhay
2        Dipti
dtype: object

In [6]:
age = [30,21,28]

In [7]:
pd.Series(age)

0    30
1    21
2    28
dtype: int64

In [8]:
height = [150,162.5,152]

In [9]:
pd.Series(height)

0    150.0
1    162.5
2    152.0
dtype: float64

Unlike numpy and darrays, Pandas Series **supports mixed datatypes**

In [10]:
mixed = [True, 'say', {'my_motivation': 100}]

In [11]:
pd.Series(mixed)

0                      True
1                       say
2    {'my_motivation': 100}
dtype: object

# Parameters vs. Arguments

In [12]:
pd.Series(students)

0    Gitanjali
1        Abhay
2        Dipti
dtype: object

In [13]:
pd.Series(data=students)

0    Gitanjali
1        Abhay
2        Dipti
dtype: object

# What 's in the data?

In [14]:
books_list = ['Thursday Murder Club', 'Almost Single', 'The Fault in our stars']

In [15]:
list_s = pd.Series(books_list)

In [16]:
books_dict = {0: 'Thursday Murder Club', 1: 'Almost Single', 2: 'The Fault in our stars'}
books_dict2 = {1: 'Thursday Murder Club', 2: 'Almost Single', 3: 'The Fault in our stars'}

In [17]:
dict_s = pd.Series(books_dict)

In [18]:
pd.Series(books_dict2)

1      Thursday Murder Club
2             Almost Single
3    The Fault in our stars
dtype: object

In [19]:
list_s.equals(dict_s)

True

In [20]:
pd.Series(1992)

0    1992
dtype: int64

In [21]:
pd.Series('Gitanjali')

0    Gitanjali
dtype: object

# The .dtype Attribute

In [22]:
pd.Series(age)

0    30
1    21
2    28
dtype: int64

Pandas automatically infers the dtype of series based on data we provided to Series constructor.

However, we can specify dtype ourselves.

In [23]:
pd.Series(age, dtype='float')

0    30.0
1    21.0
2    28.0
dtype: float64

In [24]:
name_series = pd.Series(students)

In [25]:
name_series

0    Gitanjali
1        Abhay
2        Dipti
dtype: object

In [26]:
name_series.dtype

dtype('O')

In [27]:
age_series = pd.Series(age)

In [28]:
age_series

0    30
1    21
2    28
dtype: int64

In [29]:
age_series.dtype

dtype('int64')

In [30]:
height_series = pd.Series(height)

In [31]:
height_series.dtype

dtype('float64')

# What is dtype('O'), Really?
Mostly to do with:
- numpy only dealing with homogeneous(of same fixed size in memory) multi-dimensional arrays.
- Strings not being homogeneous or of fixed size.

In [32]:
heights2 = [120.4, '102.5',167.8]

In [33]:
pd.Series(heights2)

0    120.4
1    102.5
2    167.8
dtype: object

# Index and RangeIndex
- Create Custome index/labels
- When not specified, Pandas creates immutable RangeIndex for performance optimization reasons 

In [34]:
books_list

['Thursday Murder Club', 'Almost Single', 'The Fault in our stars']

In [35]:
list_s

0      Thursday Murder Club
1             Almost Single
2    The Fault in our stars
dtype: object

We can provide our own labels using "Index" parameter of pd.Series() constructor.

In [36]:
# using keyword/named arguments
pd.Series(data=books_list, index=['Thriller', 'Humour', 'Romantic'])

Thriller      Thursday Murder Club
Humour               Almost Single
Romantic    The Fault in our stars
dtype: object

In [37]:
# Using positional arguments
pd.Series(books_list, ['Thriller', 'Humour', 'Romantic'])

Thriller      Thursday Murder Club
Humour               Almost Single
Romantic    The Fault in our stars
dtype: object

In [38]:
# using mixed arguments
pd.Series(books_list, ['Thriller', 'Humour', 'Romantic'], dtype = 'object')

Thriller      Thursday Murder Club
Humour               Almost Single
Romantic    The Fault in our stars
dtype: object

In [39]:
pd.__version__

'1.3.5'

In [40]:
# stringdtype() from pandas version 1 and later
pd.Series(books_list, ['Thriller', 'Humour', 'Romantic'], dtype = 'string')

Thriller      Thursday Murder Club
Humour               Almost Single
Romantic    The Fault in our stars
dtype: string

RangeIndex --> x.index
- When we don't specify an index explicitly. 
- Is sequence of integers starting from 0
- Built-in Pandas object

In [41]:
list_s.index

RangeIndex(start=0, stop=3, step=1)

In [42]:
type(list_s.index)

pandas.core.indexes.range.RangeIndex

Creating our own Ranging (RangeIndex)

In [43]:
pd.RangeIndex(start=4, stop=7, step=1)

RangeIndex(start=4, stop=7, step=1)

In [44]:
list(pd.RangeIndex(start=4, stop=7, step=1))

[4, 5, 6]

In [45]:
list(pd.RangeIndex(start=10, stop=-11, step=-1))

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10]

# Series and Index Names
- Series can have names which later become column names in data frames
- series index can also have names

In [46]:
list_s

0      Thursday Murder Club
1             Almost Single
2    The Fault in our stars
dtype: object

In [47]:
# inttelligable: capable of being understood

In [48]:
books_series = list_s

In [49]:
books_series

0      Thursday Murder Club
1             Almost Single
2    The Fault in our stars
dtype: object

In [50]:
# Attribute
books_series.size

3

In [51]:
# Method
list_s.equals(dict_s)

True

In [52]:
# Attribute:
books_series.dtype

dtype('O')

In [53]:
books_series.name

In [54]:
books_series.name == None

True

In [55]:
books_series.name = 'my books read in 2022'

In [56]:
books_series.name

'my books read in 2022'

In [57]:
books_series
# inaddition to dtype attribute, we now have additional Name attribute also now that we have set it

0      Thursday Murder Club
1             Almost Single
2    The Fault in our stars
Name: my books read in 2022, dtype: object

In [58]:
books_series.index.name

In [59]:
books_series.index.name == None

True

In [60]:
books_series.index.name = 'My Books'

In [61]:
books_series
# Additional index name appears on top of labels

My Books
0      Thursday Murder Club
1             Almost Single
2    The Fault in our stars
Name: my books read in 2022, dtype: object

# Skills Challenge

Task 1.
- Create a Python list of length 4 that contains some of your favaourite actors. 
- So this should be a list of strings.
- Call this list-nassign it to a variable: actor_names

In [62]:
actor_names = ['Leonardo', 'Blake', 'Kate', 'Reynolds']

Task 2.
- Create another list of same length that contains your guesses of how old each actor is, feel free to just use integers or floats.
- Call thsi list actor_ages

In [63]:
actor_ages = [58, 44, 60, 48]

Task 3.
- Create a series that contains actors ages and labels the ages using the actor names.
- Give this series a name of actors.

In [64]:
actors = pd.Series(actor_ages, index= actor_names, name = 'actors')

In [65]:
actors

Leonardo    58
Blake       44
Kate        60
Reynolds    48
Name: actors, dtype: int64

Task 4.
- Repeat step 3 but this time create the series from a Python Dictionary.
- As an additional challenge, try not to type the dictionary manually but instead dynamically create it using the two lists defined in step 1 and step 2.

In [66]:
actor_dict = dict(zip(actor_names, actor_ages)) #pure functions
actor_dict2 = {name:age for name, age in zip(actor_names, actor_ages)} #dict comprehennsions and pure function

In [67]:
pd.Series(actor_dict, name = 'actors')

Leonardo    58
Blake       44
Kate        60
Reynolds    48
Name: actors, dtype: int64

In [68]:
pd.Series(actor_dict2, name = 'actors')

Leonardo    58
Blake       44
Kate        60
Reynolds    48
Name: actors, dtype: int64

# The head() and tail() Method

In [69]:
int_series = pd.Series(range(10))

In [70]:
int_series

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64

In [71]:
int_series.size

10

In [72]:
len(int_series)
# Python men() function also works just fine

10

In [73]:
# To sneak peek 1st 5 elements of series:
int_series.head()

0    0
1    1
2    2
3    3
4    4
dtype: int64

In [74]:
# To sneak peek last 5 elements of series:
int_series.tail()

5    5
6    6
7    7
8    8
9    9
dtype: int64

In [75]:
# We can also specify the number of elements we want to see using the n parameter of these methods:
int_series.head(n=3) # or int_series.head(3)

0    0
1    1
2    2
dtype: int64

In [76]:
int_series.tail(n=4) # or int_series.tail(4)

6    6
7    7
8    8
9    9
dtype: int64

In [77]:
# Notebook environments already truncate teh output of we have large series:
pd.Series(range(10000000))

0                0
1                1
2                2
3                3
4                4
            ...   
9999995    9999995
9999996    9999996
9999997    9999997
9999998    9999998
9999999    9999999
Length: 10000000, dtype: int64

In [78]:
# can change display options to chnage above behaviour
# i.e. force pands to return n rows of series 
pd.options.display.min_rows = 40

# Extracting By Index Position

In [79]:
from string import ascii_lowercase

In [80]:
ascii_lowercase

'abcdefghijklmnopqrstuvwxyz'

In [81]:
letters = list(ascii_lowercase)

In [82]:
letters

['a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z']

In [83]:
alphabet = pd.Series(letters)

In [84]:
alphabet.head(6)

0    a
1    b
2    c
3    d
4    e
5    f
dtype: object

In [85]:
# What is the 1st letter?
# What is the 11th letter?
# What are the first three letters?
# What are the sixth through tenth letters?
# What are the last six letters?

In [86]:
# 1
alphabet[0]

'a'

In [87]:
# 2
alphabet[10]

'k'

In [88]:
# 3
alphabet[:3]

0    a
1    b
2    c
dtype: object

In [89]:
# 4
alphabet[5:10]

5    f
6    g
7    h
8    i
9    j
dtype: object

In [90]:
# 5
alphabet[-6:]

20    u
21    v
22    w
23    x
24    y
25    z
dtype: object

# Accessing Elements by Label

In [91]:
from string import ascii_uppercase

In [92]:
ascii_uppercase

'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [93]:
labeled_alphabet = pd.Series(data = list(ascii_lowercase), index = list(map(lambda i: 'label_'+i, list(ascii_uppercase))))

In [94]:
labeled_alphabet.head(3)

label_A    a
label_B    b
label_C    c
dtype: object

In [95]:
# What is the 1st letter?
# What is the 11th letter?
# What are the first three letters?
# What are the sixth through tenth letters?
# What are the last six letters?

In [96]:
# 1
# labeled_alphabet[0] #the first approach by index position
labeled_alphabet['label_A']

'a'

In [97]:
# 2
# labeled_alphabet[10] #the first approach by index position
labeled_alphabet['label_K']

'k'

In [98]:
# 3
# labeled_alphabet[:3] #the first approach by index position
labeled_alphabet[:'label_C']
# Notice how stop is inclusive in labels and exclusive in position

label_A    a
label_B    b
label_C    c
dtype: object

In [99]:
# 4
# labeled_alphabet[5:10] #the first approach by index position
labeled_alphabet['label_F':'label_J']

label_F    f
label_G    g
label_H    h
label_I    i
label_J    j
dtype: object

In [100]:
# 5
# labeled_alphabet[-6:] #the first approach by index position
labeled_alphabet['label_U':]

label_U    u
label_V    v
label_W    w
label_X    x
label_Y    y
label_Z    z
dtype: object

# The add_prefix() and The add_suffix() Methods

In [101]:
alphabet.head(3)

0    a
1    b
2    c
dtype: object

In [102]:
alphabet.add_prefix('label_').head(3)

label_0    a
label_1    b
label_2    c
dtype: object

In [103]:
alphabet.add_suffix('_end_cool').head(3)

0_end_cool    a
1_end_cool    b
2_end_cool    c
dtype: object

In [104]:
# Redoing below logic using add_prefix() method
# labeled_alphabet = pd.Series(data = list(ascii_lowercase), index = list(map(lambda i: 'label_'+i, list(ascii_uppercase))))
prefix_alphabet = pd.Series(data = list(ascii_lowercase), index = list(ascii_uppercase)).add_prefix('label_')

In [105]:
prefix_alphabet.head(3)

label_A    a
label_B    b
label_C    c
dtype: object

In [106]:
alphabet.head(3)
# the original alphabet series is not modified by these methods. It remains intact 
# we just got a copy of series where the index was modified.

0    a
1    b
2    c
dtype: object

In [107]:
# to actually modify teh series, assign it to the actual variable name:
alphabet = alphabet.add_suffix('_end_cool').head(3)

In [108]:
alphabet.head(3)

0_end_cool    a
1_end_cool    b
2_end_cool    c
dtype: object

# Using Dot Notation

In [109]:
labeled_alphabet.label_V

'v'

In [110]:
labeled_alphabet['label_V']

'v'

In [111]:
labeled_alphabet['label_V':'label_Y']

label_V    v
label_W    w
label_X    x
label_Y    y
dtype: object

In [112]:
# labeled_alphabet.label_V:label_Y # We will get NameError: name 'label_Y' is not defined

# Boolean Masks and the .loc Indexer

In [113]:
labeled_alphabet['label_F':'label_J']

label_F    f
label_G    g
label_H    h
label_I    i
label_J    j
dtype: object

In [114]:
labeled_alphabet.loc['label_F':'label_J']

label_F    f
label_G    g
label_H    h
label_I    i
label_J    j
dtype: object

In [115]:
books_series

My Books
0      Thursday Murder Club
1             Almost Single
2    The Fault in our stars
Name: my books read in 2022, dtype: object

In [116]:
books_series.loc[[True, True, True]]

My Books
0      Thursday Murder Club
1             Almost Single
2    The Fault in our stars
Name: my books read in 2022, dtype: object

In [117]:
books_series.loc[[True, False, True]]

My Books
0      Thursday Murder Club
2    The Fault in our stars
Name: my books read in 2022, dtype: object

In [118]:
# books_series.loc[[True, False]]
# length of boolean mass muct match the length of the series

In [119]:
labeled_alphabet.loc[[True for i in range(labeled_alphabet.size)]]

label_A    a
label_B    b
label_C    c
label_D    d
label_E    e
label_F    f
label_G    g
label_H    h
label_I    i
label_J    j
label_K    k
label_L    l
label_M    m
label_N    n
label_O    o
label_P    p
label_Q    q
label_R    r
label_S    s
label_T    t
label_U    u
label_V    v
label_W    w
label_X    x
label_Y    y
label_Z    z
dtype: object

In [120]:
labeled_alphabet[[True for i in range(labeled_alphabet.size)]]

label_A    a
label_B    b
label_C    c
label_D    d
label_E    e
label_F    f
label_G    g
label_H    h
label_I    i
label_J    j
label_K    k
label_L    l
label_M    m
label_N    n
label_O    o
label_P    p
label_Q    q
label_R    r
label_S    s
label_T    t
label_U    u
label_V    v
label_W    w
label_X    x
label_Y    y
label_Z    z
dtype: object

In [121]:
# Extract every odd items:
labeled_alphabet.loc[[True if i%2 == 0 else False for i in range(labeled_alphabet.size)]]

label_A    a
label_C    c
label_E    e
label_G    g
label_I    i
label_K    k
label_M    m
label_O    o
label_Q    q
label_S    s
label_U    u
label_W    w
label_Y    y
dtype: object

# Extracting by Position with .iloc

In [122]:
labeled_alphabet.iloc[0]

'a'

In [123]:
labeled_alphabet.iloc[1:3]

label_B    b
label_C    c
dtype: object

In [124]:
labeled_alphabet[1:3]
# similar to index slicing if iloc is not used.

label_B    b
label_C    c
dtype: object

In [125]:
labeled_alphabet.iloc[[1,4,6]]

label_B    b
label_E    e
label_G    g
dtype: object

# Bonus: Using Callables with .loc And .iloc

In [126]:
labeled_alphabet.loc['label_V']

'v'

In [127]:
labeled_alphabet.loc[lambda x: 'label_V']

'v'

In [128]:
labeled_alphabet.loc[lambda x: ['label_V','label_A']]

label_V    v
label_A    a
dtype: object

In [136]:
# labeled_alphabet.loc[lambda x: [True,True]] # Throws: IndexError: Boolean index has wrong length: 2 instead of 26
labeled_alphabet.loc[lambda x: [True for i in range(x.size)]]

label_A    a
label_B    b
label_C    c
label_D    d
label_E    e
label_F    f
label_G    g
label_H    h
label_I    i
label_J    j
label_K    k
label_L    l
label_M    m
label_N    n
label_O    o
label_P    p
label_Q    q
label_R    r
label_S    s
label_T    t
label_U    u
label_V    v
label_W    w
label_X    x
label_Y    y
label_Z    z
dtype: object

In [137]:
def every_fifth(x):
  return [True if i%5==0 else False for i in range(x.size)]

In [138]:
labeled_alphabet.iloc[every_fifth]

label_A    a
label_F    f
label_K    k
label_P    p
label_U    u
label_Z    z
dtype: object

In [139]:
def every_fifth_refined(x):
  return [True if (i+1)%5==0 else False for i in range(x.size)]

In [140]:
labeled_alphabet.iloc[every_fifth_refined]

label_E    e
label_J    j
label_O    o
label_T    t
label_Y    y
dtype: object

# Selecting with .get()

In [141]:
labeled_alphabet.get('label_I')

'i'

In [142]:
labeled_alphabet.loc['label_I']

'i'

In [143]:
labeled_alphabet['label_I']

'i'

In [146]:
labeled_alphabet.get('label_nonexisting')

In [147]:
labeled_alphabet.get('label_nonexisting') == None

True

In [148]:
labeled_alphabet.get('label_nonexisting', default = None)

In [149]:
labeled_alphabet.get('label_nonexisting', default = 'Could not find anything with that label')

'Could not find anything with that label'

In [150]:
labeled_alphabet.get('label_nonexisting', default = 0)

0

In [152]:
labeled_alphabet.get('label_nonexisting', default = {19:'20'})

{19: '20'}

In [153]:
labeled_alphabet.get(8)

'i'

In [154]:
labeled_alphabet.iloc[8]

'i'

In [155]:
labeled_alphabet[8]

'i'

# Skill Challenge

Step1. 
Create a series of length 100 containing the squares of the integers from 0 to 99. 
Assign it to te variable squares

In [176]:
squares = pd.Series(i*i for i in range(100))
# squares = pd.Series(data = [i**2 for i in range(100)])

In [177]:
squares.head()

0     0
1     1
2     4
3     9
4    16
dtype: int64

Step2. 
Extract the last three elements from the *squares* series using [] indexing

In [174]:
# squares[-3:]
squares.iloc[-3:]
#  either of the two works same

97    9409
98    9604
99    9801
dtype: int64

Step3. 
Repeat above step, but now using tail() method

In [168]:
squares.tail(3)

97    9409
98    9604
99    9801
dtype: int64

Step4. 
Verify the putput of step 2 and step 3 is same

In [170]:
squares[-3:].equals(squares.tail(3))

True

In [179]:
a = squares.iloc[-3:]
b = squares.tail(3)

In [180]:
a.equals(b)

True

In [181]:
b.equals(a)

True

In [182]:
a == b

97    True
98    True
99    True
dtype: bool