<a href="https://colab.research.google.com/github/Gitanjali1992/pandas/blob/main/Series_At_a_Glance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd

## What is a Series?

In [12]:
students = ['Gitanjali','Abhay','Dipti']

In [13]:
pd.Series(students)

0    Gitanjali
1        Abhay
2        Dipti
dtype: object

In [14]:
age = [30,21,28]

In [15]:
pd.Series(age)

0    30
1    21
2    28
dtype: int64

In [16]:
height = [150,162.5,152]

In [17]:
pd.Series(height)

0    150.0
1    162.5
2    152.0
dtype: float64

Unlike numpy and darrays, Pandas Series **supports mixed datatypes**

In [18]:
mixed = [True, 'say', {'my_motivation': 100}]

In [19]:
pd.Series(mixed)

0                      True
1                       say
2    {'my_motivation': 100}
dtype: object

# Parameters vs. Arguments

In [20]:
pd.Series(students)

0    Gitanjali
1        Abhay
2        Dipti
dtype: object

In [21]:
pd.Series(data=students)

0    Gitanjali
1        Abhay
2        Dipti
dtype: object

# What 's in the data?

In [22]:
books_list = ['Thursday Murder Club', 'Almost Single', 'The Fault in our stars']

In [23]:
list_s = pd.Series(books_list)

In [24]:
books_dict = {0: 'Thursday Murder Club', 1: 'Almost Single', 2: 'The Fault in our stars'}
books_dict2 = {1: 'Thursday Murder Club', 2: 'Almost Single', 3: 'The Fault in our stars'}

In [25]:
dict_s = pd.Series(books_dict)

In [26]:
pd.Series(books_dict2)

1      Thursday Murder Club
2             Almost Single
3    The Fault in our stars
dtype: object

In [27]:
list_s.equals(dict_s)

True

In [28]:
pd.Series(1992)

0    1992
dtype: int64

In [29]:
pd.Series('Gitanjali')

0    Gitanjali
dtype: object

# The .dtype Attribute

In [30]:
pd.Series(age)

0    30
1    21
2    28
dtype: int64

Pandas automatically infers the dtype of series based on data we provided to Series constructor.

However, we can specify dtype ourselves.

In [31]:
pd.Series(age, dtype='float')

0    30.0
1    21.0
2    28.0
dtype: float64

In [32]:
name_series = pd.Series(students)

In [33]:
name_series

0    Gitanjali
1        Abhay
2        Dipti
dtype: object

In [34]:
name_series.dtype

dtype('O')

In [35]:
age_series = pd.Series(age)

In [36]:
age_series

0    30
1    21
2    28
dtype: int64

In [37]:
age_series.dtype

dtype('int64')

In [38]:
height_series = pd.Series(height)

In [39]:
height_series.dtype

dtype('float64')

# What is dtype('O'), Really?
Mostly to do with:
- numpy only dealing with homogeneous(of same fixed size in memory) multi-dimensional arrays.
- Strings not being homogeneous or of fixed size.

In [40]:
heights2 = [120.4, '102.5',167.8]

In [41]:
pd.Series(heights2)

0    120.4
1    102.5
2    167.8
dtype: object

# Index and RangeIndex
- Create Custome index/labels
- When not specified, Pandas creates immutable RangeIndex for performance optimization reasons 

In [42]:
books_list

['Thursday Murder Club', 'Almost Single', 'The Fault in our stars']

In [43]:
list_s

0      Thursday Murder Club
1             Almost Single
2    The Fault in our stars
dtype: object

We can provide our own labels using "Index" parameter of pd.Series() constructor.

In [44]:
# using keyword/named arguments
pd.Series(data=books_list, index=['Thriller', 'Humour', 'Romantic'])

Thriller      Thursday Murder Club
Humour               Almost Single
Romantic    The Fault in our stars
dtype: object

In [45]:
# Using positional arguments
pd.Series(books_list, ['Thriller', 'Humour', 'Romantic'])

Thriller      Thursday Murder Club
Humour               Almost Single
Romantic    The Fault in our stars
dtype: object

In [46]:
# using mixed arguments
pd.Series(books_list, ['Thriller', 'Humour', 'Romantic'], dtype = 'object')

Thriller      Thursday Murder Club
Humour               Almost Single
Romantic    The Fault in our stars
dtype: object

In [47]:
pd.__version__

'1.3.5'

In [48]:
# stringdtype() from pandas version 1 and later
pd.Series(books_list, ['Thriller', 'Humour', 'Romantic'], dtype = 'string')

Thriller      Thursday Murder Club
Humour               Almost Single
Romantic    The Fault in our stars
dtype: string

RangeIndex --> x.index
- When we don't specify an index explicitly. 
- Is sequence of integers starting from 0
- Built-in Pandas object

In [49]:
list_s.index

RangeIndex(start=0, stop=3, step=1)

In [50]:
type(list_s.index)

pandas.core.indexes.range.RangeIndex

Creating our own Ranging (RangeIndex)

In [51]:
pd.RangeIndex(start=4, stop=7, step=1)

RangeIndex(start=4, stop=7, step=1)

In [52]:
list(pd.RangeIndex(start=4, stop=7, step=1))

[4, 5, 6]

In [53]:
list(pd.RangeIndex(start=10, stop=-11, step=-1))

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10]

# Series and Index Names
- Series can have names which later become column names in data frames
- series index can also have names

In [54]:
list_s

0      Thursday Murder Club
1             Almost Single
2    The Fault in our stars
dtype: object

In [55]:
# inttelligable: capable of being understood

In [56]:
books_series = list_s

In [57]:
books_series

0      Thursday Murder Club
1             Almost Single
2    The Fault in our stars
dtype: object

In [58]:
# Attribute
books_series.size

3

In [59]:
# Method
list_s.equals(dict_s)

True

In [60]:
# Attribute:
books_series.dtype

dtype('O')

In [61]:
books_series.name

In [62]:
books_series.name == None

True

In [63]:
books_series.name = 'my books read in 2022'

In [64]:
books_series.name

'my books read in 2022'

In [65]:
books_series
# inaddition to dtype attribute, we now have additional Name attribute also now that we have set it

0      Thursday Murder Club
1             Almost Single
2    The Fault in our stars
Name: my books read in 2022, dtype: object

In [66]:
books_series.index.name

In [67]:
books_series.index.name == None

True

In [68]:
books_series.index.name = 'My Books'

In [69]:
books_series
# Additional index name appears on top of labels

My Books
0      Thursday Murder Club
1             Almost Single
2    The Fault in our stars
Name: my books read in 2022, dtype: object

# Skills Challenge

Task 1.
- Create a Python list of length 4 that contains some of your favaourite actors. 
- So this should be a list of strings.
- Call this list-nassign it to a variable: actor_names

In [70]:
actor_names = ['Leonardo', 'Blake', 'Kate', 'Reynolds']

Task 2.
- Create another list of same length that contains your guesses of how old each actor is, feel free to just use integers or floats.
- Call thsi list actor_ages

In [71]:
actor_ages = [58, 44, 60, 48]

Task 3.
- Create a series that contains actors ages and labels the ages using the actor names.
- Give this series a name of actors.

In [72]:
actors = pd.Series(actor_ages, index= actor_names, name = 'actors')

In [73]:
actors

Leonardo    58
Blake       44
Kate        60
Reynolds    48
Name: actors, dtype: int64

Task 4.
- Repeat step 3 but this time create the series from a Python Dictionary.
- As an additional challenge, try not to type the dictionary manually but instead dynamically create it using the two lists defined in step 1 and step 2.

In [74]:
actor_dict = dict(zip(actor_names, actor_ages)) #pure functions
actor_dict2 = {name:age for name, age in zip(actor_names, actor_ages)} #dict comprehennsions and pure function

In [75]:
pd.Series(actor_dict, name = 'actors')

Leonardo    58
Blake       44
Kate        60
Reynolds    48
Name: actors, dtype: int64

In [76]:
pd.Series(actor_dict2, name = 'actors')

Leonardo    58
Blake       44
Kate        60
Reynolds    48
Name: actors, dtype: int64

# The head() and tail() Method

In [84]:
int_series = pd.Series(range(10))

In [78]:
int_series

0    0
1    1
2    2
3    3
4    4
dtype: int64

In [85]:
int_series.size

10

In [86]:
len(int_series)
# Python men() function also works just fine

10

In [87]:
# To sneak peek 1st 5 elements of series:
int_series.head()

0    0
1    1
2    2
3    3
4    4
dtype: int64

In [88]:
# To sneak peek last 5 elements of series:
int_series.tail()

5    5
6    6
7    7
8    8
9    9
dtype: int64

In [91]:
# We can also specify the number of elements we want to see using the n parameter of these methods:
int_series.head(n=3) # or int_series.head(3)

0    0
1    1
2    2
dtype: int64

In [92]:
int_series.tail(n=4) # or int_series.tail(4)

6    6
7    7
8    8
9    9
dtype: int64

In [97]:
# Notebook environments already truncate teh output of we have large series:
pd.Series(range(10000000))

0                0
1                1
2                2
3                3
4                4
            ...   
9999995    9999995
9999996    9999996
9999997    9999997
9999998    9999998
9999999    9999999
Length: 10000000, dtype: int64

In [101]:
# can change display options to chnage above behaviour
# i.e. force pands to return n rows of series 
pd.options.display.min_rows = 40

# Extracting By Index Position

In [103]:
from string import ascii_lowercase

In [105]:
ascii_lowercase

'abcdefghijklmnopqrstuvwxyz'

In [104]:
letters = list(ascii_lowercase)

In [None]:
letters

In [107]:
alphabet = pd.Series(letters)

In [108]:
alphabet.head(6)

0    a
1    b
2    c
3    d
4    e
5    f
dtype: object

In [109]:
# What is the 1st letter?
# What is the 11th letter?
# What are the first three letters?
# What are the sixth through tenth letters?
# What are the last six letters?

In [110]:
# 1
alphabet[0]

'a'

In [111]:
# 2
alphabet[10]

'k'

In [112]:
# 3
alphabet[:3]

0    a
1    b
2    c
dtype: object

In [113]:
# 4
alphabet[5:10]

5    f
6    g
7    h
8    i
9    j
dtype: object

In [114]:
# 5
alphabet[-6:]

20    u
21    v
22    w
23    x
24    y
25    z
dtype: object

# Accessing Elements by Label

In [132]:
from string import ascii_uppercase

In [133]:
ascii_uppercase

'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [134]:
labeled_alphabet = pd.Series(data = list(ascii_lowercase), index = list(map(lambda i: 'label_'+i, list(ascii_uppercase))))

In [135]:
labeled_alphabet.head(3)

label_A    a
label_B    b
label_C    c
dtype: object

In [136]:
# What is the 1st letter?
# What is the 11th letter?
# What are the first three letters?
# What are the sixth through tenth letters?
# What are the last six letters?

In [137]:
# 1
# labeled_alphabet[0] #the first approach by index position
labeled_alphabet['label_A']

'a'

In [138]:
# 2
# labeled_alphabet[10] #the first approach by index position
labeled_alphabet['label_K']

'k'

In [139]:
# 3
# labeled_alphabet[:3] #the first approach by index position
labeled_alphabet[:'label_C']
# Notice how stop is inclusive in labels and exclusive in position

label_A    a
label_B    b
label_C    c
dtype: object

In [140]:
# 4
# labeled_alphabet[5:10] #the first approach by index position
labeled_alphabet['label_F':'label_J']

label_F    f
label_G    g
label_H    h
label_I    i
label_J    j
dtype: object

In [141]:
# 5
# labeled_alphabet[-6:] #the first approach by index position
labeled_alphabet['label_U':]

label_U    u
label_V    v
label_W    w
label_X    x
label_Y    y
label_Z    z
dtype: object

# The add_prefix() and The add_suffix() Methods

In [128]:
alphabet.head(3)

0    a
1    b
2    c
dtype: object

In [130]:
alphabet.add_prefix('label_').head(3)

label_0    a
label_1    b
label_2    c
dtype: object

In [131]:
alphabet.add_suffix('_end_cool').head(3)

0_end_cool    a
1_end_cool    b
2_end_cool    c
dtype: object

In [150]:
# Redoing below logic using add_prefix() method
# labeled_alphabet = pd.Series(data = list(ascii_lowercase), index = list(map(lambda i: 'label_'+i, list(ascii_uppercase))))
prefix_alphabet = pd.Series(data = list(ascii_lowercase), index = list(ascii_uppercase)).add_prefix('label_')

In [151]:
prefix_alphabet.head(3)

label_A    a
label_B    b
label_C    c
dtype: object

In [156]:
alphabet.head(3)
# the original alphabet series is not modified by these methods. It remains intact 
# we just got a copy of series where the index was modified.

0_end_cool_end_cool    a
1_end_cool_end_cool    b
2_end_cool_end_cool    c
dtype: object

In [155]:
# to actually modify teh series, assign it to the actual variable name:
alphabet = alphabet.add_suffix('_end_cool').head(3)

In [158]:
alphabet.head(3)

0_end_cool_end_cool    a
1_end_cool_end_cool    b
2_end_cool_end_cool    c
dtype: object

# Using Dot Notation

In [159]:
labeled_alphabet.label_V

'v'

In [160]:
labeled_alphabet['label_V']

'v'

In [161]:
labeled_alphabet['label_V':'label_Y']

label_V    v
label_W    w
label_X    x
label_Y    y
dtype: object

In [163]:
# labeled_alphabet.label_V:label_Y # We will get NameError: name 'label_Y' is not defined

# Boolean Masks and the .loc Indexer