# Pandas Enhances Numpy with the following features:
    1. Data labels[named columns] and descriptive indices
    2. Robust handling of common data formats and missing data
    3. Relational-database operations
 ## Learning Objectives
    1. Creating Series objects from Python lists and dicts
    2. Extraction indexes and values
    3. Indexing Series objects implicitly and explicitly
    4. Integer or Labeled Location explicitly mentioned
    5. Advanced Indexing -> Boolean mask

In [None]:
# installation
# jupyter notebook
!pip install pandas
# Terminal 
pip install pandas

In [None]:
# import pandas -> follow the convention :)
import pandas as pd

# Series 
    Series are like one-dimensional arrays with labels/names!
    Series are building blocks of Pandas DataFrames which we'll cover in the next section

In [4]:
# creating a series from a list
squares_series = pd.Series([0, 1, 4, 9, 16, 25], name='squares')
squares_series

0     0
1     1
2     4
3     9
4    16
5    25
Name: squares, dtype: int64

In [5]:
# Extracting arrays from series
print(f"array value of a series object:")
squares_series.values

array value of a series object:


array([ 0,  1,  4,  9, 16, 25], dtype=int64)

In [6]:
# Extracting indices
print(f"indices of a series object:")
print("[INFO] This is called implicit indexing because we as users didn't specify as a index for square_series")
squares_series.index 

indices of a series object:
[INFO] This is called implicit indexing because we as users didn't specify as a index for square_series


RangeIndex(start=0, stop=6, step=1)

In [7]:
print("Using numeric indecies to extract items")
print(f"item at index 3: {squares_series[3]}")

Using numeric indecies to extract items
item at index 3: 9


In [8]:
print("Slicing")
print(f"items from index 2 to 4(excluded):\n{squares_series[2: 4]}")

Slicing
items from index 2 to 4(excluded):
2    4
3    9
Name: squares, dtype: int64


In [9]:
print("Getting the name:")
squares_series.name

Getting the name:


'squares'

In [11]:
print("Computer programming languages' popularity in 2022")
print("Explicit indexing")
print("[NOTE] These numbers are not extracted from a survey/report whatsoever")
values = [99.3, 92.4, 95.5, 95, 84.5, 56, 84.8, 100, 74.3, 78.9]
indexes = ['Java', 'C', 'C++', 'Python', 'C#', 'PHP', 'JavaScript', 'Ruby', 'R', 'Matlab']
popularity_2022 = pd.Series(values, index=indexes)
popularity_2022

Computer programming languages' popularity in 2022
Explicit indexing
[NOTE] These numbers are not extracted from a survey/report whatsoever


Java           99.3
C              92.4
C++            95.5
Python         95.0
C#             84.5
PHP            56.0
JavaScript     84.8
Ruby          100.0
R              74.3
Matlab         78.9
dtype: float64

In [14]:
print("Creating Series using dictionaries")
popularity_dict_2021_dict = {"C": 99.9,  "C#": 91.3, "C++": 99.4, "Java": 92,  "JavaScript": 83,
                              "Matlab": 53, "PHP": 83,  "Python": 100,  "R": 84.8, "Ruby": 76.2}
popularity_dict_2021 = pd.Series(popularity_dict_2021_dict)
popularity_dict_2021

Creating Series using dictionaries


C              99.9
C#             91.3
C++            99.4
Java           92.0
JavaScript     83.0
Matlab         53.0
PHP            83.0
Python        100.0
R              84.8
Ruby           76.2
dtype: float64

In [16]:
# indexes are names/labels not numbers/range!
popularity_2022

Java           99.3
C              92.4
C++            95.5
Python         95.0
C#             84.5
PHP            56.0
JavaScript     84.8
Ruby          100.0
R              74.3
Matlab         78.9
dtype: float64

In [17]:
print('indexes: ')
popularity_2022.index

indexes: 


Index(['Java', 'C', 'C++', 'Python', 'C#', 'PHP', 'JavaScript', 'Ruby', 'R',
       'Matlab'],
      dtype='object')

In [18]:
print("Using numeric indecies to extract items")
print(f"item at index 0: {popularity_2022[0]}")

Using numeric indecies to extract items
item at index 0: 99.3


In [19]:
print("Slicing")
print(f"items from index 0 to 2(excluded):\n{popularity_2022[0: 2]}")

Slicing
items from index 0 to 2(excluded):
Java    99.3
C       92.4
dtype: float64


In [20]:
print("Extracting items using named indices")
print(f"item at index Python: {popularity_2022['Python']}")

Extracting items using named indices
item at index Python: 95.0


In [21]:
print("Named Slicing")
print(f"items from index C++ to C#(Included!!!!):\n{popularity_2022['C++':'C#']}")

Named Slicing
items from index C++ to C#(Included!!!!):
C++       95.5
Python    95.0
C#        84.5
dtype: float64


## Integer or Labeled Location explicitly mentioned 

In [22]:
print("Integer Location. Although popularity series is indexed with a list of labels, integer indexing can be applied to it")
popularity_2022.iloc[0: 2]

Integer Location. Although popularity series is indexed with a list of labels, integer indexing can be applied to it


Java    99.3
C       92.4
dtype: float64

In [24]:
print("Label Location")
popularity_2022.loc[:'Ruby']

Label Location


Java           99.3
C              92.4
C++            95.5
Python         95.0
C#             84.5
PHP            56.0
JavaScript     84.8
Ruby          100.0
dtype: float64

In [25]:
print("Raises error if loc is used to get an item using integer indexes!")
popularity_2022.loc[:1]

Raises error if loc is used to get an item using integer indexes!


TypeError: cannot do slice indexing on Index with these indexers [1] of type int

### advance indexing like boolean mask

In [26]:
print("Condition and its output. It's like a boolean mask in NumPy as a series.")
popularity_2022 > 90

Condition and its output. It's like a boolean mask in NumPy as a series.


Java           True
C              True
C++            True
Python         True
C#            False
PHP           False
JavaScript    False
Ruby           True
R             False
Matlab        False
dtype: bool

In [27]:
# numpy boolean mask
(popularity_2022 > 90).values

array([ True,  True,  True,  True, False, False, False,  True, False,
       False])

In [28]:
print("Applying the boolean mask")
popularity_2022[popularity_2022 > 90]

Applying the boolean mask


Java       99.3
C          92.4
C++        95.5
Python     95.0
Ruby      100.0
dtype: float64

In [29]:
print("Getting items with a list of noncontinuous indices")
popularity_2022[[0, 3, 5]]

Getting items with a list of noncontinuous indices


Java      99.3
Python    95.0
PHP       56.0
dtype: float64

In [30]:
print("Assigning values to a list of noncontinuous indices")
popularity_2022[[0, 3, 5]] = [99, 100, 90]
popularity_2022

Assigning values to a list of noncontinuous indices


Java           99.0
C              92.4
C++            95.5
Python        100.0
C#             84.5
PHP            90.0
JavaScript     84.8
Ruby          100.0
R              74.3
Matlab         78.9
dtype: float64

In [31]:
# rounding values to 100 
popularity_2022[popularity_2022 > 90] = [100 for mask in (popularity_2022 > 90).values if mask]
popularity_2022

Java          100.0
C             100.0
C++           100.0
Python        100.0
C#             84.5
PHP            90.0
JavaScript     84.8
Ruby          100.0
R              74.3
Matlab         78.9
dtype: float64

*:)*