# Agenda
- Fundamentals of Pandas
  - Purpose
  - Features
- Data Strutures
- Introduction to Series
  - Creating and Accessing Pandas Series using different methods
  - Basic Information on Pandas Series
  - Operations and Transformations
  - Querying in Series
- Introduction to DataFrame
  - Creating a DataFrame using different methods
  - Accessing DataFrame
  - Understanding DataFrame Basics
- Introduction to Statistical Operations in Pandas
  - Descriptive Statistics
  - Mean, Median and Standard Deviation
  - Correlation Analysis
- Date and TimeDelta in Pandas
  - Date Handling
  - Time Delta
    - Creating TimeDelta
    - Performing Arithmetic Ops on Date and Time using TimeDelta
    - Resampling Time Series
- Categorical Data Handling
  - Creating a Categorical Variable
  - Counting Occurences
  - Creating Dummy Variables
  - Label Encoding
- Handling Text Data
- Iteration
  - Iterating Over Rows
  - Apply function
  - Vectorized Operations
  - Iterating over a Series
- Sorting
  - Sorting DataFrame by Column
  - Sorting DataFrame by Multiple Column
  - Sorting DataFrame by index
  - Sorting Series
- Plotting with Pandas

## Fundamentals 
Pandas is an open source library built on top of numpy and is used for data manipulation, data analysis, Data Cleaning, data visualization. The name `pandas` is coming the words Panel Data. The pandas library introduces few Data Structure like Series, DataFrame which makes working with structured data more efficient.
### __Purpose of Pandas__
![link text](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/ADSP_Images/Lesson_04_Working_with_Pandas/1_Introduction_to_Pandas/Purpose_of_Pandas.png)

### __Features of Pandas__
![link text](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/ADSP_Images/Updated_Images/Lesson_4/4_01/Features_of_Pandas.png)


## Data Structures
The are Two main Data Structure is pandas 

![link text](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/ADSP_Images/Lesson_04_Working_with_Pandas/1_Introduction_to_Pandas/Data_Structures.png)


## Introduction to Series

A pandas Series is one dimensional array like structure containing and respective label/index


It can be created with different data inputs:
![link text](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/ADSP_Images/Updated_Images/Lesson_4/4_01/Introduction_to_Series.png)

## Create and Access Series using different methods

to create Series, pandas has class called as Series with initialization for labels. systactically<br>
$$ seriesname =  pandas.Series(sequence, index = list of indices) $$

In [1]:
# Install Pandas 
!pip install pandas



In [2]:
# Import pandas and numpy
import numpy as np
import pandas as pd


In [3]:
# Creating Pandas Series using List
data =  [1,2,3,4,5]
sr = pd.Series(data)

# Create a series with user defined index using list
idx = ['a', 'b', 'c', 'd', 'e']
sr_with_index = pd.Series(data, index =  idx)


In [4]:
print(sr)

0    1
1    2
2    3
3    4
4    5
dtype: int64


In [5]:
print(sr_with_index)

a    1
b    2
c    3
d    4
e    5
dtype: int64


In [7]:
arr =  np.array(data)
print(arr)

[1 2 3 4 5]


In [9]:
print(sr[4])
print(sr_with_index['e'])

5
5


In [11]:
# Create a Series using Dictionary
sr_with_dict = pd.Series({'Shirts':2, 'Trousers':1, 'Shoes':1, 'Watches':2, 'Wallets':3, 'TShirt': 3, 
                          'Belts':2, 'Socks':2})

In [12]:
print(sr_with_dict)

Shirts      2
Trousers    1
Shoes       1
Watches     2
Wallets     3
TShirt      3
Belts       2
Socks       2
dtype: int64


In [14]:
sr_with_dict['Shirts']

2

In [16]:
# Create a Series using Dictionary with user defined index
dict1 = {'Shirts':2, 'Trousers':1, 'Shoes':1, 'Watches':2, 'Wallets':3, 'TShirt': 3, 'Belts':2, 'Socks':2}
sr_with_dict_idx = pd.Series(dict1, index = ['Shirts', 'Trousers', 'Shoes', 'Watches', 
                                                           'Wallets', 'TShirt', 'Socks', 'Perfume'])

In [17]:
sr_with_dict_idx

Shirts      2.0
Trousers    1.0
Shoes       1.0
Watches     2.0
Wallets     3.0
TShirt      3.0
Socks       2.0
Perfume     NaN
dtype: float64

## Basic Information in pandas Series
These functions collectively helps in summarizing and understanding the characterstics of Data, facilitating effective data exploration and analysis.

In [19]:
# display the first n rows -  head()
first_n_rows =  sr_with_dict_idx.head(3) # default count is 5s
first_n_rows

Shirts      2.0
Trousers    1.0
Shoes       1.0
dtype: float64

In [20]:
# display the last n rows -  tail()
last_n_rows =  sr_with_dict_idx.tail(3) # default count is 5s
last_n_rows

TShirt     3.0
Socks      2.0
Perfume    NaN
dtype: float64

In [22]:
# general summary - info()
sr_with_dict_idx.info()

<class 'pandas.core.series.Series'>
Index: 8 entries, Shirts to Perfume
Series name: None
Non-Null Count  Dtype  
--------------  -----  
7 non-null      float64
dtypes: float64(1)
memory usage: 428.0+ bytes


In [30]:
print(type(sr_with_dict_idx))
print(sr_with_dict_idx.index)
print(len(sr_with_dict_idx.index))
print(sr_with_dict_idx.name)
print(sr_with_dict_idx.count())
print(sr_with_dict_idx.dtype)
print(sr_with_dict_idx.memory_usage())

<class 'pandas.core.series.Series'>
Index(['Shirts', 'Trousers', 'Shoes', 'Watches', 'Wallets', 'TShirt', 'Socks',
       'Perfume'],
      dtype='object')
8
None
7
float64
428


In [31]:
# extract index
print(sr_with_dict_idx.index)

Index(['Shirts', 'Trousers', 'Shoes', 'Watches', 'Wallets', 'TShirt', 'Socks',
       'Perfume'],
      dtype='object')


In [32]:
# extract values
print(sr_with_dict_idx.values)  ***

[ 2.  1.  1.  2.  3.  3.  2. nan]


In [33]:
# num of dimensions
print(sr_with_dict_idx.ndim)

1


In [34]:
# shape
print(sr_with_dict_idx.shape)

(8,)


In [35]:
# size
print(sr_with_dict_idx.size)

8
