# Pandas Series – Session Outline

In this session, we will cover the following topics:

1. **What is Pandas?**  
   - Introduction to the Pandas library and its importance in data analysis.

2. **Introduction to Pandas Series**  
   - Understanding Series as a one-dimensional data structure.
   - How to create and access Series.

3. **Series Methods**  
   - Commonly used methods like `head()`, `tail()`, `unique()`, `value_counts()`, and more.

4. **Series Math Methods**  
   - Performing mathematical operations on Series: `sum()`, `mean()`, `max()`, `min()`, etc.

5. **Series with Python Functionalities**  
   - Using built-in Python functions with Series.
   - Applying custom functions using `.apply()`.

6. **Boolean Indexing on Series**  
   - Filtering Series data using conditions.

7. **Plotting Graphs on Series**  
   - Visualizing Series data using Matplotlib and Pandas built-in plotting features.


## What is Pandas?

- **Pandas** is a Python library used for working with data.  
- It is built on top of **NumPy** and is mostly used for **data cleaning, analysis, and manipulation**.  
- Think of Pandas as **Excel for Python – but much more powerful!**

## Why was Pandas created?

Before Pandas, handling large and messy datasets in Python was **hard and slow**.  
Pandas was created to:

- **Load data easily** from Excel, CSV, SQL, and other sources.  
- **Organize data** into rows and columns (like tables).  
- **Analyze data quickly** using built-in functions.  
- **Handle missing or incorrect data** efficiently.

## Key Features of Pandas

- **Fast and Efficient** – Works on large datasets quickly.  
- **Two Main Data Structures:**  
  - **Series** – like a single column (1D).  
  - **DataFrame** – like a table with rows & columns (2D).  
- **Handles Missing Data** – You can fill, drop, or replace missing values.  
- **Data Analysis** – Find sum, average, min, max, etc.  
- **Powerful Indexing** – Access data by labels or conditions.  
- **Data Cleaning** – Remove duplicates, fix data types, etc.  
- **Data Visualization** – Works well with Matplotlib & Seaborn for charts.  
- **Supports Many Data Formats** – CSV, Excel, JSON, SQL databases, etc.

## Real-Life Example of Pandas

Imagine you are a **teacher** and have student marks in an Excel file.

With **Pandas**, you can:

- **Read** that Excel file into Python.  
- **Find average marks** of all students.  
- **Sort students** by marks.  
- **Find top 3 students.**  
- **Draw a graph** of marks.


In [17]:
# !pip show pandas
import numpy as np
import pandas as pd

In [18]:
pandas_version = pd.__version__
pandas_version

'2.3.1'

## What is a Pandas Series?
- A **Series** is like a **single column of data** (1-dimensional).
- It is **similar to a list or array**, but more powerful because it has **labels (indexes)**.
- 
## Understanding Series as a One-Dimensional Data Structure
Think of a **Series** like a single column in Excel.
A Series has:
- **Data** (the values).  
- **Index** (like row numbers).

## How to Create a Series?

You can create a **Series** in many ways, such as:

1. **From a Python list**
2. **From a NumPy array**
3. **From a dictionary**
4. **Using scalar values**


In [19]:
# From a Python list
marks = [89, 56, 81, 94, 98]
s = pd.Series(marks)

# From a NumPy array
marks = np.array([89, 56, 81, 94, 98])

# From a dictionary
# You can create a Series from a Python dictionary where the keys become the index and values become the data.
marks = {'Math': 90, 'Science': 85, 'English': 88}
s = pd.Series(marks)

# Using scalar values
# You can create a Series using a single scalar value, but you must specify the index.
s = pd.Series(5, index=['a', 'b', 'c'])
s

a    5
b    5
c    5
dtype: int64

## How to Access Data in Series?
### Using Index Number

In [20]:
data = {'Amit': 85, 'Sara': 90, 'Ravi': 78}
s = pd.Series(data)
print(s[0])

85


  print(s[0])


### Using Label Name

In [21]:
s["Amit"]

np.int64(85)

### Slicing (Like Python Lists)

In [22]:
print(s[0:2])   # First two elements

Amit    85
Sara    90
dtype: int64


## Basic Information Methods
These methods help you understand your data quickly:

In [38]:
s = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100, np.nan])
s

0      10.0
1      20.0
2      30.0
3      40.0
4      50.0
5      60.0
6      70.0
7      80.0
8      90.0
9     100.0
10      NaN
dtype: float64

In [39]:
# Shows the first 5 elements.
s.head(5)

0    10.0
1    20.0
2    30.0
3    40.0
4    50.0
dtype: float64

In [40]:
# Shows the last 5 elements.
s.tail(5)

6      70.0
7      80.0
8      90.0
9     100.0
10      NaN
dtype: float64

In [41]:
# Shows the index values.
s.index

RangeIndex(start=0, stop=11, step=1)

In [42]:
# Shows the data values.
s.values

array([ 10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.,  nan])

In [43]:
# Shows data type (e.g., int64, float64).
s.dtype

dtype('float64')

In [44]:
# Number of elements.
s.size

11

In [45]:
# Shape (like (5,) for 5 elements).
s.shape

(11,)

In [46]:
# Checks for missing values (True/False).
s.isnull()

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10     True
dtype: bool

In [47]:
# Opposite of isnull().
s.notnull()

0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8      True
9      True
10    False
dtype: bool

## Math & Stats Methods
These methods calculate useful numbers:

In [48]:
# Adds all values
s.sum()

np.float64(550.0)

In [49]:
# Average value
s.mean()

np.float64(55.0)

In [51]:
# Middle value
s.median()

np.float64(55.0)

In [52]:
# Biggest number
s.max()

np.float64(100.0)

In [53]:
# Smallest number
s.min()

np.float64(10.0)

In [54]:
# Standard deviation (spread of data)
s.std()

np.float64(30.276503540974915)

In [55]:
# Counts non-null values.
s.count()

np.int64(10)