
# Pandas for Data Science

Pandas is a powerful library for data manipulation and analysis. It is built on top of NumPy and provides data structures like Series and DataFrame to handle structured data efficiently.

This notebook covers some of the most important Pandas functionalities that are essential for data science.

## Table of Contents
1. [Importing Pandas](#importing-pandas)
2. [Series](#series)
3. [DataFrame](#dataframe)
4. [Reading Data](#reading-data)
5. [Data Inspection](#data-inspection)
6. [Data Selection and Filtering](#data-selection-and-filtering)
7. [Data Cleaning](#data-cleaning)
8. [Data Aggregation and Grouping](#data-aggregation-and-grouping)
9. [Merging and Joining](#merging-and-joining)
10. [Basic Statistics](#basic-statistics)

---



## Importing Pandas

Let's start by importing the pandas library.


In [None]:

import pandas as pd



## Series

A Pandas Series is a one-dimensional labeled array capable of holding any data type. Let's create a Series.


In [None]:

# Creating a Pandas Series
data = [10, 20, 30, 40, 50]
series = pd.Series(data, index=["a", "b", "c", "d", "e"])
print(series)



## DataFrame

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different data types. It's similar to a table in a database or an Excel spreadsheet.

Let's create a DataFrame.


In [None]:

# Creating a DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [24, 27, 22],
    "City": ["New York", "Paris", "London"]
}
df = pd.DataFrame(data)
print(df)



## Reading Data

Pandas can read data from various file formats such as CSV, Excel, and more. Here's how to read data from a CSV file.


In [None]:

# Reading data from a CSV file (assuming a CSV file is available)
# df = pd.read_csv('data.csv')  # Uncomment and replace with an actual file path
# print(df.head())



## Data Inspection

You can inspect your DataFrame using various methods. Here are some of the most commonly used ones:


In [None]:

# Inspecting the data
print(df.head())   # First 5 rows
print(df.info())   # Summary of the DataFrame
print(df.describe())  # Descriptive statistics for numeric columns



## Data Selection and Filtering

You can select and filter specific rows and columns from your DataFrame.


In [None]:

# Selecting a column
print(df["Name"])

# Selecting multiple columns
print(df[["Name", "Age"]])

# Filtering rows based on a condition
filtered_df = df[df["Age"] > 23]
print(filtered_df)



## Data Cleaning

Handling missing or incorrect data is crucial in data science. Pandas provides functions for handling missing values.


In [None]:

# Checking for missing values
print(df.isnull().sum())

# Filling missing values
df_filled = df.fillna(0)

# Dropping rows with missing values
df_dropped = df.dropna()



## Data Aggregation and Grouping

You can aggregate data and perform group-based operations in Pandas.


In [None]:

# Grouping data by a column and calculating the mean
grouped_df = df.groupby("City")["Age"].mean()
print(grouped_df)



## Merging and Joining

Pandas allows you to merge and join DataFrames, similar to SQL joins.


In [None]:

# Creating another DataFrame
data2 = {
    "Name": ["Alice", "Bob", "David"],
    "Salary": [70000, 80000, 60000]
}
df2 = pd.DataFrame(data2)

# Merging DataFrames
merged_df = pd.merge(df, df2, on="Name", how="inner")
print(merged_df)



## Basic Statistics

Pandas makes it easy to calculate statistics like mean, median, standard deviation, and more.


In [None]:

# Basic statistics
print(df["Age"].mean())   # Mean of the Age column
print(df["Age"].median())  # Median
print(df["Age"].std())     # Standard deviation
