# Introduction to Pandas Library

Pandas is a powerful Python library for data manipulation and analysis. It provides data structures like **Series** and **DataFrame** for efficient handling of structured data.

## Key Data Structures

- **Series**: A one-dimensional labeled array capable of holding any data type.
- **DataFrame**: A two-dimensional labeled data structure with columns of potentially different data types.

## Key Features

1. **Data Input/Output**: Easily read and write data from various formats (CSV, Excel, SQL databases, etc.).
2. **Data Cleaning**: Handle missing data, duplicates, and outliers effectively.
3. **Data Wrangling**: Reshape, merge, and transform data for analysis.
4. **Data Analysis**: Compute descriptive statistics, perform aggregations, and apply custom functions.
5. **Time Series Analysis**: Analyze time-dependent data with specific functionalities.

Pandas is widely used in data science, machine learning, and various analytical tasks. It simplifies the process of working with data in Python, enabling efficient data exploration and model building.


# Installation

In [None]:
!pip install pandas



# Basic Pandas Commands

Here are some basic pandas commands to help you get started with data manipulation in Python:

## Importing Pandas



In [None]:
import pandas as pd

In [None]:
print(pd.__version__)

2.2.2


## Creating DataFrame

In [None]:
data = {
    'ID': range(1, 11),
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve',
             'Frank', 'Grace', 'Heidi', 'Ivan', 'Judy'],
    'Age': [23, 34, 29, 45, 31, 38, 27, 22, 36, 41],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston',
             'Phoenix', 'Philadelphia', 'San Antonio', 'San Diego',
             'Dallas', 'San Jose']
}

df = pd.DataFrame(data)
df

Unnamed: 0,ID,Name,Age,City
0,1,Alice,23,New York
1,2,Bob,34,Los Angeles
2,3,Charlie,29,Chicago
3,4,David,45,Houston
4,5,Eve,31,Phoenix
5,6,Frank,38,Philadelphia
6,7,Grace,27,San Antonio
7,8,Heidi,22,San Diego
8,9,Ivan,36,Dallas
9,10,Judy,41,San Jose


In [None]:
df.to_csv('data.csv', index=False)


In [None]:
df=pd.read_csv('data.csv')
df

Unnamed: 0,ID,Name,Age,City
0,1,Alice,23,New York
1,2,Bob,34,Los Angeles
2,3,Charlie,29,Chicago
3,4,David,45,Houston
4,5,Eve,31,Phoenix
5,6,Frank,38,Philadelphia
6,7,Grace,27,San Antonio
7,8,Heidi,22,San Diego
8,9,Ivan,36,Dallas
9,10,Judy,41,San Jose


## Viewing Data

In [None]:
df.head()

Unnamed: 0,ID,Name,Age,City
0,1,Alice,23,New York
1,2,Bob,34,Los Angeles
2,3,Charlie,29,Chicago
3,4,David,45,Houston
4,5,Eve,31,Phoenix


In [None]:
df.tail()

Unnamed: 0,ID,Name,Age,City
5,6,Frank,38,Philadelphia
6,7,Grace,27,San Antonio
7,8,Heidi,22,San Diego
8,9,Ivan,36,Dallas
9,10,Judy,41,San Jose


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   ID      10 non-null     int64 
 1   Name    10 non-null     object
 2   Age     10 non-null     int64 
 3   City    10 non-null     object
dtypes: int64(2), object(2)
memory usage: 448.0+ bytes


## Selecting Data

In [None]:
df['Age']

Unnamed: 0,Age
0,23
1,34
2,29
3,45
4,31
5,38
6,27
7,22
8,36
9,41


In [None]:
df.iloc[0]

Unnamed: 0,0
ID,1
Name,Alice
Age,23
City,New York


## Filtering data

In [None]:
filtered_df = df[df['Age'] > 30]
filtered_df

Unnamed: 0,ID,Name,Age,City
1,2,Bob,34,Los Angeles
3,4,David,45,Houston
4,5,Eve,31,Phoenix
5,6,Frank,38,Philadelphia
8,9,Ivan,36,Dallas
9,10,Judy,41,San Jose


## Deleting Columns

In [None]:
df.drop('City', axis=1, inplace=True)
df

Unnamed: 0,ID,Name,Age
0,1,Alice,23
1,2,Bob,34
2,3,Charlie,29
3,4,David,45
4,5,Eve,31
5,6,Frank,38
6,7,Grace,27
7,8,Heidi,22
8,9,Ivan,36
9,10,Judy,41


## Aggregating Data

In [None]:
df.describe()

Unnamed: 0,ID,Age
count,10.0,10.0
mean,5.5,32.6
std,3.02765,7.589466
min,1.0,22.0
25%,3.25,27.5
50%,5.5,32.5
75%,7.75,37.5
max,10.0,45.0


## Sorting

In [None]:
sorted_df = df.sort_values(by='Age', ascending=False)
sorted_df

Unnamed: 0,ID,Name,Age
3,4,David,45
9,10,Judy,41
5,6,Frank,38
8,9,Ivan,36
1,2,Bob,34
4,5,Eve,31
2,3,Charlie,29
6,7,Grace,27
0,1,Alice,23
7,8,Heidi,22


## Handling Missing Data

In [None]:
df.isnull().sum()

Unnamed: 0,0
ID,0
Name,0
Age,0


In [None]:
df.dropna(inplace=True)

In [None]:
df.fillna(value=0, inplace=True)