# Pandas: Essential Data Manipulation Library

Pandas is a powerful Python library for data manipulation and analysis. It provides:

- High-performance, easy-to-use data structures (DataFrame and Series)
- Tools for reading/writing various file formats (CSV, Excel, SQL databases)
- Data alignment and integrated handling of missing data
- Reshaping, merging, and pivoting data
- Robust time series functionality

Key features that make it essential for Data Science:
1. Efficient data handling of large datasets
2. Built-in data visualization
3. Powerful data aggregation and transformation operations
4. Seamless integration with other scientific Python libraries

In [15]:
import pandas as pd

# Create a sample list of dictionaries
data = [
    {'name': 'John', 'age': 25, 'city': 'New York'},
    {'name': 'Alice', 'age': 30, 'city': 'San Francisco'},
    {'name': 'Bob', 'age': 35, 'city': 'Chicago'}
]

data.extend([
    {'name': 'Emma', 'age': 28, 'city': 'Boston'},
    {'name': 'David', 'age': 32, 'city': 'Seattle'}
])

# Convert list to DataFrame
df = pd.DataFrame(data)

# Display the DataFrame
df

Unnamed: 0,name,age,city
0,John,25,New York
1,Alice,30,San Francisco
2,Bob,35,Chicago
3,Emma,28,Boston
4,David,32,Seattle


In [9]:
# Save DataFrame to CSV
df.to_csv('sample_data.csv', index=False)

# Read back the CSV file
df_loaded = pd.read_csv('sample_data.csv')

# Display the loaded DataFrame
print("\nLoaded DataFrame:")
df_loaded


Loaded DataFrame:


Unnamed: 0,name,age,city
0,John,25,New York
1,Alice,30,San Francisco
2,Bob,35,Chicago
3,Emma,28,Boston
4,David,32,Seattle


In [None]:
# Basic DataFrame information
print("Basic DataFrame Info:")
print(df.info())
print("\nDataFrame Description:")
print(df.describe())


In [10]:
# Basic data exploration
print("\nColumn Names:", df.columns.tolist())
print("\nFirst 2 rows:")
print(df.head(2))
print("\nLast 2 rows:")
print(df.tail(2))



Column Names: ['name', 'age', 'city']

First 2 rows:
    name  age           city
0   John   25       New York
1  Alice   30  San Francisco

Last 2 rows:
    name  age     city
3   Emma   28   Boston
4  David   32  Seattle


In [11]:
# Basic statistics and aggregations
print("\nAge Statistics:")
print("Average age:", df['age'].mean())
print("Minimum age:", df['age'].min())
print("Maximum age:", df['age'].max())



Age Statistics:
Average age: 30.0
Minimum age: 25
Maximum age: 35


In [12]:
# Filtering and sorting
print("\nPeople older than 30:")
print(df[df['age'] > 30])

print("\nSorted by age (descending):")
print(df.sort_values('age', ascending=False))



People older than 30:
    name  age     city
2    Bob   35  Chicago
4  David   32  Seattle

Sorted by age (descending):
    name  age           city
2    Bob   35        Chicago
4  David   32        Seattle
1  Alice   30  San Francisco
3   Emma   28         Boston
0   John   25       New York


In [13]:
# Value counts
print("\nCity distribution:")
print(df['city'].value_counts())



City distribution:
city
New York         1
San Francisco    1
Chicago          1
Boston           1
Seattle          1
Name: count, dtype: int64


In [14]:
# Basic data transformation
df['age_group'] = pd.cut(df['age'], 
                        bins=[20, 25, 30, 35, 40],
                        labels=['20-25', '26-30', '31-35', '36-40'])
print("\nDataFrame with age groups:")
df


DataFrame with age groups:


Unnamed: 0,name,age,city,age_group
0,John,25,New York,20-25
1,Alice,30,San Francisco,26-30
2,Bob,35,Chicago,31-35
3,Emma,28,Boston,26-30
4,David,32,Seattle,31-35
