# Pandas: Essential Data Manipulation Library #
Pandas is a powerful Python library for data manipulation and analysis. It provides:

* High-performance, easy-to-use data structures (DataFrame and Series)
* Tools for reading/writing various file formats (CSV, Excel, SQL databases)
* Data alignment and integrated handling of missing data
* Reshaping, merging, and pivoting data
* Robust time series functionality

Key features that make it essential for Data Science:

1. Efficient data handling of large datasets
2. Built-in data visualization
3. Powerful data aggregation and transformation operations
4. Seamless integration with other scientific Python libraries

In [1]:
import pandas as pd

# Create a sample list of dictionaries
data = [
    {'name': 'Prajwal', 'age': 21, 'city': 'lalitpur'},
    {'name': 'Anil', 'age': 20, 'city': 'Chapagau'},
    {'name': 'Luv', 'age': 22, 'city': 'Koteshwor'}
]

data.extend([
    {'name': 'Bibek', 'age': 22, 'city': 'Kathmandu'},
    {'name': 'Kritika', 'age': 21, 'city': 'Baneshwor'}
])

# Convert list to DataFrame
df = pd.DataFrame(data)

# Display the DataFrame
df

Unnamed: 0,name,age,city
0,Prajwal,21,lalitpur
1,Anil,20,Chapagau
2,Luv,22,Koteshwor
3,Bibek,22,Kathmandu
4,Kritika,21,Baneshwor


In [2]:
# Save DataFrame to CSV
df.to_csv('sample_data.csv', index=False)

# Read back the CSV file
df_loaded = pd.read_csv('sample_data.csv')

# Display the loaded DataFrame
print("\nLoaded DataFrame:")
df_loaded


Loaded DataFrame:


Unnamed: 0,name,age,city
0,Prajwal,21,lalitpur
1,Anil,20,Chapagau
2,Luv,22,Koteshwor
3,Bibek,22,Kathmandu
4,Kritika,21,Baneshwor


In [3]:
# Basic DataFrame information
print("Basic DataFrame Info:")
print(df.info())
print("\nDataFrame Description:")
print(df.describe())

Basic DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    5 non-null      object
 1   age     5 non-null      int64 
 2   city    5 non-null      object
dtypes: int64(1), object(2)
memory usage: 252.0+ bytes
None

DataFrame Description:
            age
count   5.00000
mean   21.20000
std     0.83666
min    20.00000
25%    21.00000
50%    21.00000
75%    22.00000
max    22.00000


In [4]:
# Basic data exploration
print("\nColumn Names:", df.columns.tolist())
print("\nFirst 2 rows:")
print(df.head(2))
print("\nLast 2 rows:")
print(df.tail(2))


Column Names: ['name', 'age', 'city']

First 2 rows:
      name  age      city
0  Prajwal   21  lalitpur
1     Anil   20  Chapagau

Last 2 rows:
      name  age       city
3    Bibek   22  Kathmandu
4  Kritika   21  Baneshwor


In [5]:
# Basic statistics and aggregations
print("\nAge Statistics:")
print("Average age:", df['age'].mean())
print("Minimum age:", df['age'].min())
print("Maximum age:", df['age'].max())


Age Statistics:
Average age: 21.2
Minimum age: 20
Maximum age: 22


In [6]:
# Filtering and sorting
print("\nPeople older than 30:")
print(df[df['age'] > 30])

print("\nSorted by age (descending):")
print(df.sort_values('age', ascending=False))



People older than 30:
Empty DataFrame
Columns: [name, age, city]
Index: []

Sorted by age (descending):
      name  age       city
2      Luv   22  Koteshwor
3    Bibek   22  Kathmandu
0  Prajwal   21   lalitpur
4  Kritika   21  Baneshwor
1     Anil   20   Chapagau


In [7]:
# Value counts
print("\nCity distribution:")
print(df['city'].value_counts())


City distribution:
city
lalitpur     1
Chapagau     1
Koteshwor    1
Kathmandu    1
Baneshwor    1
Name: count, dtype: int64


In [8]:
# Basic data transformation
df['age_group'] = pd.cut(df['age'], 
                        bins=[20, 25, 30, 35, 40],
                        labels=['20-25', '26-30', '31-35', '36-40'])
print("\nDataFrame with age groups:")
df


DataFrame with age groups:


Unnamed: 0,name,age,city,age_group
0,Prajwal,21,lalitpur,20-25
1,Anil,20,Chapagau,
2,Luv,22,Koteshwor,20-25
3,Bibek,22,Kathmandu,20-25
4,Kritika,21,Baneshwor,20-25
