## Data Manipulation and Analysis with Pandas
Data manipulation and analysis are key tasks in any data science or data analysis project. Pandas provides a wide range or functions for data manipulation and analysis, making it easier to clean, transform, and extract insights from data. In this lesson, we will cover various data manipulation and analysis techniques using Pandas.

In [1]:
import pandas as pd

In [23]:
df = pd.read_csv('data.csv')
df.tail(3)  # Display the last 3 rows of the DataFrame

Unnamed: 0,Date,Category,Value,Product,Sales,Region
5,2023-01-06,Electronics,1200.0,Tablet,450,South
6,2023-01-07,Home,1800.0,Refrigerator,800,East
7,2023-01-08,Clothing,,Jeans,350,West


In [24]:
df.describe()  # Get a summary of the DataFrame

Unnamed: 0,Value,Sales
count,6.0,8.0
mean,1133.333333,512.5
std,471.87569,174.744712
min,500.0,300.0
25%,850.0,387.5
50%,1100.0,475.0
75%,1425.0,625.0
max,1800.0,800.0


In [10]:
df.dtypes

Date         object
Category     object
Value       float64
Product      object
Sales         int64
Region       object
dtype: object

In [13]:
# Handling missing values
df.isnull().any()

Date        False
Category    False
Value        True
Product     False
Sales       False
Region      False
dtype: bool

In [21]:
num = df['Value'].sum()  # Sum of the 'Value' column
df = df.fillna(0)  # Fill missing values with 0
print(f"All values:\n{df['Value'].tolist()}")  # Print all values in the 'Value' column

print(f"Total Value: {num}")

All values:
[1000.0, 500.0, 1500.0, 0.0, 800.0, 1200.0, 1800.0, 0.0]
Total Value: 6800.0


In [31]:
df.drop(columns=['Sales_fillNa'], inplace=True)  # Drop the 'Sales_fillNa' column

In [32]:
# Filling missing values with the mean of the column
df['Value_fillNa'] = df['Value'].fillna(df['Value'].mean())
df

Unnamed: 0,Date,Category,Value,Product,Sales,Region,Value_fillNa
0,2023-01-01,Electronics,1000.0,Smartphone,500,North,1000.0
1,2023-01-02,Clothing,500.0,T-shirt,300,South,500.0
2,2023-01-03,Electronics,1500.0,Laptop,700,East,1500.0
3,2023-01-04,Home,,Vacuum Cleaner,400,West,1133.333333
4,2023-01-05,Clothing,800.0,Jacket,600,North,800.0
5,2023-01-06,Electronics,1200.0,Tablet,450,South,1200.0
6,2023-01-07,Home,1800.0,Refrigerator,800,East,1800.0
7,2023-01-08,Clothing,,Jeans,350,West,1133.333333


In [None]:
# Rename columns
df.rename(columns={'Value_fillNa': 'Value Filled'}, inplace=True)  # Rename the column
df.rename(columns={'Date': 'Sales Date'}, inplace=True)  # Rename the 'Date' column
df.head() # Display the first few rows of the DataFrame

Unnamed: 0,Sales Date,Category,Value,Product,Sales,Region,Value Filled
0,2023-01-01,Electronics,1000.0,Smartphone,500,North,1000.0
1,2023-01-02,Clothing,500.0,T-shirt,300,South,500.0
2,2023-01-03,Electronics,1500.0,Laptop,700,East,1500.0
3,2023-01-04,Home,,Vacuum Cleaner,400,West,1133.333333
4,2023-01-05,Clothing,800.0,Jacket,600,North,800.0


In [40]:
df['Value + 10%'] = df['Value'].apply(lambda x: x * 1.1)

In [41]:
df

Unnamed: 0,Sales Date,Category,Value,Product,Sales,Region,Value Filled,Value + 10%
0,2023-01-01,Electronics,1000.0,Smartphone,500,North,1000.0,1100.0
1,2023-01-02,Clothing,500.0,T-shirt,300,South,500.0,550.0
2,2023-01-03,Electronics,1500.0,Laptop,700,East,1500.0,1650.0
3,2023-01-04,Home,,Vacuum Cleaner,400,West,1133.333333,
4,2023-01-05,Clothing,800.0,Jacket,600,North,800.0,880.0
5,2023-01-06,Electronics,1200.0,Tablet,450,South,1200.0,1320.0
6,2023-01-07,Home,1800.0,Refrigerator,800,East,1800.0,1980.0
7,2023-01-08,Clothing,,Jeans,350,West,1133.333333,
