# Data Manipulation

### Adding/Removing Columns (assign(), drop()) 

### Data manipulation is a key skill in data analysis, where you shape your data to gain insights. Let's break down how to manipulate data using the common methods 

In [88]:
import pandas as pd

###  Add/Remove Columns

### add/remove - assign(),drop()

In [91]:
#Imagine you’re analyzing sales data and you need to add a new column for taxes or remove unnecessary columns.

In [92]:
data = {'Product': ['Laptop', 'Mobile', 'Tablet', 'Monitor'],
        'Price': [1200, 800, 300, 400],
        'Units_Sold': [100, 150, 80, 60]}

In [93]:
df=pd.DataFrame(data)

In [94]:
#Add a Column Using assign()
#You want to add a new column to calculate the total revenue (Price * Units_Sold):

In [95]:
df=df.assign(Total_Revenue=df['Price']*df['Units_Sold'])

In [96]:
df

Unnamed: 0,Product,Price,Units_Sold,Total_Revenue
0,Laptop,1200,100,120000
1,Mobile,800,150,120000
2,Tablet,300,80,24000
3,Monitor,400,60,24000


In [97]:
#remove column using drop

In [98]:
df=df.drop(columns=['Units_Sold'])

In [99]:
df

Unnamed: 0,Product,Price,Total_Revenue
0,Laptop,1200,120000
1,Mobile,800,120000
2,Tablet,300,24000
3,Monitor,400,24000


### Renaming

In [101]:
#Your sales data might have inconsistent column names that you want to standardize.

In [102]:
#rename column

In [103]:
df=df.rename(columns={'Price':'Unit_Price','Total_Revenue':'Revenue'})

In [104]:
df

Unnamed: 0,Product,Unit_Price,Revenue
0,Laptop,1200,120000
1,Mobile,800,120000
2,Tablet,300,24000
3,Monitor,400,24000


In [105]:
#renaming index

In [106]:
df=df.rename(index={0:'A',1:'B',2:'C',3:'D'})

In [107]:
df

Unnamed: 0,Product,Unit_Price,Revenue
A,Laptop,1200,120000
B,Mobile,800,120000
C,Tablet,300,24000
D,Monitor,400,24000


### Sorting Data

In [133]:
#Sorting is often used in data analysis to rank data by importance,
#such as sorting by revenue to find your top products.

In [142]:
#sort column values 

In [150]:
df=df.sort_values(by='Revenue',ascending=True)

In [152]:
df

Unnamed: 0,Product,Unit_Price,Revenue
C,Tablet,300,24000
D,Monitor,400,24000
A,Laptop,1200,120000
B,Mobile,800,120000


In [148]:
#sort- index

In [154]:
df=df.sort_index()

In [156]:
df

Unnamed: 0,Product,Unit_Price,Revenue
A,Laptop,1200,120000
B,Mobile,800,120000
C,Tablet,300,24000
D,Monitor,400,24000


###   Filtering Data

In [159]:
#Filtering helps you focus on specific subsets of the data, like high-revenue products or low-priced items.

### based on conditions

In [182]:
high_revenue_df=df[df['Revenue']>30000]

In [184]:
high_revenue_df

Unnamed: 0,Product,Unit_Price,Revenue
A,Laptop,1200,120000
B,Mobile,800,120000


### on multiple conditions 

In [187]:
filtered_df=df[(df['Unit_Price']<500)&(df['Revenue']>20000)]

In [189]:
filtered_df

Unnamed: 0,Product,Unit_Price,Revenue
C,Tablet,300,24000
D,Monitor,400,24000


# Real-Time Data Analysis Example


### Let’s say you work as a data analyst for an e-commerce company and need to evaluate which products are generating the most revenue, remove irrelevant columns, and identify underperforming items.

  

### Add new insights by creating additional columns like Total_Revenue.


### Clean the data by dropping unnecessary columns like Units_Sold.


### Standardize your column names for consistency using rename().
  

### Sort data to analyze the highest revenue products.


### Filter to focus on relevant products (e.g., those above certain revenue thresholds).   