You are provided with two datasets containing information in a CSV file. The datasets have the 
following issues: 
1) Missing values / Empty cells 
2) Inconsistent date formats 
3) Duplicate rows. 
4) Wrong data 
5) Unnecessary columns that are not relevant to the analysis. 
Write pandas scripts to clean these datasets by addressing each of the issues mentioned above

In [11]:
import pandas as pd

df = pd.read_csv('Sales.csv')
df

Unnamed: 0,Order ID,Customer Name,Order Date,Product,Quantity,Unit Price,Total Revenue
0,1001,John Doe,01/01/2024,Widget A,10.0,25.0,250.0
1,1002,Jane Smith,01/02/2024,Widget B,5.0,40.0,200.0
2,1003,,2024/01/03',Widget A,,25.0,
3,1004,Alice Johnson,04/01/2024,Widget C,3.0,,210.0
4,1005,Bob Brown,2024/01/05',Widget B,10.0,40.0,400.0
5,1006,John Doe,06/01/2024,Widget A,4.0,25.0,100.0
6,1001,John Doe,01/01/2024,Widget A,10.0,25.0,250.0
7,1007,Jane Smith,07/01/2024,Widget C,-6.0,70.0,-420.0


In [12]:
#Filling in missing data

df.dropna(subset = 'Customer Name', inplace = True)

df.loc[3, 'Unit Price'] = 70
df

Unnamed: 0,Order ID,Customer Name,Order Date,Product,Quantity,Unit Price,Total Revenue
0,1001,John Doe,01/01/2024,Widget A,10.0,25.0,250.0
1,1002,Jane Smith,01/02/2024,Widget B,5.0,40.0,200.0
3,1004,Alice Johnson,04/01/2024,Widget C,3.0,70.0,210.0
4,1005,Bob Brown,2024/01/05',Widget B,10.0,40.0,400.0
5,1006,John Doe,06/01/2024,Widget A,4.0,25.0,100.0
6,1001,John Doe,01/01/2024,Widget A,10.0,25.0,250.0
7,1007,Jane Smith,07/01/2024,Widget C,-6.0,70.0,-420.0


In [13]:
#Correcting date format

df['Order Date'] = pd.to_datetime(df['Order Date'], format='mixed')

#Check Duplicates
df.duplicated().sum()


1

In [14]:
#Removing duplicates
df.drop_duplicates(inplace = True)
print(df.duplicated().sum())
df

0


Unnamed: 0,Order ID,Customer Name,Order Date,Product,Quantity,Unit Price,Total Revenue
0,1001,John Doe,2024-01-01,Widget A,10.0,25.0,250.0
1,1002,Jane Smith,2024-01-02,Widget B,5.0,40.0,200.0
3,1004,Alice Johnson,2024-04-01,Widget C,3.0,70.0,210.0
4,1005,Bob Brown,2024-01-05,Widget B,10.0,40.0,400.0
5,1006,John Doe,2024-06-01,Widget A,4.0,25.0,100.0
7,1007,Jane Smith,2024-07-01,Widget C,-6.0,70.0,-420.0


In [17]:
#Correcting wrong data
df.loc[7, 'Quantity'] = 6
df.loc[7, 'Total Revenue'] = 420
print('Cleaned Dataset')
df 

Cleaned Dataset


Unnamed: 0,Order ID,Customer Name,Order Date,Product,Quantity,Unit Price,Total Revenue
0,1001,John Doe,2024-01-01,Widget A,10.0,25.0,250.0
1,1002,Jane Smith,2024-01-02,Widget B,5.0,40.0,200.0
3,1004,Alice Johnson,2024-04-01,Widget C,3.0,70.0,210.0
4,1005,Bob Brown,2024-01-05,Widget B,10.0,40.0,400.0
5,1006,John Doe,2024-06-01,Widget A,4.0,25.0,100.0
7,1007,Jane Smith,2024-07-01,Widget C,6.0,70.0,420.0
