You are provided with two datasets containing information in a CSV file. The datasets have the 
following issues: 
1) Missing values / Empty cells 
2) Inconsistent date formats 
3) Duplicate rows. 
4) Wrong data 
5) Unnecessary columns that are not relevant to the analysis. 
Write pandas scripts to clean these datasets by addressing each of the issues mentioned above

In [15]:
import pandas as pd

df = pd.read_csv('Mine.csv')
df.head()

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
0,60,2023/10/01',110.0,130.0,409.1
1,60,2023/10/02',117.0,145.0,479.0
2,60,2023/10/03',103.0,135.0,340.3
3,45,2023/10/04',109.0,175.0,282.4
4,45,2023/10/05',117.0,150.0,405.1


In [16]:
df.dtypes

Duration      int64
Date         object
Pulse       float64
Maxpulse    float64
Calories    float64
dtype: object

In [17]:
#Check for missing values
df.isnull().sum()

Duration    0
Date        2
Pulse       1
Maxpulse    2
Calories    4
dtype: int64

In [18]:
#Filling in missing Values

df.dropna(subset = ['Date'], inplace = True)

df['Pulse'] = df['Pulse'].fillna(df['Pulse'].median())
df['Maxpulse'] = df['Maxpulse'].fillna(df['Maxpulse'].median())
df['Calories'] = df['Calories'].fillna(df['Calories'].median())

df.isnull().sum()

Duration    0
Date        0
Pulse       0
Maxpulse    0
Calories    0
dtype: int64

In [19]:
#incosistent date formats corrected

df['Date'] = pd.to_datetime(df['Date'], format='mixed')

#Checking for Duplicate Rows
df.duplicated().sum()

np.int64(0)

In [20]:
df.dtypes

Duration             int64
Date        datetime64[ns]
Pulse              float64
Maxpulse           float64
Calories           float64
dtype: object

In [21]:
#Wrong data 
df.loc[7, 'Duration'] = 40

print('Edited mine data')

for y in df.index:
    if df.loc[y, "Pulse"] > df.loc[y, "Maxpulse"]:
        df.drop(y, inplace = True)

print(df)



Edited mine data
    Duration       Date  Pulse  Maxpulse  Calories
0         60 2023-10-01  110.0     130.0     409.1
1         60 2023-10-02  117.0     145.0     479.0
2         60 2023-10-03  103.0     135.0     340.3
3         45 2023-10-04  109.0     175.0     282.4
4         45 2023-10-05  117.0     150.0     405.1
5         60 2023-10-06  103.0     125.0     300.0
6         60 2023-10-07  110.0     135.0     374.0
7         40 2023-10-08  114.0     133.0     282.4
8         60 2023-10-09  112.0     126.0     193.8
9         30 2023-10-10  102.0     147.0     234.8
10        60 2023-10-11  100.0     129.0     375.3
11        60 2023-10-12  109.0     131.0     345.6
12        60 2023-10-13  103.0     136.0     239.2
13        60 2023-10-15  120.0     125.0     240.8
20        60 2023-10-21  100.0     106.0     280.0
21        60 2023-10-22  103.0     107.0     282.4
30        60 2023-10-31   94.0     126.0     282.4


In [22]:
print('Cleaned dataset')
df.to_csv('Updatedmine.csv')
df.set_index('Duration', inplace = True)

Cleaned dataset
