# Cleaning Rainfall Data

This notebook cleans the dataset of rainfall

In [None]:
import pandas as pd

df = pd.read_csv("Rainfall_Data.csv")

In [9]:
df.head()

Unnamed: 0,City,Date,Year,Month,Rainfall_mm
0,Kampala,2005-01-01,2005,1,41.632484
1,Addis Ababa,2005-01-01,2005,1,44.480564
2,Kampala,2005-02-01,2005,2,28.327267
3,Addis Ababa,2005-02-01,2005,2,19.762764
4,Kampala,2005-03-01,2005,3,154.92038


In [14]:
df.columns

Index(['City', 'Date', 'Year', 'Month', 'Rainfall_mm'], dtype='object')

I'm going to remove the columns, **'Date and month'** as it's just repetitions of the column **'Date'** and then re-order the columns so they can appear as "**`Year, City, Rainfall_mm`**"

In [None]:
# Dropping the 'Date and month' column as it is redundant
df = df.drop(["Date", "Month"], axis=1)

# Group by City and Year, then sum the Rainfall_mm
yearly_rainfall = df.groupby(["City", "Year"])["Rainfall_mm"].sum().reset_index()


# Sort by Year First, then by City
yearly_rainfall = yearly_rainfall.sort_values(by=["Year", "City"]).reset_index(
    drop=True
)

# Display the result
yearly_rainfall.head()

Unnamed: 0,City,Year,Rainfall_mm
0,Addis Ababa,2005,1174.917609
1,Kampala,2005,1102.198519
2,Addis Ababa,2006,1322.667343
3,Kampala,2006,1305.71652
4,Addis Ababa,2007,1261.028727


In [19]:
# Reordering the columns to have 'Year', 'City', 'Rainfall_mm'
yearly_rainfall = yearly_rainfall[["Year", "City", "Rainfall_mm"]]

yearly_rainfall.head()

Unnamed: 0,Year,City,Rainfall_mm
0,2005,Addis Ababa,1174.917609
1,2005,Kampala,1102.198519
2,2006,Addis Ababa,1322.667343
3,2006,Kampala,1305.71652
4,2007,Addis Ababa,1261.028727


In [20]:
# Export the cleaned data to a new CSV file
yearly_rainfall.to_csv("Cleaned_Rainfall_Data.csv", index=False)