# Charles' Project 1: Data Analysis of Singapore Rainfall

# Background


 This is a notebook to track Charles' progress for the Project 1 objectives - to display proficiencies in the usage of skills taught in Week 1 of General Assembly.


### Sections and skills used:

- Data Import and Cleaning
- Forming of Hypothesis Question
- Exploratory Data Analysis
- Data Visualization
- Conclusions and Recommendations


## Data Descriptions


 Monthly number of rain days from 1982 to 2022. A day is considered to have “rained” if the total rainfall for that day is 0.2mm or more.
* [1 `rainfall-monthly-number-of-rain-days.csv`](./data/rainfall-monthly-number-of-rain-days.csv)


Monthly total rain recorded in mm(millimeters) from 1982 to 2022
* [2 `rainfall-monthly-total.csv`](./data/rainfall-monthly-total.csv)


Highest Daily Rainfall In The Month
* [3 `relative-humidity-monthly-mean.csv`](https://data.gov.sg/dataset/relative-humidity-monthly-mean)


Highest Daily Rainfall In The Month
* [4 `rainfall-monthly-highest-daily-total.csv`](https://data.gov.sg/dataset/rainfall-monthly-maximum-daily-total)


Hourly Wet bulb temperature, 3 columns - wbt_date, wbt_time, wet_bulb_temperature
* [5 `wet-bulb-temperature-hourly.csv`](https://data.gov.sg/dataset/wet-bulb-temperature-hourly)


Monthly mean sunshine hours, mean_sunshine_hrs
* [6 `sunshine-duration-monthly-mean-daily-duration.csv`](https://data.gov.sg/dataset/sunshine-duration-monthly-mean-daily-duration): 


The monthly mean of the minimum temperature each day, temp_mean_daily_min in Degree Celcius
* [7 `surface-air-temperature-monthly-mean.csv`](https://data.gov.sg/dataset/surface-air-temperature-mean-daily-minimum)

# Data Import and Cleaning

## Import packages


In [22]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import os
print(os.getcwd())


c:\Users\chaaa\Documents\GitHub\DSI-SG-37\project_1\code


In [27]:
#1 Number of Rain Days Each Month from 1982 to 2022
one = pd.read_csv(r"..\data\rainfall-monthly-number-of-rain-days.csv")
two = pd.read_csv(r"..\data\rainfall-monthly-total.csv")
three = pd.read_csv(r"..\data\relative-humidity-monthly-mean.csv")
four = pd.read_csv(r"..\data\rainfall-monthly-highest-daily-total.csv")
five = pd.read_csv(r"..\data\wet-bulb-temperature-hourly.csv")
six = pd.read_csv(r"..\data\sunshine-duration-monthly-mean-daily-duration.csv")
seven = pd.read_csv(r"..\data\surface-air-temperature-monthly-mean.csv")



## Cleaning Overview

So first, cleaning involves:

1. Display the data: print the first 5 rows of each dataframe to your Jupyter notebook.
2. Check for missing values and datatype.
3. Check for any obvious issues with the observations.
4. Fix any errors you identified in steps 2-3.
6. Fix any incorrect data types found in step 5.
    - Fix any individual values preventing other columns from being the appropriate type.
    - If the month column data is better analyzed as month and year, create new columns for the same
7. Rename Columns.
    - Column names should be all lowercase.
    - Column names should not contain spaces (underscores will suffice--this allows for using the `df.column_name` method to access columns in addition to `df['column_name']`).
    - Column names should be unique and informative.
8. Drop unnecessary rows (if needed).
9. Merge dataframes that can be merged.
    - Since different climate metrics are in month format, you can merge them into one single dataframe for easier analysis
10. Perform any additional cleaning that you feel is necessary.
11. Save your cleaned and merged dataframes as csv files.

### 1. Display the data: print the first 5 rows of each dataframe to your Jupyter notebook.

In [51]:
print(one.head(5))
print(two.head(5))
print(three.head(5))
print(four.head(5))
print(five.head(5))
print(six.head(5))
print(seven.head(5))

     month  no_of_rainy_days
0  1982-01                10
1  1982-02                 5
2  1982-03                11
3  1982-04                14
4  1982-05                10
     month  total_rainfall
0  1982-01           107.1
1  1982-02            27.8
2  1982-03           160.8
3  1982-04           157.0
4  1982-05           102.2
     month  mean_rh
0  1982-01     81.2
1  1982-02     79.5
2  1982-03     82.3
3  1982-04     85.9
4  1982-05     83.2
     month  maximum_rainfall_in_a_day
0  1982-01                       36.5
1  1982-02                        9.4
2  1982-03                       61.7
3  1982-04                       45.1
4  1982-05                       33.0
    wbt_date  wbt_time  wet_bulb_temperature
0 1982-01-01         1                  24.7
1 1982-01-01         2                  24.5
2 1982-01-01         3                  24.3
3 1982-01-01         4                  24.2
4 1982-01-01         5                  24.2
     month  mean_sunshine_hrs
0  1982-01      

### 2. Check for missing values and datatype.

In [33]:
dataframes = {
    'one': one,
    'two': two,
    'three': three,
    'four': four,
    'five': five,
    'six': six,
    'seven': seven
}


for name, df in dataframes.items():
    print(f"Dataframe: {name}")
    print(f"Missing values:\n{df.isnull().sum()}")
    print(f"Dtypes:\n{df.dtypes}")
    print("-" * 50 )

Dataframe: one
Missing values:
month               0
no_of_rainy_days    0
dtype: int64
Dtypes:
month               object
no_of_rainy_days     int64
dtype: object
--------------------------------------------------
Dataframe: two
Missing values:
month             0
total_rainfall    0
dtype: int64
Dtypes:
month              object
total_rainfall    float64
dtype: object
--------------------------------------------------
Dataframe: three
Missing values:
month      0
mean_rh    0
dtype: int64
Dtypes:
month       object
mean_rh    float64
dtype: object
--------------------------------------------------
Dataframe: four
Missing values:
month                        0
maximum_rainfall_in_a_day    0
dtype: int64
Dtypes:
month                         object
maximum_rainfall_in_a_day    float64
dtype: object
--------------------------------------------------
Dataframe: five
Missing values:
wbt_date                0
wbt_time                0
wet_bulb_temperature    0
dtype: int64
Dtypes:
wbt_date

### 3 Check for any obvious issues with the observations.


It seems that there are no missing values at all in any of the datasets.

In dataframe [five], which is the wet-bulb-temperature dataset, there seems to be a problem. The [wbt_date] column appears to be the type 'object' which needs to be converted to a datetime type. 


In [48]:
print("-" * 50 )
five['wbt_date'] = pd.to_datetime(five['wbt_date'])

print(f"WBT dataset five datatypes:\n{five.dtypes}")

print("-" * 50 )
print(f"WBT dataset head \n{five['wbt_date'].head()}")


print("-" * 50 )
print('It seems that the datetime format has been successful in the wbt_date column.')

print("-" * 50 )


--------------------------------------------------
WBT dataset five datatypes:
wbt_date                datetime64[ns]
wbt_time                         int64
wet_bulb_temperature           float64
dtype: object
--------------------------------------------------
WBT dataset head 
0   1982-01-01
1   1982-01-01
2   1982-01-01
3   1982-01-01
4   1982-01-01
Name: wbt_date, dtype: datetime64[ns]
--------------------------------------------------
It seems that the datetime format has been successful in the wbt_date column.
--------------------------------------------------


### 4. Fix any errors you identified in steps 2-3.

Seems to be fixed.

### 6. Fix any incorrect data types found in step 5.
    - Fix any individual values preventing other columns from being the appropriate type.
    - If the month column data is better analyzed as month and year, create new columns for the same


In [None]:
#Creating new columns of month and year for each dataframe

# Iterate over all dataframes
for name, df in dataframes.items():
    if 'month' in df.columns:
        df['year'] = pd.to_datetime(df['month']).dt.year
        df['month_num'] = pd.to_datetime(df['month']).dt.month
    print(f"Dataframe: {name}")
    print(df.head())  # print the first few rows to verify the changes
    print("-" * 50)



#Extracting the same columns from five:
five['year'] = five['wbt_date'].dt.year
five['month_num'] = five['wbt_date'].dt.month

print(five.head())



### 7. Rename Columns.
    - Column names should be all lowercase.
    - Column names should not contain spaces (underscores will suffice--this allows for using the `df.column_name` method to access columns in addition to `df['column_name']`).
    - Column names should be unique and informative.


In [54]:
# Define the new column names
new_column_names = {'month': 'date', 'month_num': 'month', 'mean_temp':'monthly_mean_of_min_daily_temperature_celcius',''}

# Iterate over all dataframes
for name, df in dataframes.items():
    df.rename(columns=new_column_names, inplace=True)
    print(f"Dataframe: {name}")
    print(df.head())  # print the first few rows to verify the changes
    print("-" * 50)


Dataframe: one
      date  no_of_rainy_days  year  month
0  1982-01                10  1982      1
1  1982-02                 5  1982      2
2  1982-03                11  1982      3
3  1982-04                14  1982      4
4  1982-05                10  1982      5
--------------------------------------------------
Dataframe: two
      date  total_rainfall  year  month
0  1982-01           107.1  1982      1
1  1982-02            27.8  1982      2
2  1982-03           160.8  1982      3
3  1982-04           157.0  1982      4
4  1982-05           102.2  1982      5
--------------------------------------------------
Dataframe: three
      date  mean_rh  year  month
0  1982-01     81.2  1982      1
1  1982-02     79.5  1982      2
2  1982-03     82.3  1982      3
3  1982-04     85.9  1982      4
4  1982-05     83.2  1982      5
--------------------------------------------------
Dataframe: four
      date  maximum_rainfall_in_a_day  year  month
0  1982-01                       36.5  1982

### 8. Drop unnecessary rows (if needed).


### 9. Merge dataframes that can be merged.
    - Since different climate metrics are in month format, you can merge them into one single dataframe for easier analysis


### 10. Perform any additional cleaning that you feel is necessary.


### 11. Save your cleaned and merged dataframes as csv files.