# Data Aggregation NoteBook

Climatic data was extracted from CliFlo Niwa, Niwa's Climate Database.

Statistic extracted from each station includes: Total rainfall, Mean air tempature, Mean vapor pressure, Mean wind speed, Days of wind gusts >= 24 knots and Standard deviation of daily mean tempature with the respected statistic code from CliFlo Niwa being: 00,02,16,33,60,61

Stations were chosen based on EMI's zones with respected regions which needs to be aggregated to represent the EMI zones which are Upper North Island, Central North island, Lower North island, Upper South island and Lower South island. 

The stats codes are:

Code	Description	Units
0	Total Rainfall	Mm
2	Mean Air Temperature	Celsius
16	Mean Vapour Pressure	Hpa
33	Mean Wind Speed	M/Sec
60	Days Of Wind Gust >= 24 Knots	Day
61	Standard Deviation Of Daily Mean Temperature.	Celsius




### Import libraries

In [1]:
import pandas as pd

In [2]:
def CliFloReshape(filepath):
    '''
    CliFloReshape transforms the data from CliFlo in order to prepare for the timeseries analysis

        args: 
         filepath: cliflo proccessed tab seperated text file (remove all meta data attached and only include relevant data)

        Returns:
         a processed dataframe such the the data is now ready for time series analysis

    '''

    data = pd.read_csv(filepath, sep = '\t') #Read in file path
    
    data = data.rename(columns={'Year ': 'Year'}) #Rename 'Year ' to 'Year' as per formating for date time

    data = pd.melt(data, id_vars=['Station','Year', 'Stats_Code' ], 
                   value_vars=['Jan', 'Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'], var_name="Month") #Convert months from wide to long
    
    data = pd.pivot(data,index=['Station', 'Year', 'Month'], 
                    columns='Stats_Code', values='value').reset_index() #Convert Stats Code values from long to wide
    
    # Renaming
    nameMapping = {0: 'Total Rainfall Mm', 2: 'Mean Air Temperature in Celsius', 
                   16: 'Mean Vapour Pressure Hpa', 33: 'Mean Wind Speed M/Sec',
                   60: 'Days Of Wind Gust >= 24 Knots in Day',
                   61: 'Standard Deviation Of Daily Mean Temperature in Celsius'}
    
    monthMapping = {'Apr':4, 'Aug':8, 'Dec':12, 'Feb' : 2, 'Jan' : 1, 'Jul' : 7, 'Jun' : 6, 'Mar' : 3, 'May' : 5,
       'Nov' : 11, 'Oct' : 10, 'Sep' : 9}
    
    data['Month'] = data['Month'].map(monthMapping)

    data = data.rename(columns=nameMapping)

    data['Day'] = 1 # Adding day var as per formating for date time

    
    # Date formatting

    datetimes = pd.to_datetime(data[['Year', 'Month', 'Day']]) # create series of dates corresponding to respective months and year

    data = pd.concat([datetimes, data[['Total Rainfall Mm',
       'Mean Air Temperature in Celsius', 'Mean Vapour Pressure Hpa',
       'Mean Wind Speed M/Sec', 'Days Of Wind Gust >= 24 Knots in Day',
       'Standard Deviation Of Daily Mean Temperature in Celsius']]], axis=1) # Replace old data with the dates and the following measurements
    
    data = data.rename(columns={0: 'Date'}) # Renaming date (currently named as 0) column to date

    return data



In [3]:
def Ouput_as_csv (filelist, path = ''):
    '''
    To download and save the processed csv files

        args:
            filelist: a list of files including extension (should be in tab seperated as a txt file)
            path: absolute path if files not in same directory

    '''

    for i in filelist:
        data = CliFloReshape(path+i)
        data.to_csv(f'New{i[:-4]}.csv', index=False) 




In [4]:
filelist = ['UpperNorthRegions.txt', 'SouthSouthRegions.txt', 'UpperSouthARegions.txt','SouthNorthRegions.txt', 'CentralNorthRegions.txt']
path = '../RegionsCSVfiles/'

In [5]:
Ouput_as_csv(filelist, path)