**Creation of simulated dataset of climate variables, based on Phoenix Park, Dublin weather station**   
As explained in the README file for this repository, this notebook aims to generate simulated data, based on analysis of government provided actual readings, for rainfall, atmospheric pressure, and minimum and maximum temperatures on a daily basis from the Phoenix Park weather station.  

Climate predictions are of highly significant importance with concerns about human induced global warming.  

We start by getting the actual historical data that has been downloaded from the government website into an Excel spreadsheet, and then exported to a csv file. The non-data header rows, plus columns for variables that are not being examined have been removed from the file, along with incomplete rows of data.  

The pandas and matplotlib packages are imported for processing the input data.  
The csv file is first loaded into a Pandas dataframe, and we get the data type for each column :  
(ref https://www.shanelynn.ie/python-pandas-read_csv-load-data-from-csv-files/)  
(ref https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html)


In [14]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv('Phoenix Park weather data.csv')
print (df.dtypes)
# df = pd.read_csv('Phoenix Park weather data.csv')

date      object
maxtp    float64
mintp    float64
rain     float64
cbl       object
dtype: object


'cbl' is the 'convective boundary layer' - the layer of the earth's atmosphere most affected by the heating effect of the sun on the earth's surface. So the 'cbl' column here, which is for a ground based weather station, is the atmospheric pressure at ground level. 
(ref https://en.wikipedia.org/wiki/Convective_planetary_boundary_layer)

Display the first few rows from the spreadsheet and count the number of rows :
(ref https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/)  
(ref https://stackoverflow.com/questions/15943769/how-do-i-get-the-row-count-of-a-pandas-dataframe)

In [2]:
print(df.iloc[0:12],0)
print(' ')
print('Number of rows ' , len(df.index))

         date  maxtp  mintp  rain     cbl
0   01-Jan-06    8.2    2.9   0.0  1002.8
1   02-Jan-06   10.6    0.7   0.0  1016.5
2   03-Jan-06   10.9    0.0   0.2  1020.5
3   04-Jan-06    7.2   -1.6   0.0  1022.5
4   05-Jan-06    6.4    4.1   0.0  1014.6
5   06-Jan-06    5.1    1.1   0.0  1015.6
6   07-Jan-06    4.7    1.7   0.0  1017.9
7   08-Jan-06    5.9   -2.7   1.3  1019.2
8   09-Jan-06   10.5   -4.2   0.0  1014.5
9   10-Jan-06   12.9    8.0   9.6  1002.3
10  11-Jan-06    9.0    4.1   0.3  1007.6
11  12-Jan-06   12.5    4.6   0.0  1011.6 0
 
Number of rows  4743


There are many data rows, so make sure programatically that none of them contain null values - as this would affect the statistical values such as 'mean' that will be used to create the simulated data.
(ref https://stackoverflow.com/questions/43424199/display-rows-with-one-or-more-nan-values-in-pandas-dataframe)

In [3]:
df1 = df[df.isna().any(axis=1)]
print (len(df1))

0


There aren't any null values, so we needn't worry about this.
Set the dataframe index to the date column - this is unique

In [4]:
#print(df.iloc[0])
for col in df.columns: 
    print(col)

date
maxtp
mintp
rain
cbl


In [7]:
df.set_index('date')
print(df.iloc[0:10])

        date  maxtp  mintp  rain     cbl
0  01-Jan-06    8.2    2.9   0.0  1002.8
1  02-Jan-06   10.6    0.7   0.0  1016.5
2  03-Jan-06   10.9    0.0   0.2  1020.5
3  04-Jan-06    7.2   -1.6   0.0  1022.5
4  05-Jan-06    6.4    4.1   0.0  1014.6
5  06-Jan-06    5.1    1.1   0.0  1015.6
6  07-Jan-06    4.7    1.7   0.0  1017.9
7  08-Jan-06    5.9   -2.7   1.3  1019.2
8  09-Jan-06   10.5   -4.2   0.0  1014.5
9  10-Jan-06   12.9    8.0   9.6  1002.3


Extract the data for each month :
(ref https://stackoverflow.com/questions/27975069/how-to-filter-rows-containing-a-string-pattern-from-a-pandas-dataframe)

In [13]:
months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
i=0
for mon in months:
    if i==0:
      mon=df[df['date'].str.contains(mon)]
      print(mon.max())
#      print(mon['cbl'])
#    print(mon)
#   plt.hist(mon['cbl'])
#    plt.xlabel('Minimum Temperature')
#    plt.title(months[i])
#    plt.show()
    i+=1

pr=[1024,998,1016]    
    
#dm=[]
#dm[0] = df[df['date'].str.contains(mon[0])]

#print(dfjan)
#plt.hist(dm[0]['mintp'])
#plt.show()
#plt.hist(np.random.gamma(0.3,5,500))

date     31-Jan-19
maxtp         15.1
mintp         12.6
rain          23.6
cbl          999.9
dtype: object


Analysis of Irish climate records by a team at Maynooth university on behalf of the Environmental Protection Agency demonstrate that a Gamma probability distribution may be used to model winter and summer precipitation levels, and a Normal distribution for summer temperatures. 
http://www.epa.ie/pubs/reports/research/climate/Reserach_Report_277.pdf