### Windfarm Notebook

***

 The weather data that we are analysing was downloaded from the Met Eireann website. We have downloaded data from the four corners of the country to see if there is a variation in wind speed based on location. We will analyse the data and see if the wind speed for Ireland is changing over time. Is Ireland getting windier, less windier?


<div><img src="https://d3hnfqimznafg0.cloudfront.net/image-handler/ts/20180403085507/ri/850/src/images/Article_Images/ImageForArticle_703(1).jpg" alt="Domain Names", width=640, height=360"></div>


 ### Description of Project

***

### **Tasks** 

1.  You may look for your own source of historic weather information, and/or
    use the Met Eireann one (Historical Data - Met Éireann - The Irish
    Meteorological Service). Click on the download button to get a zip file that
    contains a CSV file.
1.  You may need to clean and normalize the data before doing analysis

**Questions you can ask:**

1.  How much wind power is there at a particular location? This is quite open ended, is this just the mean wind speed for
    an hour/day/month/year, or should you take into account that there are wind ranges that the windfarms can operate in. (min
    max speeds)

1.  Some analysis of what power when would be useful (time of day/year)

1.  Are the wind speeds likely to be the same in 10 years in the future? ie is there a trend in recorded wind speeds over the last
    few decades.

1.  Is there any other weather metric worth analyzing (eg rain, temp)

1.  What will the power output of the windfarms in Ireland be like next week, according to the weather forecasts? (ok that is a
    tricky one,because you would need to get, or make up, information about the size and locations of the wind farms in Ireland, 
    or find/makeup the windspeed to power output equation.

1.  Anything else you can think of?


### Import the Libraries

***

We use [pandas](https://pandas.pydata.org/) for the DataFrame data structure. It allows us to investigate CSV files, amongst other 
features. 
Pandas is a software library written for the Python programming language, which is used for data manipulation and analysis.

We use [NumPy](https://numpy.org/), which is a library for the Python programming language, which allows us to work with large 
multi-dimensional arrays and matrices. It also supplies a large collection of high-level mathematical functions to operate on these 
arrays. 
[NumPy Wikipedia](https://en.wikipedia.org/wiki/NumPy)

We use [matplotlib](https://matplotlib.org/), which is a plotting library for the Python programming language, and is usually used in 
conjunction with its numerical mathematics extension NumPy.


In [1]:
#Import the required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from python.createdb import CreateDB as createdb
from python.createtable import CreateTable as createtable
from python.stations import Stations as stations
from python.import_data import Import_Data as import_data
from python.writedb import WriteDB as write
from python.testdb import TestDB as testdb

## Import the weather information for a number of weather stations.

Below, we will look at importing the weather data from a number of different locations around the country. We will created python classes to import the datasets into the `data folder` in this repository, and also to import the data to an SQL database called `weather`. We will create tables in the database with the location name of each station that we select for downloading the data from. For example `shannon_airport`, `dublin_airport` etc.


In [2]:

# List of weather stations and the number of rows to skip in the data file
weather_stations = [
     [{"athenry" :1875}, 17],
     [{"cork_airport" : 3904}, 23],
     [{"shannon_airport" : 518}, 23],
     [{"dublin_airport" : 532}, 23],
     [{"mullingar": 875}, 17],
     [{"gurteen" : 1475}, 17]
]

# Create the database to store the downloaded data
'''
db = createdb()

#Create an instance of the CreateTable class to create the tables in the database
tables = createtable()

# Create the tables in the database
for i in weather_stations:
    for name, id in i[0].items():
       skiprows = i[1]
       tables.create_table(name, skiprows)

# Create an instatnce of the import data class to import the data into the data folder
# Folder has being created so commenting out to stop it being rerun again and again
'''

data = import_data()

#Import the data from the weather stations
for i in weather_stations:
    for name, id in i[0].items():
        skiprows = i[1]
        data.import_data(name, id, skiprows)


#Create an instance of the write class to write the data to a database 
#As above database has being created so commenting out to stop it being rerun again and again
#(Takes 4 minutes to load data to sql database)

write = write()

# Write the data to the database using the station name as the table name
for i in weather_stations:
    for name, id in i[0].items():
        skiprows = i[1]
        write.write_db(name, id , skiprows)



Engine created
Engine created
Engine created
Engine created
Engine created
Engine created


In [3]:
#Test the database by querying the data
test = testdb()

#Test the database by querying the data. Print the first 20 rows of the table
test.test_db('cork_airport')

('01-jan-1962 01:00', 8, None, 0, -1.1, 0, -1.3, -1.6, 5.3, 94.0, 1016.0, 1, 14, 1, 340, 2, 0, None, 30000.0, 999.0, 2.0)
('01-jan-1962 02:00', 8, None, 0, -1.1, 0, -1.3, -1.6, 5.3, 94.0, 1016.5, 1, 10, 1, 340, 3, 1, None, 30000.0, 20.0, 7.0)
('01-jan-1962 03:00', 8, None, 0, -1.0, 0, -1.2, -1.6, 5.3, 94.0, 1016.7, 1, 12, 1, 320, 1, 1, None, 30000.0, 999.0, 3.0)
('01-jan-1962 04:00', 8, None, 0, -1.6, 0, -1.8, -2.2, 5.1, 94.0, 1017.2, 1, 8, 1, 330, 1, 0, None, 30000.0, 999.0, 1.0)
('01-jan-1962 05:00', 8, None, 0, -2.1, 0, -2.3, -3.3, 4.8, 93.0, 1018.0, 1, 11, 1, 320, 1, 0, None, 30000.0, 999.0, 0.0)
('01-jan-1962 06:00', 8, None, 0, -2.1, 0, -2.3, -3.3, 4.9, 93.0, 1018.1, 1, 11, 1, 330, 2, 1, None, 30000.0, 999.0, 0.0)
('01-jan-1962 07:00', 8, None, 0, -2.2, 0, -2.4, -3.3, 4.8, 93.0, 1018.8, 1, 11, 1, 340, 2, 0, None, 30000.0, 999.0, 0.0)
('01-jan-1962 08:00', 8, None, 0, -1.6, 0, -1.9, -2.7, 4.9, 92.0, 1019.0, 1, 14, 1, 340, 2, 0, None, 30000.0, 999.0, 0.0)
('01-jan-1962 09:00', 8, N

### Load the datasets

***

Since it would be useful to analyse windspeed from the four corners of the country I have downloaded multiple datasets from the Met Eireann website into the `data` folder of this repository above. The datasets were not identical, in so far as they contain a different number of columns depending upon the location they were taken from. Some datasets contained 17 rows metadata, while others contained 23 rows metadata. Some stations have 15 columns data, while other stations have 21 columns data. Therefore the first 17 rows of some of the datasets, and the first 23 rows of other datasets contained `metadata`, and it was important to skip these rows when importing the dataset.  For clarity, I have demonstrated the steps taken to clean the dataset for the `dublin_airport_532.csv` file in the initial part of this notebook.  The `skiprows=23` argument was passed to the `pd.read_csv()` function to skip the metadata contained in the first number of rows. I have used the metadata in the first 23 rows to rename the columns of the dataset. This makes the datset clearer and easier to read. The `skipinitialspace=True` argument was used while importing the dataset, and the reasoning for this is explained below, when we are looking at the missing values in the dataset.

We then went on to drop the `indicator` columns in the dataset. 