# Interactive plots
This notebook contains examples of various plots that could be used for investigating visualising the dataset when broken down by location. See `utils.py` for implementation details.

In [1]:
# Import packages and prepare data
import pandas as pd
import utils

raw = pd.read_csv('data/raw.csv')
raw = utils.fix_dates(raw)

numerical_features = [f for f in raw.columns if f not in ['Datetime', 'Season', 'Latitude', 'Longitude', 'Location']]
raw_clean = utils.fix_units(raw)

raw_clean.columns = ['Datetime', 'Location', 'Latitude (deg)', 'Longitude (deg)', 'Altitude (m)', 'Season',
       'Humidity (%)', 'AmbientTemp (deg C)', 'WindSpeed (km/h)', 'Visibility (km)', 'Pressure (mbar)',
       'CloudCeiling (km)', 'Power (W)']

numerical_features = [f for f in raw_clean.columns if f not in ['Datetime', 'Season', 'Latitude (deg)', 'Longitude (deg)', 'Altitude (m)', 'Location']]

### Box and Violin Plots

The following plots can be used to compare the distributions of various numerical features at each location, individually or side-by-side and with either box or violin plots. We see that the power output often exhibits approximately bimodal distributions with the locations of the central peaks varying from site to site and with peak frequency occurring at high power in some cases (e.g. USAFA) and low power in others (e.g. Malmstrom). It can be seen from the violin/box plots that all features vary substantially with location. 

In [2]:
utils.compare_box_violins(raw_clean, numerical_features)

interactive(children=(Dropdown(description='Feature', index=6, options=('Humidity (%)', 'AmbientTemp (deg C)',…

### Comparison of distributions using histograms
These plots provide comparative information regarding the distributions of particular features at different locations.

In [3]:
utils.compare_histograms(raw_clean, numerical_features, bins=16)

interactive(children=(Dropdown(description='Location 1', index=10, options=('Camp Murray', 'Grissom', 'Hill We…

### Scatter plots
Variation between features can also be compared using scatter plots. As the wind speed has not been recorded on a continuous interval, an option has been added to add some noise to better visualise the relationship.

In [4]:
utils.scatterplot(raw_clean, numerical_features)

interactive(children=(Dropdown(description='Location', index=11, options=('Camp Murray', 'Grissom', 'Hill Webe…

### Map view of PV sites with average output
The variation of average power with location can be explored using the graphic below. Clicking on a given location produces a pop-up showing the average variation by month or hour. We see that there are a range of differing behaviours depending on the particular site. 

In [5]:
data = raw_clean.copy()
time_increment = 'month_of_year' # Can also aggregate by 'hour_of_day'
utils.create_map(data, time_increment=time_increment)


  grouped_means[index[0]][index[1]] = row[0]
  grouped_stds[index[0]][index[1]] = row[1]


<Figure size 640x480 with 0 Axes>