# Plotting the data with Plotly

## Outline

### 1. Importing the data
### 2. Working with Plotly
* Individual Scatter Plots
* Yearly Average Scatter Plots

### 3. Using Plotly for All Locations

## 1. Importing the data
First import our packages and then import the `Locations.csv` file. This will be used for plotting our map and forecasting the data. 

In [3]:
import pandas as pd
import plotly.express as px

In [73]:
locations = pd.read_csv('Locations.csv')
locations.tail()

Unnamed: 0.1,Unnamed: 0,lat,lon,Place,p,d,q,P,D,Q,filepath
22,22,46.2199,-119.0837,"Kennewick, WA",5,0,8,0,1,1,NASA/POWER_Point_Monthly_Timeseries_1981_2020_...
23,23,46.1704,-123.7804,"Navy Heights, OR",4,0,5,0,1,0,NASA/POWER_Point_Monthly_Timeseries_1981_2020_...
24,24,46.1514,-122.8191,"Kelso, WA",0,0,6,0,1,0,NASA/POWER_Point_Monthly_Timeseries_1981_2020_...
25,25,46.0562,-118.3476,"Walla Walla, WA",6,0,7,2,0,2,NASA/POWER_Point_Monthly_Timeseries_1981_2020_...
26,26,45.4969,-122.5938,"Portland, OR",6,0,8,3,0,3,NASA/POWER_Point_Monthly_Timeseries_1981_2020_...


Map of all the locations being used in our forecasting model. It provides the coordinates and nearest town/city. 

In [125]:
fig = px.scatter_mapbox(locations, lat="lat", lon="lon", hover_name = "Place", 
                        color_discrete_sequence=["darkviolet"], zoom=5.5, height=400, width = 600)
# styles: "open-street-map" or "carto-positron" are the best options 
fig.update_layout(mapbox_style="carto-positron")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

fig.write_html("Map.html")

## 2. Working with Plotly

### Individual Scatter Plots

Plotly only takes data in a certain format, therefore, the best way we've been able to plot the predictions from SARIMAX is by converting the data into a new data frame and calculating the year-month-day associated with the predicted data. Here we will load in the py file `forecast_single.py` to predict the data and the py file `arima_dataframe.py` to obtain the year-month-day.

In [12]:
# %load forecast_single.py
import numpy as np
import geopandas as gp
import pandas as pd
import datetime as dt

from statsmodels.tsa.statespace.sarimax import SARIMAX

# import .py scripts from repo
from json_to_csv import geojson_to_csv
from ts_train_test_split import uni_selection


def forecast(locations, sample):
    '''use pd.read_csv('Locations.csv') to access required data.
    Sample must be integer index of desired location.
    Fits model with input data and forecasts to 2035.
    Returns a df with real and forecasted values (1984-2035).
    results[:443] = real values, results[443:] = forecasted.'''

    forecasted = []
    geojson = locations['filepath'][sample]
    df = geojson_to_csv(geojson)
    X = uni_selection(df)
    X.index = pd.DatetimeIndex(X.index.values,
                               freq=X.index.inferred_freq)
    (p, d, q) = (locations['p'][sample],
                 locations['d'][sample], locations['q'][sample])
    (P, D, Q, s) = (locations['P'][sample],
                    locations['D'][sample], locations['Q'][sample], 12)
    model = SARIMAX(X, order=(p, d, q), seasonal_order=(P, D, Q, s))
    fit_model = model.fit(maxiter=50, method='powell', disp=False)
    forecast = fit_model.get_prediction(start='2021-01-01', end='2035-12-01')
    ci = forecast.conf_int()

    forecasted.append(forecast.predicted_mean)
    forecasts = pd.DataFrame(forecasted).T
    forecasts = forecasts.rename(columns={'predicted_mean': 'value'})
    real = pd.DataFrame(X)
    real = real.rename(columns={'ALLSKY_KT': 'value'})
    results = pd.concat([real, forecasts])
    results = results.rename(columns={'value': 'solar'})
    return results


In [13]:
df = forecast(locations, (0))
df


Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.


Non-invertible starting MA parameters found. Using zeros as starting parameters.



Unnamed: 0,solar
1984-01-01,0.440000
1984-02-01,0.410000
1984-03-01,0.510000
1984-04-01,0.480000
1984-05-01,0.450000
...,...
2035-08-01,0.543632
2035-09-01,0.441321
2035-10-01,0.402070
2035-11-01,0.347117


In [17]:
# %load arima_dataframe.py
import pandas as pd
import datetime

# pred_sol was a list that was the prediction from ARIMA
def arima_results(df):
    # Associating a year and a month with the ARIMA predictions
    df = df.rename(columns = {"solar": "Solar Ratio"})
    years = []
    year = 1984
    month = 1
    day = 1
    # Only works for predictions from 0 to 624
    for x in range(624):
        if month == 13:
            year += 1
            month = 1
        years.append(datetime.datetime(year, month, day))
        month += 1
    # Adding the time column 'Year' to the ARIMA dataframe
    df['Year'] = years
    return df


In [18]:
df_predicted = arima_results(df)
df_predicted

Unnamed: 0,Solar Ratio,Year
1984-01-01,0.440000,1984-01-01
1984-02-01,0.410000,1984-02-01
1984-03-01,0.510000,1984-03-01
1984-04-01,0.480000,1984-04-01
1984-05-01,0.450000,1984-05-01
...,...,...
2035-08-01,0.543632,2035-08-01
2035-09-01,0.441321,2035-09-01
2035-10-01,0.402070,2035-10-01
2035-11-01,0.347117,2035-11-01


Once the data frame is in the format show above, it's easy to plot the data with Plotly, as shown below for the first location Abbotsford, Canada.

In [19]:
fig = px.scatter(df_predicted, x = 'Year', y = 'Solar Ratio', trendline="ols",
                 trendline_scope="overall", title="Abbotsford, Canada 49.0362\N{DEGREE SIGN}N 122.3247\N{DEGREE SIGN}W")
fig.show()

### Yearly Average Scatter Plots

With the py file `arima_yearly_averages.py`, we can represent the results from SARIMAX in terms of yearly averages.

In [21]:
# %load arima_yearly_averages.py
import pandas as pd
import numpy as np


def arima_averages(df):
    allSolar = df['Solar Ratio'].tolist()

    avgSolar = []
    years = []

    month = 0
    sum = 0
    year = 1984

    for x in allSolar:
        sum += x
        month += 1
        if month == 12:
            avgSolar.append(sum/12)
            years.append(year)
            year += 1
            month = 0
            sum = 0

    avg = pd.DataFrame()
    avg['Solar Ratio'] = avgSolar
    avg['Year'] = years
    return avg


In [22]:
yearly_avg = arima_averages(df_predicted)
yearly_avg

Unnamed: 0,Solar Ratio,Year
0,0.47,1984
1,0.513333,1985
2,0.485833,1986
3,0.5175,1987
4,0.48,1988
5,0.4925,1989
6,0.5,1990
7,0.5175,1991
8,0.499167,1992
9,0.505,1993


Plot of the yearly averages for the first location: Abbotsford, Canada.

In [24]:
fig = px.scatter(yearly_avg, x = 'Year', y = 'Solar Ratio', trendline="ols",
                 trendline_scope="overall", title="Abbotsford, Canada 49.0362\N{DEGREE SIGN}N 122.3247\N{DEGREE SIGN}W")
fig.show()
#fig.write_html("file.html") this will save it as an html file

### 3. Using Plotly for All Locations
Here we will take the same concepts from before and create data frames for all of the locations with year-month-day values and yearly averages. Using the following for loop we can create the new data frames `all_pred` and `allAverages`. 

In [74]:
all_pred = pd.DataFrame()
allAverage = pd.DataFrame()

for i in range(27):
    df = forecast(locations, (i))
    df_predicted = arima_results(df)
    df_average = arima_averages(df_predicted)
    
    df_predicted['Lon'] = locations.loc[i]['lat']
    df_predicted['Lat'] = locations.loc[i]['lon']
    df_predicted['Place'] = locations.loc[i]['Place']
    
    df_average['Lon'] = locations.loc[i]['lat']
    df_average['Lat'] = locations.loc[i]['lon']
    df_average['Place'] = locations.loc[i]['Place']
    
    all_pred = all_pred.append(df_predicted)
    allAverage = allAverage.append(df_average)
    


Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.


Non-invertible starting MA parameters found. Using zeros as starting parameters.


Non-stationary starting seasonal autoregressive Using zeros as starting parameters.



In [90]:
all_pred

Unnamed: 0,Solar Ratio,Year,Lon,Lat,Place
1984-01-01,0.440000,1984-01-01,49.0362,-122.3247,"Abbotsford, Canada"
1984-02-01,0.410000,1984-02-01,49.0362,-122.3247,"Abbotsford, Canada"
1984-03-01,0.510000,1984-03-01,49.0362,-122.3247,"Abbotsford, Canada"
1984-04-01,0.480000,1984-04-01,49.0362,-122.3247,"Abbotsford, Canada"
1984-05-01,0.450000,1984-05-01,49.0362,-122.3247,"Abbotsford, Canada"
...,...,...,...,...,...
2035-08-01,0.580615,2035-08-01,45.4969,-122.5938,"Portland, OR"
2035-09-01,0.522725,2035-09-01,45.4969,-122.5938,"Portland, OR"
2035-10-01,0.484671,2035-10-01,45.4969,-122.5938,"Portland, OR"
2035-11-01,0.409469,2035-11-01,45.4969,-122.5938,"Portland, OR"


In [77]:
allAverage

Unnamed: 0,Solar Ratio,Year,Lon,Lat,Place
0,0.470000,1984,49.0362,-122.3247,"Abbotsford, Canada"
1,0.513333,1985,49.0362,-122.3247,"Abbotsford, Canada"
2,0.485833,1986,49.0362,-122.3247,"Abbotsford, Canada"
3,0.517500,1987,49.0362,-122.3247,"Abbotsford, Canada"
4,0.480000,1988,49.0362,-122.3247,"Abbotsford, Canada"
...,...,...,...,...,...
47,0.475577,2031,45.4969,-122.5938,"Portland, OR"
48,0.475050,2032,45.4969,-122.5938,"Portland, OR"
49,0.474960,2033,45.4969,-122.5938,"Portland, OR"
50,0.474458,2034,45.4969,-122.5938,"Portland, OR"


For the monthly data, we can make individual line or scatter plots. We can also express the data all together in one graph. 

In [107]:
fig = px.line(all_pred, x = "Year", y = "Solar Ratio", color = "Place", line_group = "Place", hover_name = "Place",
              line_shape="spline", render_mode="svg")
fig.show()

Here we can compare the monthly data to the yearly averages. 

In [108]:
fig = px.line(allAverage, x = "Year", y = "Solar Ratio", color = "Place", line_group = "Place", hover_name = "Place",
              line_shape="spline", render_mode="svg")
fig.show()

With the number of locations, it can be hard to interpret which line corresponds to which location. Below we've provided an animated scatter plot and an animated bar graph to express the yearly averages more clearly. These graphs can also be exported as html files. The advantage of these graphs is that they are interactive. You can hit the play button to watch the animation from 1984 to 2035 or you can stop at each year and analyze the data by hovering over each location.

In [100]:
fig = px.scatter(allAverage, x = "Year", y = "Solar Ratio", animation_frame = "Year", animation_group = "Place",
                 color = "Place", hover_name = "Place", range_x = [1984,2035], range_y = [0.4,0.6])
fig.show()
#fig.write_html("Yearly Averages.html")  #this will save it as an html file

In [99]:
fig = px.bar(allAverage, x = "Place", y = "Solar Ratio", color="Place",
              animation_frame = "Year", animation_group = "Place", range_y=[0.4,0.6])
fig.show()

Currently unable to get the map to work with Plotly.

In [124]:
fig = px.scatter_mapbox(allAverage, lat = "Lat", lon = "Lon", hover_name = "Place", color = "Place", size = "Solar Ratio",
                       animation_frame = "Year", animation_group = "Place", zoom = 10, title = "All Averages")
                        #color_discrete_sequence=["darkviolet"], zoom=5.5, height=400, width = 600)
# styles: "open-street-map" or "carto-positron" are the best options 
fig.update_layout(mapbox_style="carto-positron")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()