# Brief

Implement a Time Series Forecasting model in Python, by using the FBProphet module.
The forecasting model should be able to predict the Sunspots (see below) by using Facebook’s Prophet Time Series Forecasting model. Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, daily seasonality.
Sunspots are temporary phenomena on the Sun's photosphere that appear as spots darker than the surrounding areas. They are regions of reduced surface temperature caused by concentrations of magnetic field flux that inhibit convection. Sunspots usually appear in pairs of opposite magnetic polarity. Their number varies according to the approximately 11-year solar cycle.
Source: https://en.wikipedia.org/wiki/Sunspot
You should test your forecasting model in three (3) distinct datasets. On Daily, Monthly Mean, and Yearly Mean sunspots.

# Imports

In [27]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from fbprophet import Prophet

import sklearn


# Data and brief EDA

'Year' - Year

'Month' - Month

'Day' - Day

'Decimal_year' - Date in fraction of year

'Daily_sunspot_number' -  Daily total sunspot number. A value of -1 indicates that no number is available for that day (missing value).

'Standard_deviation' - Daily standard deviation of the input sunspot numbers from individual stations.

'Number_observations' - Number of observations used to compute the daily value.

'Provisional_indicator' - Definitive/provisional indicator. A blank indicates that the value is definitive. A '*' symbol indicates that the value is still provisional and is subject to a possible revision (Usually the last 3 to 6 months)

In [28]:
colnames=['Year', 'Month', 'Day', 'Decimal_year', 'Daily_sunspot_number', 'Standard_deviation', 
          'Number_observations', 'Provisional_indicator'] 
df = pd.read_csv(r'data/daily.csv', names=colnames, header=None, delimiter=';')

In [29]:
df

Unnamed: 0,Year,Month,Day,Decimal_year,Daily_sunspot_number,Standard_deviation,Number_observations,Provisional_indicator
0,1818,1,1,1818.001,-1,-1.0,0,1
1,1818,1,2,1818.004,-1,-1.0,0,1
2,1818,1,3,1818.007,-1,-1.0,0,1
3,1818,1,4,1818.010,-1,-1.0,0,1
4,1818,1,5,1818.012,-1,-1.0,0,1
...,...,...,...,...,...,...,...,...
74625,2022,4,26,2022.316,132,21.8,43,0
74626,2022,4,27,2022.319,132,14.8,43,0
74627,2022,4,28,2022.322,135,15.5,42,0
74628,2022,4,29,2022.325,107,21.4,39,0


In [30]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 74630 entries, 0 to 74629
Data columns (total 8 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Year                   74630 non-null  int64  
 1   Month                  74630 non-null  int64  
 2   Day                    74630 non-null  int64  
 3   Decimal_year           74630 non-null  float64
 4   Daily_sunspot_number   74630 non-null  int64  
 5   Standard_deviation     74630 non-null  float64
 6   Number_observations    74630 non-null  int64  
 7   Provisional_indicator  74630 non-null  int64  
dtypes: float64(2), int64(6)
memory usage: 4.6 MB


No null values.
Note: the -1 values in Daily_sunspot_number

In [43]:
# get datetime and move to first
def make_col_datetime(df):
    #make datetime 
    df['Date'] = pd.to_datetime(df[['Year', 'Month', 'Day']])
    
    # pop the column 
    first_column = df.pop('Date')

    # insert at first position 
    df.insert(0, 'Date', first_column)
    
    # Set as index
    df = df.set_index('Date')

    # Convert TimeSeries to specified frequency
    df = df.asfreq('D')

    # sort
    df = df.sort_index()

    return df

In [44]:
make_col_datetime(df)

Unnamed: 0_level_0,Year,Month,Day,Decimal_year,Daily_sunspot_number,Standard_deviation,Number_observations,Provisional_indicator
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1818-01-01,1818,1,1,1818.001,-1,-1.0,0,1
1818-01-02,1818,1,2,1818.004,-1,-1.0,0,1
1818-01-03,1818,1,3,1818.007,-1,-1.0,0,1
1818-01-04,1818,1,4,1818.010,-1,-1.0,0,1
1818-01-05,1818,1,5,1818.012,-1,-1.0,0,1
...,...,...,...,...,...,...,...,...
2022-04-26,2022,4,26,2022.316,132,21.8,43,0
2022-04-27,2022,4,27,2022.319,132,14.8,43,0
2022-04-28,2022,4,28,2022.322,135,15.5,42,0
2022-04-29,2022,4,29,2022.325,107,21.4,39,0
