## Using airline-passenger Dataset try to predict the number of passengers for next month, given the number of passengers (in units of thousands) for this month

### A. You can write a simple function to convert our single column of data into a two-column dataset: 
* The first column containing this month’s (t) passenger count and the second column containing next month’s (t+1) passenger count, to be predicted.
* Divide the data into train and test set 
* Fit an LSTM model on top of the data where the optimizer = ‘adam’ and epoch =’100’
* Build another model where optimizer = ‘sgd’ and epoch = ‘50

In [13]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras.models import Sequential
from keras.layers import Dense, LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error

In [14]:
airp = pd.read_csv('C:/Users/Admin/Downloads/airline-passengers.csv')
airp.head()

Unnamed: 0,"""Month"",""Passengers"""
0,"""1949-01"",112"
1,"""1949-02"",118"
2,"""1949-03"",132"
3,"""1949-04"",129"
4,"""1949-05"",121"


In [15]:
airp.columns.values

array(['"Month","Passengers"'], dtype=object)

In [16]:
airp[['Month', 'Passengers']] = airp['"Month","Passengers"'].str.split(',', 1, expand = True)

In [17]:
airp.head()

Unnamed: 0,"""Month"",""Passengers""",Month,Passengers
0,"""1949-01"",112","""1949-01""",112
1,"""1949-02"",118","""1949-02""",118
2,"""1949-03"",132","""1949-03""",132
3,"""1949-04"",129","""1949-04""",129
4,"""1949-05"",121","""1949-05""",121


In [18]:
airp = airp.drop(labels = '"Month","Passengers"', axis = 1)

In [19]:
airp['Month'] = airp['Month'].str.replace('"', '')

In [20]:
airp.head()

Unnamed: 0,Month,Passengers
0,1949-01,112
1,1949-02,118
2,1949-03,132
3,1949-04,129
4,1949-05,121


In [21]:
airp['Month']=pd.to_datetime(airp['Month'], format='%Y-%m-%d')
airp.set_index(['Month'], inplace=True)

In [22]:
airp = airp.values.astype('float32')

In [23]:
airp.dtype

dtype('float32')

In [24]:
# Normalise the dataset

scaler = MinMaxScaler(feature_range=(0, 1))
airp = scaler.fit_transform(airp)

In [25]:
# Split train test data
train_size = int(len(airp)*0.70)
test_size = len(airp) - train_size

In [26]:
train, test = airp[0:train_size,:], airp[train_size:len(airp),:]
print(len(train)), print(len(test))

100
44


(None, None)

### The first column containing this month’s (t) passenger count and the second column containing next month’s (t+1) passenger count, to be predicted

In [27]:
def create_dataset(airp, timestamp =1):
    X, Y = [], []
    for i in range(len(airp)-timestamp-1):
        a = airp[i:(i+timestamp), 0]
        X.append(a)
        Y.append(airp[i+timestamp, 0])
    return np.array(X), np.array(Y)

### Divide the data into train and test set

In [28]:
timestamp =1
X_train, Y_train = create_dataset(train, timestamp=1)
X_test, Y_test = create_dataset(test, timestamp=1)

In [29]:
print(X_train.shape)
print(Y_train.shape)
print(X_test.shape)
print(Y_test.shape)

(98, 1)
(98,)
(42, 1)
(42,)


In [30]:
X_train = np.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1]))
X_test = np.reshape(X_test, (X_test.shape[0], 1, X_test.shape[1]))
print(X_train.shape)
print(Y_train.shape)
print(X_test.shape)
print(Y_test.shape)

(98, 1, 1)
(98,)
(42, 1, 1)
(42,)


### Fit an LSTM model on top of the data where the optimizer = ‘adam’ and epoch =’100’

In [31]:
model = Sequential()
model.add(LSTM(4, input_shape=(1,timestamp)))
model.add(Dense(1))
model.compile(loss = 'mean_squared_error', optimizer = 'adam', metrics = ['accuracy'])
model.fit(X_train, Y_train, epochs = 100, batch_size =1, verbose =2)

Epoch 1/100
98/98 - 0s - loss: 0.0399 - accuracy: 0.0102
Epoch 2/100
98/98 - 0s - loss: 0.0185 - accuracy: 0.0102
Epoch 3/100
98/98 - 0s - loss: 0.0144 - accuracy: 0.0102
Epoch 4/100
98/98 - 0s - loss: 0.0131 - accuracy: 0.0102
Epoch 5/100
98/98 - 0s - loss: 0.0121 - accuracy: 0.0102
Epoch 6/100
98/98 - 0s - loss: 0.0110 - accuracy: 0.0102
Epoch 7/100
98/98 - 0s - loss: 0.0100 - accuracy: 0.0102
Epoch 8/100
98/98 - 0s - loss: 0.0090 - accuracy: 0.0102
Epoch 9/100
98/98 - 0s - loss: 0.0079 - accuracy: 0.0102
Epoch 10/100
98/98 - 0s - loss: 0.0070 - accuracy: 0.0102
Epoch 11/100
98/98 - 0s - loss: 0.0060 - accuracy: 0.0102
Epoch 12/100
98/98 - 0s - loss: 0.0053 - accuracy: 0.0102
Epoch 13/100
98/98 - 0s - loss: 0.0046 - accuracy: 0.0102
Epoch 14/100
98/98 - 0s - loss: 0.0040 - accuracy: 0.0102
Epoch 15/100
98/98 - 0s - loss: 0.0036 - accuracy: 0.0102
Epoch 16/100
98/98 - 0s - loss: 0.0031 - accuracy: 0.0102
Epoch 17/100
98/98 - 0s - loss: 0.0028 - accuracy: 0.0102
Epoch 18/100
98/98 - 0s

<tensorflow.python.keras.callbacks.History at 0x1edd8c13f40>

In [33]:
train_pred = model.predict(X_train)
test_pred = model.predict(X_test)

# invert predictions
train_pred = scaler.inverse_transform(train_pred)
Y_train = scaler.inverse_transform([Y_train])
test_pred = scaler.inverse_transform(test_pred)
Y_test = scaler.inverse_transform([Y_test])

In [34]:
train_Score = np.sqrt(mean_squared_error(Y_train[0], train_pred[:,0]))
print('Train Score: %.2f RMSE' % (train_Score))
test_Score = np.sqrt(mean_squared_error(Y_test[0], test_pred[:,0]))
print('Test Score: %.2f RMSE' % (test_Score))

Train Score: 23.41 RMSE
Test Score: 48.69 RMSE


### Build another model where optimizer = ‘sgd’ and epoch = ‘50

In [38]:
timestamp =1
X_train, Y_train = create_dataset(train, timestamp=1)
X_test, Y_test = create_dataset(test, timestamp=1)
print(X_train.shape)
print(Y_train.shape)
print(X_test.shape)
print(Y_test.shape)

(98, 1)
(98,)
(42, 1)
(42,)


In [39]:
X_train = np.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1]))
X_test = np.reshape(X_test, (X_test.shape[0], 1, X_test.shape[1]))
print(X_train.shape)
print(Y_train.shape)
print(X_test.shape)
print(Y_test.shape)

(98, 1, 1)
(98,)
(42, 1, 1)
(42,)


In [40]:
# Another model where optimizer = ‘sgd’ and epoch = ‘50

model1 = Sequential()
model1.add(LSTM(10, input_shape = (1, timestamp)))
model1.add(Dense(1))
model1.compile(loss = 'mean_squared_error', optimizer = 'SGD', metrics = ['accuracy'])
model1.fit(X_train, Y_train, epochs = 50, batch_size =1, verbose =2)

Epoch 1/50
98/98 - 0s - loss: 0.0288 - accuracy: 0.0102
Epoch 2/50
98/98 - 0s - loss: 0.0188 - accuracy: 0.0102
Epoch 3/50
98/98 - 0s - loss: 0.0184 - accuracy: 0.0102
Epoch 4/50
98/98 - 0s - loss: 0.0182 - accuracy: 0.0102
Epoch 5/50
98/98 - 0s - loss: 0.0178 - accuracy: 0.0102
Epoch 6/50
98/98 - 0s - loss: 0.0177 - accuracy: 0.0102
Epoch 7/50
98/98 - 0s - loss: 0.0174 - accuracy: 0.0102
Epoch 8/50
98/98 - 0s - loss: 0.0172 - accuracy: 0.0102
Epoch 9/50
98/98 - 0s - loss: 0.0169 - accuracy: 0.0102
Epoch 10/50
98/98 - 0s - loss: 0.0168 - accuracy: 0.0102
Epoch 11/50
98/98 - 0s - loss: 0.0164 - accuracy: 0.0102
Epoch 12/50
98/98 - 0s - loss: 0.0162 - accuracy: 0.0102
Epoch 13/50
98/98 - 0s - loss: 0.0160 - accuracy: 0.0102
Epoch 14/50
98/98 - 0s - loss: 0.0158 - accuracy: 0.0102
Epoch 15/50
98/98 - 0s - loss: 0.0155 - accuracy: 0.0102
Epoch 16/50
98/98 - 0s - loss: 0.0153 - accuracy: 0.0102
Epoch 17/50
98/98 - 0s - loss: 0.0152 - accuracy: 0.0102
Epoch 18/50
98/98 - 0s - loss: 0.0149 - 

<tensorflow.python.keras.callbacks.History at 0x1eddd5e4730>

In [41]:
train_pred = model.predict(X_train)
test_pred = model.predict(X_test)

# invert predictions
train_pred = scaler.inverse_transform(train_pred)
Y_train = scaler.inverse_transform([Y_train])
test_pred = scaler.inverse_transform(test_pred)
Y_test = scaler.inverse_transform([Y_test])

In [42]:
train_Score = np.sqrt(mean_squared_error(Y_train[0], train_pred[:,0]))
print('Train Score: %.2f RMSE' % (train_Score))
test_Score = np.sqrt(mean_squared_error(Y_test[0], test_pred[:,0]))
print('Test Score: %.2f RMSE' % (test_Score))

Train Score: 23.41 RMSE
Test Score: 48.69 RMSE
