In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.preprocessing import MinMaxScaler

In [2]:
df = pd.read_csv("C:\\Users\\write\\Desktop\\LSTM Projects\\dataset\\Electricity+Consumption.csv")

In [3]:
df.dropna(inplace = True)

#### Train and Test conversion

In [4]:
trained_data = df.iloc[:8712, 1:3].values
test_data = df.iloc[8712:, 1:3].values

In [5]:
trained_y = df.iloc[:8712, 3].values
test_y = df.iloc[8712:, 3].values

In [6]:
sc = MinMaxScaler(feature_range = (0, 1))

In [7]:
trained_data_scaled = sc.fit_transform(trained_data)
test_data_scaled = sc.fit_transform(test_data)

In [8]:
trained_data_scaled.shape[1]

2

#### Create the Window Sized Dataset for Time Series Training

In [9]:
window_Size = 24

In [10]:
train_vars = []
target_vars = []

for i in range(window_Size, trained_data_scaled.shape[0]):
    train_vars.append(trained_data_scaled[i - window_Size:i, 0:3])
    target_vars.append(trained_y[i])

In [11]:
X_train, y_train = np.array(train_vars), np.array(target_vars)

In [12]:
X_train_reshape = X_train.reshape((X_train.shape[0], X_train.shape[1], X_train.shape[2]))

In [13]:
X_train.shape

(8688, 24, 2)

In [14]:
X_train_reshape.shape

(8688, 24, 2)

In [15]:
# X_train[0]

In [16]:
y_train.shape

(8688,)

### Build the Multivariate LSTM Architecture

In [18]:
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout, Reshape

In [19]:
multiLSTM = Sequential()

In [22]:
multiLSTM.add(LSTM(units = 60, return_sequences = True, input_shape = (window_Size, X_train_reshape.shape[2])))
multiLSTM.add(Dropout(0.1))
multiLSTM.add(LSTM(units = 80, return_sequences = True))
multiLSTM.add(Dropout(0.1))
multiLSTM.add(LSTM(units = 60, return_sequences = False))

## Interview Perspective : Sample for Understanding Shapes in LSTM

Certainly! Let's go through an example where we build a model with 5 LSTM layers and 2 Dense layers in between, using a window size of 10 and a batch size of 32.

### Model Architecture Overview:
- **Input shape:** `(batch_size=32, timesteps=10, features=1)` → This represents a sliding window over time series data where each window has 10 timesteps and 1 feature.
- **LSTM Layers:** 3 LSTM layers before the Dense layers and 2 LSTM layers after.
- **Dense Layers:** 2 Dense layers inserted in the middle.
- **Final Dense Layer:** Output layer to predict 1 value.

### Example Code:

```python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout, Dense

# Define the model
model = Sequential()

# First LSTM Layer (input_shape=(timesteps=10, features=1))
model.add(LSTM(units=64, return_sequences=True, input_shape=(10, 1)))
# Shape: (batch_size, timesteps=10, features=64)
model.add(Dropout(0.2))

# Second LSTM Layer
model.add(LSTM(units=64, return_sequences=True))
# Shape: (batch_size, timesteps=10, features=64)
model.add(Dropout(0.2))

# Third LSTM Layer
model.add(LSTM(units=64, return_sequences=False))  # No more sequences
# Shape: (batch_size, features=64)
model.add(Dropout(0.2))

# First Dense Layer
model.add(Dense(units=32))  # Reduces dimensionality
# Shape: (batch_size, features=32)

# Second Dense Layer
model.add(Dense(units=16))  # Further reduces dimensionality
# Shape: (batch_size, features=16)

# Fourth LSTM Layer (returning sequences again)
model.add(Reshape((1, 16)))  # Reshape to (batch_size, 1 timestep, 16 features) for LSTM compatibility
model.add(LSTM(units=64, return_sequences=True))
# Shape: (batch_size, timesteps=1, features=64)
model.add(Dropout(0.2))

# Fifth LSTM Layer (final LSTM layer)
model.add(LSTM(units=64))
# Shape: (batch_size, features=64)
model.add(Dropout(0.2))

# Final Output Dense Layer
model.add(Dense(units=1))  # Predicting a single output value
# Shape: (batch_size, features=1)

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Print the model summary
model.summary()
```

### Explanation of Shapes at Each Layer:

1. **Input Shape: (batch_size=32, timesteps=10, features=1)**
   - **Explanation:** The model is processing data in batches of 32 samples, with each sample being a window of 10 timesteps and 1 feature per timestep.

2. **First LSTM Layer (64 units, return_sequences=True):**
   - **Output Shape:** `(batch_size=32, timesteps=10, features=64)`
   - **Explanation:** The LSTM layer processes each of the 10 timesteps and outputs a 64-dimensional vector for each timestep.

3. **Second LSTM Layer (64 units, return_sequences=True):**
   - **Output Shape:** `(batch_size=32, timesteps=10, features=64)`
   - **Explanation:** Similar to the first LSTM, this layer outputs a 64-dimensional vector for each timestep.

4. **Third LSTM Layer (64 units, return_sequences=False):**
   - **Output Shape:** `(batch_size=32, features=64)`
   - **Explanation:** Since `return_sequences=False`, this layer returns only the final output of the LSTM, which is a single 64-dimensional vector for each sample (ignores the timesteps).

5. **First Dense Layer (32 units):**
   - **Output Shape:** `(batch_size=32, features=32)`
   - **Explanation:** The `Dense` layer reduces the dimensionality from 64 features to 32 features.

6. **Second Dense Layer (16 units):**
   - **Output Shape:** `(batch_size=32, features=16)`
   - **Explanation:** Further reduces the dimensionality from 32 features to 16 features.

7. **Reshape Layer:**
   - **Output Shape:** `(batch_size=32, timesteps=1, features=16)`
   - **Explanation:** We reshape the output to prepare it for further LSTM layers. It converts the flat feature vector of 16 into a timestep (1 timestep with 16 features).

8. **Fourth LSTM Layer (64 units, return_sequences=True):**
   - **Output Shape:** `(batch_size=32, timesteps=1, features=64)`
   - **Explanation:** The LSTM layer processes the 1 timestep and outputs a 64-dimensional vector.

9. **Fifth LSTM Layer (64 units, return_sequences=False):**
   - **Output Shape:** `(batch_size=32, features=64)`
   - **Explanation:** Since `return_sequences=False`, the final output is a 64-dimensional vector per sample.

10. **Final Dense Layer (1 unit):**
    - **Output Shape:** `(batch_size=32, features=1)`
    - **Explanation:** This layer reduces the output to a single value per sample, suitable for regression tasks (like predicting a single value).

### Key Points:
- The `return_sequences=True` in LSTM layers ensures that each LSTM outputs a sequence for every timestep. This is needed for stacking LSTM layers.
- The two `Dense` layers in the middle reduce the feature dimensionality.
- After the `Dense` layers, we reshape the output to make it compatible with further LSTM layers.
- The final `Dense` layer predicts a single value.

This example shows how you can use multiple LSTM layers with Dense layers in between while preserving the sequential nature of the data.

The `return_sequences` argument in an LSTM layer determines the format of the output from the LSTM. It controls whether the LSTM layer should return the entire sequence of outputs (one for each timestep) or just the output from the last timestep. Let's break this down:

### **1. `return_sequences=False` (default behavior):**
- **Output:** The LSTM will return only the output of the last timestep.
- **Shape:** `(batch_size, units)` where `units` is the number of LSTM units (neurons).

**When to use:**
- You typically use `return_sequences=False` when the next layer only needs the final state of the sequence, such as when making a prediction at the end of the sequence (e.g., classification of a sequence, or predicting the next value based on all previous timesteps).

**Example Shape:**
- Input shape: `(batch_size, timesteps, features)` (e.g., `(32, 10, 1)` for 10 timesteps with 1 feature each).
- LSTM output shape: `(batch_size, units)` (e.g., `(32, 60)` if there are 60 LSTM units).

### **2. `return_sequences=True`:**
- **Output:** The LSTM will return an output for every timestep in the input sequence, not just the final one.
- **Shape:** `(batch_size, timesteps, units)` where `units` is the number of LSTM units.

**When to use:**
- You use `return_sequences=True` when you are stacking LSTM layers, and the next LSTM layer needs the full sequence of outputs, or when you're interested in predicting a sequence of values (e.g., sequence-to-sequence models).

**Example Shape:**
- Input shape: `(batch_size, timesteps, features)` (e.g., `(32, 10, 1)` for 10 timesteps with 1 feature each).
- LSTM output shape: `(batch_size, timesteps, units)` (e.g., `(32, 10, 60)` if there are 60 LSTM units).

### **Visual Example:**
Let's say you have an input sequence of 10 timesteps, with 1 feature each (input shape `(batch_size, 10, 1)`), and you use an LSTM with 60 units.

- **`return_sequences=False`:** Only the output from the last (10th) timestep is returned. The shape would be `(batch_size, 60)` because you're getting the output from only one timestep.
  
- **`return_sequences=True`:** The LSTM returns outputs for all 10 timesteps, so the shape is `(batch_size, 10, 60)`, with 60 units for each of the 10 timesteps.

### **Usage in Stacked LSTM Layers:**
- When stacking multiple LSTM layers, you need to set `return_sequences=True` for all LSTM layers except the last one so that each layer receives the full sequence of outputs from the previous one.
- For example, the first LSTM layer outputs a sequence, which the second LSTM layer can then process.

### Example:

```python
# LSTM with return_sequences=True (returns output at each timestep)
model.add(LSTM(units=60, return_sequences=True, input_shape=(10, 1)))

# LSTM with return_sequences=False (returns output at the final timestep)
model.add(LSTM(units=60, return_sequences=False))
```

### Summary:
- **`return_sequences=True`:** Returns the full sequence of outputs (for each timestep).
- **`return_sequences=False`:** Returns only the output at the final timestep.

