## DATA690_7513_SP2023 Applied Artificial Intelligence For Practitioners.

###  Under the Instruction and guidance of, Professor Len Mancini

#### Final Project
#### Saathyak Rao Kasuganti 
#### UMBC ID: FV86010
#### Email: s219@umbc.edu

### Project Title : Predictive Analysis Telegram Bot using Bidirectional LSTM 

### Hypothesis: The use of a telegram bot that uses a bi-directional LSTM, can very efficiently and conveniently predict the stock price trend for any given stock symbol.

### Required Modules and applications for 'JET-FINANCE' Bot:

#### 1. Telegram  - To interact with Telegram API and use bot.  
#### 2. Pandas – Data analytics tools
#### 3. NumPy – Numerical library with math. functions.
#### 4. Yfinance – Library to use Yahoo Finance API
#### 5. Sklearn – For ML Models.
#### 6. Tensorflow – Used for LSTM model and Dense layers.
#### 7. Matplotlib – For visualizing the stock price and generate the graphs.

In [1]:
import telegram
from telegram.ext import Updater, CommandHandler
import pandas as pd
import numpy as np
import yfinance as yf
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Bidirectional, LSTM, Dense
import matplotlib.pyplot as plt

### Why I have chosen Bi-Directional LSTM as the algorithm to use: ?

#### 1. Capture long-term dependencies.

#### 2. Contextual understanding – processing input forward and backwards.

#### 3. Improved prediction accuracy.

#### 4. Flexibility of predicting over multiple time horizons.

### Why I have chosen the training time period for the BiLSTM model Jan 2022 to May 2023 ?

#### 1. If time period is too low,  the model may not generalize well.
#### 2.Therefore  it should be at-least an year long for ideal model training
#### 3. If time period is too long, the model’s training time is drastically increased.
#### 4. If the time period is around is around 6-7 years, the bi-directional LSTM takes too much time to train the model and generate the prediction results.


### Why Adam (Adaptive Moment Estiamtion) is suitable for stocks data ?

#### 1. Adaptive learning rate :  Adaptability aids in handling the variables of stock data.

#### 2. Faster convergence speeds. (Suitable when training models on limited resources)

In [2]:
def generate_image(stock_name):
    ticker = stock_name
    train_start_date = '2022-01-01'
    train_end_date = '2023-05-21'
    test_start_date = '2023-05-22'
    test_end_date = '2023-06-22'

    train_data = yf.download(ticker, start=train_start_date, end=train_end_date)
    test_data = yf.download(ticker, start=test_start_date, end=test_end_date)

    train_data['Returns'] = train_data['Close'].pct_change() * 100
    returns = train_data['Returns'].dropna().values

    scaler = MinMaxScaler(feature_range=(-1, 1))
    returns = scaler.fit_transform(returns.reshape(-1, 1))

    sequence_length = 20
    X = []
    y = []
    for i in range(sequence_length, len(returns)):
        X.append(returns[i-sequence_length:i])
        y.append(returns[i])
    X = np.array(X)
    y = np.array(y)

    train_size = int(0.8 * len(X))
    X_train, X_test = X[:train_size], X[train_size:]
    y_train, y_test = y[:train_size], y[train_size:]

    model = Sequential()
    model.add(Bidirectional(LSTM(50), input_shape=(sequence_length, 1))) #50 neurons in LSTM Layer.
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam') #using ADAM optimizer.
    model.fit(X_train, y_train, epochs=15, batch_size=16, verbose=2) #setting the epochs to 15 after multiple rounds of testing.

    predictions = model.predict(X_test)

    predictions = scaler.inverse_transform(predictions)
    y_test = scaler.inverse_transform(y_test)

    next_week_predictions = predictions.flatten()

    dates = pd.date_range(start=test_start_date, periods=len(next_week_predictions), freq='B')

    plt.figure(figsize=(10, 6))
    plt.plot(dates, next_week_predictions, label='Predicted Returns')

    # Calculate the expected monthly returns for the test period
    monthly_returns = test_data['Close'].pct_change().resample('M').sum().values * 100
    monthly_dates = pd.date_range(start=test_start_date, end=test_end_date, freq='M')

# Repeat the monthly returns for each trading day within the corresponding month
    trading_days_per_month = len(dates) // len(monthly_dates)
    daily_monthly_returns = np.repeat(monthly_returns, trading_days_per_month)[:len(dates)]

# Handling outliers and missing data : Pad the daily monthly returns with NaN values for the extra trading day
    if len(daily_monthly_returns) < len(dates):
        daily_monthly_returns = np.append(daily_monthly_returns, np.nan)

    plt.plot(dates, daily_monthly_returns, label='Reference Line', marker='o')

    plt.xlabel('Time')
    plt.ylabel('Returns')
    plt.title('Predicted Returns for the period of May 22nd to August 22nd')
    plt.legend()

    plt.xticks(rotation=45)

    plt.savefig('prediction.png', dpi=300)
    plt.close()
    return 'prediction.png'

## The workflow of the above generate_image function :

### 1 . Data Retrieval and Preprocessing:
    Download historical stock price data for the specified stock name within the defined train and test date ranges.
    Calculate the percentage returns of the closing prices and store them in the 'Returns' column.
    Scale the returns using the MinMaxScaler to normalize them within the range of -1 to 1.

### 2. Sequence Generation:
    Define a sequence length and create input sequences (X) and corresponding target values (y) for the LSTM model.
    Slide a window of the sequence length through the returns data and extract the sequences and target values.

### 3. Data Splitting:
    Split the generated sequences and target values into training and testing sets.
    Using 80% for training and 20% for testing.
    
### 4. Model Building and Training:
    Create a Sequential model and add a Bidirectional LSTM layer with 50 neurons.
    Add a Dense layer for the output and compile the model with mean squared error loss and Adam optimizer.
    Train the model using the training data for 15 epochs and a batch size of 16.

### 5. Prediction and Post-processing:
    Use the trained model to make predictions on the testing data.
    Inverse transform the predicted and actual values using the scaler to obtain the original scale of returns.
    Flatten the predictions for the next week's returns.

### 6. Visualization and Output:
    Generate a line graph to visualize the predicted returns for the next quarter.
    Save the graph as 'prediction.png' and return the filename as the output.
   

In [3]:
# Handler for the /start command
def start(update, context):
    context.bot.send_message(chat_id=update.effective_chat.id, text="Welcome to the Saathyak Kasuganti's stock trend prediction bot.")
    context.bot.send_message(chat_id=update.effective_chat.id, text="Use any stock symbol, to recieve stock price trend for the next quarter")
    context.bot.send_message(chat_id=update.effective_chat.id, text="enter /stock followed by any stock ticker or symbol. example: (/stock AAPL)")


### The above context.bot messages welcome the user to the telegram bot and prompt and guide the user to enter the appropriate stock symbol input for the model to predict

In [4]:
# Handler for the /stock command
def stock(update, context):
    stock_name = context.args[0]  # Extract the stock name from the command arguments
    image_path = generate_image(stock_name)  # Generate the image from the graph.

    # Debugging: Print the image path to the console
    print("Image Path:", image_path)

    try:
        context.bot.send_photo(chat_id=update.effective_chat.id, photo=open(image_path, 'rb'))  # Send the image as a reply

        # Debugging: Print a success message to the console
        print("Image sent successfully!")
    except Exception as e:
        # Debugging: Print an error message and the exception to the console
        print("Error sending image:", e)


### Next step is the main function that contains the-bot token and invokes the other two functions, when called.

In [5]:
def main():
    # Setting up the Telegram bot
    bot_token = '6257363048:AAHGybx4K_moPCC2UrIWGbHw7U7cydUL_BA'  #the telegram bot token that has been generated by BotFather.
    updater = Updater(token=bot_token, use_context=True)
    dispatcher = updater.dispatcher

    # Registering command handlers
    start_handler = CommandHandler('start', start)
    stock_handler = CommandHandler('stock', stock)
    dispatcher.add_handler(start_handler)
    dispatcher.add_handler(stock_handler)

    # Debugging Step: Printing a message when the bot starts
    print("Bot started!")

    # Starting the bot
    updater.start_polling()
    updater.idle()


### Finally - Calling the main function to start the bot

In [None]:
if __name__ == '__main__':
    main()


Bot started!
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
Epoch 1/15
17/17 - 5s - loss: 0.0960 - 5s/epoch - 320ms/step
Epoch 2/15
17/17 - 0s - loss: 0.0917 - 169ms/epoch - 10ms/step
Epoch 3/15
17/17 - 0s - loss: 0.0903 - 168ms/epoch - 10ms/step
Epoch 4/15
17/17 - 0s - loss: 0.0900 - 170ms/epoch - 10ms/step
Epoch 5/15
17/17 - 0s - loss: 0.0908 - 168ms/epoch - 10ms/step
Epoch 6/15
17/17 - 0s - loss: 0.0916 - 168ms/epoch - 10ms/step
Epoch 7/15
17/17 - 0s - loss: 0.0894 - 178ms/epoch - 10ms/step
Epoch 8/15
17/17 - 0s - loss: 0.0893 - 176ms/epoch - 10ms/step
Epoch 9/15
17/17 - 0s - loss: 0.0900 - 170ms/epoch - 10ms/step
Epoch 10/15
17/17 - 0s - loss: 0.0892 - 175ms/epoch - 10ms/step
Epoch 11/15
17/17 - 0s - loss: 0.0887 - 175ms/epoch - 10ms/step
Epoch 12/15
17/17 - 0s - loss: 0.0899 - 174ms/epoch - 10ms/step
Epoch 13/15
17/17 - 0s - loss: 0.0887 - 179ms/epoch - 11ms/step
Epoch 14/15
17/17 - 0s - los

Image Path: prediction.png
Image sent successfully!


### Result Interpretation:
#### The Results above indicate that 17 batches of data are used in the training steps.
#### The first epoch takes the longest time as it involves initialization steps
#### each subsequent epoch takes around 150ms only.
#### Initial loss is around 10%. 
#### By the end of the training phase, the loss is reduced by some extent to 8.5%