Import necessary libraries

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
import numpy as np
import math
from statsmodels.tsa.ar_model import AutoReg
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error, mean_squared_error
from IPython.display import clear_output
import time

The `DataFetcher` class is designed to handle the retrieval of web data. It initializes with a URL to target and sets a user-agent header to mimic a web browser request, which helps in avoiding potential blocking by the website's server. 

The `fetch` method performs the actual request to the specified URL using the `requests.get` method, passing the predefined headers to simulate a browser visit. This method then returns the response object for further processing. 

This class simplifies the process of fetching web pages for data scraping or web automation tasks.

In [None]:
class DataFetcher:
    def __init__(self, url):
        self.url = url
        self.headers = {
            "User-Agent": 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36'
        }

    def fetch(self):
        r = requests.get(self.url, headers=self.headers)
        return r

The `PriceParser` class is designed for extracting financial data from HTML content. 

It contains a static method `parse_price` that takes HTML content as input, uses BeautifulSoup to parse it, and then finds the text of a specific element identified by a unique class attribute. This method is particularly tailored to extract the price of a stock or a financial instrument from web pages, convert it to a float (after removing any commas), and pair it with the current timestamp. The return value is a list containing both the timestamp and the price, making it a convenient tool for financial data scraping and analysis tasks.

In [None]:
class PriceParser:
    @staticmethod
    def parse_price(html_content):
        soup = BeautifulSoup(html_content, "html.parser")
        price = soup.find_all(class_=["Fw(b) Fz(36px) Mb(-4px) D(ib)"])[0].text
        price = float(price.replace(",", ""))
        now = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        return [now, price]

The `Metrics` class provides static methods for calculating key performance indicators commonly used in evaluating the accuracy of predictive models. These methods include:

- **`MAPE` (Mean Absolute Percentage Error)**: This method calculates the average of the absolute percentage errors between the predicted values and the actual values, offering a perspective on prediction accuracy in terms of percentage. It is especially useful for comparing the performance of models across different scales.

- **`ME` (Mean Error)**: This method computes the mean difference between predicted values and actual values, providing a measure of the model's bias. Positive or negative values indicate the direction of the bias.

- **`RMSE` (Root Mean Squared Error)**: This method calculates the square root of the mean squared error (MSE), giving an idea of the average magnitude of the errors. The RMSE is useful for comparing models and understanding the error in the same units as the target variable.

Each of these methods takes as input the actual values (`y`) and the predicted values (`y_pred`), except for RMSE, which requires the mean squared error as input, and returns a rounded figure that quantifies the model's predictive performance. These metrics are essential tools for data scientists to assess and refine their predictive models.

In [2]:
class Metrics:
    @staticmethod
    def MAPE(y, y_pred):
        mape = np.mean(np.abs((y - y_pred) / y)) * 100
        return round(mape, 5)

    @staticmethod
    def ME(y, y_pred):
        me = np.mean(y_pred - y)
        return round(me, 5)

    @staticmethod
    def RMSE(MSE):
        rmse = math.sqrt(MSE)
        return round(rmse, 5)

The `AutoRegressionModel` class is designed to encapsulate the entire process of using autoregressive models for time series forecasting. It initializes with a pandas DataFrame, which should contain at least two columns: one for time and another for the values to be forecasted.

The `split_fit_predict_plot_evaluate` method carries out several key steps:

1. **Preparation**: It converts the time column to datetime format and prepares the dataset for modeling.
2. **Splitting**: It divides the dataset into training and testing sets, typically using 80% of the data for training and the remaining 20% for testing.
3. **Modeling**: It initializes an autoregression model from the `AutoReg` class, using the training data. The model is then fitted to the data.
4. **Prediction**: It uses the fitted model to make predictions on the test set.
5. **Visualization**: It plots both the actual values and the predicted values over time to visually assess the model's performance.
6. **Evaluation**: It evaluates the model's predictions using several statistical metrics, including MAPE (Mean Absolute Percentage Error), ME (Mean Error), MAE (Mean Absolute Error), MSE (Mean Squared Error), and RMSE (Root Mean Squared Error), which are calculated using the actual values and the predicted values.

This class abstracts away the complexity of model training, prediction, and evaluation, providing a streamlined way to work with autoregressive models in time series analysis.

In [12]:
class AutoRegressionModel:
    def __init__(self, dataframe):
        self.dataframe = dataframe

    def split_fit_predict_plot_evaluate(self):
        self.dataframe['Time'] = pd.to_datetime(self.dataframe['Time'])
        dataset = self.dataframe['Value'].values

        train_size = int(len(dataset) * 0.8)
        X_train, X_test = dataset[0:train_size], dataset[train_size:]

        model = AutoReg(X_train, lags=1)
        
        fit_model = model.fit()
        
        yhat = fit_model.predict(train_size, len(dataset)-1) 

        plt.plot(self.dataframe['Time'].values[train_size:], yhat, linewidth=2, label='y predicted')
        plt.plot(self.dataframe['Time'], self.dataframe['Value'], linewidth=2, label='original data')
        plt.xticks(rotation=90)
        plt.legend()

        MAPE_metric = Metrics.MAPE(X_test, yhat)
        ME_metric = Metrics.ME(X_test, yhat)
        MAE_metric = round(mean_absolute_error(X_test, yhat), 5)
        MSE_metric = round(mean_squared_error(X_test, yhat), 5)
        RMSE_metric = Metrics.RMSE(MSE_metric)

        metrics_text = f"""
        Model evaluation metrics:

        MAPE = {MAPE_metric}
        ME = {ME_metric}
        MAE = {MAE_metric}
        MSE = {MSE_metric}
        RMSE = {RMSE_metric}
        """

        print(metrics_text)


The `MainLoop` class is designed to continuously fetch stock price data from a specified URL, parse this data to extract the current stock price and timestamp, and then store these values in a DataFrame. Upon collecting enough data points, the class initiates an autoregression model to predict future prices based on past data. This model's predictions are evaluated and plotted against actual data for visual comparison. The loop runs indefinitely, fetching data every 60 seconds, making it suitable for real-time data analysis and monitoring stock price trends.

In [10]:
class MainLoop:
    def __init__(self, url):
        self.data_fetcher = DataFetcher(url)
        self.dataframe = pd.DataFrame(columns=['Time', 'Value'])

    def run(self):
        while True:
            try:
                r = self.data_fetcher.fetch()
                parsed_price = PriceParser.parse_price(r.text)
                self.dataframe.loc[len(self.dataframe)] = parsed_price
            except Exception as e:
                print('A problem has occurred: ', e)
            clear_output(wait=True)
            print(self.dataframe)
            if len(self.dataframe) > 10:
                model = AutoRegressionModel(self.dataframe)
                model.split_fit_predict_plot_evaluate()
                plt.pause(0.05)
            plt.show()
            time.sleep(60)

Example usage:

In [None]:
url = 'https://finance.yahoo.com/quote/GOOG?p=GOOG'
main_loop = MainLoop(url)
main_loop.run()