# Notes and Assumptions

# Model 1 - Heterogeneous agent model developed by Brock and Hommes (1998)

The sentiment analysis indicator ($SA$) at time $t$ can be written as:

$$Y_{SA,t} = f_{SA}(T_t, A_t, E_t)$$

where $T_t$ is the text data extracted from news articles and social media related to Bitcoin at time $t$, and $f_{SA}$ is a function that processes the text data to generate a sentiment score. The sentiment score may be based on techniques such as keyword analysis, natural language processing, or machine learning.

> We didn't use this indicator just yet because of the lack of the data provided on it

The model is more or less a linear combination of bitcoin features.

In [None]:
# chunk = pd.read_csv("datasets/btc_tweets.csv", chunksize=100000, lineterminator='\n')
# btc_tweets = pd.concat(chunk)

In [None]:
def sentiment_analysis(text):
    blob = TextBlob(text)
    return blob.sentiment.polarity

# text = 'Bitcoin is soaring to new heights!'
# btc['SA'] = btc.apply(lambda row: sentiment_analysis(row['Text']), axis=1)


> btc sentiment data is not much findable for dates before 2015

> Dropping some NaN values du to indicators, parameter = $k$

Add the saving to the calculation part

## Model 2 - WARMA-NN

- Strengths of ARIMA model:

ARIMA models are easy to understand and interpret.
ARIMA models can handle a wide range of stationary time series data, including those with a linear trend and seasonality.
ARIMA models can provide reliable short-term forecasts.
ARIMA models can be used to model univariate time series data.

- Weaknesses of ARIMA model:

ARIMA models are not suitable for modeling non-stationary time series data.
ARIMA models are limited to modeling linear relationships between variables.
ARIMA models require a large amount of historical data to be effective.

- Opportunities for ARIMA model:

ARIMA models can be easily adapted to include external factors that may influence the time series data.
ARIMA models can be used in combination with other models, such as machine learning models, to improve forecast accuracy.

- Threats to ARIMA model:

ARIMA models may not be effective in predicting long-term trends or major shifts in the data.
ARIMA models may not be able to capture complex relationships between variables.
ARIMA models can be computationally intensive and may require significant processing power to train.

- Strengths of VARMA model:

VARMA models can capture complex relationships between multiple variables.
VARMA models can handle non-stationary time series data.
VARMA models can provide accurate short-term and long-term forecasts.
VARMA models can be used to model multivariate time series data.

- Weaknesses of VARMA model:

VARMA models can be difficult to interpret and understand.
VARMA models require a large amount of data to train effectively.
VARMA models can be computationally intensive and may require significant processing power to train.

- Opportunities for VARMA model:

VARMA models can be used to identify causal relationships between variables.
VARMA models can be used to identify leading indicators for forecasting.
VARMA models can be used in combination with other models, such as machine learning models, to improve forecast accuracy.

- Threats to VARMA model:

VARMA models may not be effective in modeling time series data with irregular patterns.
VARMA models may be sensitive to outliers in the data.
VARMA models may be difficult to implement and require specialized knowledge to train effectively.

## Model 3: ARIMA-NN + Heterogenous Agent Model

First, the ARIMA and NN models are widely used and well-established models for predicting time series data. ARIMA models are particularly useful for capturing trend and seasonality in the data, while NN models are good at identifying complex relationships between input variables and the output. By combining these models, we can leverage their strengths and improve the accuracy of our predictions.

Second, a heterogenous agent model can help us account for the diversity of agents in the bitcoin market, each with their own set of preferences and behaviors. By modeling these agents and how they interact with each other, we can better understand the dynamics of the market and make more accurate predictions.

Third, by using indicators as inputs to our models, we can incorporate a wide range of data into our predictions, including both technical and fundamental factors that may affect the bitcoin market. This can lead to more robust and accurate predictions that take into account a variety of factors that may influence bitcoin prices.

Finally, the ability to combine different models into a single framework allows us to capitalize on the strengths of each model while minimizing their weaknesses. This can help us produce more accurate and reliable predictions, which can be crucial for making informed decisions in the volatile and unpredictable world of cryptocurrency trading.

In conclusion, by combining ARIMA and NN models with a heterogenous agent model, we can develop a powerful framework for predicting bitcoin prices that takes into account a wide range of factors and can produce accurate and reliable predictions. While there are no guarantees in the world of cryptocurrency trading, this approach provides a solid foundation for making informed decisions and minimizing risk.

## Model N: Fundamental Analysis


## Model 2 - Time-Series of the economic and macroeconomic factors and of the technical indicators and sentiment analysis

In this model, we try to forecast the bitcoin features and indicators using time-series techniques.

Here are the indicatiors we'd be trying to forecast:

- $H_t$ is the hash rate of the Bitcoin network at time $t$
- $TV_t$ is the transaction volume on the Bitcoin network at time $t$
- $MD_t$ is the mining difficulty of the Bitcoin network at time $t$
- $IR_t$ is the inflation rate at time $t$

- The moving average indicator ($MA$) 

For example, the formula for the moving average indicator ($MA$) with a window size of $k$ can be written as:

$$Y_{MA,t} = \frac{1}{k} \sum_{i=t-k+1}^{t} P_i$$

where $P_i$ is the price of Bitcoin at time $i$.

- The relative strength index ($RSI$)

Similarly, the formula for the relative strength index ($RSI$) with a window size of $k$ can be written as:

$$Y_{RSI,t} = 100 - \frac{100}{1 + RS}$$

where $RS$ is the relative strength at time $t$, which is calculated as:

$$RS = \frac{\sum_{i=t-k+1}^{t} Max(P_i - P_{i-1}, 0)}{\sum_{i=t-k+1}^{t} |P_i - P_{i-1}|}$$

- The stochastic oscillator ($SO$)

The formula for the stochastic oscillator ($SO$) with a window size of $k$ can be written as:

$$Y_{SO,t} = \frac{P_t - Min_{k}(P)}{Max_{k}(P) - Min_{k}(P)} \times 100$$

where $Min_{k}(P)$ and $Max_{k}(P)$ are the minimum and maximum prices of Bitcoin over the past $k$ periods, respectively.

- The Google Trend indicator $f_{GT}(Q_t)$

Another $f_j$ can be the Google Trend indicator $f_{GT}(Q_t)$

$$Y_{GT,t} = f_{GT}(Q_t)$$

where $Q_t$ represents the search query related to Bitcoin at time $t$, and $f_{GT}$ is a function that processes the search data to generate a Google Trends score.

The Google Trends score is a relative measure of the search interest for a particular query over time. It is calculated by normalizing the search volume for a given query over a specific time period and location to the total search volume for all queries in that time period and location. The resulting score ranges from 0 to 100, with 100 indicating the highest relative search interest.

Here is a step-by-step guide on how to build a model to forecast economic and macroeconomic indicators, technical indicators, and sentiment analysis for predicting future values:

1. **Gather data**: To build a model, we gather data on the economic and macroeconomic indicators, technical indicators, and sentiment analysis. We can obtain this data from public sources such as financial databases, news articles, and social media.

2. **Pre-process the data**: Once we have gathered the data, we  pre-process it to make it ready for analysis. This involves cleaning the data, handling missing values, and transforming the data into a format suitable for analysis.

3. **Perform feature selection**: Feature selection involves identifying the most relevant features for the model. We use techniques such as correlation analysis, principal component analysis (PCA), and mutual information to select the most important features.

4. **Build the model**: There are various models we use to forecast future values, such as regression models, time-series models, and machine learning models. For example, we use a linear regression model to predict the future value of Bitcoin based on the historical values of economic and macroeconomic indicators, technical indicators, and sentiment analysis.

5. **Evaluate the model**: After building the model, we evaluate its performance. We use techniques such as cross-validation and residual analysis to assess the model's accuracy and determine if it meets our requirements.

6. **Refine the model**: Based on the evaluation, we refine the model by adjusting the model parameters or selecting a different model altogether.

6. **Make predictions**: Once we are satisfied with the model's performance, we use it to make predictions about future values. However, it's important to note that the accuracy of the predictions may vary based on the quality of the data and the assumptions made in the model.

### Linear regression model:

\begin{equation}
Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_n X_n + \epsilon
\end{equation}

where Y is the dependent variable (e.g. Bitcoin's future value), X1, X2, ... Xn are the independent variables (e.g. economic indicators, technical indicators, sentiment analysis), β0, β1, β2, ... βn are the coefficients, and ε is the error term.

### PCA:

\begin{equation}
T = XW
\end{equation}

where T is the transformed data, X is the original data, and W is the weight matrix.

### Mutual information:

\begin{equation}
I(X;Y) = \sum_{y \in Y} \sum_{x \in X} p(x,y) \log{\frac{p(x,y)}{p(x)p(y)}}
\end{equation}

where I(X;Y) is the mutual information between X and Y, p(x,y) is the joint probability distribution, and p(x) and p(y) are the marginal probability distributions.

### References:

- J. Y. Campbell, A. W. Lo, and A. C. MacKinlay, The Econometrics of Financial Markets, Princeton University Press, 1997.
- J. Moody and M. Saffell, "Extracting Historical Patterns from Textual Data," in Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 20-23, 1999, pp. 8-15.
