SLOT (Self-supervised Learning of Tweets for Capturing Multi-level Price Trends) aims to address
1. the sparsity of tweets, with the number of tweets being heavily biased towards the most popular stocks.
2. the fact that tweets have noisy information that are often irrelevant to the actual stock movement.

The first problem was addressed by having SLOT learn the stock and tweet embeddings in the same vector space through self-supervised learning. This allows the use of any tweet for even unpopular stocks.

To tackle the second problem, SLOT uses tweets to learn multi-level relationships between stocks, rather than using them as direct evidence for stock prediction (e.g. positive sentiment = up).

## Attention LSTM

In [None]:
import torch
from torch import nn

class ALSTM:
    def __init__(self, input_size, hidden_size):
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            batch_first=True,
        )
        self.ln = nn.Linear(hidden_size, hidden_size)
        self.tanh = nn.Tanh()
        self.u = nn.Parameter(data=torch.randn(hidden_size))


    def forward(self, x):
        # x: (batch, seq_len, input_size)
        # output: (batch, seq_len, hidden_size)
        # h_n, c_n : (num_layers, batch, hidden_size)
        output, h_n, c_n = self.lstm(x)
        output = self.tanh(self.ln(output))

        # query: u, key: output, values: output
        attn_scores = torch.matmul(output, self.u) # (batch, seq_len, hidden_size) @ (hidden_size) -> (batch, seq_len)
        weights = attn_scores / attn_scores.sum(dim=1, keepdim=True)
        weights = weights.unsqueeze(dim=-1) # (batch, seq_len, 1)
        
        # (batch, seq_len, 1) * (batch, seq_len, hidden_size) -> (batch, seq_len, hidden_size)
        # (batch, seq_len, hidden_size) -> (batch, hidden_size)
        h_attn = (weights * output).sum(dim=1)

        # (batch, hidden_size) || (batch, hidden_size) -> (batch, 2*hidden_size)
        h_out = torch.cat((h_n[0], h_attn), dim=1) # both the general summary and the attention

        return h_out

In [None]:
class SLOT:
    def __init__(self, input_size, hidden_size, output_size = 1):
        self.ln_1 = nn.Linear(3*input_size, 3*input_size)
        self.alstm = ALSTM(input_size=3*input_size, hidden_size=hidden_size)
        self.ln_f = nn.Linear(hidden_size*2, output_size)

    def forward(self, features, global_trend, local_trend):
        # features, global_trend, local_trend: (batch, seq_len, input_size)
        final_input = torch.cat((features, global_trend, local_trend), dim=-1) # (batch, 3*input_size)
        finall_input = self.ln_1(final_input)
        h_out = self.alstm(final_input) # (batch, 2*hidden_size)
        y_pred = self.ln_f(h_out) # (batch, output_size)

        return y_pred


        

## Self-supervised Learning of Embeddings

The goal is to learn tweet (h_e) and stock (h_s) embeddings in the same semantic space, that is, learning the embeddings together such that one embedding can be used to query for the other. 
- a stock embedding and the embedding of tweet relevant to it are close together (higher dot product)
- solves problem of tweet sparsity; the model can associate stocks with tweets that don't directly mention it as long as they are close in vector space.

This was done by training for stock identification: predict the mentioned stock in a tweet when the stock symbol is masked.

First, every tweet is tokenized with Sentence Piece. Next, the tokens that correspond to the stock the tweet mentions are masked with the special token MASK.
- "Thank you Apple for the new iPhone." -> "Thank you [MASK] for the new iPhone."

### Stock Identification Model 

Use a BiLSTM (need to understand the context on both sides of the mask token).
- Using a transformer risks overfitting.

The stock embedding is a learnable parameter. 

The hidden state vector generated at the masked token is used as the tweet embedding because it
- captures the immediate left (via the forward LSTM) and right context (via the backward LSTM) of the tweet, making it exactly what we need for stock identification,
- and it is the part of the tweet that most connects to mentioned stock.





In [None]:
class StockIdentification:
    def __init__(self, num_stocks, embd_size, hidden_size):
        self.embd_size = embd_size

        self.token_embd = nn.Embedding(num_embeddings=vocab_size,
                                       embedding_dim=embd_size)
        

        self.h_s = nn.Embedding(num_embeddings=num_stocks, 
                                embedding_dim=embd_size)
        
        self.bi_lstm = nn.LSTM(
            input_size=embd_size, 
            hidden_size=hidden_size,
            bidirectional=True,
            batch_first=True
        )
        
    
    def forward(self, x):
        # x: (B, T) # tokenize before passing in



