##### All the code was written by Thales V. Gomes and it is available at: https://github.com/ThalesVGomes

# Quantitative Finance - Sequence Signal Market Forecast

#### The object of this study is to try to find some signals of operation based on the sequences of movements of financial assets negotiated on stock markets.

#### We will analyse the historical prices of a stock, count the distribution of sequences formed in the past and its probabilities to happen. Then we will try to generate good signs to buy or sell the stock based on the current sequence of the stock.

##### For example, if our model learned that a particular stock have only 1% chance of making a 6 streak movement (up or down) and our threshold is bigger than 1%, we will find a period when the stock have a streak of 5 and assume the oposite position. If it has a 5 streak of up movements, we will go short (sell) and if it has 5 streak of down we will go long (buy)

### If the market sequences are pure random, they need to follow a binomial distribution (50% chance of going up and 50% of going down) just like tossing a coin.

#### Lets start by importing the external libraries that we will need to use to develop our study

In [172]:
import pandas as pd
import numpy as np
from pandas_datareader import DataReader
from pandas_datareader._utils import RemoteDataError
from datetime import date

# Defining our main funcions

### Sometimes we will cross some missing number distributions. That means we could have in our past data a sequence that last 8 days but not a sequence of 7. With the function adjust_sequence we will solve this problem inputing the missing values of the sequence based on an adjusted formula of the sequence distribution with the binomial distribution

In [205]:
def adjust_sequence(sequence_distribution):
    """
    Receives dictionary containing a sequence distribution
    and adjust it to have all the numbers between 1 and
    the highest number adding to the missing number 
    the probability of the previous number.
    ----------------------------------------------------
    For example:
        A sequence distribution might jump some numbers
        having sequences of 5 and 7, but not sequences of 6.

        With the adjust, it creates a sequence of 6 with the same
        probability as the sequence of 5.
    
    """
    adjusted_sequence = sequence_distribution.copy()
    real_sequences = list(sequence_distribution)
    real_probabilities = list(sequence_distribution.values())

    highest_sequence = real_sequences[-1]

    ideal_sequences = list(range(1,highest_sequence+1))

    if real_sequences != ideal_sequences:
        for seq in list(range(1,highest_sequence+1)):
            if seq not in list(adjusted_sequence.keys()):
                prob = list(adjusted_sequence.values())[seq-2]
                
                # In a binomial distribution the next value is always equal to half of the previous one
                # Based on it, we will make the next value of the sequence distribution follows the binomial rule
                adjusted_sequence[seq] = prob / 2 
                adjusted_sequence = dict(sorted(adjusted_sequence.items()))

    adjusted_sequence = dict(sorted(adjusted_sequence.items()))
    
    return adjusted_sequence

### Function to create the sequence distribution of a given stock

In [206]:
def get_sequence_distribution(ticker, start, end=date.today()):
    
    """
    Given a ticker, a start date and an end date
    takes the adjusted closing data (avoids distortions) and counts
    how many high/low sequences the ticker had.
    
    Returns a dictionary with the sequence distribution of the ticker
    in the given time period and the current state
    of the sequence.
    ----------------------------------------------------------------
    Example of usage:
    
    seq_dist, curr_seq = get_sequence_distribution(ticker='GOLL4.SA', start='01-01-2020', end='01-01-2021')
    
    seq_dist ->  {1: 0.515, 2: 0.262, 3: 0.108, 4: 0.054, 5: 0.054, 6: 0.008}
                51.5% of the sequences are composed by one movement (1 up or 1 down) and so on.

    curr_seq -> 2
                   Means that in the end date (01-01-2021) the sequence of movements
                   in the same direction is equal to 2.
    
    """
    
    global data # For backtesting purposes
    
    data = None
    
    try:
        data = DataReader(ticker,'yahoo', start, end)['Adj Close'].to_frame()
    except RemoteDataError:
        print(f'No data found for: {ticker}')
        
    data['Returns'] = data['Adj Close'].pct_change()
    data.dropna(inplace=True)
    
    data['Direction'] = data['Returns'].apply(lambda x: 1 if x > 0 else -1) # 1 for up and -1 for down
    
    data = data.drop_duplicates(subset=['Adj Close', 'Returns', 'Direction']) 
    # Sometimes there are no tradings and some days keep repeating the same closing value
    
    n_rows = data['Direction'].shape[0] # number of rows in the dataframe
    
    sequences = np.array([], dtype='int8') # Where the sequences will be stored
    streak = 1
    
    for day in range(1, n_rows):
        if data['Direction'][day] == data['Direction'][day-1]: # If the current movement is equal to the last
            streak += 1
        else:
            sequences = np.append(sequences, streak)
            streak = 1    
    sequences = np.append(sequences, streak) # Append the last sequence
    
    current_sequence = sequences[-1]
    
    # Creates the sequence distribution
    unique, counts = np.unique(sequences, return_counts=True) 
    total_sequences = counts.sum()
    normalized_counts = np.round(counts / total_sequences, decimals=3)
    sequence_distribution = dict(zip(unique, normalized_counts))
    sequence_distribution = adjust_sequence(sequence_distribution)
    
    return sequence_distribution, current_sequence

### Probability of the movement to keep following the trend

In [207]:
def continue_seq_prob(sequence_distribution, current_sequence):
    """
    Based on the learned sequence distribution,
    gives the probability of the current sequence
    to continue its movement one day ahead.
    
    For example, if our current sequence is equal to 5,
    gives us the probability of a sequence equal to 6 happening
    based on the historical data of the given asset.
    """

    
    probabilities = np.fromiter(sequence_distribution.values(), dtype=np.float16)
    probabilities = probabilities[::-1]
    summed_probabilities = np.cumsum(probabilities)
    
    try:
        current_probability = summed_probabilities[-current_sequence]
    except IndexError:
        current_probability = 0 # If its the first time that such a long sequence happens
    except:
        pass
    
    return current_probability, summed_probabilities

In [235]:
def investment_sign(probability, threshold):
    if probability <= threshold:
        return round(probability, 4)
    return 0

### Runs the signals for a list of stocks and return the stocks that are currently (based on the end date) in a large sequence of movements in the same direction and have a probability smaller than the threshold to continue the movement.

In [236]:
def run_program(tickers, start, end=date.today(), threshold=0.05, verbose=True):
    """Runs the complete algorithm in a list of tickers
    and returns the tickers with a high chance of breaking
    the sequence streak"""
    
    results = {}
    for ticker in tickers:
        try:
            if verbose:
                print(f'Analysing {ticker}...')
            sequences_count, current_sequence = get_sequence_distribution(ticker, start)
            probability, _ = continue_seq_prob(sequences_count, current_sequence)
            sign = investment_sign(probability, threshold=threshold)
            results[ticker] = sign
        except Exception as error:
            if verbose:
                print(f'Error in: {ticker}. Error code: {error}')
            pass
        
    positive_results = {ticker: probability for ticker, probability in results.items() if probability > 0}
    positive_results = dict(sorted(positive_results.items(), key=lambda item: item[1]))

    return positive_results

## List of brazillian stocks ticker (based on the name at Yahoo Finance - https://finance.yahoo.com/)

In [222]:
tickers_small = ['AALR3.SA','AERI3.SA','AGRO3.SA','ALSO3.SA','ALUP11.SA','AMAR3.SA',
 'AMBP3.SA','ANIM3.SA','ARZZ3.SA','AZUL4.SA','BEEF3.SA','BKBR3.SA','BMGB4.SA',
 'BRPR3.SA','BRSR6.SA','CAML3.SA','CEAB3.SA','CESP6.SA','CIEL3.SA','CSMG3.SA',
 'CYRE3.SA','DIRR3.SA','DTEX3.SA','ECOR3.SA','ENBR3.SA','EVEN3.SA','FESA4.SA',
 'GOAU4.SA','GOLL4.SA','GUAR3.SA','HBOR3.SA','HBSA3.SA','HGTX3.SA','IGTA3.SA',
 'JPSA3.SA','LCAM3.SA','LEVE3.SA','LINX3.SA','LJQQ3.SA','LOGG3.SA','LOGN3.SA',
 'MEAL3.SA','MILS3.SA','MOVI3.SA','MTRE3.SA','MULT3.SA','MYPK3.SA','ODPV3.SA',
 'PETZ3.SA','PNVL3.SA','POMO4.SA','POSI3.SA','PTBL3.SA','QUAL3.SA','RAPT4.SA',
 'RRRP3.SA','SAPR11.SA','SAPR4.SA','SBFG3.SA','SEER3.SA','SEQL3.SA','SIMH3.SA',
 'SMLS3.SA','SOMA3.SA','SQIA3.SA','TASA4.SA','TEND3.SA','TGMA3.SA','TRIS3.SA',
 'TUPY3.SA','UNIP6.SA','VIVA3.SA','VLID3.SA','VULC3.SA','WIZS3.SA']

In [223]:
tickers_ibov = ['ABEV3.SA','AZUL4.SA','B3SA3.SA','BBAS3.SA','BBDC3.SA','BBDC4.SA','BBSE3.SA',
 'BEEF3.SA','BPAC11.SA','BRAP4.SA','BRDT3.SA','BRFS3.SA','BRKM5.SA','BRML3.SA','BTOW3.SA',
 'CCRO3.SA','CIEL3.SA','CMIG4.SA','COGN3.SA','CPFE3.SA','CRFB3.SA','CSAN3.SA','CSNA3.SA',
 'CVCB3.SA','CYRE3.SA','ECOR3.SA','EGIE3.SA','ELET3.SA','ELET6.SA','EMBR3.SA','ENBR3.SA',
 'ENGI11.SA','EQTL3.SA','FLRY3.SA','GGBR4.SA','GNDI3.SA','GOAU4.SA','GOLL4.SA','HAPV3.SA',
 'HGTX3.SA','HYPE3.SA','IGTA3.SA','IRBR3.SA','ITSA4.SA','ITUB4.SA','JBSS3.SA','KLBN11.SA',
 'LAME4.SA','LREN3.SA','MGLU3.SA','MRFG3.SA','MRVE3.SA','MULT3.SA','NTCO3.SA','PCAR3.SA',
 'PETR3.SA','PETR4.SA','QUAL3.SA','RADL3.SA','RAIL3.SA','RENT3.SA','SANB11.SA','SBSP3.SA',
 'SULA11.SA','SUZB3.SA','TAEE11.SA','TIMP3.SA','TOTS3.SA','UGPA3.SA','USIM5.SA','VALE3.SA',
 'VIVT4.SA','VVAR3.SA','WEGE3.SA','YDUQ3.SA']

In [224]:
tickers = tickers_ibov + tickers_small
tickers = set(tickers) # Remove duplicates

### Test with a small sample of tickers to go faster

In [233]:
import random
random.seed(1)
random_tickers = random.sample(tickers, 10)

In [237]:
run_program(random_tickers, start='01-01-2019', end=date.today(), threshold=0.05, verbose=True)

Analysing LINX3.SA...
Analysing MULT3.SA...
Analysing EQTL3.SA...
Analysing GGBR4.SA...
Analysing MYPK3.SA...
Analysing TEND3.SA...
Analysing VIVA3.SA...
Analysing YDUQ3.SA...
Analysing TOTS3.SA...
Analysing HYPE3.SA...


{'HYPE3.SA': 0.025}

### Based on the result above, we should assume the oposite movement position on HYPE3.SA stock with a chance of 2.5% of going wrong.

# *Backtesting* - The most important part

## Now that our strategy is fully working we will try to simulate our returns if this same strategy was applied at the past

## The first and most import thing to do is to select a time period to serve as train data where our model will learn the sequence distributions and a test data where our model will try to operate based on the learned distributions

In [240]:
def backtest(tickers, start_train, end_train, start_test, end_test, threshold=0.05):
    
    total_returns = []
    
    for ticker in tickers:

        # Learns the sequences distribution with the train data

        sequence_distribution, current_sequence = get_sequence_distribution(ticker=ticker,
                                                                            start=start_train, end=end_train)
        
        _, summed_probabilities = continue_seq_prob(sequence_distribution, current_sequence)
        
        seq_threshold = (len(summed_probabilities) - np.where(summed_probabilities < threshold)[0])
        
        try:
            seq_break = seq_threshold[-1]
        except IndexError:
            print(f'There is no sequence within the threshold probability for {ticker}. Try to increase the threshold.')
            continue

        # Predicts the test data based on the learned distribuitions 
        get_sequence_distribution(ticker=ticker, start=start_test, end=end_test)
        data[f'Sign{seq_break}'] = data['Direction'].rolling(window=seq_break).sum()
        
        buy_op = data[data[f'Sign{seq_break}']== -seq_break] # When to go long (buy)
        sell_op = data[data[f'Sign{seq_break}']== seq_break] # When to go short (sell)
        buy_days = buy_op.index
        sell_days = sell_op.index

        # Sold Returns
        sell_returns = []
        for day in sell_days:
            d = np.where(data.index==day)[0]
            try:
                r = data['Returns'][d+1][0]
                r = -r # In a short position you earn when the market goes down
                sell_returns.append(r)
            except IndexError:
                pass

        # Bought Returns
        buy_returns = []
        for day in buy_days:
            d = np.where(data.index==day)[0]
            try:
                r = data['Returns'][d+1][0]
                buy_returns.append(r)
            except IndexError:
                pass

        print(f'{ticker} Return = {round((sum(sell_returns) + sum(buy_returns))*100, 4)}%')
        total_returns.append(sum(sell_returns) + sum(buy_returns))
    return f'Total Return = {round(sum(total_returns)*100, 2)}%'

In [242]:
random_tickers = random.sample(tickers, 20)

In [243]:
backtest(random_tickers, start_train='01-01-2015', end_train='01-01-2021',
         start_test='02-01-2021', end_test=date.today(), threshold=0.075)

SAPR4.SA Return = 2.7074%
SULA11.SA Return = 2.8101%
KLBN11.SA Return = 0.0737%
BTOW3.SA Return = -12.9383%
CSNA3.SA Return = -2.0666%
FESA4.SA Return = 6.7637%
EGIE3.SA Return = 0.5045%
BKBR3.SA Return = 4.1761%
SEQL3.SA Return = 0.2917%
There is no sequence within the threshold probability for RRRP3.SA. Try to increase the threshold.
WEGE3.SA Return = 4.3137%
ECOR3.SA Return = 9.1864%
TGMA3.SA Return = 0.4373%
YDUQ3.SA Return = 7.2604%
BBSE3.SA Return = 2.6989%
ITUB4.SA Return = 1.1974%
POSI3.SA Return = -9.637%
ANIM3.SA Return = -7.3629%
USIM5.SA Return = -14.2699%
MYPK3.SA Return = -0.8542%


'Total Return = -4.71%'

# Conclusion

### As we can see from the backtest result, there is no strong evidence that the sequence is a good predictor of the market movement. If we had found a strong positive value, we could infere that betting on the reverse of sequence was a good strategy. On the other hand, if we had a strong negative value we could infere that following the sequence trend was a good strategy. Since we have a value close to 0% of return in a portfolio, we can't assume the influence of any strategy based ONLY in the sequence.

## Although there is no evidence for the influence of the sequence alone, maybe combined with some other signals the model could have a better performance. As Jim Simons said: "if there was strong and simple signals in the market, everybody would trade based on them and therefore they would not exist anymore." 