# Labeling: Excess Return Over Median



## Abstract

In this notebook, we demonstrate the method of labelling financial returns data according to excess over median. Using cross sectional data on returns of many different stocks, each observation is labelled according to whether or how much it's return exceeds the median. Correlations can then be found between features and the likelihood that a stock will outperform the market.

This technique is used in the following paper:
["The benefits of tree-based models for stock selection"](https://link.springer.com/article/10.1057/jam.2012.17) by _Zhu et al._ (2012). 

In this work, independent composite features are constructed as weighted averages of various parameters in fundemental and quantitative analysis, such as PE ratio, corporate cash flows, debt etc. The composite features are the applied as parameters in a decision tree to preduct whether a stock will outperform the market.


## How it works

A dataframe containing forward total stock returns is calculated from close prices. The median return of all companies at time $t$ in the portfolio is used to represent the market return, and excess returns are calculated by subtracting the median return from each stock's return over the time period $t$ \[Zhu et al. 2012\]. The numerical returns over medians can then be used as is (in regression analysis), or can be relabelled simply to whether or not the return is above or below the median (for use in classification analysis).

---
## Examples of use

In [6]:
import numpy as np
import pandas as pd
import yfinance as yf

from mlfinlab.labeling.excess_over_median import excess_over_median

import matplotlib.pyplot as plt

In [24]:
# Load price data for 22 stocks
tickers = "AAPL MSFT COST PFE SYY F GE BABA AMD CCL ZM FB WFC JPM NVDA CVX AAL UBER C UA VZ NOK"

data = yf.download(tickers, start="2019-05-20", end="2020-05-25",
                   group_by="ticker")
data = data.loc[:, (slice(None), 'Adj Close')]
data

[*********************100%***********************]  22 of 22 completed


Unnamed: 0_level_0,C,PFE,FB,VZ,WFC,BABA,NVDA,JPM,AAPL,NOK,AAL,MSFT,F,CVX,COST,ZM,SYY,CCL,UBER,AMD,GE,UA
Unnamed: 0_level_1,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
2019-05-20,62.928131,39.998817,182.720001,56.590591,43.170811,160.649994,151.224991,107.867950,180.930695,4.974850,30.638575,124.685707,9.798635,115.381088,246.727173,84.669998,72.997589,50.145004,41.590000,26.680000,9.841136,20.860001
2019-05-21,64.013100,40.075752,184.820007,57.060509,44.006683,163.429993,154.523544,108.236069,184.399307,4.994436,30.975046,125.357445,9.760509,115.858505,247.748169,85.440002,73.513718,50.413055,41.500000,27.350000,9.920822,21.160000
2019-05-22,62.637516,40.383518,185.320007,56.820759,43.788216,158.830002,151.673447,107.354523,180.624329,5.079745,29.609373,126.118073,9.503152,115.123291,247.331848,82.430000,73.435814,50.384338,41.250000,27.410000,9.861058,21.370001
2019-05-23,61.901287,40.316193,180.869995,56.456337,43.275299,156.000000,146.810318,105.242699,177.541138,4.921312,29.193733,124.646194,9.388772,112.545273,246.360397,78.760002,72.724915,49.891411,40.470001,26.360001,9.522396,20.850000
2019-05-24,62.375957,40.345047,181.059998,56.887890,43.854702,155.000000,144.647812,106.279236,176.859283,5.069843,28.867161,124.705452,9.369707,113.347321,245.141159,76.250000,73.085236,49.881748,41.509998,26.440001,9.412827,20.910000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-05-18,45.669998,38.070000,213.190002,55.720001,25.410000,215.279999,350.010010,90.449997,314.959991,3.650000,9.870000,184.396439,5.310000,92.550003,302.760010,164.690002,52.700001,14.720000,33.619999,54.590000,6.270000,7.300000
2020-05-19,44.430000,37.680000,216.880005,54.380001,23.950001,217.199997,352.220001,88.669998,313.140015,3.690000,9.640000,183.119995,5.300000,89.620003,304.630005,173.679993,51.380001,14.110000,33.400002,55.470001,6.210000,7.080000
2020-05-20,45.470001,37.630001,229.970001,54.259998,24.520000,216.789993,358.799988,91.330002,319.230011,3.820000,9.870000,185.660004,5.490000,93.000000,304.910004,175.479996,52.580002,14.150000,34.480000,56.389999,6.420000,7.470000
2020-05-21,45.000000,37.259998,231.389999,53.970001,24.459999,212.160004,351.010010,90.169998,316.850006,3.770000,9.890000,183.429993,5.630000,92.040001,301.970001,172.029999,52.480000,14.600000,34.259998,54.650002,6.480000,7.700000


In [25]:
excess_over_median(data)

Unnamed: 0_level_0,C,PFE,FB,VZ,WFC,BABA,NVDA,JPM,AAPL,NOK,AAL,MSFT,F,CVX,COST,ZM,SYY,CCL,UBER,AMD,GE,UA
Unnamed: 0_level_1,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
2019-05-20,0.009041,-0.006277,0.003292,0.000103,0.011161,0.009104,0.013612,-0.004788,0.010970,-0.004264,0.002781,-0.002813,-0.012091,-0.004063,-0.004062,0.000894,-0.001130,-0.002855,-0.010365,0.016912,-0.000103,0.006181
2019-05-21,-0.015995,0.013174,0.008200,0.001293,0.000530,-0.022652,-0.012950,-0.002650,-0.014978,0.022575,-0.038595,0.011562,-0.020873,-0.000852,0.003814,-0.029735,0.004435,0.004925,-0.000530,0.007688,-0.000530,0.015419
2019-05-22,0.005690,0.015777,-0.006569,0.011030,0.005730,-0.000374,-0.014619,-0.002228,0.000374,-0.013745,0.003406,0.005773,0.005408,-0.004950,0.013516,-0.027079,0.007763,0.007660,-0.001465,-0.020863,-0.016900,-0.006889
2019-05-23,0.006785,-0.000167,0.000167,0.006761,0.012506,-0.007293,-0.015613,0.008966,-0.004724,0.029298,-0.012069,-0.000408,-0.002914,0.006243,-0.005832,-0.032752,0.004071,-0.001077,0.024815,0.002152,-0.012390,0.001995
2019-05-24,-0.003444,0.004682,0.023824,-0.004072,-0.006688,0.004649,-0.006664,-0.004972,0.001740,-0.015610,-0.009210,0.005241,0.000788,0.002505,0.001871,0.025809,-0.000788,-0.001101,-0.007616,0.104588,-0.003649,0.008266
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-05-18,-0.018905,-0.001998,0.025554,-0.015803,-0.049212,0.017164,0.014560,-0.011434,0.002467,0.019205,-0.015057,0.001324,0.006363,-0.023413,0.014422,0.062833,-0.016802,-0.033194,0.001702,0.024366,-0.001324,-0.021891
2020-05-19,0.000026,-0.024708,0.036974,-0.025588,0.000418,-0.025269,-0.004700,0.006617,-0.003933,0.011849,0.000477,-0.009511,0.012467,0.014333,-0.022462,-0.013018,-0.000026,-0.020547,0.008954,-0.006796,0.010435,0.031703
2020-05-20,-0.001788,-0.001284,0.014724,0.003204,0.006102,-0.012808,-0.013162,-0.004152,0.001093,-0.004540,0.010575,-0.003462,0.034050,-0.001774,-0.001093,-0.011112,0.006647,0.040351,0.002168,-0.022308,0.017895,0.039339
2020-05-21,-0.017399,0.009042,0.017814,0.005010,-0.009255,-0.056128,0.031204,-0.005162,0.009040,0.023821,-0.016610,0.003037,0.006154,-0.016521,0.004125,-0.003037,-0.011309,-0.006988,0.019239,0.012116,-0.008201,-0.038957


In [26]:
excess_over_median(data, binary=True)


Unnamed: 0_level_0,C,PFE,FB,VZ,WFC,BABA,NVDA,JPM,AAPL,NOK,AAL,MSFT,F,CVX,COST,ZM,SYY,CCL,UBER,AMD,GE,UA
Unnamed: 0_level_1,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
2019-05-20,1.0,-1.0,1.0,1.0,1.0,1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,-1.0,-1.0,-1.0,1.0,-1.0,-1.0,-1.0,1.0,-1.0,1.0
2019-05-21,-1.0,1.0,1.0,1.0,1.0,-1.0,-1.0,-1.0,-1.0,1.0,-1.0,1.0,-1.0,-1.0,1.0,-1.0,1.0,1.0,-1.0,1.0,-1.0,1.0
2019-05-22,1.0,1.0,-1.0,1.0,1.0,-1.0,-1.0,-1.0,1.0,-1.0,1.0,1.0,1.0,-1.0,1.0,-1.0,1.0,1.0,-1.0,-1.0,-1.0,-1.0
2019-05-23,1.0,-1.0,1.0,1.0,1.0,-1.0,-1.0,1.0,-1.0,1.0,-1.0,-1.0,-1.0,1.0,-1.0,-1.0,1.0,-1.0,1.0,1.0,-1.0,1.0
2019-05-24,-1.0,1.0,1.0,-1.0,-1.0,1.0,-1.0,-1.0,1.0,-1.0,-1.0,1.0,1.0,1.0,1.0,1.0,-1.0,-1.0,-1.0,1.0,-1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-05-18,-1.0,-1.0,1.0,-1.0,-1.0,1.0,1.0,-1.0,1.0,1.0,-1.0,1.0,1.0,-1.0,1.0,1.0,-1.0,-1.0,1.0,1.0,-1.0,-1.0
2020-05-19,1.0,-1.0,1.0,-1.0,1.0,-1.0,-1.0,1.0,-1.0,1.0,1.0,-1.0,1.0,1.0,-1.0,-1.0,-1.0,-1.0,1.0,-1.0,1.0,1.0
2020-05-20,-1.0,-1.0,1.0,1.0,1.0,-1.0,-1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,-1.0,-1.0,1.0,1.0,1.0,-1.0,1.0,1.0
2020-05-21,-1.0,1.0,1.0,1.0,-1.0,-1.0,1.0,-1.0,1.0,1.0,-1.0,1.0,1.0,-1.0,1.0,-1.0,-1.0,-1.0,1.0,1.0,-1.0,-1.0
