[![Binder](../img/badge-binder.svg)](https://mybinder.org/v2/gh/nhirschey/teaching/gh-pages?filepath=project/signal-exploration.ipynb)&emsp;
[![Script](../img/badge-script.svg)](/Teaching//project/signal-exploration.fsx)&emsp;
[![Notebook](../img/badge-notebook.svg)](/Teaching//project/signal-exploration.ipynb)

**Student Name: Omar Ben Ayed**

**Student Number: 38628**

**Signal Name (e.g., Book to Market): Piotroski F-score**

**Signal Code (e.g., be_me): f_score**

## Signal Background

The Piotroski F-Score was develop by Joseph Piotroski, a Stanford accounting professor. The score is a discrete result between 0 and 9, and it evaluates the following elements (with sub-conditions):
    
    - Profitability:
        1. ROA                          (+1 point if positive)
        2. Operating Cash Flow          (+1 if positive)
        3. Δ ROA                        (+1 if positive)
        4. CFO / Total Assets > ROA     (+1 if satisfied)
    - Leverage, liquidity, and source of funds:
        1. Δ Leverage                   (+1 if negative)
        2. Δ Current Ratio              (+1 if positive)
        3. Δ Shares                     (+1 if negative)
    - Operating Efficiency:
        1. Δ Gross Margin               (+1 if positive)
        2. Δ Asset Turnover             (+1 if positive)

A score of 0-2 is considered weak, a score of 3-7 is average, and a score of 8-9 is considered strong.

The ratio takes three relevant factors to economic prosperity (profitability, liquidity, and efficiency) and allows us to see which companies perform well based on those characteristics. 

Profitability allows shareholders to be more rewarded and for the business to grow further.
Leverage is an important factor, as when it is too high it can lead to shareholder value destruction.
Liquidity allows for mroe flexible and dynamic reactions to the competitive environment.
And efficiency is a testimonial of competitve advantage and positioning.

Based on this, it is at least a predictor of an improving balance sheet, although it is not one of a cheap business.

In his paper "Value Investing: The Use of Historical Financial Statement Information to Separate Winners from Losers", Joseph Piotroski used a sample of high book-to-market stocks. These stocks could be either cheap, financially distressed or both. Therefore, working a formula based on the 3 previously shown factors gives a downside protection from bankruptcy or operational failures.

In his analysis -despite potential data snooping bias-, high F-Score stocks scored an average return higher by 9.7% on average (over a 20 year period, with 12-month buy-and-hold). 



## Signal Analysis

This section involves analysis of your signal. I guide you through a series of programming tasks to complete. You will find sections labeled **Task** asking you to do each piece of analysis. Please make sure that you complete all of these tasks. Make use of the course resources and example code on the course website. It should be possible to complete all the requested tasks using information given below or somewhere on the course website.

Some tasks ask for a written response. You may write your response to the written question in the same cell that that question is asked in. **Please do not** delete the task question text. We need it to locate your answers to specific tasks when grading.

Load libraries.



In [None]:
#r "nuget: FSharp.Data"
#r "nuget: FSharp.Stats"
#r "nuget: Plotly.NET,2.0.0-preview.17"
#r "nuget: Plotly.NET.Interactive,2.0.0-preview.17"
#r "nuget: MathNet.Numerics"
#r "nuget: MathNet.Numerics.FSharp"

open MathNet.Numerics.Statistics
open System
open FSharp.Data
open FSharp.Stats
open Plotly.NET

Microsoft.DotNet.Interactive.InstallPackagesMessage


Error: input.fsx (1,6)-(1,13) typecheck error The namespace or module 'MathNet' is not defined. Maybe you want one of the following:
   Math

In [None]:
// Set dotnet interactive formatter to plaintext
Formatter.Register(fun (x:obj) (writer: TextWriter) -> fprintfn writer "%120A" x )
Formatter.SetPreferredMimeTypesFor(typeof<obj>, "text/plain")
// Make plotly graphs work with interactive plaintext formatter
Formatter.SetPreferredMimeTypesFor(typeof<GenericChart.GenericChart>,"text/html")


### First, make sure that you're referencing the correct files.

Here I'm assuming that you have a class folder with this `signal-exploration.ipynb` notebook and a `data` folder inside of it. The folder hierarchy would look like below where you
have the below files and folders accessible:

```code
/class
    signal-exploration.ipynb
    /data
        id_and_return_data.csv
        zero_trades_252d.csv
    
```

First, make sure that our working directory is the source file directory.



In [None]:
let [<Literal>] ResolutionFolder = __SOURCE_DIRECTORY__
Environment.CurrentDirectory <- ResolutionFolder


We assume the `id_and_return_data.csv` file and the signal csv file  are in the `data` folder. In this example the signal file is `zero_trades_252d.csv`. You should replace that file name with your signal file name.



In [None]:
let [<Literal>] IdAndReturnsFilePath = "data/id_and_return_data.csv"
let [<Literal>] MySignalFilePath = "data/f_score.csv"


If my paths are correct, then this code should read the first few lines of the files.
If it doesn't show the first few lines, fix the above file paths.



In [None]:
IO.File.ReadLines(IdAndReturnsFilePath) |> Seq.truncate 5


seq
  ["id(string),eom(date),source(string),sizeGrp(string),obsMain(string),exchMain(string),primarySec(bool),gvkey(string),iid(string),permno(int Option),permco(int Option),excntry(string),curcd(string),fx(string),common(bool),compTpci(string),crspShrcd(int Option),compExchg(string),crsp_exchcd(int Option),adjfct(float Option),shares(float Option),gics(int Option),sic(int Option),naics(int Option),ff49(int Option),ret(float Option),retExc(float Option),prc(float Option),marketEquity(float Option)";
   "crsp_86432,2000-01-31T00:00:00.0000000,CRSP,micro,1,1,true,115876,01,86432,16313,USA,USD,1,true,,11,,3,2,5.218,40101010,6020,522110,45,-0.003906,-0.00824925,15.9375,83.161875";
   "crsp_85640,2000-01-31T00:00:00.0000000,CRSP,small,1,1,true,002193,01,85640,20300,USA,USD,1,true,,11,,1,1,102.496,35102020,8051,623110,11,-0.157143,-0.161485863,3.6875,377.954";
   "crsp_86430,2000-01-31T00:00:00.0000000,CRSP,micro,1,1,true,115946,01,86430,16319,USA,USD,1,true,,11,,3,1,10.764,45103010,7372,511

In [None]:
IO.File.ReadLines(MySignalFilePath) |> Seq.truncate 5


seq
  ["id(string),eom(date),signal(float option)"; "comp_001034_01,2008-12-31T00:00:00.0000000,5";
   "comp_001043_01,2000-01-31T00:00:00.0000000,5"; "comp_001076_02,2010-12-31T00:00:00.0000000,"; ...]


Assuming the paths are defined correctly and you saw the first 5 rows above,
we can now read the data using the CSV provider that parses the fields in the file.

First define the Csv types from the sample files:



In [None]:
type IdAndReturnsType = 
    CsvProvider<Sample=IdAndReturnsFilePath,
                // The schema parameter is not required,
                // but I am using it to override some column types
                // to make filtering easier.
                // If I didn't do this these particular columns 
                // would have strings of "1" or "0", but explicit boolean is nicer.
                Schema="obsMain(string)->obsMain=bool,exchMain(string)->exchMain=bool",
                ResolutionFolder=ResolutionFolder>

type MySignalType = 
    CsvProvider<MySignalFilePath,
                ResolutionFolder=ResolutionFolder>


Now read in the data.



In [None]:
let idAndReturnsCsv = IdAndReturnsType.GetSample()

let mySignalCsv = MySignalType.GetSample()

Columns in the `idAndReturnsCsv` are:



In [None]:
idAndReturnsCsv.Headers


Some
  [|"id(string)"; "eom(date)"; "source(string)"; "sizeGrp(string)"; "obsMain(string)"; "exchMain(string)";
    "primarySec(bool)"; "gvkey(string)"; "iid(string)"; "permno(int Option)"; "permco(int Option)"; "excntry(string)";
    "curcd(string)"; "fx(string)"; "common(bool)"; "compTpci(string)"; "crspShrcd(int Option)"; "compExchg(string)";
    "crsp_exchcd(int Option)"; "adjfct(float Option)"; "shares(float Option)"; "gics(int Option)"; "sic(int Option)";
    "naics(int Option)"; "ff49(int Option)"; "ret(float Option)"; "retExc(float Option)"; "prc(float Option)";
    "marketEquity(float Option)"|]


Columns in the `mySignalCsv` are:



In [None]:
mySignalCsv.Headers


Some [|"id(string)"; "eom(date)"; "signal(float option)"|]


There are a lot of columns in the id and returns csv. You can look at the data documentation to figure out what they are.

Put the rows into a list (we're more familiar with lists).



In [None]:
let idAndReturnsRows = idAndReturnsCsv.Rows |> Seq.toList
let mySignalRows = mySignalCsv.Rows |> Seq.toList


### Distribution of unique stocks in the id and returns data

To get you started, I will walk you through some simple analysis of the id and returns data.

Count the total number of stocks.

First, look at a few ids



In [None]:
idAndReturnsRows
|> List.map (fun row -> row.Id)
|> List.truncate 5


["crsp_86432"; "crsp_85640"; "crsp_86430"; "crsp_85756"; "crsp_50278"]


Now count all of them.



In [None]:
idAndReturnsRows
|> List.map (fun row -> row.Id)
|> List.distinct
|> List.length


14000


Number of stocks each month.

First look at the date column



In [None]:
idAndReturnsRows
|> List.map (fun row -> row.Eom)
|> List.truncate 5


[1/31/2000 12:00:00 AM; 1/31/2000 12:00:00 AM; 1/31/2000 12:00:00 AM; 1/31/2000 12:00:00 AM; 1/31/2000 12:00:00 AM]


Group by month, then count per month.



In [None]:
let idAndReturnStocksPerMonth =
    let byMonth =
        idAndReturnsRows
        |> List.groupBy (fun row -> row.Eom)
        |> List.sortBy (fun (month, rows) -> month)
    [ for (month, rows) in byMonth do
        let nStocks = 
            rows
            |> List.map (fun row -> row.Id)
            |> List.distinct
            |> List.length
        month, nStocks ]


Look at a first few months.



In [None]:
idAndReturnStocksPerMonth
|> List.sortBy (fun (month, nStocks) -> month) 
|> List.truncate 5


[(1/31/2000 12:00:00 AM, 7076); (2/29/2000 12:00:00 AM, 7031); (3/31/2000 12:00:00 AM, 7021);
 (4/30/2000 12:00:00 AM, 7037); (5/31/2000 12:00:00 AM, 7053)]


Look at the last few.



In [None]:
idAndReturnStocksPerMonth
|> List.sortByDescending (fun (month, nStocks) -> month)
|> List.truncate 5


[(12/31/2020 12:00:00 AM, 4269); (11/30/2020 12:00:00 AM, 4216); (10/31/2020 12:00:00 AM, 4156);
 (9/30/2020 12:00:00 AM, 4129); (8/31/2020 12:00:00 AM, 4105)]


Create a column chart showing the number of stocks per month (Plotly.net column chart [docs](https://plotly.net/02_1_bar-and-column-charts.html)).



In [None]:
idAndReturnStocksPerMonth
|> Chart.Column


Add some lables to the axes (Plotly.net axis styling [docs](https://plotly.net/01_0_axis-styling.html)).



In [None]:
idAndReturnStocksPerMonth
|> List.sortBy (fun (month, nStocks) -> month)
|> Chart.Column
|> Chart.withXAxisStyle (TitleText="Month")
|> Chart.withYAxisStyle (TitleText="Number of Stocks")


We have some different size groups already assigned in the data:



In [None]:
idAndReturnsRows
|> List.countBy (fun row -> row.SizeGrp)


[("micro", 486881); ("small", 263335); ("nano", 197705); ("large", 173040); ("mega", 97879)]


Let's make a plot with separate bars for each group in 2015. You can read more about multiple charts in the Plotly.net [docs](https://plotly.net/01_2_multiple-charts.html).

We'll write a function. We need to give a type hint so that
it knows the type of the input data. If we didn't include the type hint, we'd get an error saying 'Lookup of indeterminate type ..' because it doesn't know the data type of the 'rows' input. The type hint the  `: list<IdAndReturnsType.Row>` part of the function definition.
This is saying we have a list of rows from the CsvProvider type that we defined earlier for this csv file data.



In [None]:
let countIdAndReturnsRows (rows: list<IdAndReturnsType.Row>) =
    let byMonth =
        rows
        |> List.groupBy (fun row -> row.Eom)
        |> List.sortBy (fun (month, rows) -> month)
    [ for (month, rows) in byMonth do
        let nStocks = 
            rows
            |> List.map (fun row -> row.Id)
            |> List.distinct
            |> List.length
        month, nStocks ]

Look at the function output. It is a list of tuples where each tuple is a pair of month (`DateTime`) and the count (`int`).



In [None]:
idAndReturnsRows
|> countIdAndReturnsRows
|> List.truncate 3


[(1/31/2000 12:00:00 AM, 7076); (2/29/2000 12:00:00 AM, 7031); (3/31/2000 12:00:00 AM, 7021)]


Just for large caps.



In [None]:
let stockCountsLarge =
    let toPlot = 
        idAndReturnsRows
        |> List.filter (fun row -> 
            row.SizeGrp = "large" && 
            row.Eom.Year = 2015)
        |> countIdAndReturnsRows
    Chart.Column(toPlot, Name = "Large caps")

stockCountsLarge


Just for small caps.



In [None]:
let stockCountsSmall =
    let toPlot = 
        idAndReturnsRows
        |> List.filter (fun row -> 
            row.SizeGrp = "small" &&
            row.Eom.Year = 2015)
        |> countIdAndReturnsRows
    Chart.Column(toPlot, Name = "Small caps")

stockCountsSmall


combined:



In [None]:
[ stockCountsLarge; stockCountsSmall ]
|> Chart.combine


Now all groups



In [None]:
let stockCountsAllSizes =
    idAndReturnsRows
    |> List.filter (fun row -> row.Eom.Year = 2015)
    |> List.groupBy (fun row -> row.SizeGrp)
    |> List.map (fun (sizeGrp, rows) -> 
        let toPlot = countIdAndReturnsRows rows
        sizeGrp, toPlot)

// first few observations of all size Groups
stockCountsAllSizes
|> List.map (fun (sizeGroup, xs) ->
    sizeGroup, xs |> List.truncate 3)


[("micro", [(1/31/2015 12:00:00 AM, 1583); (2/28/2015 12:00:00 AM, 1532); (3/31/2015 12:00:00 AM, 1509)]);
 ("mega", [(1/31/2015 12:00:00 AM, 410); (2/28/2015 12:00:00 AM, 412); (3/31/2015 12:00:00 AM, 410)]);
 ("small", [(1/31/2015 12:00:00 AM, 1050); (2/28/2015 12:00:00 AM, 1058); (3/31/2015 12:00:00 AM, 1031)]);
 ("large", [(1/31/2015 12:00:00 AM, 706); (2/28/2015 12:00:00 AM, 695); (3/31/2015 12:00:00 AM, 700)]);
 ("nano", [(1/31/2015 12:00:00 AM, 538); (2/28/2015 12:00:00 AM, 580); (3/31/2015 12:00:00 AM, 617)])]


A combined chart.



In [None]:
stockCountsAllSizes
|> List.map (fun (sizeGrp, toPlot) -> 
    Chart.Column(toPlot, Name = sizeGrp))
|> Chart.combine

(** Same, but stacking each chart on top of eachother. *)   

stockCountsAllSizes
|> List.map (fun (sizeGrp, toPlot) -> 
    Chart.Column(toPlot, Name = sizeGrp))
|> Chart.SingleStack()


You should now have some a good idea of how to work with this data.

### Distribution of unique stocks in the your signal data

Do similar analysis as above, but for the your signal data.

> **Task:** Complete this function. It takes a list of `MySignalType.Row` as input and should return a list of the month and the integer count of unique stock ids that month (`list<DateTime * int>`).
> 



In [None]:
let countMySignalRows (rows: list<MySignalType.Row>) =
    let byMonth =
        rows
        |> List.groupBy (fun row -> row.Eom)
        |> List.sortBy (fun (month, rows) -> month)
    [ for (month, rows) in byMonth do
        let nStocks = 
            rows
            |> List.map (fun row -> row.Id)
            |> List.distinct
            |> List.length
        month, nStocks ]


> **Task:** Create a column chart showing the number of stocks per month in your signal data csv file.
> 



In [None]:
let stockCountsPerMonth =
    let toPlot = 
        mySignalRows
        |> countMySignalRows
    Chart.Column(toPlot, Name = "Number of stocks per month")
    |> Chart.withXAxisStyle (TitleText="Month")
    |> Chart.withYAxisStyle (TitleText="Stocks")

stockCountsPerMonth


You may have some stocks with missing data. If you have some stocks with missing signal data, the below code will return the first 3 observations.
If you do not have missing data it will return an empty list.



In [None]:
mySignalRows
|> List.choose (fun row -> 
    // Choose the rows where row.Signal is None.
    match row.Signal with
    | None -> Some row
    | Some signal -> None )
|> List.truncate 3


[("comp_001076_02", 12/31/2010 12:00:00 AM, None); ("comp_002620_01", 11/30/2008 12:00:00 AM, None);
 ("comp_004424_01", 9/30/2000 12:00:00 AM, None)]


We can create a list that only contains stocks with non-missing signals. We define a record type to hold this data. The main change is making signal have `float` type instead of `Option<float>` because we're removing missing data.



In [None]:
type NonMissingSignal =
    {
        Id: string
        Eom: DateTime
        Signal: float
    }
    
let myNonMissingSignals =
    mySignalRows
    |> List.choose (fun row -> 
        match row.Signal with
        | None -> None
        | Some signal -> 
            Some { Id = row.Id; Eom = row.Eom; Signal = signal })


> **Task:** Complete this function. It takes a list of `NonMissingSignal` records as input and should return a list of the month and the integer count of unique stock ids that month (`list<DateTime * int>`).
> 



In [None]:
let countMyNonMissingSignalRows (rows: list<NonMissingSignal>) =
    let byMonth =
        rows
        |> List.groupBy (fun row -> row.Eom)
        |> List.sortBy (fun (month, rows) -> month)
    [ for (month, rows) in byMonth do
        let nStocks = 
            rows
            |> List.map (fun row -> row.Id)
            |> List.distinct
            |> List.length
        month, nStocks ]


> **Task:** Create a column chart showing the number of stocks per month in your signal data that **do not** have missing signals.
> 



In [None]:
let nonMissingStockCountsPerMonth =
    let toPlot = 
        myNonMissingSignals
        |> countMyNonMissingSignalRows
    Chart.Column(toPlot, Name = "Number of stocks per month")
    |> Chart.withXAxisStyle (TitleText="Month")
    |> Chart.withYAxisStyle (TitleText="Stocks with non missing data")

nonMissingStockCountsPerMonth

> **Task:** Create a column chart showing the number of stocks per month in your signal data that **do** have missing signals.
> 



In [None]:
type MissingSignal =
    {
        Id: string
        Eom: DateTime
        Signal: float
    }

let myMissingSignals =
    mySignalRows
    |> List.choose (fun row -> 
        match row.Signal with
        | None -> Some { Id = row.Id; Eom = row.Eom; Signal = 0.0 }
        | Some signal -> None)

let countMyMissingSignalRows (rows: list<MissingSignal>) =
    let byMonth =
        rows
        |> List.groupBy (fun row -> row.Eom)
        |> List.sortBy (fun (month, rows) -> month)
    [ for (month, rows) in byMonth do
        let nStocks = 
            rows
            |> List.map (fun row -> row.Id)
            |> List.distinct
            |> List.length
        month, nStocks ]

let missingStockCountsPerMonth =
    let toPlot = 
        myMissingSignals
        |> countMyMissingSignalRows
    Chart.Column(toPlot, Name = "Number of stocks per month")
    |> Chart.withXAxisStyle (TitleText="Month")
    |> Chart.withYAxisStyle (TitleText="Stocks with missing data")

missingStockCountsPerMonth

### Distribution of the signal

> **Task:** Compute the minimum, maximum, median, standard deviation, and average of the non-missing signals in your dataset.
> 



In [None]:
let maxMyNonMissingSignal =   
    myNonMissingSignals
        |> List.map (fun row -> row.Signal)
        |> Seq.max
let minMyNonMissingSignal =   
    myNonMissingSignals
        |> List.map (fun row -> row.Signal)
        |> Seq.min

let medianMyNonMissingSignal =   
    myNonMissingSignals
        |> List.map (fun row -> row.Signal)
        |> Seq.median

let stDevMyNonMissingSignal =   
    myNonMissingSignals
        |> List.map (fun row -> row.Signal)
        |> Seq.stDev

let averageMyNonMissingSignal =   
    myNonMissingSignals
        |> List.map (fun row -> row.Signal)
        |> Seq.average

In [None]:
printfn "Max is %f" maxMyNonMissingSignalRows
printfn "Min is %f" minMyNonMissingSignalRows
printfn "Median is %f" medianMyNonMissingSignalRows
printfn "Standard Deviation is %f" stDevMyNonMissingSignalRows
printfn "Average is %f" averageMyNonMissingSignalRows

Max is 9.000000
Min is 0.000000
Median is 5.000000
Standard Deviation is 1.720113
Average is 4.847088


It can also be useful to compute percentiles of the signal. You can calculate percentils using `FSharp.Stats` quantile module.



In [None]:
// 10th, 50th, and 90th percentiles
let pctlExamples = [0.1; 0.5; 0.9]

// you must have an array of values
let pctlExamplesData = 
    [ 10.0; -20.0; 0.1; -5.0; 7.0; 4.0]
    |> List.toArray


Compute the percentiles.



In [None]:
let pctlExamplesComputed =    
    [ for pctl in pctlExamples do
        Quantile.compute pctl pctlExamplesData ]
pctlExamplesComputed


[-20.0; 2.05; 10.0]


> **Task:** Compute the 1st, 10th, 50th, 90th, and 99th percentiles of the non-missing signals in your dataset. Once these percentiles are calculated them, assign the signals to the values below. Explain what you learn about the distribution. Is it uniformly distributed, a skewed distribution, are there outliers, etc.?
> 



In [None]:
let pctls = [0.01; 0.1; 0.5; 0.9; 0.99]

let pctlData = 
    myNonMissingSignals
    |> List.map(fun row -> row.Signal)
    |> List.toArray

let pctlComputed =    
    [ for pctl in pctls do
        Statistics.quantileFunc pctlData pctl]  

let signalP01: float = pctlComputed.[0] 
let signalP10: float = pctlComputed.[1]
let signalP50: float = pctlComputed.[2] 
let signalP90: float = pctlComputed.[3] 
let signalP99: float = pctlComputed.[4] 

pctlComputed

Error: input.fsx (10,9)-(10,19) typecheck error The value, namespace, type or module 'Statistics' is not defined. Maybe you want one of the following:
   stats

Looking at the percentiles, the distribution of the signals seems to be close to normal. As expected less than 1% of firms have a score of 0, and the same goes for a score of 9. The distribution does not seem skewed and there are no apparent outliers as the score is discrete.

> **Task:** Create a [histogram](https://plotly.net/04_0_histograms.html) showing the distribution of the signal in for all stocks in your dataset that have non-missing signals. Limit the data to 2015 to make it easier to plot. Explain what you learn about the distribution. Is it uniformly distributed, are there outliers, etc. How do you see this in the plot, and is there anything new that you learned relative to the percentiles?
> 



In [None]:
let myNonMissingSignalsHistogram =
    myNonMissingSignals
    |> List.filter (fun row -> row.Eom.Year = 2015)
    |> List.map(fun row -> row.Signal)
        |> Chart.Histogram

myNonMissingSignalsHistogram

**Answer**: The distribution seems to be bell shaped with a mean of around 4 (4.8 as previously seen) and a mode of 5, the distribution seems to be close to a normal distribution. The score that we are analyzing is discrete in nature meaning that there are no outliers. However, values of 0 and 9 are the least frequent.
This goes in line with what was discussed in the percentiles part, although we now can see that the are more scores below 5 than above 5.

[Winsorizing](https://en.wikipedia.org/wiki/Winsorizing) is a technique to remove the influence of outliers from a dataset. Let's create a winsorized version of your data.

Assuming that you have defined the percentile above correctly, this will create a winsorized version of your signal dataset. It is winsorized at the 1st and 99th percentiles.



In [None]:
let winsorizeSignals (signalOb: NonMissingSignal) =
    let newSignal =
        if signalOb.Signal < signalP01 then 
            signalP01
        elif signalOb.Signal > signalP99 then
            signalP99
        else
            signalOb.Signal
    // copy and update the observation with the
    // winsorized signal.
    { signalOb with Signal = newSignal }


Test on a random signal



In [None]:
winsorizeSignals myNonMissingSignals[99]


{ Id = "comp_001554_01"
  Eom = 9/30/2016 12:00:00 AM
  Signal = 6.0 }


do for all



In [None]:
let myWinsorizedSignals =
    myNonMissingSignals
    |> List.map winsorizeSignals


> **Task:** Create a [histogram](https://plotly.net/04_0_histograms.html) showing the distribution of the **winsorized signals** for all stocks in your dataset. Limit the data to 2015 to make it easier to plot. Explain what you learn about the distribution. Is it uniformly distributed, are there outliers, etc. How do you see this in the plot, and is there anything new that you learned relative to the percentiles and non-winsorized histogram?
> 



In [None]:
let myNonMissingSignalsHistogramWinsorized =
    myNonMissingSignals
    |> List.filter (fun row -> row.Eom.Year = 2015)
    |> List.map(fun row -> row.Signal)
        |> Chart.Histogram

myNonMissingSignalsHistogramWinsorized


**Answer**: Since the analyzed signal is a discrete score, winsorization does not seem to alter our dataset in any significant way.

> **Task:** Create a map collection called `byStockMonthIdAndReturnMap` where the key is a tuple of stock id as string and month as DateTime (`string * DateTime`) and the value is an `IdAndReturnsType.Row`.
> 

**Note:** I have added a type constraint of `: Map<(string * DateTime), IdAndReturnsType.Row>` to make sure that the type of the map is correct. If you fill in code below, you will get a type mismatch error until your code is correct. You don't generally need these type constraints, but I am putting it here to make the compiler check that you produce the output that I am asking for.

**Hint:** we did things like this in the momentum signal lecture. There's also a practice quiz on map collections.



In [None]:
let byStockMonthIdAndReturnMap: Map<string * DateTime, IdAndReturnsType.Row> =
    idAndReturnsRows
    |> List.map(fun x ->
        let ym = DateTime(x.Eom.Year, x.Eom.Month, 1)
        let key = id x.Id, ym
        key, x)
    |> Map

byStockMonthIdAndReturnMap


map
  [(("comp_001034_01", 12/1/2008 12:00:00 AM),
    ("comp_001034_01", 12/31/2008 12:00:00 AM, "COMPUSTAT", "large", true, true, true, "001034", "01", None, None, "USA",
     "USD", "1", true, "0", None, "11", None, Some 1.0, Some 41.882, Some 35202010, None, None, None, Some 0.024931,
     Some 0.0249017502, Some 36.94, Some 1547.12108));
   (("comp_001043_01", 1/1/2000 12:00:00 AM),
    ("comp_001043_01", 1/31/2000 12:00:00 AM, "COMPUSTAT", "nano", true, true, true, "001043", "01", None, None, "USA",
     "USD", "1", true, "0", None, "12", None, Some 1.0, Some 2.848, None, None, None, None, Some 0.111111,
     Some 0.1067681111, Some 1.25, Some 3.56));
   (("comp_001076_02", 12/1/2010 12:00:00 AM),
    ("comp_001076_02", 12/31/2010 12:00:00 AM, "COMPUSTAT", "small", true, true, true, "001076", "02", None, None, "USA",
     "USD", "1", true, "0", None, "11", None, Some 1.0, Some 69.428, Some 25504060, Some 7359, Some 532299, Some 34,
     Some 0.034569, Some 0.0344291383, Some 20.6

> **Task:** Create a [histogram](https://plotly.net/04_0_histograms.html) showing the distribution of the **winsorized signals** for only **small-cap stocks** in your dataset. Limit the data to 2015 to make it easier to plot.
> 

**Hint:** if you have a stock and it's signal in a particular month, the `byStockMonthIdAndReturnMap` is useful for looking up thinks about the stock that month.)



In [None]:
let myWinsorizedSignalsMapped =
    myWinsorizedSignals
    |> List.map (fun x ->
        let s = x.Signal
        let ym = DateTime(x.Eom.Year, x.Eom.Month, 1)
        let key = id x.Id, ym
        key, s)
    |> Map

let idAndReturnsSmall2015 =
    idAndReturnsRows
    |> List.filter (fun row -> row.SizeGrp = "small" && row.Eom.Year = 2015)
    |> List.map (fun x ->
        let ym = DateTime(x.Eom.Year, x.Eom.Month, 1)
        let key = id x.Id, ym
        key, x)
    |> Map

let small2015Histogram =
    [for rtrn in idAndReturnsSmall2015.Keys do Map.tryFind rtrn myWinsorizedSignalsMapped]
    |> List.choose id
    |> Chart.Histogram

small2015Histogram

> **Task:** Create a [histogram](https://plotly.net/04_0_histograms.html) showing the distribution of the **winsorized signals** for only **large-cap stocks** in your dataset. Limit the data to 2015 to make it easier to plot.
> 



In [None]:
let idAndReturnsLarge2015 =
    idAndReturnsRows
    |> List.filter (fun row -> row.SizeGrp = "large" && row.Eom.Year = 2015)
    |> List.map (fun x ->
        let ym = DateTime(x.Eom.Year, x.Eom.Month, 1)
        let key = id x.Id, ym
        key, x)
    |> Map

let large2015Histogram =
    [for rtrn in idAndReturnsLarge2015.Keys do Map.tryFind rtrn myWinsorizedSignalsMapped]
    |> List.choose id
    |> Chart.Histogram

large2015Histogram

> **Task:** Compare and contrast the histograms for the **small-cap** and **large-cap** stocks. Are there any differences? If we wanted to sort stocks based on the signal, do you think that we would end up with stocks that have different average sizes in the low and high signal portfolios?

> **Answer:** Both histograms are fairly similar in the 5-8 score range. However, in the 0-4 range, the small cap stocks seem to have a higher frequency.
This could be explained by the fact that large companies have more capable management and operations which would allow them to have avoid mistakes that small cap stock management might do.

### Towards portfolios.

> **Task:** Using your winsorized list of signals, group your stocks by month. Assign this result to a value named `byStockMonthSignals` that is a list of `DateTime * list<NonMissingSignal>` tuples. The first thing in the tuple is the month and the second thing is a list of `NonMissingSignal` records for all stocks in that month.
> 



In [None]:
let byStockMonthSignals: list<DateTime * list<NonMissingSignal>> =
    myWinsorizedSignals
    |> List.groupBy(fun x -> x.Eom)


Now assuming `byStockMonthSignals` is correct, we'll sort the stocks each month from smallest to largest based on the signal that month. Then split the stocks into 3 equal-sized portfolios (aka terciles) based on the sorted signal. We'll create a `SortedPort` record for each portfolio and assign the list to a value named `terciles`.



In [None]:
type SortedPort =
    { Portfolio: int
      Eom: DateTime
      Stocks: list<NonMissingSignal> }

let terciles =
    byStockMonthSignals
    |> List.collect (fun (eom, signals) ->
        let sortedSignals =
            signals
            |> List.sortBy (fun signalOb -> signalOb.Signal)
            |> List.splitInto 3
        sortedSignals
        |> List.mapi (fun i p -> 
            { Portfolio = i + 1
              Eom = eom
              Stocks = p }))


look at the first portfolio



In [None]:
terciles[0]


{ Portfolio = 1
  Eom = 12/31/2008 12:00:00 AM
  Stocks =
   [{ Id = "crsp_10092"
      Eom = 12/31/2008 12:00:00 AM
      Signal = 1.0 }; { Id = "crsp_10180"
                        Eom = 12/31/2008 12:00:00 AM
                        Signal = 1.0 }; { Id = "crsp_10371"
                                          Eom = 12/31/2008 12:00:00 AM
                                          Signal = 1.0 }; { Id = "crsp_10779"
                                                            Eom = 12/31/2008 12:00:00 AM
                                                            Signal = 1.0 }; { Id = "crsp_11161"
                                                                              Eom = 12/31/2008 12:00:00 AM
                                                                              Signal = 1.0 };
    { Id = "crsp_15203"
      Eom = 12/31/2008 12:00:00 AM
      Signal = 1.0 }; { Id = "crsp_18033"
                        Eom = 12/31/2008 12:00:00 AM
                        Signal = 1.0 };

look at the last portfolio



In [None]:
terciles |> List.last


{ Portfolio = 3
  Eom = 4/30/2009 12:00:00 AM
  Stocks =
   [{ Id = "crsp_86097"
      Eom = 4/30/2009 12:00:00 AM
      Signal = 5.0 }; { Id = "crsp_86122"
                        Eom = 4/30/2009 12:00:00 AM
                        Signal = 5.0 }; { Id = "crsp_86211"
                                          Eom = 4/30/2009 12:00:00 AM
                                          Signal = 5.0 }; { Id = "crsp_86218"
                                                            Eom = 4/30/2009 12:00:00 AM
                                                            Signal = 5.0 }; { Id = "crsp_86322"
                                                                              Eom = 4/30/2009 12:00:00 AM
                                                                              Signal = 5.0 };
    { Id = "crsp_86339"
      Eom = 4/30/2009 12:00:00 AM
      Signal = 5.0 }; { Id = "crsp_86474"
                        Eom = 4/30/2009 12:00:00 AM
                        Signal = 5.0 }; { Id = 

> **Task:** Using `terciles`, compute the average signal in each tercile portfolio each month. Plot a combined (`Chart.combine`) line chart (`Chart.line`) showing the average signal for each tercile portfolio from the start to the end of the sample. What do you learn? Is the average signal in each tercile constant throughout the sample, or does it vary over time?
> 

> **Task:** Using `byStockMonthSignals`, sort the stocks each month from smallest to largest based on the signal that month. Then split the stocks into 5 equal-sized portfolios (aka quintiles) based on the sorted signal. Create a `SortedPort` record for each portfolio and assign the list to a value named `quintiles`.
> 



In [None]:
let portfolioTerciles =
    terciles
    |> List.groupBy (fun row -> row.Portfolio)

let portfolioTercile1 = 
    snd portfolioTerciles[0]
    |> List.map (fun x -> x.Stocks)

let averageTercile1 = 
    [for x in portfolioTercile1 do
        let m =
            x
            |> List.map (fun row -> row.Eom)
            |> List.distinct
        let average =
            x
            |> List.map (fun row -> row.Signal)
            |> List.average
        m[0], average]
    |> List.sort

let portfolioTercile2 = 
    snd portfolioTerciles[1]
    |> List.map (fun x -> x.Stocks)

let averageTercile2 = 
    [for x in portfolioTercile2 do
        let m =
            x
            |> List.map (fun row -> row.Eom)
            |> List.distinct
        let average =
            x
            |> List.map (fun row -> row.Signal)
            |> List.average
        m[0], average]
    |> List.sort

let portfolioTercile3 = 
    snd portfolioTerciles[2]
    |> List.map (fun x -> x.Stocks)

let averageTercile3 = 
    [for x in portfolioTercile3 do
        let m =
            x
            |> List.map (fun row -> row.Eom)
            |> List.distinct
        let average =
            x
            |> List.map (fun row -> row.Signal)
            |> List.average
        m[0], average]
    |> List.sort

Chart.combine (
    [ Chart.Line(averageTercile1, Name="1st Tercile")
      Chart.Line(averageTercile2, Name="2nd Tercile")
      Chart.Line(averageTercile3, Name="3rd Tercile")])

**Task:** There seems to be little variation in signals between terciles, they do alter by a maximum of 1 score unit. However, due to the discrete nature of this signal variations do not show us much.
Scores have shown lower values in the period after the dot-com bubble and the 2008 crisis.

In [None]:
let quintiles: list<SortedPort> =
    byStockMonthSignals
    |> List.collect (fun (eom, signals) ->
        let sortedSignals =
            signals
            |> List.sortBy (fun signalOb -> signalOb.Signal)
            |> List.splitInto 5
        sortedSignals
        |> List.mapi (fun i p ->
            { Portfolio = i + 1
              Eom = eom
              Stocks = p }))

> **Task:** Filter `quintiles` to the quintile portfolio of stocks each month that has the lowest signal value. This should be stocks where `SortedPort.Portfolio = 1`. Assign the filtered list to a value named `bottomQuintile`.
> 



In [None]:
let bottomQuintile: list<SortedPort> =
    quintiles
    |> List.filter (fun sortedPort -> sortedPort.Portfolio = 1)


> **Task:** Create a list named `bottomQuintileReturn` that contains the return of the bottom quintile portfolio each month. The portfolio return for a given month should be calculated using equal weights on every stock in the portfolio that month. The result should be given as a list of `SortedPortfolioReturn` records. **Additionally,** the month of the return should be lagged one month relative to the portfolio formation month. That means that if you formed a portfolio based on a signal known as of the end of February 2022 (Eom = DateTime(2022,02,28)), the portfolio return during the first month that you hold it will be calculated using stock returns during March 2022 (MonthOfReturn = DateTime(2022,03,31)).
> 

Quick example getting end of month additon:



In [None]:
let endOfFebruary = DateTime(2022,02,28)

let addOneEom (eom: DateTime) =
    DateTime(eom.Year, eom.Month, 1).AddMonths(2).AddDays(-1.0)

addOneEom endOfFebruary

3/31/2022 12:00:00 AM


That will give you the end of March. So in summary, if the signal that you use to form portfolios comes from February 2022 (signal EOM = DateTime(2022,2,28)), make sure that you get returns from March 2022 (return EOM = DateTime(2022,3,31)).



In [None]:
type SortedPortfolioReturn =
    { 
        Portfolio: int
        MonthOfReturn: DateTime
        AvgReturn: float
    }

let bottomQuintileReturn: list<SortedPortfolioReturn> =
    let datesAndReturns = 
        bottomQuintile |> List.groupBy (fun x -> x.Eom) |> List.map (fun(date, row) ->
        let rDate = addOneEom date
        [for signal in row do
            let loop =
                [for stock in signal.Stocks do
                    let stockTicker = stock.Id
                    let n = [for stockTicker in signal.Stocks do stockTicker] |> List.length
                    let row =
                        match Map.tryFind (stockTicker, rDate) byStockMonthIdAndReturnMap with 
                        | Some x ->
                            match x.Ret with
                            | Some x -> x / float n
                        | None -> 0
                    date, row]
            let totalR =
                loop |> List.sumBy (fun (x,y) -> y)
            date, totalR])

    let unwrap = [ for m in datesAndReturns do m[0]]
    [for datesAndReturns in unwrap do
        { Portfolio = 1
          MonthOfReturn = fst datesAndReturns
          AvgReturn = snd datesAndReturns}]
    |> List.sort

bottomQuintileReturn

[{ Portfolio = 1
   MonthOfReturn = 1/31/2000 12:00:00 AM
   AvgReturn = 0.0 }; { Portfolio = 1
                        MonthOfReturn = 2/29/2000 12:00:00 AM
                        AvgReturn = 0.0 }; { Portfolio = 1
                                             MonthOfReturn = 3/31/2000 12:00:00 AM
                                             AvgReturn = 0.0 }; { Portfolio = 1
                                                                  MonthOfReturn = 4/30/2000 12:00:00 AM
                                                                  AvgReturn = 0.0 };
 { Portfolio = 1
   MonthOfReturn = 5/31/2000 12:00:00 AM
   AvgReturn = 0.0 }; { Portfolio = 1
                        MonthOfReturn = 6/30/2000 12:00:00 AM
                        AvgReturn = 0.0 }; { Portfolio = 1
                                             MonthOfReturn = 7/31/2000 12:00:00 AM
                                             AvgReturn = 0.0 }; { Portfolio = 1
                                                   

> **Task:** Plot a line chart of the cumulative return of the bottom quintile portfolio during the sample. For reference you will find the [plotting returns](https://nhirschey.github.io/Teaching/Momentum-Class.html#Plotting-returns) section of the momentum class lecture useful. It provides an example of calculating a portfolio's cumulative returns using `List.scan`.
> 



In [None]:
let cumulativeReturns =
    let h::t = [for i in bottomQuintileReturn do
                {i with AvgReturn = log(1. + i.AvgReturn)}]
    (h, t)
    ||> List.scan (fun sndReturn fstReturn ->
        { fstReturn with AvgReturn = fstReturn.AvgReturn + sndReturn.AvgReturn })
    
[for c in cumulativeReturns do
    c.MonthOfReturn, c.AvgReturn]
|> Chart.Line