# Missing Values


## Load Packages and Extra Functions

In [1]:
using Printf

include("src/printmat.jl");

# NaN and missing

The `NaN` (Not-a-Number) can be used to indicate that a floating point number (like 2.1) is missing or otherwise strange. For other types of data (for instance, 2), use a `missing` instead.

Most computations involving NaNs/missings give `NaN` or `missing` as a result.

In [2]:
println(2.0 + NaN + missing)

missing


## Loading Data

When your data (loaded from a csv file, say) has special values for missing data points (for instance, `-999.99`), then you can simply replace those values with `NaN`.  This works since `NaN` is a Float64 value, so you can change an existing array of `Float64`s to `NaN`. More generally, use `missing`.

(See the tutorial on loading and saving data for more information.)

In [3]:
data = [1.0 -999.99;
        2.0 12.0;
        3.0 13.0]

z  = replace(data,-999.99=>NaN)    #replace -999.99 by NaN or missing
z2 = replace(data,-999.99=>missing)
printblue("z and z2: ")
printmat(z)
printmat(z2)

[34m[1mz and z2: [22m[39m
     1.000       NaN
     2.000    12.000
     3.000    13.000

     1.000   missing
     2.000    12.000
     3.000    13.000



# Testing for NaNs/missings in an Array

You can test whether a number is `NaN` or missing by using `isunordered()`. (Use `isnan()` or `ismissing()` if you want to test specifically for one of them.) 

In [4]:
z = [1.0     NaN;
     2.0     12.0;
     missing 13.0;
     4       14   ]

if any(isunordered,z)                  #check if any NaNs/missins
  println("z has some NaNs/missings")  #can also do any(isunordered.(z))
end

printblue("\nThe sum of each column: ")
printmat(sum(z,dims=1))

z has some NaNs/missings

[34m[1mThe sum of each column: [22m[39m
   missing       NaN



# Disregarding NaNs/missings in a Vector

can often be done by just `!filter()` the vector to get rid of all elements that are NaN/missing.

In [5]:
z1 = z[:,1]             #the first column of `z`

sum(filter(!isunordered,z1))    #finds all elements in z1 that are not unordered, and sums them

7.0

# Getting Rid of Matrix Rows with any NaNs/missings

It is a common procedure in statistics to throw out all cases with NaNs/missing values. For instance, let `z` be a matrix and `z[t,:]` the data for period $t$  which contains one or more `NaN/missing` values. It is then common (for instance, in linear regressions) to throw out that entire row of the matrix.

This is a reasonable approach if it can be argued that the fact that the data is missing is random - and not related to the subject of the investigation. It is much less reasonable if, for instance, the returns for all poorly performing mutual funds are listed as "missing" - and you want to study what fund characteristics that drive performance.

The code below shows a simple way of how to through out all rows of a matrix with at least one `NaN/missing`.

For statistical computations, you may also consider the [NaNStatistics.jl](https://github.com/brenhinkeller/NaNStatistics.jl) package. 

In [6]:
printblue("z:")
printmat(z)

vb = any(isunordered,z,dims=2) #indicates rows with NaNs/missings, gives a Tx1 matrix
vc = .!vec(vb)                 #indicates rows without NaNs/missings, vec to make a vector

z2 = z[vc,:]           #keep only rows without NaNs/missings
printblue("z2: a new matrix where all rows with any NaNs/missings have been pruned:")
printmat(z2)

[34m[1mz:[22m[39m
     1.000       NaN
     2.000    12.000
   missing    13.000
     4.000    14.000

[34m[1mz2: a new matrix where all rows with any NaNs/missings have been pruned:[22m[39m
     2.000    12.000
     4.000    14.000



## Converting to a Standard Array

Once you have pruned all rows with missings, you may want to convert the matrix to, for instance, Float64. This might simplify some of the later code. Notice that if there were no missings (just NaNs), then no conversion is needed.

In [7]:
println("The type of z2 is ", typeof(z2))

z3 = convert.(Float64,z2)            #could also use `disallowmissing()` from the `Missings.jl` package
println("\nThe type of z3 is ", typeof(z3))

The type of z2 is Matrix{Union{Missing, Float64}}

The type of z3 is Matrix{Float64}
