# United States Housing Starts m/m

Housing Starts m/m reflect changes in the number of new residential construction projects, which started in the reported month, compared to the previous month.

Data for the indicator are obtained from a survey of developers and regulatory authorities. The sample size covers about 95% of all residential construction projects in the country. The data are adjusted for weather and season variations. The indicator is published in the second decade of each month, as part of the general national construction report. The report also includes construction permits and completed projects.

The volume of new housing starts is rarely interpreted in absolute terms: construction is highly dependent on weather conditions, geographic location in the region, and time of the year. That is why analysts normally measure the indicator change for several months. During the indicator interpretation, economists pay attention to the following reference points.

An increase in demand for new homes points to the growth of the population welfare.
An increase in new housing constructions leads to an increase of employment in the construction industry.
An increase in demand for new homes may lead to an increased demand for other products needed to new home buyers, such as new furniture, appliances, etc. This may spur consumer activity and affect price indices.
The indicator growth may lead to an increase in the real estate market.
Taking into account the above point, higher readings of housing starts may have a positive impact on the US dollar quotes.
                        
source:
- https://www.mql5.com/en/economic-calendar/united-states/housing-starts-mm

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Set-Up" data-toc-modified-id="Set-Up-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Set Up</a></span></li><li><span><a href="#Read-Data" data-toc-modified-id="Read-Data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Read Data</a></span><ul class="toc-item"><li><span><a href="#Sample-the-data" data-toc-modified-id="Sample-the-data-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Sample the data</a></span></li></ul></li><li><span><a href="#Choose-the-KEY-value-column" data-toc-modified-id="Choose-the-KEY-value-column-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Choose the KEY value column</a></span><ul class="toc-item"><li><span><a href="#Draw-Plots-of-Original-Data" data-toc-modified-id="Draw-Plots-of-Original-Data-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Draw Plots of Original Data</a></span></li></ul></li><li><span><a href="#Extract-right-columns" data-toc-modified-id="Extract-right-columns-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Extract right columns</a></span></li><li><span><a href="#Rata-Die" data-toc-modified-id="Rata-Die-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Rata Die</a></span></li><li><span><a href="#Quantize-the-values" data-toc-modified-id="Quantize-the-values-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Quantize the values</a></span></li><li><span><a href="#Fill-in-Empty-Dates" data-toc-modified-id="Fill-in-Empty-Dates-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Fill in Empty Dates</a></span><ul class="toc-item"><li><span><a href="#Fill:-Sort" data-toc-modified-id="Fill:-Sort-7.1"><span class="toc-item-num">7.1&nbsp;&nbsp;</span>Fill: Sort</a></span></li><li><span><a href="#Fill:-Draw-Plots" data-toc-modified-id="Fill:-Draw-Plots-7.2"><span class="toc-item-num">7.2&nbsp;&nbsp;</span>Fill: Draw Plots</a></span></li></ul></li><li><span><a href="#Averages" data-toc-modified-id="Averages-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Averages</a></span><ul class="toc-item"><li><span><a href="#Averages:-Draw-Plots" data-toc-modified-id="Averages:-Draw-Plots-8.1"><span class="toc-item-num">8.1&nbsp;&nbsp;</span>Averages: Draw Plots</a></span></li><li><span><a href="#Insert-averages-to-DataFrame" data-toc-modified-id="Insert-averages-to-DataFrame-8.2"><span class="toc-item-num">8.2&nbsp;&nbsp;</span>Insert averages to DataFrame</a></span></li><li><span><a href="#Position:-Draw-Plots" data-toc-modified-id="Position:-Draw-Plots-8.3"><span class="toc-item-num">8.3&nbsp;&nbsp;</span>Position: Draw Plots</a></span></li></ul></li><li><span><a href="#Save-DataFrame-to-CSV-file" data-toc-modified-id="Save-DataFrame-to-CSV-file-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Save DataFrame to CSV file</a></span><ul class="toc-item"><li><span><a href="#Save:-Describe-before-saving" data-toc-modified-id="Save:-Describe-before-saving-9.1"><span class="toc-item-num">9.1&nbsp;&nbsp;</span>Save: Describe before saving</a></span></li><li><span><a href="#Write-as-CSV-file" data-toc-modified-id="Write-as-CSV-file-9.2"><span class="toc-item-num">9.2&nbsp;&nbsp;</span>Write as CSV file</a></span></li></ul></li></ul></div>

## Set Up

In [6]:
dataset_file_name = "united-states.housing-starts-mm.csv"
path_data_original = "../Data/original/"
date_original_format = "yyyy.mm.dd"
original_value_column = 3
last_position_change = "2020-09-30"

include("../Julia/functions.jl") 
println()





## Read Data

In [7]:
## show available datasets
#data = available_datasets() # uncomment to see all available datasets

# Read DataFrame from the CSV file.
df = fetch_dataset(dataset_file_name, date_original_format , path_data_original )

println()

Fetched and sorted by date ../Data/original/united-states.housing-starts-mm.csv, record count 59



### Sample the data

In [9]:
using Statistics
describe(df)

Unnamed: 0_level_0,variable,mean,min,median,max,nunique,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Union…,Union…,Type
1,Date,,2016-01-12,,2020-11-18,59.0,,Date
2,ActualValue,0.261017,-30.2,-0.3,25.5,,,Float64
3,ForecastValue,6.60698,-30.7,1.1,245.5,,16.0,"Union{Missing, Float64}"
4,PreviousValue,0.406897,-26.4,-0.35,27.4,,1.0,"Union{Missing, Float64}"


## Choose the KEY value column
- look at the Plot below
- go back to the top "Setup" section and choose which column should be the "Orignal value"
- re-run the notebook

### Draw Plots of Original Data

In [11]:
using Plots

columns = print_colunms(df)
record_count = size(df)[1]
rows = 1:record_count
dates = format_dates( df[rows,2] , "m/d/yy")

gr()
plot(          dates, # x-axis: dates
               [  df[rows,2]    ], # y-axis
    label    = [  columns[2]   "" ]  ,
    legend   =:topleft, 
              # :right, :left, :top, :bottom, :inside, :best, :legend, :topright, :topleft, :bottomleft, :bottomright
    xlabel   = "time",
    ylabel   = "indicators",
    size     = (980, 400), # width, height
    layout = (1, 1) # number of graphs: vertically, horizontally
    )

1 Date
2 ActualValue
3 ForecastValue
4 PreviousValue


UndefRefError: UndefRefError: access to undefined reference

## Extract right columns

In [None]:
using DataFrames
df = DataFrame( Day      = df[:,1],                     # 1 
                Date     = df[:,2],                     # 2 
                Value    = df[:,original_value_column], # 3 
                Original = df[:,original_value_column]  # 4 
               )

columns = preview_data(df)
println()

## Rata Die

In [None]:
record_count = size(df)[1]
col_ind = 1
insertcols!(df, col_ind, :Rata_Die => zeros(Int64, record_count); makeunique = true )

update_rata_die!(df, 1, 2)
#first(df, 6)
println("Inserted Rata Die")

## Quantize the values

- Quantization is a process of noramalizing the data
- I have decided to normalize the data for Int8 as I might try use Google Coral NPU
- minimum = -128.0
- maximum = 127.0

In [None]:
data_original = df[:, 4] # keep original for display comparison later

quantize_column!(df, 3)

columns = preview_data(df, 12)
println()

In [None]:
using Plots

count = size(df)[1]
rows = 1:count
dates = format_dates( df[rows,2] , "m/d/yy")

gr()
plot(          dates, # x-axis: dates
               [  df[rows,original_value_column]    ], # y-axis
    label    = [  columns[original_value_column] ""   ]  ,
    legend   =:topleft, 
              # :right, :left, :top, :bottom, :inside, :best, :legend, :topright, :topleft, :bottomleft, :bottomright
    xlabel   = "time",
    ylabel   = "indicators",
    size     = (980, 400), # width, height
    layout = (1, 1) # number of graphs: vertically, horizontally
    )

## Fill in Empty Dates

In [None]:
populate_missing_dates!(df)

### Fill: Sort

In [None]:
df = sort(df, [:Day]);
count = size(df)[1]
first(df, 8)
# columns = preview_data(df)
# println()

### Fill: Draw Plots
- if the indicator is updated only preiodically (bi-weekly, monthly, quarterly) the the graph will appear blocky

In [None]:
using Plots
count = size(df)[1]
rows = 1:count
dates = format_dates( df[rows,2] , "m/d/yy")

gr()
plot(          dates, # x-axis: dates
               [ df[rows,original_value_column]    ], # y-axis
    label    = [ columns[original_value_column]  ""],
    legend   =:topleft, 
              # :right, :left, :top, :bottom, :inside, :best, :legend, :topright, :topleft, :bottomleft, :bottomright
    xlabel   = "time",
    ylabel   = "indicators",
    size     = (980, 400), # width, height
    layout = (1, 1) # number of graphs: vertically, horizontally
    )

## Averages

In [None]:
column_to_average = 3
averages005 = calculate_average(df, 5,   column_to_average )
averages030 = calculate_average(df, 30,  column_to_average )
averages090 = calculate_average(df, 90,  column_to_average )
averages180 = calculate_average(df, 180, column_to_average )
averages365 = calculate_average(df, 365, column_to_average )
println()

### Averages: Draw Plots

In [None]:
using Plots

columns = names(df)
count = size(df)[1]
days_back = 365*1
rows = 1:count # count-days_back:count
dates = format_dates( df[rows,2] , "m/d/yy")

gr()
plot(          dates, # x-axis: dates
               [ df[rows,3]      ], # y-axis
    label    = [ columns[3]    ""],
    legend   =:topleft, 
              # :right, :left, :top, :bottom, :inside, :best, :legend, :topright, :topleft, :bottomleft, :bottomright
    xlabel   = "time",
    ylabel   = "indicators",
    size     = (980, 400), # width, height
    layout = (1, 1) # number of graphs: vertically, horizontally
    )

### Insert averages to DataFrame

- if the frequency of data is e.g. 30 days, averages below 30 days do not add value

In [None]:
#insertcols!(df, 5,  :Avg005   => averages005  , makeunique=true)
#insertcols!(df, 6,  :Avg030   => averages030  , makeunique=true)
insertcols!(df,  5,  :Avg090   => averages090  , makeunique=true)
insertcols!(df,  6,  :Avg180   => averages180  , makeunique=true)
insertcols!(df,  7,  :Avg365   => averages365  , makeunique=true)

using Statistics
describe(df)

### Position: Draw Plots

In [None]:
using Plots
count = size(df)[1]
days_back = 365
rows = 1:count
dates = format_dates( df[rows,2] , "m/d/yy")

gr()
plot(   dates, # x-axis: dates
        [  position_column df[rows,3] df[rows,5] df[rows,6] df[rows,7]   ], # y-axis
    label    = 
        [ "position"       columns[3] columns[5] df[rows,6] df[rows,7] ""],
    legend   =:topleft, 
              # :right, :left, :top, :bottom, :inside, :best, :legend, :topright, :topleft, :bottomleft, :bottomright
    xlabel   = "time",
    ylabel   = "indicators",
    size     = (980, 400), # width, height
    layout = (1, 1) # number of graphs: vertically, horizontally
    )

[back to top](#Table-of-Contents)
<hr/>

## Save DataFrame to CSV file

### Save: Describe before saving

In [None]:
using Statistics
describe(df)

### Write as CSV file

In [None]:
save_dataset(df, dataset_file_name, "../Data/processed/" );
println("US_Housing_Starts_mm finished and saved to ", dataset_file_name)