# Map Reduce vs. For Loops in Julia
In this notebook I'm doing some exploring on different language features or testing performance for different approaches. When I'm using data it will likely be from the FBI's National Incident Based Reporting System (NIBRS) as it is a data set I've gotten familiar with. It is very interesting data too!

## For loops vs. map & reduce in Julia
I was currious about the performance difference between map & reduce vs. for loops. My example comes from the FBI Crime Data Explorer (CDE) data, where one has multiple years of data for any particular state and all of the downloaded ZIP files contain the same file names. It is, of course, easy to fix. We'll just download the data into several different folders for each year and load all of the CSV files into a JuliaDB table. Effectively doing what would be a union of all the CSV files for any given table. If you want more information about FBI CDE data you can [check them out here](https://crime-data-explorer.fr.cloud.gov/).

In this test example I've got a list of unique file names and a range of ~ 20 years that I'll be downloading data for. I want to combine both lists into a single list that is effectively the equivalent of a `full join` in SQL. I'm also very curious about the performance difference between various methods in Julia.

In [74]:
versioninfo()

Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, broadwell)


In [70]:
years = [2000:2020;]
files = ["f1.csv", "f2.csv"]

2-element Array{String,1}:
 "f1.csv"
 "f2.csv"

Just calling `map` twice gives me all the combinations that I want but I want it in a single array so that it can be
fed to **JuliaDB** via the `loadtable` method.

In [71]:
map(file -> 
        map(year -> string(year, "/", file), years)
        , files)

2-element Array{Array{String,1},1}:
 ["2000/f1.csv", "2001/f1.csv", "2002/f1.csv", "2003/f1.csv", "2004/f1.csv", "2005/f1.csv", "2006/f1.csv", "2007/f1.csv", "2008/f1.csv", "2009/f1.csv"  …  "2011/f1.csv", "2012/f1.csv", "2013/f1.csv", "2014/f1.csv", "2015/f1.csv", "2016/f1.csv", "2017/f1.csv", "2018/f1.csv", "2019/f1.csv", "2020/f1.csv"]
 ["2000/f2.csv", "2001/f2.csv", "2002/f2.csv", "2003/f2.csv", "2004/f2.csv", "2005/f2.csv", "2006/f2.csv", "2007/f2.csv", "2008/f2.csv", "2009/f2.csv"  …  "2011/f2.csv", "2012/f2.csv", "2013/f2.csv", "2014/f2.csv", "2015/f2.csv", "2016/f2.csv", "2017/f2.csv", "2018/f2.csv", "2019/f2.csv", "2020/f2.csv"]

What I need to do is reduce this array of arrays into a single array.

In [72]:
reduce(append!,
    map(file -> 
        map(year -> string(year, "/", file), years)
        , files)
    )

42-element Array{String,1}:
 "2000/f1.csv"
 "2001/f1.csv"
 "2002/f1.csv"
 "2003/f1.csv"
 "2004/f1.csv"
 "2005/f1.csv"
 "2006/f1.csv"
 "2007/f1.csv"
 "2008/f1.csv"
 "2009/f1.csv"
 "2010/f1.csv"
 "2011/f1.csv"
 "2012/f1.csv"
 ⋮
 "2009/f2.csv"
 "2010/f2.csv"
 "2011/f2.csv"
 "2012/f2.csv"
 "2013/f2.csv"
 "2014/f2.csv"
 "2015/f2.csv"
 "2016/f2.csv"
 "2017/f2.csv"
 "2018/f2.csv"
 "2019/f2.csv"
 "2020/f2.csv"

### Now let's compare the performance of map,map,reduce to multiple for loops necessary to accomplish the same thing
I was definitely surprised at how much faster and how reliably faster the `map` `reduce` is compared to the `for` `loop` solution. 

In [67]:
using BenchmarkTools
@benchmark reduce(append!,
                map(file -> 
                        map(year -> string(year, "/", file), years)
                    , files)
            )

BenchmarkTools.Trial: 
  memory estimate:  12.25 KiB
  allocs estimate:  263
  --------------
  minimum time:     10.789 μs (0.00% GC)
  median time:      11.929 μs (0.00% GC)
  mean time:        12.300 μs (0.00% GC)
  maximum time:     371.833 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

In [68]:
combos = []
@benchmark for year in years
                for file in files
                    append!(combos, string(year, "/", file))
                end
            end

BenchmarkTools.Trial: 
  memory estimate:  12.80 KiB
  allocs estimate:  294
  --------------
  minimum time:     25.565 μs (0.00% GC)
  median time:      28.345 μs (0.00% GC)
  mean time:        69.208 μs (0.00% GC)
  maximum time:     400.271 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1