## Data
Being able to easily load and process data is a crucial task that can make any data science more pleasant. In this notebook, we will cover most common types often encountered in data science tasks, and we will be using this data throughout the rest of this tutorial.

In [3]:
#using BenchmarkTools
using DataFrames
#using DelimitedFiles
#using CSV
#using XLSX
#using Downloads

┌ Info: Precompiling DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0]
└ @ Base loading.jl:1423
[91m[1mERROR: [22m[39mLoadError: ArgumentError: Package Tables does not have OrderedCollections in its dependencies:
- If you have Tables checked out for development and have
  added OrderedCollections as a dependency but haven't updated your primary
  environment's manifest file, try `Pkg.resolve()`.
- Otherwise you may need to report an issue with Tables
Stacktrace:
 [1] [0m[1mrequire[22m[0m[1m([22m[90minto[39m::[0mModule, [90mmod[39m::[0mSymbol[0m[1m)[22m
[90m   @ [39m[90mBase[39m [90m.\[39m[90m[4mloading.jl:980[24m[39m
 [2] [0m[1minclude[22m
[90m   @ [39m[90m.\[39m[90m[4mBase.jl:418[24m[39m[90m [inlined][39m
 [3] [0m[1minclude_package_for_output[22m[0m[1m([22m[90mpkg[39m::[0mBase.PkgId, [90minput[39m::[0mString, [90mdepot_path[39m::[0mVector[90m{String}[39m, [90mdl_load_path[39m::[0mVector[90m{String}[39m, [90mload_path[39

LoadError: Failed to precompile DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0] to C:\Users\PC\.julia\compiled\v1.7\DataFrames\jl_EC0E.tmp.

# 🗃️ Get some data
In Julia, it's pretty easy to dowload a file from the web using the `download` function. But also, you can use your favorite command line commad to download files by easily switching from Julia via the `;` key. Let's try both.

Note: `download` depends on external tools such as curl, wget or fetch. So you must have one of these.

In [2]:
P = Downloads.download("https://raw.githubusercontent.com/nassarhuda/easy_data/master/programming_languages.csv",
    "programming_languages.csv")

"programming_languages.csv"

Another way would be to use a shell command to get the same file.

# 📂 Read your data from text files.
The key question here is to load data from files such as `csv` files, `xlsx` files, or just raw text files. We will go over some Julia packages that will allow us to read such files very easily.

Let's start with the package `DelimitedFiles` which is in the standard library.

In [2]:
#=
readdlm(source, 
    delim::AbstractChar, 
    T::Type, 
    eol::AbstractChar; 
    header=false, 
    skipstart=0, 
    skipblanks=true, 
    use_mmap, 
    quotes=true, 
    dims, 
    comments=false, 
    comment_char='#')
=#
P,H = readdlm("programming_languages.csv",',';header=true);

In [3]:
dictionary1 = Dict("a"=>5, "b"=>3)
DF1 = DataFrame(H[1] => P[:,1],
                H[2] => P[:,2])
DF1[1:10,:]
DF2 = DataFrame(dictionary1)

Unnamed: 0_level_0,a,b
Unnamed: 0_level_1,Int64,Int64
1,5,3


In [4]:
# To write to a text file, you can:
writedlm("programminglanguages_dlm.txt", P, '-')
KK = readdlm("programminglanguages_dlm.txt")


73×3 Matrix{Any}:
 "1951-Regional"          "Assembly"  "Language"
 "1952-Autocode"          ""          ""
 "1954-IPL"               ""          ""
 "1955-\"FLOW-MATIC\""    ""          ""
 "1957-FORTRAN"           ""          ""
 "1957-COMTRAN"           ""          ""
 "1958-LISP"              ""          ""
 "1958-ALGOL"           58            ""
 "1959-FACT"              ""          ""
 "1959-COBOL"             ""          ""
 "1959-RPG"               ""          ""
 "1962-APL"               ""          ""
 "1962-Simula"            ""          ""
 ⋮                                    
 "2003-Scala"             ""          ""
 "2005-F#"                ""          ""
 "2006-PowerShell"        ""          ""
 "2007-Clojure"           ""          ""
 "2009-Go"                ""          ""
 "2010-Rust"              ""          ""
 "2011-Dart"              ""          ""
 "2011-Kotlin"            ""          ""
 "2011-Red"               ""          ""
 "2011-Elixir"            ""     

A more powerful package to use here is the `CSV` package. By default, the CSV package imports the data to a DataFrame, which can have several advantages as we will see below.

In general,[`CSV.jl`](https://juliadata.github.io/CSV.jl/stable/) is the recommended way to load CSVs in Julia. Only use `DelimitedFiles` when you have a more complicated file where you want to specify several things.

In [5]:
C = CSV.read("programming_languages.csv", DataFrame);

In [7]:
@show typeof(C)
C[1:10,:]
display(C.language) #[!,:year]

typeof(C) = DataFrame


73-element Vector{String31}:
 "Regional Assembly Language"
 "Autocode"
 "IPL"
 "FLOW-MATIC"
 "FORTRAN"
 "COMTRAN"
 "LISP"
 "ALGOL 58"
 "FACT"
 "COBOL"
 "RPG"
 "APL"
 "Simula"
 ⋮
 "Scala"
 "F#"
 "PowerShell"
 "Clojure"
 "Go"
 "Rust"
 "Dart"
 "Kotlin"
 "Red"
 "Elixir"
 "Julia"
 "Swift"

In [8]:
@show typeof(P)
P[1:10,:]

typeof(P) = Matrix{Any}


10×2 Matrix{Any}:
 1951  "Regional Assembly Language"
 1952  "Autocode"
 1954  "IPL"
 1955  "FLOW-MATIC"
 1957  "FORTRAN"
 1957  "COMTRAN"
 1958  "LISP"
 1958  "ALGOL 58"
 1959  "FACT"
 1959  "COBOL"

In [9]:
names(C)
names(DF2)

2-element Vector{String}:
 "a"
 "b"

In [10]:
names(C)
C.year
C.language
describe(C)

Unnamed: 0_level_0,variable,mean,min,median,max,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Int64,DataType
1,year,1982.99,1951,1986.0,2014,0,Int64
2,language,,ALGOL 58,,dBase III,0,String31


In [11]:
@btime P,H = readdlm("programming_languages.csv",',';header=true);
@btime C = CSV.read("programming_languages.csv", DataFrame);

  360.700 μs (2438 allocations: 111.08 KiB)
  426.000 μs (258 allocations: 46.23 KiB)


In [12]:
# To write to a *.csv file using the CSV package
CSV.write("programminglanguages_CSV.csv", DataFrame(P, :auto))
CSV.write("ab.csv", DataFrame(DF2))

"ab.csv"

Another type of files that we may often need to read is `XLSX` files. Let's try to read a new file.

In [6]:
T = XLSX.readdata("data/zillow_data_download_april2020.xlsx", #file name
    "Sale_counts_city", #sheet name
    "A1:F9" #cell range
    )
DF3 = DataFrame(T, :auto)


Unnamed: 0_level_0,x1,x2,x3,x4,x5,x6
Unnamed: 0_level_1,Any,Any,Any,Any,Any,Any
1,RegionID,RegionName,StateName,SizeRank,2008-03,2008-04
2,6181,New York,New York,1,missing,missing
3,12447,Los Angeles,California,2,1446,1705
4,39051,Houston,Texas,3,2926,3121
5,17426,Chicago,Illinois,4,2910,3022
6,6915,San Antonio,Texas,5,1479,1529
7,13271,Philadelphia,Pennsylvania,6,1609,1795
8,40326,Phoenix,Arizona,7,1310,1519
9,18959,Las Vegas,Nevada,8,1618,1856


If you don't want to specify cell ranges... though this will take a little longer...

In [8]:
G = XLSX.readtable("data/zillow_data_download_april2020.xlsx","Sale_counts_city");


Here, `G` is a tuple of two items. The first is an vector of vectors where each vector corresponds to a column in the excel file. And the second is the header with the column names.

In [9]:
DF4 = DataFrame(G[1],G[2])
#@show typeof(G)
describe(DF4)

Unnamed: 0_level_0,variable,mean,min,median,max,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Int64,DataType
1,RegionID,67674.4,3299,33866.0,760887,0,Any
2,RegionName,,Aaronsburg,,Zwolle,0,Any
3,StateName,,Alabama,,Wyoming,0,Any
4,SizeRank,14380.0,1,14380.0,28759,0,Any
5,2008-03,7.35599,0,1.0,2926,598,Any
6,2008-04,8.17862,0,1.0,3121,598,Any
7,2008-05,8.69831,0,1.0,3220,598,Any
8,2008-06,9.19296,0,1.0,3405,597,Any
9,2008-07,9.44435,0,1.0,3464,594,Any
10,2008-08,8.73809,0,1.0,3371,589,Any


In [16]:
G[1][1][1:10]

10-element Vector{Any}:
  6181
 12447
 39051
 17426
  6915
 13271
 40326
 18959
 54296
 38128

In [40]:
G[2][1:10]
G[2][:]

148-element Vector{Symbol}:
 :RegionID
 :RegionName
 :StateName
 :SizeRank
 Symbol("2008-03")
 Symbol("2008-04")
 Symbol("2008-05")
 Symbol("2008-06")
 Symbol("2008-07")
 Symbol("2008-08")
 Symbol("2008-09")
 Symbol("2008-10")
 Symbol("2008-11")
 ⋮
 Symbol("2019-03")
 Symbol("2019-04")
 Symbol("2019-05")
 Symbol("2019-06")
 Symbol("2019-07")
 Symbol("2019-08")
 Symbol("2019-09")
 Symbol("2019-10")
 Symbol("2019-11")
 Symbol("2019-12")
 Symbol("2020-01")
 Symbol("2020-02")

And we can easily store this data in a DataFrame. `DataFrame(G...)` uses the "splat" operator to _unwrap_ these arrays and pass them to the DataFrame constructor.

In [10]:
D = DataFrame(G...) # equivalent to DataFrame(G[1],G[2])

Unnamed: 0_level_0,RegionID,RegionName,StateName,SizeRank,2008-03,2008-04,2008-05
Unnamed: 0_level_1,Any,Any,Any,Any,Any,Any,Any
1,6181,New York,New York,1,missing,missing,missing
2,12447,Los Angeles,California,2,1446,1705,1795
3,39051,Houston,Texas,3,2926,3121,3220
4,17426,Chicago,Illinois,4,2910,3022,2937
5,6915,San Antonio,Texas,5,1479,1529,1582
6,13271,Philadelphia,Pennsylvania,6,1609,1795,1709
7,40326,Phoenix,Arizona,7,1310,1519,1654
8,18959,Las Vegas,Nevada,8,1618,1856,1961
9,54296,San Diego,California,9,772,1057,1195
10,38128,Dallas,Texas,10,1158,1232,1240


In [11]:
foods = ["apple", "cucumber", "tomato", "banana"]
calories = [105,47,22,105]
prices = [0.85,1.6,0.8,0.6,]
colors = ["red", "green", "red", "yellow"]
dataframe_calories = DataFrame(item=foods,calories=calories)
dataframe_prices = DataFrame(item=foods,price=prices)
dataframe_color = DataFrame(item=foods,color=colors)
#println(dataframe_calories,dataframe_prices,dataframe_color)

Unnamed: 0_level_0,item,color
Unnamed: 0_level_1,String,String
1,apple,red
2,cucumber,green
3,tomato,red
4,banana,yellow


In [12]:
DF = innerjoin(dataframe_calories,dataframe_prices, dataframe_color, on=:item) #Concatenate missing dataframes

Unnamed: 0_level_0,item,calories,price,color
Unnamed: 0_level_1,String,Int64,Float64,String
1,apple,105,0.85,red
2,cucumber,47,1.6,green
3,tomato,22,0.8,red
4,banana,105,0.6,yellow


In [13]:
# we can also use the DataFrame constructor on a Matrix
DataFrame(T, :auto)

Unnamed: 0_level_0,x1,x2,x3,x4,x5,x6
Unnamed: 0_level_1,Any,Any,Any,Any,Any,Any
1,RegionID,RegionName,StateName,SizeRank,2008-03,2008-04
2,6181,New York,New York,1,missing,missing
3,12447,Los Angeles,California,2,1446,1705
4,39051,Houston,Texas,3,2926,3121
5,17426,Chicago,Illinois,4,2910,3022
6,6915,San Antonio,Texas,5,1479,1529
7,13271,Philadelphia,Pennsylvania,6,1609,1795
8,40326,Phoenix,Arizona,7,1310,1519
9,18959,Las Vegas,Nevada,8,1618,1856


You can also easily write data to an XLSX file

In [53]:
# if you already have a dataframe: 
# XLSX.writetable("filename.xlsx", collect(DataFrames.eachcol(df)), DataFrames.names(df))
XLSX.writetable("writefile_using_XLSX.xlsx",G[1],G[2])

In [58]:
XLSX.writetable("Items1.xlsx",DF)


In [57]:
DF_items = XLSX.readtable("Items.xlsx")

LoadError: MethodError: no method matching readtable(::String)
[0mClosest candidates are:
[0m  readtable(::AbstractString, [91m::Union{Int64, AbstractString}[39m; first_row, column_labels, header, infer_eltypes, stop_in_empty_row, stop_in_row_function, enable_cache) at C:\Users\PC\.julia\packages\XLSX\E1Mu6\src\read.jl:576
[0m  readtable(::AbstractString, [91m::Union{Int64, AbstractString}[39m, [91m::Union{XLSX.ColumnRange, AbstractString}[39m; first_row, column_labels, header, infer_eltypes, stop_in_empty_row, stop_in_row_function, enable_cache) at C:\Users\PC\.julia\packages\XLSX\E1Mu6\src\read.jl:583

## ⬇️ Importing your data

Often, the data you want to import is not stored in plain text, and you might want to import different kinds of types. Here we will go over importing `jld`, `npz`, `rda`, and `mat` files. Hopefully, these four will capture the types from four common programming languages used in Data Science (Julia, Python, R, Matlab).

We will use a toy example here of a very small matrix. But the same syntax will hold for bigger files.

```
4×5 Array{Int64,2}:
 2  1446  1705  1795  1890
 3  2926  3121  3220  3405
 4  2910  3022  2937  3224
 5  1479  1529  1582  1761
 ```

In [23]:
using JLD
jld_data = JLD.load("data/mytempdata.jld")
save("mywrite.jld", "A", jld_data)

In [24]:
using NPZ
npz_data = npzread("data/mytempdata.npz")
npzwrite("mywrite.npz", npz_data)

In [25]:
using RData
R_data = RData.load("data/mytempdata.rda")
# We'll need RCall to save here. https://github.com/JuliaData/RData.jl/issues/56
using RCall
@rput R_data
R"save(R_data, file=\"mywrite.rda\")"

RObject{NilSxp}
NULL


In [26]:
using MAT
Matlab_data = matread("data/mytempdata.mat")
matwrite("mywrite.mat",Matlab_data)

In [27]:
@show typeof(jld_data)
@show typeof(npz_data)
@show typeof(R_data)
@show typeof(Matlab_data)
;

typeof(jld_data) = Dict{String, Any}
typeof(npz_data) = Matrix{Int64}
typeof(R_data) = Dict{String, Any}
typeof(Matlab_data) = Dict{String, Any}


In [28]:
Matlab_data

Dict{String, Any} with 1 entry:
  "tempdata" => [2 1446 … 1795 1890; 3 2926 … 3220 3405; 4 2910 … 2937 3224; 5 …

# 🔢 Time to process the data from Julia
We will mainly cover `Matrix` (or `Vector`), `DataFrame`s, and `dict`s (or dictionaries). Let's bring back our programming languages dataset and start playing it the matrix it's stored in.

In [63]:
P
#dictionary1 = Dict("a"=>5, "b"=>3)
dictionary2 = Dict("year"=>P[:,1], "language"=>P[:,2])
#sintax to create a DF with a matrix is DataFrame(Matrix, :auto)
P_df = DataFrame(dictionary2)


Unnamed: 0_level_0,language,year
Unnamed: 0_level_1,Any,Any
1,Regional Assembly Language,1951
2,Autocode,1952
3,IPL,1954
4,FLOW-MATIC,1955
5,FORTRAN,1957
6,COMTRAN,1957
7,LISP,1958
8,ALGOL 58,1958
9,FACT,1959
10,COBOL,1959


In [18]:
DATA[:,2]

73-element Vector{Any}:
 "Regional Assembly Language"
 "Autocode"
 "IPL"
 "FLOW-MATIC"
 "FORTRAN"
 "COMTRAN"
 "LISP"
 "ALGOL 58"
 "FACT"
 "COBOL"
 "RPG"
 "APL"
 "Simula"
 ⋮
 "Scala"
 "F#"
 "PowerShell"
 "Clojure"
 "Go"
 "Rust"
 "Dart"
 "Kotlin"
 "Red"
 "Elixir"
 "Julia"
 "Swift"

Here are some quick questions we might want to ask about this simple data.
- Which year was was a given language invented?
- How many languages were created in a given year?

In [12]:
# Q1: Which year was was a given language invented?
function year_created(P,language::String)
    loc = findfirst(P[:,2] .== language)
    return P[loc,1]
end
year_created(P,"Julia")

function languaje_by_year(DATA, year::Int64) 
    loc = findfirst(DATA[:,1] .== year) #I did it as a matrix but there is another sintaxis for dataframes
    return P[loc,2]
end
languaje_by_year(P,1962) #this function returns the first languaje made in that year if there are more than one

LoadError: type Array has no field year

In [31]:
year_created(P,"W")

LoadError: ArgumentError: invalid index: nothing of type Nothing

In [32]:
function year_created_handle_error(P,language::String)
    loc = findfirst(P[:,2] .== language)
    !isnothing(loc) && return P[loc,1]
    error("Error: Language not found.")
end
year_created_handle_error(P,"W")

LoadError: Error: Language not found.

In [33]:
# Q2: How many languages were created in a given year?
function how_many_per_year(P,year::Int64)
    year_count = length(findall(P[:,1].==year))
    return year_count
end
how_many_per_year(P,2011)

4

Now let's try to store this data in a DataFrame...

In [43]:
#P_df = DATA #DataFrame(year = P[:,1], language = P[:,2]) # or DataFrame(P)

In [42]:
# Even better, since we know the types of each column, we can create the DataFrame as follows:
# P_df = DataFrame(year = Int.(P[:,1]), language = string.(P[:,2]))


And now let's answer the same questions we just answered...

In [33]:
# Q1: Which year was was a given language invented?
# it's a little more intuitive and you don't need to remember the column ids
#function year_created(P_df,language::String)
#    loc = findfirst(P_df.language .== language)
#    return P_df.year[loc]
#end
#year_created(P_df,"Julia")

function languages_by_year(DATA, year::Int64)
    loc = findall(DATA.year .== year)
    return DATA.language[loc]
end
languages_by_year(DATA, 1962)
#this is the sintaxis for DataFrames

3-element Vector{Any}:
 "APL"
 "Simula"
 "SNOBOL"

In [37]:
year_created(P_df,"W")

LoadError: ArgumentError: invalid index: nothing of type Nothing

In [38]:
function year_created_handle_error(P_df,language::String)
    loc = findfirst(P_df.language .== language)
    !isnothing(loc) && return P_df.year[loc]
    error("Error: Language not found.")
end
year_created_handle_error(P_df,"W")

LoadError: Error: Language not found.

In [39]:
# Q2: How many languages were created in a given year?
function how_many_per_year(P_df,year::Int64)
    year_count = length(findall(P_df.year.==year))
    return year_count
end
how_many_per_year(P_df,2011)

4

Next, we'll use dictionaries. A quick way to create a dictionary is with the `Dict()` command. But this creates a dictionary without types. Here, we will specify the types of this dictionary.

In [40]:
# A quick example to show how to build a dictionary
Dict([("A", 1), ("B", 2),(1,[1,2])])

Dict{Any, Any} with 3 entries:
  "B" => 2
  "A" => 1
  1   => [1, 2]

In [49]:
P_dictionary = Dict{Integer,Vector{String}}()
#dictionary2 = Dict{Integer, Vector{String}}()
#dictionary2 = Dict("year"=>P[:,1], "language"=>P[:,2])
#DATA = DataFrame(dictionary2)

Dict{Integer, Vector{String}}()

In [53]:
P_dictionary[67] = ["julia","programming"]

2-element Vector{String}:
 "julia"
 "programming"

In [43]:
# this is not going to work.
P_dictionary["julia"] = 7

LoadError: MethodError: [0mCannot `convert` an object of type [92mString[39m[0m to an object of type [91mInteger[39m
[0mClosest candidates are:
[0m  convert(::Type{T}, [91m::Base.TwicePrecision[39m) where T<:Number at twiceprecision.jl:250
[0m  convert(::Type{T}, [91m::AbstractChar[39m) where T<:Number at char.jl:180
[0m  convert(::Type{T}, [91m::CartesianIndex{1}[39m) where T<:Number at multidimensional.jl:136
[0m  ...

Now, let's populate the dictionary with years as keys and vectors that hold all the programming languages created in each year as their values. Even though this looks like more work, we often need to do it just once.

In [59]:
dict = Dict{Integer,Vector{String}}()
for i = 1:size(P,1)
    year,lang = P[i,:]
    if year in keys(dict)
        dict[year] = push!(dict[year],lang) 
        # note that push! is not our favorite thing to do in Julia, 
        # but we're focusing on correctness rather than speed here
    else
        dict[year] = [lang]
    end
end
dict

Dict{Integer, Vector{String}} with 45 entries:
  1985 => ["Eiffel"]
  2002 => ["Scratch"]
  1952 => ["Autocode"]
  1963 => ["CPL"]
  1964 => ["Speakeasy", "BASIC", "PL/I"]
  1967 => ["BCPL"]
  2001 => ["C#", "D"]
  1991 => ["Python", "Visual Basic"]
  1957 => ["FORTRAN", "COMTRAN"]
  1988 => ["Tcl", "Wolfram Language "]
  1955 => ["FLOW-MATIC"]
  1951 => ["Regional Assembly Language"]
  1994 => ["CLOS "]
  2011 => ["Dart", "Kotlin", "Red", "Elixir"]
  1959 => ["FACT", "COBOL", "RPG"]
  1962 => ["APL", "Simula", "SNOBOL"]
  2005 => ["F#"]
  1969 => ["B"]
  1972 => ["C", "Smalltalk", "Prolog"]
  1997 => ["Rebol"]
  1986 => ["Objective-C", "LabVIEW ", "Erlang"]
  1993 => ["Lua", "R"]
  1958 => ["LISP", "ALGOL 58"]
  1987 => ["Perl"]
  1954 => ["IPL"]
  ⋮    => ⋮

In [66]:
# Though a smarter way to do this is:
curyear = P_df.year[1]
P_dictionary[curyear] = [P_df.language[1]]
for (i,nextyear) in enumerate(P_df.year[2:end])
    if nextyear == curyear
        #same key
        P_dictionary[curyear] = push!(P_dictionary[curyear],P_df.language[i+1])
        # note that push! is not our favorite thing to do in Julia, 
        # but we're focusing on correctness rather than speed here
    else
        curyear = nextyear
        P_dictionary[curyear] = [P_df.language[i+1]]
    end
end



In [68]:
length(keys(P_dictionary))

46

In [69]:
length(unique(P[:,1]))

45-element Vector{Any}:
 1951
 1952
 1954
 1955
 1957
 1958
 1959
 1962
 1963
 1964
 1966
 1967
 1968
    ⋮
 2000
 2001
 2002
 2003
 2005
 2006
 2007
 2009
 2010
 2011
 2012
 2014

In [72]:
# Q1: Which year was was a given language invented?
# now instead of looking in one long vector, we will look in many small vectors
function year_created(P_dictionary,language::String)
    keys_vec = collect(keys(P_dictionary))
    lookup = map(keyid -> findfirst(P_dictionary[keyid].==language),keys_vec)
    # now the lookup vector has `nothing` or a numeric value. We want to find the index of the numeric value.
    return keys_vec[findfirst((!isnothing).(lookup))]
end
year_created(P_dictionary,"Julia")

2012

In [49]:
# Q2: How many languages were created in a given year?
how_many_per_year(P_dictionary,year::Int64) = length(P_dictionary[year])
how_many_per_year(P_dictionary,2011)

4

# 📝 A note about missing data

In [76]:
# assume there were missing values in our dataframe
P[1,1] = missing
P_df = DataFrame(year = P[:,1], language = P[:,2])


Unnamed: 0_level_0,year,language
Unnamed: 0_level_1,Any,Any
1,missing,Regional Assembly Language
2,1952,Autocode
3,1954,IPL
4,1955,FLOW-MATIC
5,1957,FORTRAN
6,1957,COMTRAN
7,1958,LISP
8,1958,ALGOL 58
9,1959,FACT
10,1959,COBOL


In [51]:
dropmissing(P_df)

Unnamed: 0_level_0,year,language
Unnamed: 0_level_1,Any,Any
1,1952,Autocode
2,1954,IPL
3,1955,FLOW-MATIC
4,1957,FORTRAN
5,1957,COMTRAN
6,1958,LISP
7,1958,ALGOL 58
8,1959,FACT
9,1959,COBOL
10,1959,RPG


# Finally...
After finishing this notebook, you should be able to:
- [ ] dowload a data file from the web given a url
- [ ] load data from a file from a text file via DelimitedFiles or CSV
- [ ] write your data to a text file or csv file
- [ ] load data from file types xlsx, jld, npz, mat, rda
- [ ] write your data to an xlsx file, jld, npz, mat, rda
- [ ] store data in a 2D array (`Matrix`), or `DataFrame` or `Dict`
- [ ] write functions to perform basic lookups on `Matrix`, `DataFrame`, and `Dict` types
- [ ] use some of the basic functions on `DataFrame`s such as: `dropmissing`, `describe`, `by`, and `join`

# 🥳 One cool finding

Julia was created in 2012