## Reading and writing files

#### Working with CSV and text files

Let's load a file called `programminglanguages.csv` with `readcsv`. It's already in our current working directory:

In [None]:
;ls

With the shell command `head`, we can see the first few lines of this file

In [None]:
;head -10 programminglanguages.csv

As the first input to `readcsv`, we'll pass the name of the CSV file you want to read. Optionally, you can specify if you want to also return a header that includes the first row/line of the CSV file.

In [None]:
data, header = readcsv("programminglanguages.csv", header = true)

In [None]:
header

The data from this file lists programming languages and the years in which they were created.

Rather than working with strictly CSV files, we could use different delimiters with the functions `readdlm`  or `writedlm` (`readcsv` and `writecsv` are just instances of these). <br>

Let's write this same data to a file with a different delimiter.

In [None]:
writedlm("programminglanguages.txt", data, '-')

Now we can check to see if this worked with `;ls` and `;head`

In [None]:
;ls

In [None]:
;head -10 programminglanguages.txt

We've successfully rewritten this file so that years and languages are separated by dashes rather than commas.

#### Storing matrices

If we want to write a matrix to a file, we can use the `MAT` package

In [None]:
using MAT

Create matrices to store in an output file:

In [None]:
mult_table = [i * j for i in 1:5, j in 1:5]

In [None]:
add_table = [i + j for i in 1:5, j in 1:5]

Create an output file where you can write/store these matrices:

In [None]:
storematrices = matopen("storematrices.mat", "w")

and now write these matrices with the `write` command.

We can even specify the names of the matrices we are writing as the second input to `write` so that we can load individual matrices from the file!

In [None]:
write(storematrices, "mult_table", mult_table)

In [None]:
write(storematrices, "add_table", add_table)

In [None]:
close(storematrices)

Now we can open the file "storematrices.mat" -

In [None]:
;ls

In [None]:
loadmatrices = matopen("storematrices.mat")

What were the names of the matrices stored in that file?

In [None]:
names(loadmatrices)

We can grab either now that we know their names.

In [None]:
read(loadmatrices, "add_table")

In [None]:
close(loadmatrices)

### Loading data in as a DataFrame

When working with larger data sets, you may find the `DataFrames` package helpful. Here we'll load in a file called "houses.csv" with the `DataFrames` `readtable` function.

In [None]:
using DataFrames

In [None]:
houses = readtable("houses.csv")

This dataset gives us information about housing prices, locations, sizes, etc. for 985 houses. We can see this data is stored in a `DataFrame`:

In [None]:
typeof(houses)

We can grab columns of this `DataFrame` by using symbol versions of the column names. For example, indexing with `:price` will allow us to grab the `price` column from `houses:

In [None]:
prices = houses[:price]

In [None]:
square_footage = houses[:sq__ft]

Let's try plotting the prices of these houses against their square footage:

In [None]:
using Plots; gr()

scatter(square_footage, prices)
ylabel!("price")
xlabel!("size of house")

We can see that price roughly goes up with size, but we also see there are a bunch of houses that have "0 square feet). 

It turns out those are the fields for which we lack data. Let's filter those out and choose to only consider houses that have more than 0 square feet.

In [None]:
filtered_houses = houses[houses[:sq__ft] .> 0, :]

Now let's grab the prices and sizes of the houses in this new `DataFrame` called `filtered_houses`:

In [None]:
prices = filtered_houses[:price]
sizes = filtered_houses[:sq__ft]

scatter(sizes, prices)
xlabel!("Sizes of houses")
ylabel!("Prices of houses")

That looks much better!

Let's put these two `DataArrays` into a new `DataFrame` and then write that new `DataFrame` to an output file.

First we can create a new `DataFrame` with the `DataFrame` function.

When we run 

```julia
myhouses = DataFrame(price = prices, square_footage = sizes)
```

we are creating a `DataFrame` with two columns. The data comes from the `DataArrays` called `prices` and `sizes`, and we give those columns the names `price` and `square_footage`.

In [None]:
myhouses = DataFrame(square_footage = sizes, price = prices)

Now that we've created this `DataFrame`, we can see write it to an output file.

In [None]:
outfile = writetable("myhousedata.csv", myhouses)

Let's check to see if this worked

In [None]:
;ls

In [None]:
;head -10 myhousedata.csv