# Environment setup for data frames tutorial

## Bogumił Kamiński

Welcome to DataFrames.jl introduction!

This set of Jupyter notebooks is intended to give you an overwiew of what functionality DataFrames.jl has based on practical examples.

You can find reviews of functionality of DataFrames.jl (not as exercises as this tutorial but task-type oriented) in the following locations:
* an official manual at https://juliadata.github.io/DataFrames.jl/stable/
* a tutorial going through all functionalities of DataFrames.jl at https://github.com/bkamins/Julia-DataFrames-Tutorial

We also assume that you have a basic knowledge of the Julia language and the Julia ecosystem. There are great tutorials on this topic in [JuliaAcademy](https://juliaacademy.com/), so I encourage you to check them out.

As this is a hands-on tutorial you can expect that the examples will be implemented in a way as I would write them when doing actual project.

The notebooks were prepared under Julia 1.5.3 and tested under Julia 1.6.1. If you have a different version of Julia installed change the kernel in *Kernel/Change kernel* option in menu (assuming you are on a Julia 1.x all examples should work without a problem).

In [None]:
VERSION

Jupyter Notebook automatically activates project environment if it is found in the working directory.

So first let us check if we have Project.toml and Manifest.toml files present (they should be present if you cloned the repository of this tutorial).

In [None]:
isfile.(["Project.toml", "Manifest.toml"])

You should get `1` printed (meaning `true`) in both entries of a vector.

Now we are sure that you are going to use exactly the same versions of the packages that I use when running this tutorial.

Let us check what packages (and in what versions) we will use.

In [None]:
] status

These notebooks should work with DataFrames versions 0.22 and 1.2.

If checking the status of the packages gives a warning that some of the packages are not downloaded run the `instantiate` instruction from the following line.

In [None]:
] instantiate

<div class="alert alert-block alert-info">
    <p><b>PyPlot.jl configuration:</b></p>
    <p>In some environments automatic installation of PyPlot.jl might fail. If you encounter this ussue please refer to <a href="https://github.com/JuliaPy/PyPlot.jl#installation">the PyPlot.jl installation instructions</a>. </p>
</div>

In particular typically executing the following commands:

```
using Pkg
ENV["PYTHON"]=""
Pkg.build("PyCall")
```

should resolve the PyPlot.jl installation issues. However, on OS X sometimes more configuration steps are required. You can find the detailed instructions [here](https://github.com/JuliaPy/PyPlot.jl#os-x).

As you see we will use the following packages:

Package | Description
:-|:-
DataFrames.jl | a core package that is a subject of this tutorial; it is used for data manipulation; we use version 0.21.0 of this package
CSV.jl | a package for reading/writing of CSV files
FreqTables.jl | a very useful package for creating frequency tables
GLM.jl | a package for fitting Generalized Linear Models (as no data science tutorial would be complete without building some predictive model)
PyPlot.jl | a package for plotting; there are many options in the Julia ecosystem to choose from; in this tutorial we use PyPlot.jl as it is based on Matplotlib so if you have experience with the Python data science technology stack it should be familiar
Pipe.jl | a package that makes chaining of operations super powerful (which is something you probably know from `%>%` in R)
Arrow.jl | a package for working with data in Apache Arrow format
Unitful.jl | a package for working with physical units (like kg, cm, ...)