Before running this, please make sure to activate and instantiate the
tutorial-specific package environment, using this
[`Project.toml`](https://raw.githubusercontent.com/juliaai/DataScienceTutorials.jl/gh-pages/__generated/D0-categorical/Project.toml) and
[this `Manifest.toml`](https://raw.githubusercontent.com/juliaai/DataScienceTutorials.jl/gh-pages/__generated/D0-categorical/Manifest.toml), or by following
[these](https://juliaai.github.io/DataScienceTutorials.jl/#learning_by_doing) detailed instructions.

In [2]:
using Pkg; Pkg.activate("D:/JULIA/6_ML_with_Julia/D0-categorical"); Pkg.instantiate()

[32m[1m  Activating[22m[39m project at `D:\JULIA\6_ML_with_Julia\D0-categorical`


This tutorial follows loosely [the docs](https://juliadata.github.io/CategoricalArrays.jl/latest/using.html).

## Defining a categorical vector

In [3]:
using CategoricalArrays

v = categorical(["AA", "BB", "CC", "AA", "BB", "CC"])

6-element CategoricalArray{String,1,UInt32}:
 "AA"
 "BB"
 "CC"
 "AA"
 "BB"
 "CC"

This declares a categorical vector, i.e. a Vector whose entries are expected to represent a group or category.
You can retrieve the group labels using `levels`:

In [4]:
levels(v)

3-element Vector{String}:
 "AA"
 "BB"
 "CC"

which, by  default, returns the labels in lexicographic order.

## Working with categoricals

### Ordered categoricals

You can specify that categories are *ordered* by specifying `ordered=true`, the order then follows that of the levels. If you wish to change that order, you  need to  use the `levels!` function.
Let's see two examples.

In [5]:
v = categorical([1, 2, 3, 1, 2, 3, 1, 2, 3], ordered=true)

levels(v)

3-element Vector{Int64}:
 1
 2
 3

Here the lexicographic order matches what we want so no  need to change it, since we've specified  that the categories are ordered we can do:

In [6]:
v[1] < v[2]

true

Let's now consider another example

In [7]:
v = categorical(["high", "med", "low", "high", "med", "low"], ordered=true)

levels(v)

3-element Vector{String}:
 "high"
 "low"
 "med"

The levels follow the lexicographic order which  is not what  we want:

In [8]:
v[1] < v[2]

true

In order to re-specify the order we need to  use `levels!`:

In [9]:
levels!(v, ["low", "med", "high"])

6-element CategoricalArray{String,1,UInt32}:
 "high"
 "med"
 "low"
 "high"
 "med"
 "low"

now things are properly ordered:

In [10]:
v[1] < v[2]

false

### Missing values

You can also have a categorical vector with missing values:

In [11]:
v = categorical(["AA", "BB", missing, "AA", "BB", "CC"]);

that doesn't change the levels:

In [12]:
levels(v)

3-element Vector{String}:
 "AA"
 "BB"
 "CC"

---

*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*