# Introduction

Factors are designed to represent categorical data that can take a fixed set of possible values. Factors are built on top of integers, and have a levels attribute:

In [1]:
fruits <- factor(c('Apple', 'Banana'), levels = c('Apple', 'Banana', 'Coconut'))

fruits

In [3]:
typeof(fruits)

In [2]:
attributes(fruits)

# Behaviors

When you change the level of a factor, it's underlying integer stay the same,  but the levels are changed, making it look like the data has changed.

In [6]:
l <- factor(letters)



levels(l) <- rev(levels(l))
# underlying integer stay the same
as.integer(l)

In [7]:
# both the data and levels are reversed
l

If you only wanna reverse the level without reversing the data, use this:

In [9]:
factor(letters, levels = rev(letters))

If you only wanna reverse the data without reversing the level, use this:

In [11]:
factor(rev(letters), levels = letters)

# Explanation

In [1]:
help(factor)

**Usage**

```R
factor(x = character(), levels, labels = levels,
       exclude = NA, ordered = is.ordered(x), nmax = NA)

ordered(x, ...)

is.factor(x)
is.ordered(x)

as.factor(x)
as.ordered(x)

addNA(x, ifany = FALSE)
```

**Arguments**


`x`	
a vector of data, usually taking a small number of distinct values.

`levels`	
an optional vector of the unique values (as character strings) that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x. Note that this set can be specified as smaller than sort(unique(x)).

`labels`	
either an optional character vector of labels for the levels (in the same order as levels after removing those in exclude), or a character string of length 1. Duplicated values in labels can be used to map different values of x to the same factor level.

`exclude`	
a vector of values to be excluded when forming the set of levels. This may be factor with the same level set as x or should be a character.

`ordered`	
logical flag to determine if the levels should be regarded as ordered (in the order given).

`nmax`	
an upper bound on the number of levels; see ‘Details’.

`...`	
(in ordered(.)): any of the above, apart from ordered itself.

`ifany`	
only add an NA level if it is used, i.e. if any(is.na(x)).

# Factor creation

**`factor()`**

In [1]:
genders = c('Male', 'Female', 'Female', 'Male', 'Female')
is.factor(genders)

In [2]:
gender_cat = factor(genders)
gender_cat

In [3]:
is.factor(gender_cat)

Factor in dataframe

In [5]:
# Create the vectors for data frame.
height <- c(132,151,162,139,166,147,122)
weight <- c(48,49,66,53,67,52,40)
gender <- c("male","male","female","female","male","female","male")

# Create the data frame.
input_data <- data.frame(height,weight,gender)
input_data

height,weight,gender
132,48,male
151,49,male
162,66,female
139,53,female
166,67,male
147,52,female
122,40,male


In [6]:
input_data$gender

In [7]:
is.factor(input_data$gender)

# Getting and changing the categories of a factor

**`level()s`**:  provides access to the levels attribute of a variable. The first form returns the value of the levels of its argument and the second sets the attribute.

In [5]:
gender <- factor(c('Male', 'Female', 'Male'))
#geting the categories of factor `gender`
levels(gender)

In [7]:
#changing the factors of `gender`
levels(gender) <- c('Girl', 'Boy')

gender

# Getting number of categories

**`nlevels()`**

In [9]:
#getting #categories
nlevels(gender)

# Chaning the order of levels

In [8]:
factor(c('One', 'Two', 'One', 'Three'), levels = c('One', 'Two', 'Three'))

using **`levels`**

In [3]:
gender <- factor(c('Male', 'Female', 'Male'))
gender

In [4]:
levels(gender) <- c('Male', 'Female')
gender

### reorder

**`reorder()`**: reorder the levels of a factor base on the value of another.  
<b style = 'color:red'>NOTE: `reorder` does not change the position of values like `sort`, it just change the order of categories in that factor</b>

In [1]:
numbers <- factor(c('Three', 'Two', 'One'))
numbers #We can see the order of levels: 'One' < 'Three' < 'Two'

In [5]:
#Reorder the level of numbers
#'One' will have 1
#'Two' will have value 2
#'Three' will have value 3
#1 < 2 < 3 so after reorder, 'One' < 'Two' < 'Three'
numbers <- reorder(numbers, c(3, 2, 1))

numbers

In [13]:
#What if there are multiple value for each category?

numbers <- factor(c('One', 'One', 'Two', 'Two', 'Three', 'Three'))
numbers #levels: 'One' <'Three' <'Two'

In [14]:
values <- c(2, 6, 1, 3, 10, 10)
#reoder by mean value of each category
numbers <- reorder(numbers, values, FUN = 'mean')

numbers  #after reorder: 'Two' < 'One' <'Three'

# Generating factor levels

We can generate factor levels by using the **`gl()`** function. It takes two integers as input which indicates how many levels and how many times each level.

```python
gl(n, k, labels)
```

* **`n`** is a integer giving the number of levels.

* **`k`** is a integer giving the number of replications.

* **`labels`** is a vector of labels for the resulting factor levels.

In [11]:

v <- gl(3, 4, labels = c("Tampa", "Seattle","Boston"))
print(v)

 [1] Tampa   Tampa   Tampa   Tampa   Seattle Seattle Seattle Seattle Boston 
[10] Boston  Boston  Boston 
Levels: Tampa Seattle Boston


# Label

In [1]:
sex <- factor(c('Male', 'Female', 'Female', 'Male', 'Male', 'Female'))
levels(sex)

In [2]:
#mapping Female to Girl and Male to Boy
factor(sex, labels = c('Girl', 'Boy'))

# Ordinal and Nominal Catogorical Variable

In [3]:
# ordinal factor: ordered = T
quality <- factor(c('Good', 'Premium', 'Bad', 'Premium'), levels = c('Bad', 'Good', 'Premium'), ordered = T)

print(quality)

[1] Good    Premium Bad     Premium
Levels: Bad < Good < Premium


In [4]:
# nominal factor: ordered = F
fruits <- factor(c('Apple', 'Banana'), levels = c('Apple', 'Banana', 'Coconut'))

print(fruits)

[1] Apple  Banana
Levels: Apple Banana Coconut
