<div align="center">
    <h4> Data Structure: Factor</h4>
</div>

In [2]:
y <- c(1,4,3,5,4,2,4)

In [3]:
possible.dieface <- c(1,2,3,4,5,6)

In [4]:
labels.dieface <- c("one","two","three","four","five","six")

In [5]:
facy <- factor(y, levels = possible.dieface, labels = labels.dieface)
facy

In [7]:
x <- factor(c("juice", "juice", "lemonade", "juice", "water"))
x

**unclass() function**

In [8]:
unclass(x)

In [11]:
attr(x,'levels')

In [12]:
class(x)

In [14]:
z = unclass(x)
z

In [15]:
class(z)

In [16]:
unclass(x)

In [20]:
attr(x,'levels')

In [22]:
income <- ordered(c("high","high", "low","medium", "medium"), levels=c("low","medium", "high"))
income

In [23]:
unclass(income)

In [26]:
attr(income,'levels')

In [27]:
class(income)

In [28]:
class(unclass(income))

In [31]:
x <- c(4,5,1,2,3,4,4,5,6)
x

In [32]:
x <- as.factor(x)
x

In [33]:
class(x)

In [2]:
# create a factor by passing a character or numeric vector into the factor() function:
gender_vector <- c(rep("male", 10),rep("female",10)) # Create a character variable
gender_factor <- factor(gender_vector) # convert to factor
print(gender_factor)

 [1] male   male   male   male   male   male   male   male   male   male  
[11] female female female female female female female female female female
Levels: female male


In [3]:
# specify the levels a factor can take by passing a character vector of levels to the levels argument:
gender_factor <- factor(gender_vector, levels = c('male', 'female','other'))
print(gender_factor)

 [1] male   male   male   male   male   male   male   male   male   male  
[11] female female female female female female female female female female
Levels: male female other


In [4]:
# You can check, rename and add to the levels of a factor with the levels() function:
levels(gender_factor)    # Check levels

In [5]:
levels(gender_factor) <- c('male','female','unknown') # change levels
levels(gender_factor)

In [6]:
levels(gender_factor) <- c('male', 'female','unknown', 'no_one') # add a level
levels(gender_factor)

remove the factor levels with no data present by recreating the factor with the factor() function or by using the droplevels() function


In [7]:
gender_factor <- droplevels(gender_factor) # drop unused levels
levels(gender_factor)

R offers a second type of factor called an ordered factor for ordinal data. Ordinal data is non-numeric data that has some sense of natural ordering. For example, a variable with the levels "very low", "low", "medium", "high", and "very high" is not numeric but it has a natural ordering, so it can be encoded as an ordered factor. To create an ordered factor, use the factor() function with the additional argument ordered=TRUE or use the ordered() function.

**Note:** it is important to use the levels argument when creating an ordered factor because the levels you supply are used to create the ordering from lowest to highest.

In [11]:
dat <- rep(c("very low","low", "medium", "high","very high"), 2)
dat_factor <- factor(dat, levels = c("very low", "low", "medium", "high", "very high"), ordered=T)
print(dat_factor)

 [1] very low  low       medium    high      very high very low  low      
 [8] medium    high      very high
Levels: very low < low < medium < high < very high


Convert a factor to character using **as.character()** function 

In [12]:
as.character(gender_factor)

If you try to convert a factor to numeric, the result will be a numeric vector corresponding to the integers assigned to each factor level:

In [13]:
as.numeric(gender_factor)

If for some reason you have numeric data encoded as a factor, this might not be the desired behavior. To convert a factor with numeric levels to a numeric vector of the level values use the following construction:

In [14]:
numeric_factor <- factor(c(-1.3,-2.6,2.6,3.2,2.6,4.5,-1.3))
# this construction lets you extract the numbers
as.numeric(levels(numeric_factor))[numeric_factor]
# converting to charactor first also works (but run maybe slower)
as.numeric(as.character(numeric_factor))

If you'd like to add more values to an existing factor, you can't just use c() like you would when combining normal vectors. One way to add to a factor is to convert the factor to character, concatenate it with the new values, and then convert it back to factor.

In [17]:
# This adds more values to the gender_factor
gender_factor <- as.factor(c(as.character(gender_factor), "Unknown", "Unknown", "Prefer not to say"))
summary(gender_factor)

**Factor Indexing**

In [18]:
gender_factor[2]                      # Get the second element
gender_factor[9:15]                   # Get a slice of elements
gender_factor[c(3,6,12)]              # Get a selection of specific elements
gender_factor[gender_factor=="male"]  # Get all values where the level equals male

**Factor Summary Functions**

In [20]:
summary(gender_factor)  # summary() returns counts for each level
str(gender_factor)     # str() shows the factor's stucture
length(gender_factor)  # Get the length of the factor
table(gender_factor)   # table() creates a data table of counts

 Factor w/ 4 levels "female","male",..: 2 2 2 2 2 2 2 2 2 2 ...


gender_factor
           female              male Prefer not to say           Unknown 
               10                10                 3                 6 

In [21]:
language <- c(rep("python",15),rep("R",10),rep("SQL",5))

language_factor <- factor(language)
language_factor

In [22]:
summary(language_factor)