# Intro to Variable Types

**Inspecting Categorical & Quantitative Variables**

## Introduction

Datasets contain variables that describe a set of observations. For example, if we are collecting data on a group of people, we might have variables such as age, gender, and political party. Flat (also called tabular) datasets are organized with columns and rows. The columns are the variables (also called attributes or features) and the rows are observations (also called instances or records).

There are two types of variables: numeric and categorical. Numeric variables represent quantities and are most often things that can be counted or measured. Categorical variables come in a lot of varieties, including nominal (which are the names for groups), ordinal values (like places in a competition), or binary (like “on” and “off”).

![Schematic describing the different types of variables. Variables can be either quantitative or categorical. Quantitative variables can be discrete counts or continuous measurements. Categorical variables can be either ordinal (ordered categories) or nominal (unordered).](./assets/vartypes_article.png)

Let’s dive deeper into the different variable types to understand how to identify them in a dataset.

## Quantitative Variables

Quantitative variables are counts or measurements (eg., number of points earned in a game or height). They can be used in mathematical operations and quantitative analysis, and are helpful for answering questions like “How many/much?”, “What is the average?”, or “How often?”. There are two types of quantitative variables: discrete and continuous.

### Discrete Variables

Discrete variables represent counts, and are represented as whole numbers (also called integers). For example, it doesn’t make sense to say “I have 2.3 children,” and you can’t make 4.1 coin flips. Both children and coin flips are examples of discrete variables.

### Continuous Variables

Continuous variables are measurements that do make sense as parts of a whole. Most measurements are continuous and can be represented with decimal numbers (also called floats). For example, a bag of sugar could be 1.2 kg or a building could be 10.5 meters tall. Any time a number is represented by a decimal, it is most likely continuous. However, some software programs will change integers into decimals, so it is always the responsibility of the analyst to know what is correct.

## Categorical Variables

Categorical variables differ from quantitative variables in that they focus on the different ways data can be grouped rather than counted or measured. With categorical variables, we want to understand how the observations in our dataset can be grouped and separated from one another based on their attributes. When the groupings have a specific order or ranking, the variable is an ordinal categorical variable. If there is no apparent order or ranking to the categories, we refer to the variable as a nominal categorical variable.

### Ordinal Variables

Do you remember working with a column in a dataset where the values of the column were groups that were greater or lesser than each other in some intrinsic way? Suppose there was a variable containing responses to the question “Rate your agreement with the statement: The minimum age to drive should be lowered.” The response options are “strongly disagree”, “disagree”, “neutral”, “agree”, and “strongly agree”. Because we can see an order where `“strongly disagree”` < `“disagree“` < `“neutral”` < `“agree”` < `“strongly agree”` in relation to agreement, we consider the variable to be ordinal.

Other examples of ordinal variables could be the standings in a sporting competition, age ranges, and customer ratings of a product or service.


| | place | company_seniority | age_group | customer_rating |
| -- | -- | -- | -- | -- |
| A. Jacobs | 1 | junior | 20-25 | very_satisfied |
| McKensey | 3 | senior | 35-40 | satisfied |
| Joel | 7 | executive | 30-35 | satisfied |
| Barry | 4 | mid | 50-55 | very_satisfied |

It is important to keep in mind when working with ordinal variables that the differences between categories can vary. We can see in the table above that each age group in the `age_group` column has a range of five years, and so groups are evenly spaced apart. However, the same logic can not be applied to the `customer_rating` variable. Here it is not accurate to assume that the difference between `”satisfied”` and `”very satisfied”` is the same as the difference between `”dissatisfied”` and `”very dissatisfied”`. This is a key difference between ordinal variables and discrete quantitative variables.

### Nominal Variables

Nominal categorical variables are those variables with two or more categories that do not have any relational order. Examples of nominal categories could be states in the U.S., brands of computers, or ethnicities. Notice how for each of these variables, there is no intrinsic ordering that distinguishes a category as greater than or less than another category.

|  | pet_type | color | favorite_food | adoption_city |
| -- | -- | -- | -- | -- |
| Fluffy | cat | orange | fish_pate | Sacramento |
| Bruno | dog | white | peanut_butter | Mt. Shasta |
| Alfie | bird | blue | sunflower_seed | San Francisco |
| Bitsy | turtle | green | apple | Los Angeles |

The number of possible values for a nominal variable can be quite large. It’s even possible that a nominal categorical variable will take on a unique value for every observation in a dataset, like in the case of unique identifiers such as `name` or `email_address`.

Sometimes, identifying a nominal variable can be tricky if that variable has attributes that are ordinal or quantitative. For example, the `adoption_city` variable above is nominal; however, we could assign an ordering to `adoption_city` based on a city-specific attribute like yearly average temperature. We might do this if we want to build a model to predict whether a particular animal will be adopted — and believe that temperature is relevant in making this prediction. If temperature is the ONLY thing we care about with respect to adoption city, we could assign an order to `adoption_city` based on temperature (and rename the variable to something like `adoption_city_temp`). Alternatively, we could create a new ordinal variable in our dataset named `city_rel_temp`, which is completely dependent on `adoption_city` as shown below:

| adoption_city | city_rel_temp |
| -- | -- |
| Sacramento | cool |
| Mt. Shasta | coldest |
| San Francisco | warm |
| Los Angeles | warmest |

Now we can see that the `city_rel_temp` is an ordinal variable as it contains an order that is based fundamentally on the temperatures of the cities.

### Binary Variables

Binary or dichotomous variables are a special kind of nominal variable that have only two categories. Because there are only two possible values for binary variables, they are mutually exclusive to one another. We can imagine a variable that describes if a picture contains a cat or a dog as a binary variable. In this case, if the picture is not a dog, it must be a cat, and vice versa. Binary variables can also be described with numbers similar to bits with `0` or `1` values. Likewise you may find binary variables containing boolean values of `True` or `False`.

|  | status | is_awake | is_stable |
| -- | -- | -- | -- |
| patient_101 | 1 | True | No |
| patient_304 | 0 | True | Yes |
| patient_107 | 1 | False | No |
| patient_514 | 1 | False | Yes |


<style>
:checked + span {
    color: red;
}
.correct_answer:checked + span {
        color: green;
}
</style>

Now that you’ve learned about variable types, test your knowledge by answering the following questions:

1. Which variable type contains numerical measurements in decimal form?

    <div>
        <label>
            <input type="radio" name="quiz1">
            <span>
                Nominal categorical
            </span>
        </label>
    </div>
    <div>
        <label>
            <input class="correct_answer" type="radio" name="quiz1">
            <span>
                Continuous quantitative
            </span>
        </label>
    </div>
    <div>
        <label>
            <input type="radio" name="quiz1">
            <span>
                Ordinal categorical
            </span>
        </label>
    </div>
    <div>
        <label>
            <input type="radio" name="quiz1">
            <span>
                Discrete quantitative
            </span>
        </label>
    </div>

2. The `status` variable describes if a person is asthmatic. The value `0` = “Not Asthmatic” and `1` = “Asthmatic”. What type of variable is the `status` variable?

    <div>
        <label>
            <input class="correct_answer" type="radio" name="quiz2">
            <span>
                Binary categorical
            </span>
        </label>
    </div>
    <div>
        <label>
            <input type="radio" name="quiz2">
            <span>
                Discrete quantitative
            </span>
        </label>
    </div>
    <div>
        <label>
            <input type="radio" name="quiz2">
            <span>
                Ordinal categorical
            </span>
        </label>
    </div>

3. Assign the appropriate variable types to the following variables from the dataset:

    | passenger_name | num_checked_bags | purchased_online | would_recommend | ticket_price |
    | -- | -- | -- | -- | -- |
    | joe | 3 | True | strongly_disagree | 1200.69 |
    | diane | 1 | True | strongly_agree | 2213.05 |
    | katy | 0 | True | neutral | 3078.58 |
    | sylvester | 2 | False | strongly_agree | 941.02 |

    A. Nominal categorical: ___

    B. Continuous quantitative: ___

    C. Ordinal categorical: ___

    D. Discrete quantitative: ___

    E. Binary categorical: ___

    - `num_checked_bags`
    - `passenger_name`
    - `purchased_online`
    - `ticket_price`
    - `would_recommend`
    
    <details>
        <summary>Answer</summary>

    A. Nominal categorical: `passenger_name`

    B. Continuous quantitative: `ticket_price`

    C. Ordinal categorical: `would_recommend`

    D. Discrete quantitative: `num_checked_bags`

    E. Binary categorical: `purchased_online`
    </details>