# Data Variables:
Data Variables are values that represent some characteristic or property of some object. These variables can either be counted or measured. Data Variables can be categorized into different types for a given dataset, as illustrated in the figure:

![image.png](attachment:image.png)

Lets discuss these with examples one by one:

# Numeric Variables:
Numeric Variables are data variables that make use of numbers to represent a particular characteristic of an instance. Numeric variables can be categorized into two main types i.e. Discrete and Continuous

## Discrete Variables:
Discrete Variables are numeric variables that carry a discrete value i.e. only represent characteristics defined by integer values. Discrete Variables can further be divided into two types i.e. Quantitative Discrete Variables and Qualitative Discrete Variables.

### Qualitative Variables:
A qualitative variable is a variable that represents some sort of categorization among its values. A discrete qualitative variable thus is a discrete variable that is qualitative in nature and represents categorization of values. Although numeric variables are generally used to represent qualitative values, these variables are an encoded version of qualitative values i.e. they do not support mathematical operations and thus cannot be measured or counted. Qualitative Discrete Variables can be classified into two types i.e. dichotomous and multichotomous

#### Dichotomous Variables:
Dichotomous Variables, or simply Binary Variables are variables that are qualitative discrete variables with exactly two categories. These variables are used in many cases in general machine learning problem, the most common being as a binary classifier target. 

Example of a dichotomous variable is gender i.e. a person being either a male or female, another example is answer for a true/false question i.e. either a true or a false, etc. Since we are discussing types of numeric variables, these variables are usually encoded as 0s and 1s in a given dataset (though other numbers can also be utilized)

#### Multichotomous Variables:
Multichotomous Variables are variables that are qualitative discrete variables with at least three categories. These variables are also very common in machine learning problems, the most common examples being as a multi-classification target. Multichotomous Variables can be further categorized into two types i.e. Nominal and Ordinal.

##### Nominal Variables:
Nominal Variables are multichotomous variables with no defined order or rank i.e. the values stored in such variables possess equal importance in terms of the defined characteristic. 

A common example of a numeric nominal variable is a country's country code. Country codes are all different for each country and are represented by numeric values, but neither do one country ranks above other in order because of its country code value nor do country codes support mathematical operations as they represent qualitative data disguised as numeric values. 

Numeric nominal data is usually utilized in cases like that of country code or as numeric encoded versions of non-numeric categories e.g. the favourite color of a person, etc.

##### Ordinal Variables:
Ordinal Variables are multichotomous variables with a defined order/rank i.e. the values stored in such variables rank in some order above other values and thus are given more priority in comparison. Ordinal variables do carry a rank with each value but the difference in these ranks is not defined i.e. we know that excellent performance is better than very good, but by how much margin is not known. A common example of a numeric ordinal variable is a student's grade. 

The grade of a student maybe different for each student but each grade has its rank as well e.g. a student with grade A is considered better at the subject than a student with grade C. Same goes for a student's performance. Since these are numeric in nature, they can be ranked in some sort of scale with min and max range. 

For example, we can say that a student with 5/10 marks in a test performed fair, 2/10 performed poor whereas one with 9/10 marks performed good (these can be categorized to an even smaller scale with addition of categories like worse, excellent, very good, etc.)

### Quantitative Variables:
A quatitative variable is a variable that can be measured or counted and represents some measured quantity of an instance. A discrete quantitative variable is an integer value that represents some measured quantity as supposed to some category. Thus, a quantitative discrete variable has ordering just like an ordinal variable, but the difference here is that the difference between the two values is known e.g. if one basket has 40 apples and the other has 20 apples, the latter has a lesser rank than the previous one but the difference between them is known i.e. 20. Quantitative variables also support mathematical operations.

A common example of quantitative variable is the marks scored by some student e.g. student A obtains 87 marks in the test whereas the student B obtains 54 marks. Other examples include number of sales for a product, number of books written by author, etc.

An interesting property of quantitative values such as these is that they can be converted into a qualitative value if grouped based on some defined range, an operation pretty common in machine learning problems, known as discretization. Example of such a case would be creating categories for marks obtained by students i.e. grade A+ for 92-100, A for 85-92, etc.. Although this is the case, the technique is usally used in case of continuous variables.

## Continuous Variables:
Continuous Variables are numeric variables carrying a continuous values i.e. values lieing between two extremes with an infinite amount of precision. Continuous variables are always quantitative in nature so they carry the same properties discusses in quantitative discrete variables i.e. they are measurable, ordered, support mathematical operations and have a known difference between the values. Continuous variables can be classified as finite and infinite as well based on the factor of their precision but that usually doesn't matter much in terms of data analysis, thus, we classify continuous variables based on the scale they utilize for measurement i.e. interval-scale and ratio-scale.

### Interval Scale Variables:
Interval Scale Variables are continuous variables which do not have an absolute value of zero defined i.e. there is no point of reference/origin defined on a general scale for the variable. This means that if one unit of measurement for a variable has 0 value, it would not be 0 for its other units of measurement. Interval Scale variables always need a unit of measurement when representing them.

For example, consider the example of measuring temperature. The temperature can either be measured in Celsius, Fahrenheit or Kelvin scale. Since Celcius is the most common one, we can say that a room with 20 degree Celsius has twice the temperature as a room with 10 degree Celsius. But since there is no absolute point of reference, we cannot say that the first room is twice as hot as the second one (due to inambiguity of scale). 

### Ratio Scale Variables:
Ratio Scale Variables are continuous variables which have an absolute value of zero defined i.e. there is a point of reference/origin defined on a general scale for the variable. This means that if a variable has zero value, it will be defined as zero for all its units of measurements when representing them.

For example, consider the example of measuring money. Money can be measured in different currencies e.g. Dollar, Rupees, Pounds, etc. If someone has 0 amount of money, then regardless of which currency he is utilizing, he would still be considered bankrupt. (since 0 USD = 0 PKR = 0 GBP = 0 anyother currency)

# Non-Numeric Variables:
Non-Numeric variables are variables that make use of non-numeric values to represent a particular characteristic of an instance. Non-numeric variables can be divided into two categories i.e. Categorical or Non-Categorical.

## Categorical Variables:
Non-numeric categorical variables just like discrete qualitative variables are used to represent different category of values with the only difference that the values being used for categorization are non-numeric in nature instead of numeric. These variables also have the same categories i.e.

* Dichotomous - With two non-numeric classes e.g. Male and Female in gender, etc.
* Nominal - With unordered non-numeric classes e.g. America, Pakistan, India in country names, etc.
* Ordinal - With ordered non-numeric classes e.g. Poor, Fair, Good in performance, etc.

## Non-Categorical Variables:
Non-categorical variables are non-numeric variables that are not meant to represent categories rather contain non-numeric data for different instances. These variables usually undergo some sort of processing to get insights so as to represent any of the above mentioned variables. 

For example, a phone number of the format +xx xxx-xxx-xxxx is a non-categorical variable but with necessary processing can be used to retrieve some information e.g. we can get the first two digits to get the country code, which then becomes a categorical variable useful for our analysis.

Other examples of these variables are website URLs, long text passages, set of values, CNIC numbers, description of a product, etc.

Thats it for today! We will look into how to investigate the kinds of variables from a dataset using real dataset examples tomorrow with the help of above mentioned classification.