---
# Variables in Statistics

> The properties with varying values we call **`variables`**. The height property in our dataset is an example of a variable. In fact, all the properties described in our dataset are variables.

***
## Quantitative and Qualitative Variables


Variables in statistics can describe either `quantities`, or `qualities`.

For instance, the `Height` variable in our dataset describes *how tall* each player is. The `Age` variable describes *how much* time has passed since each player was born. The `MIN` variable describes *how many* minutes each player played in the 2016-2017 WNBA season.

Generally, a variable that describes *how much* there is of something describes a quantity, and, for this reason, it's called a **quantitative variable**

Variables that describe qualities are called **qualitative variables** or **categorical variables**. Generally, qualitative variables describe *what* or *how* something is. Usually, qualitative variables describe qualities using words, but numbers can also be used.

* We've already created a dictionary named variables. Each variable name is given as dictionary key.
* If a variable is quantitative, then complete the value of the corresponding key with the string `'quantitative'`. If the variable is qualitative, the use the string `'qualitative'`.

In [6]:
import pandas as pd
wnba = pd.read_csv('wnba.csv')

variables = {'Name': '', 'Team': '', 'Pos': '', 'Height': '', 'BMI': '',
             'Birth_Place': '', 'Birthdate': '', 'Age': '', 'College': '', 'Experience': '',
             'Games Played': '', 'MIN': '', 'FGM': '', 'FGA': '',
             '3PA': '', 'FTM': '', 'FTA': '', 'FT%': '', 'OREB': '', 'DREB': '',
             'REB': '', 'AST': '', 'PTS': ''}

variables = {'Name': 'qualitative', 'Team': 'qualitative', 'Pos': 'qualitative', 'Height': 'quantitative', 'BMI': 'quantitative',
             'Birth_Place': 'qualitative', 'Birthdate': 'quantitative', 'Age': 'quantitative', 'College': 'qualitative', 'Experience': 'quantitative',
             'Games Played': 'quantitative', 'MIN': 'quantitative', 'FGM': 'quantitative', 'FGA': 'quantitative',
             '3PA': 'quantitative', 'FTM': 'quantitative', 'FTA': 'quantitative', 'FT%': 'quantitative', 'OREB': 'quantitative', 'DREB': 'quantitative',
             'REB': 'quantitative', 'AST': 'quantitative', 'PTS': 'quantitative'}
print(variables)

{'Name': 'qualitative', 'Team': 'qualitative', 'Pos': 'qualitative', 'Height': 'quantitative', 'BMI': 'quantitative', 'Birth_Place': 'qualitative', 'Birthdate': 'quantitative', 'Age': 'quantitative', 'College': 'qualitative', 'Experience': 'quantitative', 'Games Played': 'quantitative', 'MIN': 'quantitative', 'FGM': 'quantitative', 'FGA': 'quantitative', '3PA': 'quantitative', 'FTM': 'quantitative', 'FTA': 'quantitative', 'FT%': 'quantitative', 'OREB': 'quantitative', 'DREB': 'quantitative', 'REB': 'quantitative', 'AST': 'quantitative', 'PTS': 'quantitative'}


***
## Scales of Measurement

![Scale](https://s3.amazonaws.com/dq-content/284/s1m2_height_team.svg)

The system of rules that define how each variable is measured is called `scale of measurement or, less often, `level of measurement`.

***
## The Nominal Scale
<span style="background:white">

![Nominal](https://s3.amazonaws.com/dq-content/284/s1m2_nominal.svg) </span>

The Team variable is an example of a variable measured on a `nominal scale`. For any variable measured on a nominal scale:
* We can tell whether two individuals are different or not (with respect to that variable).
* We can't say anything about the direction and the size of the difference.
* We know that it can only describe qualities.


Inspect the dataset, and find the variables measured on a nominal scale. In the code editor:
* Add the variables measured on a nominal scale to a list named nominal_scale, and sort the elements in the list alphabetically (the sorting helps us with answer checking).
* Notice that we've added a new variable named Height_labels. Instead of showing the height in centimeters, the new variable shows labels like "short", "medium", or "tall". By considering the principles that characterize the nominal scale, think whether the new Height_labels variable should be included in your nominal_scale list.

In [8]:
nominal_scale = sorted(['Name', 'Team', 'Pos', 'Birth_Place', 'College'])
print(nominal_scale)
wnba.head(5)

['Birth_Place', 'College', 'Name', 'Pos', 'Team']


Unnamed: 0,Name,Team,Pos,Height,Weight,BMI,Birth_Place,Birthdate,Age,College,...,OREB,DREB,REB,AST,STL,BLK,TO,PTS,DD2,TD3
0,Aerial Powers,DAL,F,183,71.0,21.200991,US,"January 17, 1994",23,Michigan State,...,6,22,28,12,3,6,12,93,0,0
1,Alana Beard,LA,G/F,185,73.0,21.329438,US,"May 14, 1982",35,Duke,...,19,82,101,72,63,13,40,217,0,0
2,Alex Bentley,CON,G,170,69.0,23.875433,US,"October 27, 1990",26,Penn State,...,4,36,40,78,22,3,24,218,0,0
3,Alex Montgomery,SAN,G/F,185,84.0,24.543462,US,"December 11, 1988",28,Georgia Tech,...,35,134,169,65,20,10,38,188,2,0
4,Alexis Jones,MIN,G,175,78.0,25.469388,US,"August 5, 1994",23,Baylor,...,3,9,12,12,7,0,14,50,0,0


***
## The Ordinal Scale

![Scale](https://s3.amazonaws.com/dq-content/284/s1m2_ordinal_intervals.svg)

Consider the following sentences, and evaluate their truth value. If the sentence is true, then assign True to the corresponding variable (programming variable) in the code editor, otherwise assign False. Make sure you assign boolean values as answers, not strings.

1. Using the Height_labels variable only, we can tell whether player Kiah Stokes is taller than Riquna Williams. Assign your answer to a variable named `question1`.
2. We can measure the height difference between Kiah Stokes and Riquna Williams using the Height_labels variable. Assign your answer to `question2`.
3. The Height_labels and the College variables are both measured on an ordinal scale. Assign your answer to `question3`.
4. The Games Played variable is not measured on an ordinal scale. Assign your answer to `question4`.
5. The Experience variable is measured on an ordinal scale. Assign your answer to `question5`.
6. The Height_labels variable is qualitative because it is measured using words. Assign your answer to `question6`.

In [9]:
question1 = True
question2 = False
question3 = False
question4 = True
question5 = False
question6 = False

***
## Interval and Ratio Scales

![Ordinal_Scale](https://s3.amazonaws.com/dq-content/284/s1m2_intervals_not_known.svg)
![RealNumbers](https://s3.amazonaws.com/dq-content/284/s1m2_intervals_eq.svg)

A variable measured on a scale that preserves the order between values and has well-defined intervals using real numbers is an example of a variable measured either on an `interval scale`, or on a `ratio scale`.

***
## The Difference Between Ration and Interval Scales

What sets apart ratio scales from interval scales is the nature of the zero point.

On a ratio scale, the zero point means no quantity. For example, the `Weight` variable is measured on a **ratio scale**, which means that 0 grams indicate the absence of weight.

On an **interval scale**, however, the zero point doesn't indicate the absence of a quantity. It actually indicates the *presence* of a quantity.

![Ratio](https://s3.amazonaws.com/dq-content/284/s1m2_interval_vs_ratio.svg)

Examine the various variables of the dataset, and find the ones that are measured on an interval or ratio scale.

* For the variables measured on a interval scale, add their names as a string to a list named interval. Sort the list alphabetically.
* For the variables measured on a ratio scale, add their names as a string to a list named ratio. Sort the list alphabetically.
* We've also added the Weight_deviation variable to the dataset, so make sure you include that one too in one of the lists.

In [17]:
interval = ['Birthdate', 'Weight_deviation']
ratio = sorted(['Height', 'Weight', 'BMI', 'Age', 'Experience', 'Games Played', 'MIN', 'FGM', 'FGA', 'FG%', '15:00', 
                '3PA', '3P%', 'FTM', 'FTA', 'FT%', 'OREB', 'DREB', 'REB', 'AST', 'STL', 'BLK', 'TO',
                'PTS', 'DD2', 'TD3'])
print(' interval:', interval,'\n','ratio:', ratio)

 interval: ['Birthdate', 'Weight_deviation'] 
 ratio: ['15:00', '3P%', '3PA', 'AST', 'Age', 'BLK', 'BMI', 'DD2', 'DREB', 'Experience', 'FG%', 'FGA', 'FGM', 'FT%', 'FTA', 'FTM', 'Games Played', 'Height', 'MIN', 'OREB', 'PTS', 'REB', 'STL', 'TD3', 'TO', 'Weight']


***
## Common Examples of Interval Scales

Generally, points in time are indicated by variables measured on an interval scale. Let's say we want to indicate the point in time of the first manned mission on the Moon. If we want to use a ratio scale, our zero point must be meaningful and denote the absence of time. For this reason, we'd basically have to begin the counting at the very beginning of time.

There are many problems with this approach. One of them is that we don't know with precision when time began (assuming time actually has a beginning), which means we don't know how far away in time we are from that zero point.

To overcome this, we can set an arbitrary zero point, and measure the distance in time from there. Customarily, we use the Anno domini system where the zero point is arbitrarily set at the moment Jesus was born. Using this system, we can say that the first manned mission on the Moon happened in 1969. This means that the event happened 1968 years after Jesus' birth (1968 because there's no year 0 in the Anno domini system).

![Scale](https://s3.amazonaws.com/dq-content/284/s1m2_anno_domini.svg)

Temperature can be measured on a ratio scale too, and this is done using the Kelvin scale. 0 K (0 Kelvin) is not set arbitrarily, and it indicates the lack of temperature. The temperature can't possibly drop below 0 K.

![Temp](https://s3.amazonaws.com/dq-content/284/s1m2_temperature.svg)

***
## Discrete and Continuous Variables
Generally, if there's no possible intermediate value between any two adjacent values of a variable, we call that variable <span style="text-decoration: underline; font-weight: bold"> discrete</span>.

In the diagram below we consider values between 86 and 87 kg, and break down the interval in five equal parts. Then we take two values (86.2 and 86.8) from the interval 86 - 87, and break down the interval between these values (86.2 and 86.8) in five equal parts. Then we repeat the process for the interval 86.2 - 86.8. In fact, we could repeat the process infinitely.
![parts](https://s3.amazonaws.com/dq-content/284/s1m2_infinity_in_finity.svg)

Generally, if there's an infinity of values between any two values of a variable, we call that variable <span style="text-decoration: underline; font-weight: bold">continuous</span>.


In [18]:
print(wnba['Height'].head())

0    183
1    185
2    170
3    185
4    175
Name: Height, dtype: int64


For every variable, indicate whether is continuous or discrete.

* In the code editor, we've already extracted for you the names of the variables that are measured on ratio and interval scales. Every variable name is registered as a dictionary key.
* If a variable is discrete, then assign the string 'discrete' to its corresponding dictionary key.
* If a variable is continuous, then assign the string 'continuous' to its corresponding dictionary key.

In [None]:
ratio_interval_only = {'Height':'', 'Weight': '', 'BMI': '', 'Age': '', 'Games Played': '', 'MIN': '', 'FGM': '',
                       'FGA': '', 'FG%': '', '3PA': '', '3P%': '', 'FTM': '', 'FTA': '', 'FT%': '',
                       'OREB': '', 'DREB': '', 'REB': '', 'AST': '', 'STL': '', 'BLK': '', 'TO': '',
                       'PTS': '', 'DD2': '', 'TD3': '', 'Weight_deviation': ''}
ratio_interval_only = {'Height': 'continuous', 'Weight': 'continuous', 'BMI': 'continuous', 'Age': 'continuous',
                       'Games Played': 'discrete', 'MIN': 'continuous', 'FGM': 'discrete',
                       'FGA': 'discrete', 'FG%': 'continuous', '3PA': 'discrete', '3P%': 'continuous',
                       'FTM': 'discrete', 'FTA': 'discrete', 'FT%': 'continuous', 'OREB': 'discrete',
                       'DREB': 'discrete', 'REB': 'discrete', 'AST': 'discrete', 'STL': 'discrete',
                       'BLK': 'discrete', 'TO': 'discrete', 'PTS': 'discrete', 'DD2': 'discrete', 
                       'TD3': 'discrete', 'Weight_deviation': 'continuous'}

***
## Real Limits

Generally, every value of a continuous variable is an interval, no matter how precise the value is. The boundaries of an interval are sometimes called real limits. The lower boundary of the interval is called <span style="text-decoration: underline; font-weight: bold">lower real limit</span>, and the upper boundary is called <span style="text-decoration: underline; font-weight: bold">upper real limit</span>.

![span](https://s3.amazonaws.com/dq-content/284/s1m2_real_limits.svg)

In the figure above we can see for example that 88.5 is halfway between 88 and 89. If we got a measurement of 88.5 kg in practice, but we want only integers in our dataset (hence zero decimals precision), you might wonder whether to assign the value to 88 or 89 kg. The answer is that 88.5 kg is exactly halfway between 88 and 89 kg, and it doesn't necessarily belong to any of those two values. The assignment only depends on how you choose to round numbers: if you round up, then 88.5 kg will be assigned to 89 kg; if you round down, then the value will be assigned to 88 kg.

Find the real limits for five values of the BMI (body mass index) variable.

* We've already extracted the first five BMI values in the dataset and rounded each off to a precision of three decimal places. We stored the values as dictionary keys in a dictionary named bmi.
* For every BMI value write its real limits in a list and make the list a dictionary value that should correspond to the right dictionary key. The lower real limits should come first in each list. For example:

```python
bmi = {20: [19.5, 20.5],
 21: [20.5, 21.5],
 23: [22.5, 23.5],
 24: [23.5, 24.5],
 22: [21.5, 22.5]}
 ```

In [19]:
bmi = {21.201: [],
 21.329: [],
 23.875: [],
 24.543: [],
 25.469: []}
bmi = {21.201: [21.2005, 21.2015],
 21.329: [21.3285, 21.3295],
 23.875: [23.8745, 23.8755],
 24.543: [24.5425, 24.5435],
 25.469: [25.4685, 25.4695]}

***
![Next](https://s3.amazonaws.com/dq-content/284/s1m2_second_step_done.svg)