# _Variables in Statistics_

This notebook was adapted from Dataquest's second lesson on statistics, titled _Variables in Statistics_. In this lesson/notebook, we'll focus on understanding the structural parts of a data set and how they are measured.

**Note**: the particular meaning of the 'variable' concept is restricted to the domain of statistics. 

### _Quantitative and Qualitative Variables_

**Notes**
- can be described in either quantities or qualities
- quantitative variable = how much there is of something (i.e. quantity)
- qualitative/categorical variable = describe what or how something is


In [1]:
# import library
import pandas as pd
pd.options.display.max_columns = None

# load in data
wnba = pd.read_csv('data/wnba.csv')

# create dictionary storing each variable and whether or not it is quantitative or qualitative
variables = {'Name': 'qualitative', 'Team': 'qualitative', 'Pos': 'qualitative', 'Height': 'quantitative', 'BMI': 'quantitative',
             'Birth_Place': 'qualitative', 'Birthdate': 'quantitative', 'Age': 'quantitative', 'College': 'qualitative', 'Experience': 'quantitative',
             'Games Played': 'quantitative', 'MIN': 'quantitative', 'FGM': 'quantitative', 'FGA': 'quantitative',
             '3PA': 'quantitative', 'FTM': 'quantitative', 'FTA': 'quantitative', 'FT%': 'quantitative', 'OREB': 'quantitative', 'DREB': 'quantitative',
             'REB': 'quantitative', 'AST': 'quantitative', 'PTS': 'quantitative'}

In [2]:
variables

{'Name': 'qualitative',
 'Team': 'qualitative',
 'Pos': 'qualitative',
 'Height': 'quantitative',
 'BMI': 'quantitative',
 'Birth_Place': 'qualitative',
 'Birthdate': 'quantitative',
 'Age': 'quantitative',
 'College': 'qualitative',
 'Experience': 'quantitative',
 'Games Played': 'quantitative',
 'MIN': 'quantitative',
 'FGM': 'quantitative',
 'FGA': 'quantitative',
 '3PA': 'quantitative',
 'FTM': 'quantitative',
 'FTA': 'quantitative',
 'FT%': 'quantitative',
 'OREB': 'quantitative',
 'DREB': 'quantitative',
 'REB': 'quantitative',
 'AST': 'quantitative',
 'PTS': 'quantitative'}

### _Scales of Measurement_

**Notes**
- scale of measurement: system of rules that define how each variable is measured

### _Nominal Scale_

**Notes**
- `Team` variable from `wnba` data set = example of variable measured on nominal scale
    - we can tell whether two individuals are different (with respect to that variable)
    - can't say anything about direction or size of difference
    - know it only describes qualities
    - in the case of numbers, they are identifiers and do not quantify anything

In [3]:
# create loop that prints out keys of qualitative variables
for key, value in variables.items():
    if value == 'qualitative':
        print(key)

Name
Team
Pos
Birth_Place
College


In [4]:
# create list from above loop
nominal_scale = []

for key, value in variables.items():
    if value == 'qualitative':
        nominal_scale.append(key)
        
sorted(nominal_scale)

['Birth_Place', 'College', 'Name', 'Pos', 'Team']

### _The Ordinal Scale_

**Notes**
- new variable `Height_labels` showed labels like `short`, `medium`, `tall`
    - can tell whether two individuals are different or not
    - unlike case of nominal scale, can also tell direction of difference
        - someone who has label `tall` has a greater height than someone with `short`
        - still can't determine size of difference though, which makes this an example of a variable measured on an ordinal scale
        - ordinal is quantitative
    - examples of ordinal: ranks of athletes, of horses in a race

In [5]:
wnba.head()

Unnamed: 0,Name,Team,Pos,Height,Weight,BMI,Birth_Place,Birthdate,Age,College,Experience,Games Played,MIN,FGM,FGA,FG%,15:00,3PA,3P%,FTM,FTA,FT%,OREB,DREB,REB,AST,STL,BLK,TO,PTS,DD2,TD3
0,Aerial Powers,DAL,F,183,71.0,21.200991,US,"January 17, 1994",23,Michigan State,2,8,173,30,85,35.3,12,32,37.5,21,26,80.8,6,22,28,12,3,6,12,93,0,0
1,Alana Beard,LA,G/F,185,73.0,21.329438,US,"May 14, 1982",35,Duke,12,30,947,90,177,50.8,5,18,27.8,32,41,78.0,19,82,101,72,63,13,40,217,0,0
2,Alex Bentley,CON,G,170,69.0,23.875433,US,"October 27, 1990",26,Penn State,4,26,617,82,218,37.6,19,64,29.7,35,42,83.3,4,36,40,78,22,3,24,218,0,0
3,Alex Montgomery,SAN,G/F,185,84.0,24.543462,US,"December 11, 1988",28,Georgia Tech,6,31,721,75,195,38.5,21,68,30.9,17,21,81.0,35,134,169,65,20,10,38,188,2,0
4,Alexis Jones,MIN,G,175,78.0,25.469388,US,"August 5, 1994",23,Baylor,R,24,137,16,50,32.0,7,20,35.0,11,12,91.7,3,9,12,12,7,0,14,50,0,0


In [6]:
# with Height_labels, can we tell if one player is taller than another?
question1 = True

# Can we measure the height difference with Height_labels?
question2 = False

# Height_labels and College both measured on an ordinal scale?
question3 = False

# Games Played no measured on an ordinal scale?
question4 = True

# Experience is measured on an ordinal scale?
question5 = False

# Height_labels is qualitative because it is measured using words?
question6 = False

### _The Interval and Ratio Scales_

**Notes**
- a variable measured on a scale that preserves order between values and has well-defined intervals using real numbers
    - example of variable measured on either an **interval** scale, or on a **ratio** scale
- tend to be very common, here are some examples:
    - height measured in inches
    - weight measured in grams
    - time measured in seconds
    - price of an apple in dollars
    
### _The Difference Between Ratio and Interval Scales_
- what sets apart ratio scales from interval scales?
    - nature of the zero point
- for ratio scale zero = no quantity
    - ex. 0 grams indicates absence of weight
- interval scale = zero point indeicates presence of a quantity
    - ex. player weight deviation (player's deviation in weight from average weight)
    - a value of 0 does not mean a player has no weight!
        - means it is exactly the same weight as the mean