- Structured Data
    - Numeric or quantitative (continuous & discrete)
        - Continuous
        - Discrete
    - Categorical or qualitative (always discrete)
        - Nominal
        - Ordinal

- In terms of DataFrame/rectangular data
    - Feature, independent variable, attribute, input, predictor, variable : these all are same
    - Outcome, dependent variable, response, target, output : these all are same

- Apart from rectangular data there are many more data having specific use-case in data science
    - Time Series data
    - Spatial Data
    - Graph

In [1]:
import pandas as pd
from scipy import stats
import numpy as np

In [19]:
data = pd.read_csv('dataset/state_murder_rate.csv')

In [29]:
data.head()

Unnamed: 0,State,Population,Murder rate
0,Alabama,4779736,5.7
1,Alaska,710231,5.6
2,Arizona,6392017,4.7
3,Arkansas,2915918,5.6
4,California,37253956,4.4


#### Extimates of Location
- Mean
- Weighted mean
- Median
- Weighted median
- Trimmed median

In [24]:
np.mean(data['Population'])

7694135.625

In [25]:
data['Population'].mean()

7694135.625

In [28]:
stats.trim_mean(data['Population'], 0.1)

7694135.625

In [8]:
data['Population'].median()

4176916.5

In [30]:
np.average(data['Murder rate'], weights=data['Population'])

4.376359279149048

- The basic metric for location is the mean, but it can be sensitive to extreme values (outlier)
- Other metrics (median, trimmed mean) are more robust.

#### Estimates of Variability

- Deviation/errors/residuals
- Variance/mean-squared-error
- Standard Deviation/**l2-norm**/Euclidean norm
- Mean absolute deviation/**l1-norm**/Manhattan norm : absolute value in place of squaring
- Mean absolute deviation from the median
- Range
- Percentile/quantile : The value such that $P$ percent of the values take on this value or less and $(100–P)$ percent take on this value or more.
- Interquartile range/IQR

**NOTE** : Although formula for standard deviation is complicated and less intuitive, it is the prefered one over 'mean absolute deviation'. Standard Deviation owes its preeminence to statistical
theory: mathematically, working with squared values is much more convenient than absolute values, especially for statistical models