Classical statistics focused almost exclusively on inference, a sometimes complex set of procedures for drawing conclusions about large populations based on small samples.

### Elements of Structured Data

There are two basic types of structured data: numeric and categorical. Numeric data
comes in two forms: continuous, such as wind speed or time duration, and discrete,
such as the count of the occurrence of an event. Categorical data takes only a fixed set
of values, such as a type of TV screen (plasma, LCD, LED, etc.) or a state name (Ala‐
bama, Alaska, etc.). Binary data is an important special case of categorical data that
takes on only one of two values, such as 0/1, yes/no, or true/false. Another useful type
of categorical data is ordinal data in which the categories are ordered; an example of
this is a numerical rating (1, 2, 3, 4, or 5).

### Rectangular Data

The typical frame of reference for an analysis in data science is a rectangular data
object, like a spreadsheet or database table.

Rectangular data is the general term for a two-dimensional matrix with rows indicat‐
ing records (cases) and columns indicating features (variables); data frame is the spe‐
cific format in R and Python.

#### Data frame
Rectangular data (like a spreadsheet) is the basic data structure for statistical and
machine learning models.

### Data Frames and Indexes

Traditional database tables have one or more columns designated as an index, essen‐
tially a row number. This can vastly improve the efficiency of certain database quer‐
ies. In Python, with the pandas library, the basic rectangular data structure is a
DataFrame object. By default, an automatic integer index is created for a DataFrame
based on the order of the rows. In pandas , it is also possible to set multilevel/hier‐
archical indexes to improve the efficiency of certain operations.

### Nonrectangular Data Structures

There are other data structures besides rectangular data. 
- Time series data records successive measurements of the same variable.
- Spatial data structures, which are used in mapping and location analytics, are more complex and varied than rectangular data structures.
- Graph (or network) data structures are used to represent physical, social, and abstract relationships. For example, a graph of a social network, such as Facebook or LinkedIn, may represent connections between people on the network.

### Estimates of Location

#### Key Terms for Estimates of Location:

- Mean: The sum of all values divided by the number of values.
- Weighted mean: The sum of all values times a weight divided by the sum of the weights.
- Median: The value such that one-half of the data lies above and below.
- Percentile (Quantile): The value such that P percent of the data lies below.
- Weighted median: The value such that one-half of the sum of the weights lies above and below the sorted data.
- Trimmed mean: The average of all values after dropping a fixed number of extreme values.

A variation of the mean is a **trimmed mean**, which you calculate by dropping a fixed number of sorted values at each end and then taking an average of the remaining values. **A trimmed mean eliminates the influence of extreme values.**

Another type of mean is a **weighted mean**, which you calculate by multiplying each
data value x i by a user-specified weight w i and dividing their sum by the sum of the
weights.
