###  Why describing data with descriptives is important

Descriptives are that part of statistics that quantitatively describe or summarize features of a collection of information. Aim of descriptive statistics is to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent. 

For example, in papers reporting on human subjects, typically a table is included giving the overall sample size, sample sizes in important subgroups (e.g., for each treatment or exposure group), and demographic or clinical characteristics such as the average age, the proportion of subjects of each sex etc

Some measures that are commonly used to describe a data set are measures of central tendency and measures of variability or dispersion. Measures of central tendency include the mean, median and mode, while measures of variability include the standard deviation (or variance), the minimum and maximum values of the variables, kurtosis and skewness.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

% matplotlib inline

In [3]:
train = pd.read_csv('train.csv')

Getting the mean of all the columns in the dataset

In [4]:
train.mean()

PassengerId    446.000000
Survived         0.383838
Pclass           2.308642
Age             29.699118
SibSp            0.523008
Parch            0.381594
Fare            32.204208
dtype: float64

Mean is one of the descriptive statistic that gives information about the midpoint of the data variable. When combined with other descriptives like variance and standard deviation it describes the data.

let us find the number of people survived in titanic. For this we just can't count all the incidents so we use value_counts to get a clear statistic.

In [9]:
train.Survived.value_counts()

0    549
1    342
Name: Survived, dtype: int64

Mode gives you the highest frequency occurance in the data column.

In [12]:
train.Pclass.mode()

0    3
dtype: int64

Standard deviation tells you how much your data is deviating from the mean value.

In [15]:
train.Age.std()

14.526497332334044

If we want to know how far our dataset is from the average value then we might use var command to get its value.

In [17]:
train.Pclass.var()

0.6990151199889065

Using descriptives to understand our data gives you the clear picture of how our data is distributed and get a clear view of its properties.