<a href="https://colab.research.google.com/github/brenoslivio/Statistics-Python/blob/main/1-DescriptiveAnalysis/1_DescriptiveAnalysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Descriptive Analysis

*There are many ways to analyse data, but one of the most known methods in Statistics is probably descriptive analysis,  which seeks to describe or summarize past and present data, helping to create accessible data insights. It's the field of Statistics that deals with the description and summarization of data. It will be given some pratical examples of these kind of analysis using a dataset. Recommended text for the subject are the Chapters 2 and 3 from **Introductory Statistics** by **Sheldon M. Ross**, which will be our main source of knowledge in this notebook.*



---



## Table of contents


1. [Data type](#type)

  1.1 [Qualitative variables](#qualita)

  * [Nominal](#nominal)

  * [Ordinal](#ordinal)

  1.2 [Quantitative variables](#quantita)

  * [Discrete](#discrete)

  * [Continuous](#cont)

2. [Measures of position](#mespos)

  2.1 [Quantiles](#quantiles)

  * [Percentiles](#percentiles)

  * [Deciles](#deciles)

  * [Quartiles](#quartiles)

  * [Median](#median)

  2.2 [Mean](#mean)

  2.3 [Mode](#mode)

3. [Measures of dispersion](#mesdis)

  3.1 [Variance](#variance)

  3.2 [Standard deviation](#sd)

  3.3 [Range](#range)

  3.4 [Interquantile range](#iqr)

4. [Table of frequencies](#table)

  4.1 [Absolute](#absolute)

  4.2 [Relative](#relative)

  4.3 [Simple](#simple)

  4.4 [Cumulative](#cumulative)

5. [Graphs](#graphs)

  5.1 [Pie chart](#pie)

  5.2 [Bar](#bar)

  5.3 [Boxplot](#boxplot)

  5.4 [Dispersion](#dispersion)

  5.5 [Line](#line)

6. [Practicing with a dataset](#practice)





---



## Data type <a name="type"></a>



### Qualitative variables <a name="qualita"></a>

#### Nominal <a name="nominal"></a>

#### Ordinal <a name="ordinal"></a>

### Quantitative variables <a name="quantita"></a>

#### Discrete <a name="discrete"></a>

#### Continuous <a name="cont"></a>

## Measures of position <a name="mespos"></a>

### Quantiles <a name="quantiles"></a>

#### Percentiles <a name="percentiles"></a>

#### Deciles <a name="deciles"></a>

#### Quartiles <a name="quartiles"></a>

#### Median <a name="median"></a>

### Mean <a name="mean"></a>

### Mode <a name="mode"></a>

## Measures of dispersion <a name="mesdis"></a>

### Variance <a name="variance"></a>

### Standard deviation (population and sample) <a name="sd"></a>

### Range <a name="range"></a>

### Interquantile range <a name="iqr"></a>



## Table of frequencies <a name="table"></a>

### Absolute <a name="absolute"></a>

### Relative <a name="relative"></a>

### Simple <a name="simple"></a>

### Cumulative <a name="cumulative"></a>



## Graphs <a name="graphs"></a>

### Pie chart <a name="pie"></a>

### Bar <a name="bar"></a>

### Boxplot <a name="boxplot"></a>

### Dispersion <a name="dispersion"></a>

### Line <a name="line"></a>



## Practicing with a dataset <a name="practice"></a>

Now we will practice with we saw in the notebook using a dataset in Brazilian Portuguese.

### Loading the dataset

In [7]:
import pandas as pd
import numpy as np

dfPoll = pd.read_csv("https://raw.githubusercontent.com/brenoslivio/Statistics-Python/main/1-DescriptiveAnalysis/dataset.csv",
    dtype={
        "Qual sua idade?": np.int32,
        "Qual sua altura em metros?": np.float64,
        "Seu peso em kg.": np.int32,
        "Sexo?": str,
        "Grau de escolaridade?": str,
        "Em qual estado você nasceu?": str,
        "Em quantos irmãos vocês são (contando contigo)?": np.int32,
        "Quantos membros tem sua família? (Quantos moram contigo, ou 1 caso more sozinho)": np.int32,
        "Você trabalha/estuda atualmente?": str,
        "Qual atividade realiza com mais frequência?": str,
    },
    na_values="",
)

dfPoll.columns =['age', 'height_m', 'weight_kg', 'sex', 'schooling', 'state', 'brothers_you', 'family_members', 'work_study', 'freq_actitivity'] 

dfPoll

Unnamed: 0,age,height_m,weight_kg,sex,schooling,state,brothers_you,family_members,work_study,freq_actitivity
0,20,1.82,85,M,Ensino superior (completo ou incompleto),SP,3,5,trabalho e estudo,Cozinhar
1,25,1.72,64,M,Pós-graduação (completo ou incompleto),MS,5,1,estudo,Rede social
2,29,1.82,65,M,Pós-graduação (completo ou incompleto),SP,1,2,estudo,Ler livro
3,27,1.63,60,F,Pós-graduação (completo ou incompleto),MS,2,4,trabalho,Assistir TV
4,28,1.57,63,F,Ensino superior (completo ou incompleto),MS,2,3,estudo,Rede social
...,...,...,...,...,...,...,...,...,...,...
63,22,1.81,92,M,Ensino superior (completo ou incompleto),SP,3,4,estudo,Ler livro
64,25,1.55,56,F,Ensino superior (completo ou incompleto),SP,3,5,estudo,Rede social
65,21,1.71,65,F,Ensino superior (completo ou incompleto),MG,2,2,trabalho e estudo,Rede social
66,18,1.67,75,F,Ensino superior (completo ou incompleto),SP,4,4,estudo,Ler livro


In [10]:
dfPoll['height_m'].quantile()

1.705