## Sumário

O objectivo deste tutorial (ou manual, em português arcaico;) ) é apresentar as funcionalidades da biblioteca Pandas para análise de dados.
O tutorial está organizado em 5 capítulos:

I - Importação de csv e excel

II - Análise exploratória de dados

III - Index, Slice, Select

IV - Limpeza e transformação de dados

V - Visualização de dados

Este notebook debruça-se sobre os capítulos I e II.

Cada capítulo segue a seguinte estrutura: introdução, tabela sumário, secções e subsecções.



## Índice <a class="anchor"  id="Índice"></a>
* [**I. Importação de csv e excel**](#I)
* [Introdução](#Ii)
* [Tabela sumário](#It)
* [1. CSV](#I1) 
* [2. Excel](#I2)
* [**II. Análise exploratória de dados**](#II)
* [Introdução](#IIi)
* [Tabela sumário](#IIt)
* [1. Primeiro contacto com os dados](#II1) 
    * [1.1 Dimensão da tabela e pré-visualização dos dados](#II11)
    * [1.2 Colunas: nome e tipo de dados](#II12)
* [2. Conhecer melhor as colunas](#II2)
    * [2.1 Valores distintos: quantos, quais e frequência](#II21)
    * [2.2 Identificar colunas com NA](#II22)
    * [2.3 Excluir NA](#II23)
* [3. Calcular estatísticas de cada coluna](#II3)
    * [3.1 describe()](#II32)
    * [3.2 Estatística descritiva](#II32)
    

# [**I. Importação de csv e excel**](#Índice)  <a class="anchor"  id="c1"></a>

# [**Introdução**](#Índice)  <a class="anchor"  id="Ii"></a>

Neste capítulo vamos importar dados de ficheiros/arquivos de texto em formato csv e de folhas de cálculo em formato excel. 
Para adicionar um dataset ao notebook basta seleccionar "Add Data" no separador lateral. Neste tutorial usamos um csv extraído do Banco Mundial com diversos indicadores.


# [**Tabela sumário**](#Índice) <a class="anchor"  id="It"></a>

Objectivo|Funcionalidade           |Sintaxe|
:----|:---------------------------------|:---------------------------------|
Importar csv|.read_csv()            | tabela = pd.read_csv('localização do arquivo/ficheiro')  |
Importar excel|.read_excel()           | tabela = pd.read_csv('localização do arquivo/ficheiro') |
Visualizar número total de linhas e de colunas |.shape   |table.shape|
Visualizar primeiras linhas    |.head()             | tabela.head() | 
Visualizar últimas linhas |.tail()     |table.tail()|



# [**1. Importação de CSV**](#Índice)  <a class="anchor"  id="I1"></a>

In [1]:
#Importar pandas
import pandas as pd

file = '/kaggle/input/gender-statistics-1990-2021/88c52f5e-f72c-406d-a2e2-31309c9953de_Data.csv'

table1 = pd.read_csv(file)

# Averiguar número de linhas e de colunas (linhas, colunas)
table1.shape

(3197, 15)

In [2]:
table1.head(3)

Unnamed: 0,Time,Country Name,Firms with female top manager (% of firms),"Labor force, female (% of total labor force)","Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)","Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)","Fertility rate, total (births per woman)","Literacy rate, adult female (% of females ages 15 and above)","Literacy rate, adult total (% of people ages 15 and above)","Literacy rate, adult male (% of males ages 15 and above)","Government expenditure on education, total (% of GDP)",Unmet need for contraception (% of married women ages 15-49),"Women making their own informed decisions regarding sexual relations, contraceptive use and reproductive health care (% of women age 15-49)",Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who were first married by age 15 (% of women ages 20-24)
0,1990,Afghanistan,,16.866045,,,7.565,,,,,,,,
1,1990,Albania,,40.805673,,,2.9,,,,,,,,
2,1990,Algeria,,12.238081,,,4.556,,,,,,,,


In [3]:
table1.tail()

Unnamed: 0,Time,Country Name,Firms with female top manager (% of firms),"Labor force, female (% of total labor force)","Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)","Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)","Fertility rate, total (births per woman)","Literacy rate, adult female (% of females ages 15 and above)","Literacy rate, adult total (% of people ages 15 and above)","Literacy rate, adult male (% of males ages 15 and above)","Government expenditure on education, total (% of GDP)",Unmet need for contraception (% of married women ages 15-49),"Women making their own informed decisions regarding sexual relations, contraceptive use and reproductive health care (% of women age 15-49)",Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who were first married by age 15 (% of women ages 20-24)
3192,,,,,,,,,,,,,,,
3193,,,,,,,,,,,,,,,
3194,,,,,,,,,,,,,,,
3195,Data from database: World Development Indicators,,,,,,,,,,,,,,
3196,Last Updated: 10/26/2023,,,,,,,,,,,,,,


# [**2. Importação de excel**](#Índice) <a class="anchor"  id="I2"></a>

In [4]:
file2 = '/kaggle/input/gender-statistics-1990-2021/P_Data_Extract_From_World_Development_Indicators.xlsx'

table2 = pd.read_excel(file2)

table2.shape

(271, 65)

In [5]:
table1.head()

Unnamed: 0,Time,Country Name,Firms with female top manager (% of firms),"Labor force, female (% of total labor force)","Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)","Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)","Fertility rate, total (births per woman)","Literacy rate, adult female (% of females ages 15 and above)","Literacy rate, adult total (% of people ages 15 and above)","Literacy rate, adult male (% of males ages 15 and above)","Government expenditure on education, total (% of GDP)",Unmet need for contraception (% of married women ages 15-49),"Women making their own informed decisions regarding sexual relations, contraceptive use and reproductive health care (% of women age 15-49)",Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who were first married by age 15 (% of women ages 20-24)
0,1990,Afghanistan,,16.866045,,,7.565,,,,,,,,
1,1990,Albania,,40.805673,,,2.9,,,,,,,,
2,1990,Algeria,,12.238081,,,4.556,,,,,,,,
3,1990,American Samoa,,,,,,,,,,,,,
4,1990,Andorra,,,,,,,,,,,,,


In [6]:
table2.tail()

Unnamed: 0,Series Name,Country Name,1960,1961,1962,1963,1964,1965,1966,1967,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
266,,,,,,,,,,,...,,,,,,,,,,
267,,,,,,,,,,,...,,,,,,,,,,
268,,,,,,,,,,,...,,,,,,,,,,
269,Data from database: World Development Indicators,,,,,,,,,,...,,,,,,,,,,
270,Last Updated: 10/26/2023,,,,,,,,,,...,,,,,,,,,,


# [**II. Análise exploratória de dados**](#Índice)  <a class="anchor"  id="II"></a>

# [**Introdução**](#Índice)  <a class="anchor"  id="IIi"></a>

Neste capítulo vamos percorrer algumas das funcionalidades usadas na exploração de dados. No fundo, falamos de ferramentas que nos permitem conhecer os dados que temos em mãos. 


# [**Tabela sumário**](#Índice) <a class="anchor"  id="IIt"></a>

Objectivo|Funcionalidade           |Sintaxe|
:----|:---------------------------------|:---------------------------------|
Número total de linhas e de colunas |.shape   |table.shape|
Sumário da tabela              |.info()             |table.info()|
Visualizar primeiras linhas    |.head()             |table.head() | 
Visualizar últimas linhas      |.tail()             |table.tail()|
Lista com nome das colunas     |.columns            |table.columns
Averiguar o tipo de dados de cada coluna |.dtypes   |table.dtypes
Número de valores distintos numa coluna|.nunique()  |table['column'].nunique()
Lista de valores distintos numa coluna|.unique()    |table['column'].unique()
Frequência de cada valor distinto numa coluna|.value_counts()| table[column].value_counts()
Estatísticas várias            |.describe()         |table.describe()|
Média                          |.mean()             |column.mean()
Mediana                        |.median()           |column.median()
Desvio padrão                  |.std()              |column.std()
Coeficiente de correlação de Pearson|.corr()        |column1.corr(column2,  method='pearson', min_periods=None)
Covariância                    |.cov()              |column1.cov(column2, min_periods=None)
Moda| .mode()|df.mode()
|.isna()/.isnull()|
|.notna()/notnull()|
               


# [**1. Primeiro contacto com os dados**](#Índice)  <a class="anchor"  id="II1"></a>

## [**1.1 Dimensão da tabela e pré-visualização dos dados**](#Índice)  <a class="anchor"  id="II11"></a>

In [7]:
#O tamanho conta! Vamos ver quantas linhas e quantas colunas tem a nossa tabela.
#(linhas, colunas)

table1.shape

(3197, 15)

In [8]:
#Espreitar as primeiras linhas da tabela

table1.head()

Unnamed: 0,Time,Country Name,Firms with female top manager (% of firms),"Labor force, female (% of total labor force)","Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)","Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)","Fertility rate, total (births per woman)","Literacy rate, adult female (% of females ages 15 and above)","Literacy rate, adult total (% of people ages 15 and above)","Literacy rate, adult male (% of males ages 15 and above)","Government expenditure on education, total (% of GDP)",Unmet need for contraception (% of married women ages 15-49),"Women making their own informed decisions regarding sexual relations, contraceptive use and reproductive health care (% of women age 15-49)",Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who were first married by age 15 (% of women ages 20-24)
0,1990,Afghanistan,,16.866045,,,7.565,,,,,,,,
1,1990,Albania,,40.805673,,,2.9,,,,,,,,
2,1990,Algeria,,12.238081,,,4.556,,,,,,,,
3,1990,American Samoa,,,,,,,,,,,,,
4,1990,Andorra,,,,,,,,,,,,,


In [9]:
#Ou as últimas...

table1.tail()

Unnamed: 0,Time,Country Name,Firms with female top manager (% of firms),"Labor force, female (% of total labor force)","Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)","Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)","Fertility rate, total (births per woman)","Literacy rate, adult female (% of females ages 15 and above)","Literacy rate, adult total (% of people ages 15 and above)","Literacy rate, adult male (% of males ages 15 and above)","Government expenditure on education, total (% of GDP)",Unmet need for contraception (% of married women ages 15-49),"Women making their own informed decisions regarding sexual relations, contraceptive use and reproductive health care (% of women age 15-49)",Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who were first married by age 15 (% of women ages 20-24)
3192,,,,,,,,,,,,,,,
3193,,,,,,,,,,,,,,,
3194,,,,,,,,,,,,,,,
3195,Data from database: World Development Indicators,,,,,,,,,,,,,,
3196,Last Updated: 10/26/2023,,,,,,,,,,,,,,


## [**1.2 Colunas: nome e tipo de dados**](#Índice)  <a class="anchor"  id="II12"></a>

In [10]:
#Obter uma lista com os nomes das colunas

table1.columns

Index(['Time', 'Country Name', 'Firms with female top manager (% of firms)',
       'Labor force, female (% of total labor force)',
       'Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)',
       'Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)',
       'Fertility rate, total (births per woman)',
       'Literacy rate, adult female (% of females ages 15 and above)',
       'Literacy rate, adult total (% of people ages 15 and above)',
       'Literacy rate, adult male (% of males ages 15 and above)',
       'Government expenditure on education, total (% of GDP)',
       'Unmet need for contraception (% of married women ages 15-49)',
       'Women making their own informed decisions regarding sexual relations, contraceptive use and reproductive health care (% of women age 15-49)',
       'Women who believe a husband is justified in beating his wife when she goes out without telling him (%)',
       'Women who were f

In [11]:
table1.columns.to_list()

['Time',
 'Country Name',
 'Firms with female top manager (% of firms)',
 'Labor force, female (% of total labor force)',
 'Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)',
 'Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)',
 'Fertility rate, total (births per woman)',
 'Literacy rate, adult female (% of females ages 15 and above)',
 'Literacy rate, adult total (% of people ages 15 and above)',
 'Literacy rate, adult male (% of males ages 15 and above)',
 'Government expenditure on education, total (% of GDP)',
 'Unmet need for contraception (% of married women ages 15-49)',
 'Women making their own informed decisions regarding sexual relations, contraceptive use and reproductive health care (% of women age 15-49)',
 'Women who believe a husband is justified in beating his wife when she goes out without telling him (%)',
 'Women who were first married by age 15 (% of women ages 20-24)']

In [12]:
"""
Averiguar o tipo de dados presente em cada coluna.
Pode ser útil para identificar necessidades de transformação.
Por exemplo, converter texto (string) num formato numérico.
Ou uma coluna com datas mas cujo tipo de dados é importado como texto,
podendo ser necessário converter em date/datetime.
"""

table1.dtypes

Time                                                                                                                                            object
Country Name                                                                                                                                    object
Firms with female top manager (% of firms)                                                                                                     float64
Labor force, female (% of total labor force)                                                                                                   float64
Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)                                                           float64
Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)                                                             float64
Fertility rate, total (births per woman)                                                      


# [**2. Conhecer melhor uma coluna específica**](#Índice)  <a class="anchor"  id="II2"></a>

## [**2.1 Valores distintos: quantos, quais e frequência**](#Índice)  <a class="anchor"  id="II21"></a>

Objectivo|Funcionalidade           |Sintaxe|
:----|:---------------------------------|:---------------------------------|
Número de valores distintos numa coluna|.nunique()  |table['column'].nunique()
Lista de valores distintos numa coluna|.unique()    |table['column'].unique()
Frequência de cada valor distinto numa coluna|.value_counts()| 

In [13]:
"""Averiguar quantos valores distintos existem numa coluna específica, 
neste caso, 'Time'.
"""

table1['Time'].nunique()

14

Neste caso, ficamos a saber que apesar de o período ir de 1990 até 2021, apenas 14 anos constam dos dados. Quais serão os anos em falta?

In [14]:
#Lista com a identificação dos valores distintos da coluna 'Time'

table1['Time'].unique()

array(['1990', '2000', '2013', '2014', '2015', '2016', '2017', '2018',
       '2019', '2020', '2021', '2022', nan,
       'Data from database: World Development Indicators',
       'Last Updated: 10/26/2023'], dtype=object)

Percebemos agora que a periodicidade de reporte dos dados só passou a ser anual no ano de 2013.

In [15]:
#Frequência de cada valor distinto.
#Neste caso, verificamos que há 266 linhas para cada ano diferente.
#Por defeito, value_counts(dropNA=True)

table1['Time'].value_counts()

Time
1990                                                266
2000                                                266
2013                                                266
2014                                                266
2015                                                266
2016                                                266
2017                                                266
2018                                                266
2019                                                266
2020                                                266
2021                                                266
2022                                                266
Data from database: World Development Indicators      1
Last Updated: 10/26/2023                              1
Name: count, dtype: int64

In [16]:
"""
Frequência de cada valor distinto.
Neste caso, não se exclui os NA e são apresentados na listagem: NaN = 3
Estes NA dizem respeito a linhas na coluna 'Time'.
Há muitos outros NA nas restantes colunas.
"""

table1['Time'].value_counts(dropna=False)

Time
1990                                                266
2000                                                266
2013                                                266
2014                                                266
2015                                                266
2016                                                266
2017                                                266
2018                                                266
2019                                                266
2020                                                266
2021                                                266
2022                                                266
NaN                                                   3
Data from database: World Development Indicators      1
Last Updated: 10/26/2023                              1
Name: count, dtype: int64

## [**2.2 Identificar colunas com NA**](#Índice)  <a class="anchor"  id="II22"></a>

Objectivo|Funcionalidade           |Sintaxe|Pergunta | Resposta
:----|:---------------------------------|:---------------------------------|:-----|:-----
Identificar célula como sendo NA|.isna()/.isnull()|table.isna()|A célula é NA?|True - É NA|
Identificar presença de  NA numa coluna |.isna().all/.isnull().all| table.isna().all()|Existem algum NA?| True - Existe pelo menos um NA na coluna
Apurar total de NA por coluna | .isna().sum()/.isnull().sum() | table.isna().sum()| Quantos NA na coluna?
Apurar total de NA na tabela | .isna().sum().sum()/""|table.isna().sum().sum()| Quantos NA na tabela?
 | | |||
  | | |||
Identificar célula como não sendo NA|.notna()/.notnull()|table.notna()|A célula é não NA?|True - Não é NA|
Identificar não existência de NA numa coluna | .notna()/notnull()| table.notna()|Não existem NA?|True - Não existe nenhum NA na coluna
Calcular total de não NA por coluna |.notna().sum()/notnull().sum() | table.notna()sum()|Quantos não NA?|

               

### isna()

In [17]:
"""Produz um tabela de dimensão igual à tabela em análise, em que cada
célula é preenchida com:
    True = caso o valor na tabela em análise seja NA;
    False = caso não seja.
"""

table1.isna()

Unnamed: 0,Time,Country Name,Firms with female top manager (% of firms),"Labor force, female (% of total labor force)","Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)","Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)","Fertility rate, total (births per woman)","Literacy rate, adult female (% of females ages 15 and above)","Literacy rate, adult total (% of people ages 15 and above)","Literacy rate, adult male (% of males ages 15 and above)","Government expenditure on education, total (% of GDP)",Unmet need for contraception (% of married women ages 15-49),"Women making their own informed decisions regarding sexual relations, contraceptive use and reproductive health care (% of women age 15-49)",Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who were first married by age 15 (% of women ages 20-24)
0,False,False,True,False,True,True,False,True,True,True,True,True,True,True,True
1,False,False,True,False,True,True,False,True,True,True,True,True,True,True,True
2,False,False,True,False,True,True,False,True,True,True,True,True,True,True,True
3,False,False,True,True,True,True,True,True,True,True,True,True,True,True,True
4,False,False,True,True,True,True,True,True,True,True,True,True,True,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3192,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True
3193,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True
3194,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True
3195,False,True,True,True,True,True,True,True,True,True,True,True,True,True,True


### isna().all()

In [18]:
"""
Acrescentando .all" ficamos com uma visão agregada.
False - nem todos os valores da coluna são NA;
True - todas as linhas dessa coluna são NA.
"""

table1.isna().all()

Time                                                                                                                                           False
Country Name                                                                                                                                   False
Firms with female top manager (% of firms)                                                                                                     False
Labor force, female (% of total labor force)                                                                                                   False
Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)                                                           False
Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)                                                             False
Fertility rate, total (births per woman)                                                                  

### isna().sum()

In [19]:
#Identifica o total de NA para cada coluna

table1.isna().sum()

Time                                                                                                                                              3
Country Name                                                                                                                                      5
Firms with female top manager (% of firms)                                                                                                     2983
Labor force, female (% of total labor force)                                                                                                    385
Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)                                                           3125
Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)                                                             3125
Fertility rate, total (births per woman)                                                                        

### isna().sum().sum()

In [20]:
table1.isna().sum().sum()

30333

### notna()

In [21]:
"""Produz um tabela de dimensão igual à tabela em análise, em que cada
célula é preenchida com:
    True = caso o valor na tabela em análise seja NA;
    False = caso não seja.
"""


table1.notna()

Unnamed: 0,Time,Country Name,Firms with female top manager (% of firms),"Labor force, female (% of total labor force)","Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)","Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)","Fertility rate, total (births per woman)","Literacy rate, adult female (% of females ages 15 and above)","Literacy rate, adult total (% of people ages 15 and above)","Literacy rate, adult male (% of males ages 15 and above)","Government expenditure on education, total (% of GDP)",Unmet need for contraception (% of married women ages 15-49),"Women making their own informed decisions regarding sexual relations, contraceptive use and reproductive health care (% of women age 15-49)",Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who were first married by age 15 (% of women ages 20-24)
0,True,True,False,True,False,False,True,False,False,False,False,False,False,False,False
1,True,True,False,True,False,False,True,False,False,False,False,False,False,False,False
2,True,True,False,True,False,False,True,False,False,False,False,False,False,False,False
3,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False
4,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3192,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3193,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3194,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3195,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False


### notna().all()

In [22]:
"""
Acrescentado "".all" ficamos com uma visão agregada.
False - existe pelo menos um NA;
True - não existe nenhum NA
"""

table1.notna().all()

Time                                                                                                                                           False
Country Name                                                                                                                                   False
Firms with female top manager (% of firms)                                                                                                     False
Labor force, female (% of total labor force)                                                                                                   False
Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)                                                           False
Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)                                                             False
Fertility rate, total (births per woman)                                                                  

### notna().sum()

In [23]:
"""
Total de valores não NA em cada coluna
"""

table1.notna().sum()

Time                                                                                                                                           3194
Country Name                                                                                                                                   3192
Firms with female top manager (% of firms)                                                                                                      214
Labor force, female (% of total labor force)                                                                                                   2812
Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)                                                             72
Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)                                                               72
Fertility rate, total (births per woman)                                                                        

In [24]:
#True - existe algum NA; False - não existe nenhum NA na coluna.

table1['Firms with female top manager (% of firms)'].isna()

0       True
1       True
2       True
3       True
4       True
        ... 
3192    True
3193    True
3194    True
3195    True
3196    True
Name: Firms with female top manager (% of firms), Length: 3197, dtype: bool

In [25]:
"""Com .all ficamos com uma visão agregada.
True - existe pelo menos um NA nesta coluna; False - não há nenhum NA nesta coluna.
"""

table1['Firms with female top manager (% of firms)'].isna().all

<bound method NDFrame._add_numeric_operations.<locals>.all of 0       True
1       True
2       True
3       True
4       True
        ... 
3192    True
3193    True
3194    True
3195    True
3196    True
Name: Firms with female top manager (% of firms), Length: 3197, dtype: bool>

In [26]:
#Existem 2983 valores em falta nesta coluna.

table1['Firms with female top manager (% of firms)'].isna().sum()

2983

## [**2.3 Excluir NA**](#Índice)  <a class="anchor"  id="II23"></a>

Objectivo|Funcionalidade           |Sintaxe|Pergunta | Resposta
:----|:---------------------------------|:---------------------------------|:-----|:-----
Excluir linhas com valores NA|.isna()/.isnull()|table[table.isna()]

               

In [27]:
#????

table1[table1.notna()]

Unnamed: 0,Time,Country Name,Firms with female top manager (% of firms),"Labor force, female (% of total labor force)","Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)","Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)","Fertility rate, total (births per woman)","Literacy rate, adult female (% of females ages 15 and above)","Literacy rate, adult total (% of people ages 15 and above)","Literacy rate, adult male (% of males ages 15 and above)","Government expenditure on education, total (% of GDP)",Unmet need for contraception (% of married women ages 15-49),"Women making their own informed decisions regarding sexual relations, contraceptive use and reproductive health care (% of women age 15-49)",Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who were first married by age 15 (% of women ages 20-24)
0,1990,Afghanistan,,16.866045,,,7.565,,,,,,,,
1,1990,Albania,,40.805673,,,2.900,,,,,,,,
2,1990,Algeria,,12.238081,,,4.556,,,,,,,,
3,1990,American Samoa,,,,,,,,,,,,,
4,1990,Andorra,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3192,,,,,,,,,,,,,,,
3193,,,,,,,,,,,,,,,
3194,,,,,,,,,,,,,,,
3195,Data from database: World Development Indicators,,,,,,,,,,,,,,


In [28]:
#Exlui todas as linhas em que exista um NA na coluna indicada.
#notna() filtra as linhas que não são NA.

table1[table1['Firms with female top manager (% of firms)'].notna()]

Unnamed: 0,Time,Country Name,Firms with female top manager (% of firms),"Labor force, female (% of total labor force)","Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)","Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)","Fertility rate, total (births per woman)","Literacy rate, adult female (% of females ages 15 and above)","Literacy rate, adult total (% of people ages 15 and above)","Literacy rate, adult male (% of males ages 15 and above)","Government expenditure on education, total (% of GDP)",Unmet need for contraception (% of married women ages 15-49),"Women making their own informed decisions regarding sexual relations, contraceptive use and reproductive health care (% of women age 15-49)",Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who were first married by age 15 (% of women ages 20-24)
533,2013,Albania,12.2,41.853022,,,1.697,,,,3.539300,,,,
540,2013,Armenia,14.1,50.483535,,,1.600,,,,2.650180,,,,
544,2013,Azerbaijan,2.6,49.915723,,,1.980,99.716820,99.789360,99.865921,2.442130,,,,
547,2013,Bangladesh,4.8,28.559370,,,2.184,57.789001,61.015541,64.213768,1.966160,13.895225,,,
549,2013,Belarus,32.2,49.902654,,,1.668,,,,5.009420,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3015,2022,India,6.8,23.542186,,,,69.102150,76.322777,83.451813,,,,,
3018,2022,Iraq,1.6,14.402375,,,,,,,,,,,
3044,2022,Madagascar,37.3,48.970102,,,,76.029999,77.480003,78.930000,3.139020,,,,
3075,2022,Pakistan,3.4,23.311754,,,,,,,1.974449,,,,



# [**3. Conhecer as estatísticas de cada coluna**](#Índice)  <a class="anchor"  id="II3"></a>

## [**3.1 describe()**](#Índice)  <a class="anchor"  id="II31"></a>

In [29]:
table1.describe()

Unnamed: 0,Firms with female top manager (% of firms),"Labor force, female (% of total labor force)","Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)","Proportion of time spent on unpaid domestic and care work, male (% of 24 hour day)","Fertility rate, total (births per woman)","Literacy rate, adult female (% of females ages 15 and above)","Literacy rate, adult total (% of people ages 15 and above)","Literacy rate, adult male (% of males ages 15 and above)","Government expenditure on education, total (% of GDP)",Unmet need for contraception (% of married women ages 15-49),"Women making their own informed decisions regarding sexual relations, contraceptive use and reproductive health care (% of women age 15-49)",Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who were first married by age 15 (% of women ages 20-24)
count,214.0,2812.0,72.0,72.0,2840.0,899.0,900.0,898.0,1940.0,299.0,57.0,75.0,158.0
mean,17.735595,40.716067,17.473393,7.488555,2.853734,76.850282,81.081614,85.313455,4.35355,19.678702,52.021053,27.065333,6.259494
std,9.799419,8.948439,4.220799,3.048831,1.450443,19.950783,16.554112,13.248059,1.952922,8.707947,21.503386,16.942251,6.599479
min,0.9,7.786072,5.02051,2.22222,0.772,12.79642,22.31155,31.328791,0.127174,3.49283,4.9,0.8,0.0
25%,11.675,38.26428,15.289528,4.561298,1.707823,61.099674,67.991722,75.301466,3.12499,12.653605,40.2,14.45,1.425
50%,17.236111,43.91277,16.819445,8.25119,2.374071,82.291847,85.449062,88.397068,4.132104,19.7,56.7,25.5,4.6
75%,22.075,46.620032,19.604105,9.80417,3.826809,94.119415,95.422932,96.957411,5.286955,25.559743,64.9,38.95,8.775
max,64.8,53.877323,29.52199,13.81944,8.606,100.0,100.0,100.0,15.585125,40.2,96.2,66.9,38.2


## [**3.2 Estatística descritiva**](#Índice)  <a class="anchor"  id="II32"></a>

Objectivo|Funcionalidade           |Sintaxe|
:----|:---------------------------------|:---------------------------------|
Média                          |.mean()             |column.mean()
Mediana                        |.median()           |column.median()
Desvio padrão                  |.std()              |column.std()
Coeficiente de correlação de Pearson|.corr()        |column1.corr(column2,  method='pearson', min_periods=None)
Covariância                    |.cov()              |column1.cov(column2, min_periods=None)
Moda| .mode()|column.mode()

               

In [30]:
#Média

table1['Firms with female top manager (% of firms)'].mean()

17.735595133784226

In [31]:
#Mediana

table1['Firms with female top manager (% of firms)'].median()

17.2361111111111

In [32]:
#Desvio padrão

table1['Firms with female top manager (% of firms)'].std()

9.799418689843838

In [33]:
#Moda

table1['Firms with female top manager (% of firms)'].mode()

0    14.0
Name: Firms with female top manager (% of firms), dtype: float64

In [34]:
#Coeficiente de correlação entre duas variáveis(colunas)

table1['Firms with female top manager (% of firms)'].corr(table1['Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)'], method='pearson', min_periods=None)

-0.04073444659048812

In [35]:
#Covariância entre duas variáveis (colunas)

table1['Firms with female top manager (% of firms)'].cov(table1['Proportion of time spent on unpaid domestic and care work, female (% of 24 hour day)'], min_periods=None)

-2.39658245238095

#### [Voltar ao Índice](#Índice)