## pré-processamento de dados tabulares

In [None]:
from fastai.gen_doc.nbdoc import *
from fastai.tabular import *


## visão global

Este pacote contém o classe de base para definir uma transformação para o pré-processamento de dados dataframes tabulares, assim como [`TabularProc`](/tabular.transform.html#TabularProc) básico. Pré-processamento inclui coisas como
- substituição de variáveis ​​não-numéricas por categorias, então seus ids,
- preenchimento faltando valores,
- normalizar as variáveis ​​contínuas.
Em todos os passos que temos de ter o cuidado de usar a correspondência que decidir sobre o nosso conjunto de treinamento (que id damos a cada categoria, o que é o valor que colocamos para os dados em falta, ou como a média / std usamos para normalizar) em nossa validação ou conjunto de teste. Para lidar com isso, usamos uma classe especial chamada [`TabularProc`](/tabular.transform.html#TabularProc).
Os dados utilizados neste página do documento é um subconjunto do [adult dataset](https://archive.ics.uci.edu/ml/datasets/adult). Ele dá uma certa quantidade de dados sobre indivíduos para treinar um modelo para prever se o seu salário é maior que \ $ 50k ou não.

In [None]:
path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')
train_df, valid_df = df.iloc[:800].copy(), df.iloc[800:1000].copy()
train_df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,49,Private,101320,Assoc-acdm,12.0,Married-civ-spouse,,Wife,White,Female,0,1902,40,United-States,>=50k
1,44,Private,236746,Masters,14.0,Divorced,Exec-managerial,Not-in-family,White,Male,10520,0,45,United-States,>=50k
2,38,Private,96185,HS-grad,,Divorced,,Unmarried,Black,Female,0,0,32,United-States,<50k
3,38,Self-emp-inc,112847,Prof-school,15.0,Married-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,0,0,40,United-States,>=50k
4,42,Self-emp-not-inc,82297,7th-8th,,Married-civ-spouse,Other-service,Wife,Black,Female,0,0,50,United-States,<50k


Vemos que contém variáveis ​​numéricas (como `age` ou` educação num`), bem como os categóricos (como `workclass` ou` relationship`). O conjunto de dados original está limpo, mas nós removemos alguns valores que dêem exemplos de lidar com variáveis ​​que faltam.

In [None]:
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']
cont_names = ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week']

## Transforma para dados tabulares

In [None]:
show_doc(TabularProc)

<h2 id="TabularProc" class="doc_header"><code>class</code> <code>TabularProc</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L116" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#TabularProc-pytest" style="float:right; padding-right:10px">[test]</a></h2>

> <code>TabularProc</code>(**`cat_names`**:`StrList`, **`cont_names`**:`StrList`)

<div class="collapse" id="TabularProc-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#TabularProc-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>No tests found for <code>TabularProc</code>. To contribute a test please refer to <a href="/dev/test.html">this guide</a> and <a href="https://forums.fast.ai/t/improving-expanding-functional-tests/32929">this discussion</a>.</p></div></div>

A processor for tabular dataframes.  

Classe base para criação de transformações para dataframes com variáveis ​​categóricas `cat_names` e variáveis ​​contínuas` cont_names`. Note que qualquer coluna não em uma dessas listas não serão tocados.

In [None]:
show_doc(TabularProc.__call__)

<h4 id="TabularProc.__call__" class="doc_header"><code>__call__</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L121" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#TabularProc-__call__-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>__call__</code>(**`df`**:`DataFrame`, **`test`**:`bool`=***`False`***)

<div class="collapse" id="TabularProc-__call__-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#TabularProc-__call__-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>No tests found for <code>__call__</code>. To contribute a test please refer to <a href="/dev/test.html">this guide</a> and <a href="https://forums.fast.ai/t/improving-expanding-functional-tests/32929">this discussion</a>.</p></div></div>

Apply the correct function to `df` depending on `test`.  

In [None]:
show_doc(TabularProc.apply_train)

<h4 id="TabularProc.apply_train" class="doc_header"><code>apply_train</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L126" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#TabularProc-apply_train-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>apply_train</code>(**`df`**:`DataFrame`)

<div class="collapse" id="TabularProc-apply_train-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#TabularProc-apply_train-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>apply_train</code>:</p><p>Some other tests where <code>apply_train</code> is used:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_categorify</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L6" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_leaves_no_na_values</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L38" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_returns_correct_medians</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L52" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Function applied to `df` if it's the train set.  

In [None]:
show_doc(TabularProc.apply_test)

<h4 id="TabularProc.apply_test" class="doc_header"><code>apply_test</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L129" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#TabularProc-apply_test-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>apply_test</code>(**`df`**:`DataFrame`)

<div class="collapse" id="TabularProc-apply_test-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#TabularProc-apply_test-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>apply_test</code>:</p><p>Some other tests where <code>apply_test</code> is used:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_categorify</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L6" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_leaves_no_na_values</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L38" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_returns_correct_medians</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L52" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Function applied to `df` if it's the test set.  

In [None]:
jekyll_important("Those two functions must be implemented in a subclass. `apply_test` defaults to `apply_train`.")

<div markdown="span" class="alert alert-warning" role="alert"><i class="fa fa-warning-circle"></i> <b>Important: </b>Those two functions must be implemented in a subclass. `apply_test` defaults to `apply_train`.</div>

Os seguintes [`TabularProc`](/tabular.transform.html#TabularProc) são implementadas na biblioteca fastai. Note-se que a substituição de categorias de códigos, bem como a normalização das variáveis ​​contínuas são automaticamente feito em um [`TabularDataBunch`](/tabular.data.html#TabularDataBunch).

In [None]:
show_doc(Categorify)

<h2 id="Categorify" class="doc_header"><code>class</code> <code>Categorify</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L133" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#Categorify-pytest" style="float:right; padding-right:10px">[test]</a></h2>

> <code>Categorify</code>(**`cat_names`**:`StrList`, **`cont_names`**:`StrList`) :: [`TabularProc`](/tabular.transform.html#TabularProc)

<div class="collapse" id="Categorify-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#Categorify-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>Categorify</code>:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_categorify</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L6" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Transform the categorical variables to that type.  

Variáveis ​​em `cont_names` não são afetados.

In [None]:
show_doc(Categorify.apply_train)

<h4 id="Categorify.apply_train" class="doc_header"><code>apply_train</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L135" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#Categorify-apply_train-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>apply_train</code>(**`df`**:`DataFrame`)

<div class="collapse" id="Categorify-apply_train-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#Categorify-apply_train-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>apply_train</code>:</p><p>Some other tests where <code>apply_train</code> is used:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_categorify</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L6" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_leaves_no_na_values</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L38" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_returns_correct_medians</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L52" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Transform `self.cat_names` columns in categorical.  

In [None]:
show_doc(Categorify.apply_test)

<h4 id="Categorify.apply_test" class="doc_header"><code>apply_test</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L142" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#Categorify-apply_test-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>apply_test</code>(**`df`**:`DataFrame`)

<div class="collapse" id="Categorify-apply_test-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#Categorify-apply_test-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>apply_test</code>:</p><p>Some other tests where <code>apply_test</code> is used:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_categorify</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L6" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_leaves_no_na_values</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L38" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_returns_correct_medians</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L52" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Transform `self.cat_names` columns in categorical using the codes decided in `apply_train`.  

In [None]:
tfm = Categorify(cat_names, cont_names)
tfm(train_df)
tfm(valid_df, test=True)

Uma vez que não mudaram as categorias pelos respectivos códigos, nada visível mudou na trama de dados ainda, mas podemos verificar que as variáveis ​​estão agora categórica e ver seus códigos correspondentes.

In [None]:
train_df['workclass'].cat.categories

Index([' ?', ' Federal-gov', ' Local-gov', ' Private', ' Self-emp-inc',
       ' Self-emp-not-inc', ' State-gov', ' Without-pay'],
      dtype='object')

O conjunto de teste será dado os mesmos códigos de categoria que o conjunto de treinamento.

In [None]:
valid_df['workclass'].cat.categories

Index([' ?', ' Federal-gov', ' Local-gov', ' Private', ' Self-emp-inc',
       ' Self-emp-not-inc', ' State-gov', ' Without-pay'],
      dtype='object')

In [None]:
show_doc(FillMissing)

<h2 id="FillMissing" class="doc_header"><code>class</code> <code>FillMissing</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L150" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#FillMissing-pytest" style="float:right; padding-right:10px">[test]</a></h2>

> <code>FillMissing</code>(**`cat_names`**:`StrList`, **`cont_names`**:`StrList`, **`fill_strategy`**:[`FillStrategy`](/tabular.transform.html#FillStrategy)=***`<FillStrategy.MEDIAN: 1>`***, **`add_col`**:`bool`=***`True`***, **`fill_val`**:`float`=***`0.0`***) :: [`TabularProc`](/tabular.transform.html#TabularProc)

<div class="collapse" id="FillMissing-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#FillMissing-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>FillMissing</code>:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_default_fill_strategy_is_median</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L31" class="source_link" style="float:right">[source]</a></li></ul><p>Some other tests where <code>FillMissing</code> is used:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_leaves_no_na_values</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L38" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_returns_correct_medians</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L52" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Fill the missing values in continuous columns.  

`Variáveis ​​cat_names` são deixados intocados (seu valor em falta será substituído pelo código 0 no [`TabularDataBunch`](/tabular.data.html#TabularDataBunch)). [`fill_strategy`](#FillStrategy) é adotado para substituir os nans e se `add_col` é verdade, sempre que uma coluna` c` tem faltando valores, uma coluna chamada `é adicionado e bandeiras c_nan` a linha onde o valor foi faltando.

In [None]:
show_doc(FillMissing.apply_train)

<h4 id="FillMissing.apply_train" class="doc_header"><code>apply_train</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L155" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#FillMissing-apply_train-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>apply_train</code>(**`df`**:`DataFrame`)

<div class="collapse" id="FillMissing-apply_train-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#FillMissing-apply_train-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>apply_train</code>:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_leaves_no_na_values</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L38" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_returns_correct_medians</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L52" class="source_link" style="float:right">[source]</a></li></ul><p>Some other tests where <code>apply_train</code> is used:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_categorify</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L6" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Fill missing values in `self.cont_names` according to `self.fill_strategy`.  

In [None]:
show_doc(FillMissing.apply_test)

<h4 id="FillMissing.apply_test" class="doc_header"><code>apply_test</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L169" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#FillMissing-apply_test-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>apply_test</code>(**`df`**:`DataFrame`)

<div class="collapse" id="FillMissing-apply_test-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#FillMissing-apply_test-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>apply_test</code>:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_leaves_no_na_values</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L38" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_returns_correct_medians</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L52" class="source_link" style="float:right">[source]</a></li></ul><p>Some other tests where <code>apply_test</code> is used:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_categorify</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L6" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Fill missing values in `self.cont_names` like in `apply_train`.  

Preenche os valores em falta nas `colunas cont_names` com os escolhidos durante trem.

In [None]:
train_df[cont_names].head()

Unnamed: 0,age,fnlwgt,education-num,capital-gain,capital-loss,hours-per-week
0,49,101320,12.0,0,1902,40
1,44,236746,14.0,10520,0,45
2,38,96185,,0,0,32
3,38,112847,15.0,0,0,40
4,42,82297,,0,0,50


In [None]:
tfm = FillMissing(cat_names, cont_names)
tfm(train_df)
tfm(valid_df, test=True)
train_df[cont_names].head()

Unnamed: 0,age,fnlwgt,education-num,capital-gain,capital-loss,hours-per-week
0,49,101320,12.0,0,1902,40
1,44,236746,14.0,10520,0,45
2,38,96185,10.0,0,0,32
3,38,112847,15.0,0,0,40
4,42,82297,10.0,0,0,50


Valores em falta na coluna `educação num` são substituídas por 10, que é a mediana da coluna na` train_df`. As variáveis ​​categóricas não são alterados, uma vez que `nan` é usado simplesmente como uma outra categoria.

In [None]:
valid_df[cont_names].head()

Unnamed: 0,age,fnlwgt,education-num,capital-gain,capital-loss,hours-per-week
800,45,96975,10.0,0,0,40
801,46,192779,10.0,15024,0,60
802,36,376455,10.0,0,0,38
803,25,50053,10.0,0,0,45
804,37,164526,10.0,0,0,40


In [None]:
show_doc(FillStrategy, alt_doc_string='Enum flag represents determines how `FillMissing` should handle missing/nan values', arg_comments={
    'MEDIAN':'nans are replaced by the median value of the column',
    'COMMON': 'nans are replaced by the most common value of the column',
    'CONSTANT': 'nans are replaced by `fill_val`'
})

<h2 id="FillStrategy" class="doc_header">`FillStrategy`<a class="source_link" data-toggle="collapse" data-target="#FillStrategy-pytest" style="float:right; padding-right:10px">[test]</a></h2>

> <code>Enum</code> = [MEDIAN, COMMON, CONSTANT]

<div class="collapse" id="FillStrategy-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#FillStrategy-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>FillStrategy</code>:</p><p>Some other tests where <code>FillStrategy</code> is used:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_default_fill_strategy_is_median</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L31" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Enum flag represents determines how [`FillMissing`](/tabular.transform.html#FillMissing) should handle missing/nan values

- *MEDIAN*: nans are replaced by the median value of the column
- *COMMON*: nans are replaced by the most common value of the column
- *CONSTANT*: nans are replaced by `fill_val` 

In [None]:
show_doc(Normalize)

<h2 id="Normalize" class="doc_header"><code>class</code> <code>Normalize</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L181" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#Normalize-pytest" style="float:right; padding-right:10px">[test]</a></h2>

> <code>Normalize</code>(**`cat_names`**:`StrList`, **`cont_names`**:`StrList`) :: [`TabularProc`](/tabular.transform.html#TabularProc)

<div class="collapse" id="Normalize-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#Normalize-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>Normalize</code>:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_normalize</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L86" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Normalize the continuous variables.  

In [None]:
norm = Normalize(cat_names, cont_names)

In [None]:
show_doc(Normalize.apply_train)

<h4 id="Normalize.apply_train" class="doc_header"><code>apply_train</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L183" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#Normalize-apply_train-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>apply_train</code>(**`df`**:`DataFrame`)

<div class="collapse" id="Normalize-apply_train-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#Normalize-apply_train-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>apply_train</code>:</p><p>Some other tests where <code>apply_train</code> is used:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_categorify</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L6" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_leaves_no_na_values</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L38" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_returns_correct_medians</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L52" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Compute the means and stds of `self.cont_names` columns to normalize them.  

In [None]:
norm.apply_train(train_df)
train_df[cont_names].head()

Unnamed: 0,age,fnlwgt,education-num,capital-gain,capital-loss,hours-per-week
0,0.829039,-0.812589,0.981643,-0.136271,4.416656,-0.05023
1,0.443977,0.355532,2.07845,1.153121,-0.22876,0.361492
2,-0.018098,-0.856881,-0.115165,-0.136271,-0.22876,-0.708985
3,-0.018098,-0.713162,2.626854,-0.136271,-0.22876,-0.05023
4,0.289952,-0.976672,-0.115165,-0.136271,-0.22876,0.773213


In [None]:
show_doc(Normalize.apply_test)

<h4 id="Normalize.apply_test" class="doc_header"><code>apply_test</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L192" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#Normalize-apply_test-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>apply_test</code>(**`df`**:`DataFrame`)

<div class="collapse" id="Normalize-apply_test-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#Normalize-apply_test-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>apply_test</code>:</p><p>Some other tests where <code>apply_test</code> is used:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_categorify</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L6" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_leaves_no_na_values</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L38" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_tabular_transform.py::test_fill_missing_returns_correct_medians</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L52" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Normalize `self.cont_names` with the same statistics as in `apply_train`.  

In [None]:
norm.apply_test(valid_df)
valid_df[cont_names].head()

Unnamed: 0,age,fnlwgt,education-num,capital-gain,capital-loss,hours-per-week
800,0.520989,-0.850066,-0.115165,-0.136271,-0.22876,-0.05023
801,0.598002,-0.023706,-0.115165,1.705157,-0.22876,1.596657
802,-0.172123,1.560596,-0.115165,-0.136271,-0.22876,-0.214919
803,-1.01926,-1.254793,-0.115165,-0.136271,-0.22876,0.361492
804,-0.09511,-0.267403,-0.115165,-0.136271,-0.22876,-0.05023


## Tratar colunas de data

In [None]:
show_doc(add_datepart)

<h4 id="add_datepart" class="doc_header"><code>add_datepart</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L55" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#add_datepart-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>add_datepart</code>(**`df`**:`DataFrame`, **`field_name`**:`str`, **`prefix`**:`str`=***`None`***, **`drop`**:`bool`=***`True`***, **`time`**:`bool`=***`False`***)

<div class="collapse" id="add_datepart-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#add_datepart-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>add_datepart</code>:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_add_datepart</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L102" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Helper function that adds columns relevant to a date in the column `field_name` of `df`.  

Vai `drop` a coluna na` df` se o sinalizador é `true`. O `flag Tempo 'decide se formos para as partes de tempo ou ficar com as partes de data.

In [None]:
df = pd.DataFrame({'col1': ['02/03/2017', '02/04/2017', '02/05/2017'], 'col2': ['a', 'b', 'a']})
add_datepart(df, 'col1') # inplace
df.head()

Unnamed: 0,col2,col1Year,col1Month,col1Week,col1Day,col1Dayofweek,col1Dayofyear,col1Is_month_end,col1Is_month_start,col1Is_quarter_end,col1Is_quarter_start,col1Is_year_end,col1Is_year_start,col1Elapsed
0,a,2017,2,5,3,4,34,False,False,False,False,False,False,1486080000
1,b,2017,2,5,4,5,35,False,False,False,False,False,False,1486166400
2,a,2017,2,5,5,6,36,False,False,False,False,False,False,1486252800


In [None]:
show_doc(add_cyclic_datepart)

<h4 id="add_cyclic_datepart" class="doc_header"><code>add_cyclic_datepart</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L43" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#add_cyclic_datepart-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>add_cyclic_datepart</code>(**`df`**:`DataFrame`, **`field_name`**:`str`, **`prefix`**:`str`=***`None`***, **`drop`**:`bool`=***`True`***, **`time`**:`bool`=***`False`***, **`add_linear`**:`bool`=***`False`***)

<div class="collapse" id="add_cyclic_datepart-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#add_cyclic_datepart-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>No tests found for <code>add_cyclic_datepart</code>. To contribute a test please refer to <a href="/dev/test.html">this guide</a> and <a href="https://forums.fast.ai/t/improving-expanding-functional-tests/32929">this discussion</a>.</p></div></div>

Helper function that adds trigonometric date/time features to a date in the column `field_name` of `df`.  

In [None]:
df = pd.DataFrame({'col1': ['02/03/2017', '02/04/2017', '02/05/2017'], 'col2': ['a', 'b', 'a']})
df = add_cyclic_datepart(df, 'col1') # returns a dataframe
df.head()

Unnamed: 0,col2,col1weekday_cos,col1weekday_sin,col1day_month_cos,col1day_month_sin,col1month_year_cos,col1month_year_sin,col1day_year_cos,col1day_year_sin
0,a,-0.900969,-0.433884,0.900969,0.433884,0.866025,0.5,0.842942,0.538005
1,b,-0.222521,-0.974928,0.781831,0.62349,0.866025,0.5,0.833556,0.552435
2,a,0.62349,-0.781831,0.62349,0.781831,0.866025,0.5,0.823923,0.566702


## dados dividindo-se em gato e cont

In [None]:
show_doc(cont_cat_split)

<h4 id="cont_cat_split" class="doc_header"><code>cont_cat_split</code><a href="https://github.com/fastai/fastai/blob/master/fastai/tabular/transform.py#L106" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#cont_cat_split-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>cont_cat_split</code>(**`df`**, **`max_card`**=***`20`***, **`dep_var`**=***`None`***) → `Tuple`\[`List`\[`T`\], `List`\[`T`\]\]

<div class="collapse" id="cont_cat_split-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#cont_cat_split-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>cont_cat_split</code>:</p><ul><li><code>pytest -sv tests/test_tabular_transform.py::test_cont_cat_split</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_tabular_transform.py#L68" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Helper function that returns column names of cont and cat variables from given df.  

parâmetros:
- df: A trama de dados de pandas.
- max_card: cardinalidade máxima de uma variável categórica numérica.
- dep_var: A variáveis ​​dependentes.
Retorna:
- cont_names: uma lista de nomes de variáveis ​​contínuas.
- cat_names: uma lista de nomes de variáveis ​​categóricas.

In [None]:
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': ['a', 'b', 'a'], 'col3': [0.5, 1.2, 7.5], 'col4': ['ab', 'o', 'o']})
df

Unnamed: 0,col1,col2,col3,col4
0,1,a,0.5,ab
1,2,b,1.2,o
2,3,a,7.5,o


In [None]:
cont_list, cat_list = cont_cat_split(df=df, max_card=20, dep_var='col4')
cont_list, cat_list

(['col3'], ['col1', 'col2'])

## Indocumentados Métodos - Métodos movidos abaixo desta linha irá intencionalmente ser escondido

## Novos Métodos - Por favor, documento ou mover para a seção em situação irregular