## Dtypes

The data type for a column in a DataFrame or Series is known as the **dtype**. <br />
You can use the ```dtype``` property to grab the type of a specific column. For instance, we can get the dtype of the ```PRECO``` column like this:

In [2]:
import pandas as pd

br_small_caps = pd.read_csv('statusinvest-busca-avancada.csv', delimiter=';')
br_small_caps.PRECO.dtype

dtype('float64')

Alternatively, the ```dtypes``` property returns the ```dtype``` of *every* column in the DataFrame:

In [3]:
br_small_caps.dtypes

TICKER                     object
PRECO                     float64
DY                        float64
P/L                       float64
P/VP                      float64
P/ATIVOS                  float64
MARGEM BRUTA              float64
MARGEM EBIT               float64
MARG. LIQUIDA             float64
P/EBIT                    float64
EV/EBIT                   float64
DIVIDA LIQUIDA / EBIT     float64
DIV. LIQ. / PATRI.        float64
PSR                       float64
P/CAP. GIRO               float64
P. AT CIR. LIQ.           float64
LIQ. CORRENTE             float64
ROE                       float64
ROA                       float64
ROIC                      float64
PATRIMONIO / ATIVOS       float64
PASSIVOS / ATIVOS         float64
GIRO ATIVOS               float64
CAGR RECEITAS 5 ANOS      float64
CAGR LUCROS 5 ANOS        float64
 LIQUIDEZ MEDIA DIARIA     object
 VPA                      float64
 LPA                      float64
 PEG Ratio                float64
 VALOR DE MERC

One peculiarity to keep in mind is that columns consisting entirely of strings do not get their own type; they are instead given the ```object``` type. <br />
It's possible to convert a column of one type into another wherever such a conversion makes sense by using the ```astype()``` function. For example, we may transform the ```PRECO``` column fro its existing ```float64``` data type into a string (```oject```) data type:

In [4]:
br_small_caps.PRECO.astype('object')

0      27.55
1       2.03
2      11.29
3       7.95
4       14.5
5       4.24
6      11.15
7       6.25
8       8.71
9      20.93
10     17.99
11     19.31
12      27.3
13     22.35
14     25.26
15      25.6
16      23.0
17      22.0
18     24.72
19     20.18
20     10.02
21     10.05
22      10.8
23     11.44
24    294.77
25     410.0
26     16.26
27     16.15
28      3.99
29      10.0
30     10.15
31     33.35
32      4.09
33     395.0
34     78.55
35     43.33
36     64.37
37     19.02
38     52.97
39       8.9
40       5.5
41      9.85
42     11.73
43     10.89
44     64.89
45      22.1
46      1.61
47     15.39
48      6.24
49      6.12
50     24.18
51     25.66
52     16.44
53     15.62
54     25.51
55      31.9
Name: PRECO, dtype: object

A DataFrame or Series index has its own dtype, too:

In [5]:
br_small_caps.index.dtype

dtype('int64')

## Missing Data

Entries missing values are given the value ```NaN```. For technical reasons, these ```NaN``` values are always of the ```float64``` dtype. <br />
Pandas provides some methods specific to missing data. To select ```NaN``` entries you can use ```pd.isnull()``` (or its companion ```pd.notnull()```. This is meant to be used thusly:

In [8]:
br_small_caps[pd.isnull(br_small_caps.DY)]

Unnamed: 0,TICKER,PRECO,DY,P/L,P/VP,P/ATIVOS,MARGEM BRUTA,MARGEM EBIT,MARG. LIQUIDA,P/EBIT,...,PATRIMONIO / ATIVOS,PASSIVOS / ATIVOS,GIRO ATIVOS,CAGR RECEITAS 5 ANOS,CAGR LUCROS 5 ANOS,LIQUIDEZ MEDIA DIARIA,VPA,LPA,PEG Ratio,VALOR DE MERCADO
1,ATOM3,2.03,,2.5,1.23,0.91,91.76,-3.39,84.24,-62.11,...,0.74,0.21,0.43,31.18,22.25,13.557.37,1.65,0.81,0.01,48.323.942.94
3,BOAS3,7.95,,15.22,1.83,1.7,56.53,19.25,32.39,25.6,...,0.93,0.07,0.34,8.81,74.34,53.310.276.50,4.34,0.52,5.59,4.212.250.617.75
12,CEDO3,27.3,,3.42,1.28,0.31,30.85,17.17,7.54,1.5,...,0.25,0.75,1.21,11.42,10.4,13.962.73,21.39,7.97,0.0,251.750.164.80
13,CEDO4,22.35,,2.8,1.05,0.26,30.85,17.17,7.54,1.23,...,0.25,0.75,1.21,11.42,10.4,22.541.91,21.39,7.97,0.0,251.750.164.80
24,EEEL3,294.77,,4.7,1.41,0.48,61.19,54.12,32.4,2.81,...,0.34,0.66,0.32,12.36,28.25,,208.5,62.71,-0.15,2.840.511.499.76
25,EEEL4,410.0,,6.54,1.97,0.67,61.19,54.12,32.4,3.91,...,0.34,0.66,0.32,12.36,28.25,,208.5,62.71,-0.21,2.840.511.499.76
32,LJQQ3,4.09,,8.31,1.29,0.24,34.76,7.07,3.96,4.65,...,0.19,0.81,0.74,15.23,37.63,12.803.060.35,3.18,0.49,-0.02,797.847.387.99
36,NAFG3,64.37,,11.8,2.76,1.15,41.15,16.49,9.34,6.68,...,0.42,0.58,1.05,7.87,35.19,,23.3,5.46,-1.14,508.315.292.42
37,NAFG4,19.02,,3.49,0.82,0.34,41.15,16.49,9.34,1.97,...,0.42,0.58,1.05,7.87,35.19,,23.3,5.46,-0.34,508.315.292.42
40,RANI4,5.5,,3.87,1.03,0.38,41.24,31.72,21.71,2.65,...,0.37,0.63,0.45,11.31,157.54,,5.34,1.42,-1.71,2.134.486.279.10


Replacing missing values is a common operation. Pandas provides a really handy method for this problem: ```fillna()```. ```fillna()``` provides a few different strategies for mitigating such data. For example, we can simply replace each ```NaN``` with ```0```:

In [9]:
br_small_caps.DY.fillna(0)

0     11.66
1      0.00
2      2.48
3      0.00
4      9.10
5      2.06
6      1.32
7      0.43
8      2.95
9      9.07
10    10.55
11    10.81
12     0.00
13     0.00
14    10.17
15    10.04
16     6.06
17     6.97
18     5.21
19     5.37
20     5.10
21     5.37
22     4.27
23     4.44
24     0.00
25     0.00
26     3.63
27     4.02
28     9.51
29     3.17
30    10.03
31    23.96
32     0.00
33     6.50
34     1.40
35     2.79
36     0.00
37     0.00
38     2.33
39     8.30
40     0.00
41     5.91
42     4.96
43     6.84
44     3.28
45     0.00
46     0.00
47     6.15
48     1.41
49     2.49
50     5.34
51     2.62
52    10.53
53    18.89
54     2.60
55     2.29
Name: DY, dtype: float64

Or we could fill each missing value with the first non-null value that appears sometime after the given record in the database. This is known as the *backfill* strategy. <br />
Alternatively, we may have a non-null value that we would like to replace. For example, let's say we want to replace the ticker ```AGRO3``` with ```TEST3```. One way we can do this is by using the ```replace()``` method:

In [10]:
br_small_caps.TICKER.replace('AGRO3', 'TEST3')

0      TEST3
1      ATOM3
2      BLAU3
3      BOAS3
4     BRBI11
5      BRIT3
6      CAMB3
7      CAMB4
8      CAML3
9      CEBR3
10     CEBR5
11     CEBR6
12     CEDO3
13     CEDO4
14     CGRA3
15     CGRA4
16     CSRN3
17     CSRN5
18     CSRN6
19     CSUD3
20     DEXP3
21     DEXP4
22     EALT3
23     EALT4
24     EEEL3
25     EEEL4
26     EUCA3
27     EUCA4
28     JHSF3
29     JSLG3
30     KEPL3
31     LEVE3
32     LJQQ3
33     MOAR3
34     MTSA3
35     MTSA4
36     NAFG3
37     NAFG4
38     NEMO5
39     RANI3
40     RANI4
41     RAPT3
42     RAPT4
43     ROMI3
44     RSUL4
45     SCAR3
46     SHOW3
47     SOJA3
48     SOMA3
49     TECN3
50     TGMA3
51     TUPY3
52     VLID3
53     VULC3
54     WLMM3
55     WLMM4
Name: TICKER, dtype: object