# Entrada e saída de dados

Este notebook conterá nossas referências sobre entrada e saída de dados. O pandas pode ler uma variedade de tipos de arquivos usando seus métodos pd.read_. Vejamos os tipos de dados mais comuns:

In [1]:
import pandas as pd
import numpy as np

## CSV

### CSV Input

In [2]:
df = pd.read_csv('exemplo.csv')
df

Unnamed: 0,0,1,2,3
0,-0.394813,-0.69463,0.253393,0.60259
1,-2.134217,1.141381,-0.130142,0.690936
2,-0.223754,0.365133,-0.104233,-0.182788
3,0.108066,0.331332,0.053708,-0.280468
4,0.08529,0.740834,-0.596941,0.24395


In [3]:
pd.read_csv('exemplo.csv',sep=";",decimal=".")#valor do decimal - virgula ou ponto e separador do csv (sep)

Unnamed: 0,"0,1,2,3"
0,"-0.3948134131725716,-0.6946298549534513,0.2533..."
1,"-2.134216836009854,1.1413807607461153,-0.13014..."
2,"-0.2237544807445792,0.3651331540271458,-0.1042..."
3,"0.1080655494664449,0.3313317379324355,0.053708..."
4,"0.0852896742374644,0.7408341891386678,-0.59694..."


In [4]:
pd.read_csv('exemplo.csv',sep=",",decimal=".")#valor do decimal - virgula ou ponto e separador do csv (sep)

Unnamed: 0,0,1,2,3
0,-0.394813,-0.69463,0.253393,0.60259
1,-2.134217,1.141381,-0.130142,0.690936
2,-0.223754,0.365133,-0.104233,-0.182788
3,0.108066,0.331332,0.053708,-0.280468
4,0.08529,0.740834,-0.596941,0.24395


In [5]:
df.info()# verificar se os numeros vieram como string ou como int ou como float

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       5 non-null      float64
 1   1       5 non-null      float64
 2   2       5 non-null      float64
 3   3       5 non-null      float64
dtypes: float64(4)
memory usage: 288.0 bytes


Parametros

pd.read_csv(
    filepath_or_buffer: 'FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str]',
    sep=<no_default>,          Separador do csv
    delimiter=None,       
    header='infer',           onde o cabeçario se encontra
    names=<no_default>,
    index_col=None,          qual é a coluna do meu index
    usecols=None,
    squeeze=None,
    prefix=<no_default>,
    mangle_dupe_cols=True,
    dtype: 'DtypeArg | None' = None,
    engine: 'CSVEngine | None' = None,
    converters=None,
    true_values=None,
    false_values=None,
    skipinitialspace=False,
    skiprows=None,
    skipfooter=0,
    nrows=None,
    na_values=None,
    keep_default_na=True,
    na_filter=True,
    verbose=False,
    skip_blank_lines=True,
    parse_dates=None,             Se algo parecer ser uma data salvar como formato de datas e não como string 
    infer_datetime_format=False,
    keep_date_col=False,
    date_parser=None,
    dayfirst=False,
    cache_dates=True,
    iterator=False,
    chunksize=None,
    compression: 'CompressionOptions' = 'infer',
    thousands=None,
    decimal: 'str' = '.',
    lineterminator=None,
    quotechar='"',
    quoting=0,
    doublequote=True,
    escapechar=None,
    comment=None,
    encoding=None,
    encoding_errors: 'str | None' = 'strict',
    dialect=None,
    error_bad_lines=None,
    warn_bad_lines=None,
    on_bad_lines=None,
    delim_whitespace=False,
    low_memory=True,
    memory_map=False,
    float_precision=None,
    storage_options: 'StorageOptions' = None,
)


### Saída de dados tipo CSV 

In [6]:
df.to_csv('exemplo.csv',index=False)
df

Unnamed: 0,0,1,2,3
0,-0.394813,-0.69463,0.253393,0.60259
1,-2.134217,1.141381,-0.130142,0.690936
2,-0.223754,0.365133,-0.104233,-0.182788
3,0.108066,0.331332,0.053708,-0.280468
4,0.08529,0.740834,-0.596941,0.24395


## Excel

Pandas podem ler e escrever arquivos do Excel, tenha em mente, isso só importa dados. Não fórmulas nem imagens, lembrando que imagens ou macros podem bugar o método.

In [7]:
#Bibliotecas necessárias para trabalhar com excel
!pip install xlrd
!pip install openpyxl



### Entrada via Excel

In [8]:
pd.read_excel('Exemplo_Excel.xlsx',sheet_name='Planilha2')

Unnamed: 0.1,Unnamed: 0,0,1,2,3
0,0,-0.394813,-0.69463,0.253393,0.60259
1,1,-2.134217,1.141381,-0.130142,0.690936
2,2,-0.223754,0.365133,-0.104233,-0.182788
3,3,0.108066,0.331332,0.053708,-0.280468
4,4,0.08529,0.740834,-0.596941,0.24395


In [9]:
df1 = pd.read_excel('Exemplo_Excel.xlsx',sheet_name='Planilha2')

### Saída via Excel

In [10]:
df1.to_excel('Exemplo_Excel.xlsx',sheet_name='Planilha2')
df1


Unnamed: 0.1,Unnamed: 0,0,1,2,3
0,0,-0.394813,-0.69463,0.253393,0.60259
1,1,-2.134217,1.141381,-0.130142,0.690936
2,2,-0.223754,0.365133,-0.104233,-0.182788
3,3,0.108066,0.331332,0.053708,-0.280468
4,4,0.08529,0.740834,-0.596941,0.24395


## HTML


### Entrada HTML

A função Pandas read_html irá ler tabelas fora de uma página da Web e retornar uma lista de objetos DataFrame:

In [11]:
df = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')

ImportError: html5lib not found, please install it

In [None]:
df[0]

____