# Preparing Layout

---

This notebooks fetches layout tables from `SeriesHistoricas_Layout.pdf` file and creates a .csv file to be accessed in `create_db` notebook. 

This file contains tables which describes the layout of historical data files. It contains information on how to read the historical data file and a brief description on each of its fields. Below is an example of a **layout table**. 

![](./img/layout_table.png)

Here is a brief description of each column:

- **NOME DO CAMPO / DESCRIÇÃO**: name of the field and a brief description
- **CONTEÚDO**: short comments on its content
- **TIPO E TAMANHO**: Kind (one of N, X of V - which stands for Nominal, Integer and Float Value, respectively) and size (value length)
- **POS. INIC.**: initial position
- **POS. FINAL.**: final position

---

What the code in this notebook does is to collect all layout tables and save it into .csv files to be accessed by `create_db`notebook. This information guides the reading of each historical data file of B3.

In [1]:
# imports
import camelot        # fetch tables from PDF
import numpy as np
import pandas as pd

## Getting layout info

This info is collected from `SeriesHistoricas_Layout.pdf` file. 

In [2]:
# searches tables in all pages
layout_tables = camelot.read_pdf('SeriesHistoricas_Layout.pdf', pages='all', flavor='lattice')

# fetch all tables and adjusts format
tables = []
for n in range(layout_tables.n):
    temp = layout_tables[n].df
#     temp.columns = [str(j).replace('\n','').strip() for j in temp.iloc[0].values]
    temp.columns = ['NAME', 'CONTENT','KIND AND SIZE', 'INIT', 'END']    # set standard column names
    temp.drop(0,axis=0, inplace=True)
    tables.append(temp)

In [3]:
# concatenating tables into one big table
tb = pd.DataFrame()
header = np.NAN
for i in range(len(tables)):
    if not tables[i]['CONTENT'].str.contains('COTAHIST.AAAA').sum()>0:  # if not HEADER, add to big table
        tb = pd.concat([tb,tables[i]], axis=0)
    else:
        header=tables[i]

### removing special characters

we'll remove break lines special character '\n'.

In [4]:
# header
header['NAME'] = header['NAME'].str.replace('\n','')
header['CONTENT'] = header['CONTENT'].str.replace('\n','')

# big layout table
tb['NAME'] = tb['NAME'].str.replace('\n','')
tb['CONTENT'] = tb['CONTENT'].str.replace('V\n','').str.replace('\n','').str.replace(' ER ',' VER ')

### resetting index

In [5]:
# dropping old index
header = header.reset_index().drop('index', axis=1)
tb = tb.reset_index().drop('index', axis=1)

## Exporting into csv

In [6]:
header.to_csv('layout_header.csv', sep=';', encoding='utf-8-sig')
tb.to_csv('layout_table.csv', sep=';', encoding='utf-8-sig')

**End**