# Capítulo 5 – Desenvolvimento de web crawlers (hands-on)

## Aula 5.1. Coleta de dados de mercado

### Exemplo 1 - Coleta de dados da API Yahoo! Finance

No exemplo a seguir vamos fazer requisições para a série de preços e volume (OHLCV) da ação MBLY3

Insatalação da biblioteca yfinance

In [5]:
!pip install yfinance



Importar a biblioteca yfinance

In [9]:
import yfinance as yf

Definimos o instrumento a ser requisitado

Para isso, primeiro conferir o ticker do instrumento acessando https://finance.yahoo.com/

In [10]:
MBLY3 = yf.Ticker("MBLY3.SA")

In [11]:
df_hist = MBLY3.history(period="5y", interval="1d")

In [12]:
df_hist.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2021-02-09 00:00:00-03:00,25.0,26.4,22.709999,26.4,2560200,0.0,0.0
2021-02-10 00:00:00-03:00,25.870001,26.120001,23.700001,23.809999,1907900,0.0,0.0
2021-02-11 00:00:00-03:00,23.809999,24.34,22.559999,22.559999,1219100,0.0,0.0
2021-02-12 00:00:00-03:00,22.549999,22.549999,21.309999,21.549999,1050700,0.0,0.0
2021-02-17 00:00:00-03:00,21.549999,21.549999,20.66,21.0,764700,0.0,0.0


In [13]:
df_hist.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2023-05-31 00:00:00-03:00,2.77,2.89,2.74,2.81,1049100,0.0,0.0
2023-06-01 00:00:00-03:00,2.83,2.84,2.68,2.79,891900,0.0,0.0
2023-06-02 00:00:00-03:00,2.8,2.86,2.67,2.77,1424300,0.0,0.0
2023-06-05 00:00:00-03:00,2.79,2.93,2.75,2.93,1042400,0.0,0.0
2023-06-06 00:00:00-03:00,2.98,3.02,2.95,2.96,85600,0.0,0.0


### Exemplo 2 - Coleta de dados de mercado de outros provedores via API

#### Exemplo 2.1 - Marketstack (https://marketstack.com/)

In [14]:
key = "21341a8334623ed5dc650354da82697f"

In [15]:
import requests

url = f"http://api.marketstack.com/v1/eod?access_key={key}&symbols=MBLY3.BVMF"

r = requests.get(url)

In [16]:
r.json()

{'pagination': {'limit': 100, 'offset': 0, 'count': 100, 'total': 238},
 'data': [{'open': 2.79,
   'high': 2.93,
   'low': 2.75,
   'close': 2.93,
   'volume': 1034700.0,
   'adj_high': None,
   'adj_low': None,
   'adj_close': 2.93,
   'adj_open': None,
   'adj_volume': None,
   'split_factor': 1.0,
   'dividend': 0.0,
   'symbol': 'MBLY3.BVMF',
   'exchange': 'BVMF',
   'date': '2023-06-05T00:00:00+0000'},
  {'open': 2.8,
   'high': 2.86,
   'low': 2.67,
   'close': 2.77,
   'volume': 1424300.0,
   'adj_high': None,
   'adj_low': None,
   'adj_close': 2.77,
   'adj_open': None,
   'adj_volume': None,
   'split_factor': 1.0,
   'dividend': 0.0,
   'symbol': 'MBLY3.BVMF',
   'exchange': 'BVMF',
   'date': '2023-06-02T00:00:00+0000'},
  {'open': 2.83,
   'high': 2.84,
   'low': 2.68,
   'close': 2.79,
   'volume': 891900.0,
   'adj_high': None,
   'adj_low': None,
   'adj_close': 2.79,
   'adj_open': None,
   'adj_volume': None,
   'split_factor': 1.0,
   'dividend': 0.0,
   'symbol': 

In [17]:
import pandas as pd

In [18]:
df = pd.DataFrame(r.json()["data"])
df

Unnamed: 0,open,high,low,close,volume,adj_high,adj_low,adj_close,adj_open,adj_volume,split_factor,dividend,symbol,exchange,date
0,2.79,2.93,2.75,2.93,1034700.0,,,2.93,,,1.0,0.0,MBLY3.BVMF,BVMF,2023-06-05T00:00:00+0000
1,2.80,2.86,2.67,2.77,1424300.0,,,2.77,,,1.0,0.0,MBLY3.BVMF,BVMF,2023-06-02T00:00:00+0000
2,2.83,2.84,2.68,2.79,891900.0,,,2.79,,,1.0,0.0,MBLY3.BVMF,BVMF,2023-06-01T00:00:00+0000
3,2.77,2.89,2.74,2.81,1045500.0,,,2.81,,,1.0,0.0,MBLY3.BVMF,BVMF,2023-05-31T00:00:00+0000
4,2.73,2.83,2.72,2.76,1122000.0,,,2.76,,,1.0,0.0,MBLY3.BVMF,BVMF,2023-05-30T00:00:00+0000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,3.05,3.05,2.95,3.00,773300.0,,,3.00,,,1.0,0.0,MBLY3.BVMF,BVMF,2022-12-28T00:00:00+0000
96,3.01,3.08,2.90,3.03,1400200.0,,,3.03,,,1.0,0.0,MBLY3.BVMF,BVMF,2022-12-27T00:00:00+0000
97,3.04,3.04,2.88,3.00,330500.0,,,3.00,,,1.0,0.0,MBLY3.BVMF,BVMF,2022-12-26T00:00:00+0000
98,3.06,3.06,2.92,2.99,546600.0,,,2.99,,,1.0,0.0,MBLY3.BVMF,BVMF,2022-12-23T00:00:00+0000


#### Exemplo 2.2 - Alphavantage (https://www.alphavantage.co/)

In [19]:
key = "IBDFDC9VD3MJ8FT6"

In [20]:
url = f"https://www.alphavantage.co/query?function=SYMBOL_SEARCH&keywords=mbly&apikey={key}"

In [21]:
r = requests.get(url)

In [22]:
r.json()

{'bestMatches': [{'1. symbol': 'MBLY',
   '2. name': 'Mobileye Global Inc - Class A',
   '3. type': 'Equity',
   '4. region': 'United States',
   '5. marketOpen': '09:30',
   '6. marketClose': '16:00',
   '7. timezone': 'UTC-04',
   '8. currency': 'USD',
   '9. matchScore': '1.0000'},
  {'1. symbol': 'MBLY3.SAO',
   '2. name': 'Mobly S.A',
   '3. type': 'Equity',
   '4. region': 'Brazil/Sao Paolo',
   '5. marketOpen': '10:00',
   '6. marketClose': '17:30',
   '7. timezone': 'UTC-03',
   '8. currency': 'BRL',
   '9. matchScore': '0.6667'}]}

In [23]:
url = f"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=MBLY3.SAO&apikey={key}"

In [24]:
r = requests.get(url)

In [25]:
r.json()

{'Meta Data': {'1. Information': 'Daily Time Series with Splits and Dividend Events',
  '2. Symbol': 'MBLY3.SAO',
  '3. Last Refreshed': '2023-06-05',
  '4. Output Size': 'Compact',
  '5. Time Zone': 'US/Eastern'},
 'Time Series (Daily)': {'2023-06-05': {'1. open': '2.79',
   '2. high': '2.93',
   '3. low': '2.75',
   '4. close': '2.93',
   '5. adjusted close': '2.93',
   '6. volume': '1034700',
   '7. dividend amount': '0.0000',
   '8. split coefficient': '1.0'},
  '2023-06-02': {'1. open': '2.8',
   '2. high': '2.86',
   '3. low': '2.67',
   '4. close': '2.77',
   '5. adjusted close': '2.77',
   '6. volume': '1424300',
   '7. dividend amount': '0.0000',
   '8. split coefficient': '1.0'},
  '2023-06-01': {'1. open': '2.83',
   '2. high': '2.84',
   '3. low': '2.68',
   '4. close': '2.79',
   '5. adjusted close': '2.79',
   '6. volume': '891900',
   '7. dividend amount': '0.0000',
   '8. split coefficient': '1.0'},
  '2023-05-31': {'1. open': '2.77',
   '2. high': '2.89',
   '3. low': 

In [26]:
pd.DataFrame(r.json()['Time Series (Daily)'])

Unnamed: 0,2023-06-05,2023-06-02,2023-06-01,2023-05-31,2023-05-30,2023-05-29,2023-05-26,2023-05-25,2023-05-24,2023-05-23,...,2023-01-25,2023-01-24,2023-01-23,2023-01-20,2023-01-19,2023-01-18,2023-01-17,2023-01-16,2023-01-13,2023-01-12
1. open,2.79,2.8,2.83,2.77,2.73,2.65,2.55,2.39,2.14,2.17,...,2.9,2.83,2.87,3.15,3.15,3.23,3.29,3.36,3.35,3.22
2. high,2.93,2.86,2.84,2.89,2.83,2.78,2.66,2.54,2.33,2.2,...,2.96,2.91,2.87,3.15,3.18,3.23,3.29,3.38,3.42,3.36
3. low,2.75,2.67,2.68,2.74,2.72,2.61,2.46,2.37,2.12,2.11,...,2.74,2.77,2.8,2.85,2.98,3.06,3.07,3.24,3.22,3.22
4. close,2.93,2.77,2.79,2.81,2.76,2.72,2.63,2.52,2.33,2.14,...,2.85,2.89,2.8,2.85,3.16,3.17,3.19,3.27,3.36,3.35
5. adjusted close,2.93,2.77,2.79,2.81,2.76,2.72,2.63,2.52,2.33,2.14,...,2.85,2.89,2.8,2.85,3.16,3.17,3.19,3.27,3.36,3.35
6. volume,1034700.0,1424300.0,891900.0,1049100.0,1133200.0,1157300.0,1045400.0,1110500.0,1040600.0,310200.0,...,1697700.0,818000.0,559200.0,2131100.0,1197900.0,419400.0,1233700.0,403500.0,1262700.0,1277800.0
7. dividend amount,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8. split coefficient,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [27]:
pd.DataFrame(r.json()['Time Series (Daily)']).T

Unnamed: 0,1. open,2. high,3. low,4. close,5. adjusted close,6. volume,7. dividend amount,8. split coefficient
2023-06-05,2.79,2.93,2.75,2.93,2.93,1034700,0.0000,1.0
2023-06-02,2.8,2.86,2.67,2.77,2.77,1424300,0.0000,1.0
2023-06-01,2.83,2.84,2.68,2.79,2.79,891900,0.0000,1.0
2023-05-31,2.77,2.89,2.74,2.81,2.81,1049100,0.0000,1.0
2023-05-30,2.73,2.83,2.72,2.76,2.76,1133200,0.0000,1.0
...,...,...,...,...,...,...,...,...
2023-01-18,3.23,3.23,3.06,3.17,3.17,419400,0.0000,1.0
2023-01-17,3.29,3.29,3.07,3.19,3.19,1233700,0.0000,1.0
2023-01-16,3.36,3.38,3.24,3.27,3.27,403500,0.0000,1.0
2023-01-13,3.35,3.42,3.22,3.36,3.36,1262700,0.0000,1.0


#### Exemplo 2.3 - EOD (https://eodhistoricaldata.com/)

In [28]:
key = "647f760e9fe241.25045800"

In [29]:
url = f"https://eodhistoricaldata.com/api/exchanges-list/?api_token={key}&fmt=json"

In [30]:
r = requests.get(url)

In [31]:
r.json()

[{'Name': 'USA Stocks',
  'Code': 'US',
  'OperatingMIC': 'XNAS, XNYS',
  'Country': 'USA',
  'Currency': 'USD',
  'CountryISO2': 'US',
  'CountryISO3': 'USA'},
 {'Name': 'London Exchange',
  'Code': 'LSE',
  'OperatingMIC': 'XLON',
  'Country': 'UK',
  'Currency': 'GBP',
  'CountryISO2': 'GB',
  'CountryISO3': 'GBR'},
 {'Name': 'TSX Venture Exchange',
  'Code': 'V',
  'OperatingMIC': 'XTSX',
  'Country': 'Canada',
  'Currency': 'CAD',
  'CountryISO2': 'CA',
  'CountryISO3': 'CAN'},
 {'Name': 'NEO Exchange',
  'Code': 'NEO',
  'OperatingMIC': 'NEOE',
  'Country': 'Canada',
  'Currency': 'CAD',
  'CountryISO2': 'CA',
  'CountryISO3': 'CAN'},
 {'Name': 'Toronto Exchange',
  'Code': 'TO',
  'OperatingMIC': 'XTSE',
  'Country': 'Canada',
  'Currency': 'CAD',
  'CountryISO2': 'CA',
  'CountryISO3': 'CAN'},
 {'Name': 'Berlin Exchange',
  'Code': 'BE',
  'OperatingMIC': 'XBER',
  'Country': 'Germany',
  'Currency': 'EUR',
  'CountryISO2': 'DE',
  'CountryISO3': 'DEU'},
 {'Name': 'Hamburg Exch

In [32]:
EXCHANGE_CODE = "SA"

In [33]:
url = f"https://eodhistoricaldata.com/api/exchange-symbol-list/{EXCHANGE_CODE}?api_token={key}&fmt=json"

In [34]:
r = requests.get(url)

In [35]:
r.json()

[{'Code': 'A1AP34',
  'Name': 'Advance Auto Parts Inc.',
  'Country': 'Brazil',
  'Exchange': 'SA',
  'Currency': 'BRL',
  'Type': 'Common Stock',
  'Isin': None},
 {'Code': 'A1CR34',
  'Name': 'Amcor plc',
  'Country': 'Brazil',
  'Exchange': 'SA',
  'Currency': 'BRL',
  'Type': 'Common Stock',
  'Isin': None},
 {'Code': 'A1GI34',
  'Name': 'Agilent Technologies Inc.',
  'Country': 'Brazil',
  'Exchange': 'SA',
  'Currency': 'BRL',
  'Type': 'Common Stock',
  'Isin': None},
 {'Code': 'A1IV34',
  'Name': 'Apartment Investment and Management Company',
  'Country': 'Brazil',
  'Exchange': 'SA',
  'Currency': 'BRL',
  'Type': 'Common Stock',
  'Isin': None},
 {'Code': 'A1LB34',
  'Name': 'Albemarle Corporation',
  'Country': 'Brazil',
  'Exchange': 'SA',
  'Currency': 'BRL',
  'Type': 'Common Stock',
  'Isin': None},
 {'Code': 'A1LG34',
  'Name': 'Align Technology Inc.',
  'Country': 'Brazil',
  'Exchange': 'SA',
  'Currency': 'BRL',
  'Type': 'Common Stock',
  'Isin': None},
 {'Code': 'A

In [36]:
for dict_company in r.json():    
    if "mobly" in dict_company["Name"].lower():
        code = dict_company["Code"]

In [37]:
url = f"https://eodhistoricaldata.com/api/eod/{code}.{EXCHANGE_CODE}?api_token={key}&fmt=json"

In [38]:
r = requests.get(url)

In [39]:
r.json()

[{'date': '2022-06-06',
  'open': 3.49,
  'high': 3.49,
  'low': 3.25,
  'close': 3.28,
  'adjusted_close': 3.28,
  'volume': 1053500},
 {'date': '2022-06-07',
  'open': 3.24,
  'high': 3.29,
  'low': 3.09,
  'close': 3.29,
  'adjusted_close': 3.29,
  'volume': 1931000},
 {'date': '2022-06-08',
  'open': 3.25,
  'high': 3.31,
  'low': 3.14,
  'close': 3.15,
  'adjusted_close': 3.15,
  'volume': 1488600},
 {'date': '2022-06-09',
  'open': 3.15,
  'high': 3.16,
  'low': 3.02,
  'close': 3.05,
  'adjusted_close': 3.05,
  'volume': 1280500},
 {'date': '2022-06-10',
  'open': 3.03,
  'high': 3.03,
  'low': 2.8,
  'close': 2.86,
  'adjusted_close': 2.86,
  'volume': 1445800},
 {'date': '2022-06-13',
  'open': 2.83,
  'high': 2.83,
  'low': 2.33,
  'close': 2.38,
  'adjusted_close': 2.38,
  'volume': 2826100},
 {'date': '2022-06-14',
  'open': 2.4,
  'high': 2.49,
  'low': 2.29,
  'close': 2.43,
  'adjusted_close': 2.43,
  'volume': 1394000},
 {'date': '2022-06-15',
  'open': 2.47,
  'high': 

In [40]:
pd.DataFrame(r.json())

Unnamed: 0,date,open,high,low,close,adjusted_close,volume
0,2022-06-06,3.49,3.49,3.25,3.28,3.28,1053500
1,2022-06-07,3.24,3.29,3.09,3.29,3.29,1931000
2,2022-06-08,3.25,3.31,3.14,3.15,3.15,1488600
3,2022-06-09,3.15,3.16,3.02,3.05,3.05,1280500
4,2022-06-10,3.03,3.03,2.80,2.86,2.86,1445800
...,...,...,...,...,...,...,...
247,2023-05-30,2.73,2.83,2.72,2.76,2.76,1133200
248,2023-05-31,2.77,2.89,2.74,2.81,2.81,1049100
249,2023-06-01,2.83,2.84,2.68,2.79,2.79,891900
250,2023-06-02,2.80,2.86,2.67,2.77,2.77,1424300


### Exemplo 3 - Coleta de dados de derivativos da B3

Primeiro vamos inspecionar o site https://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/historico/derivativos/ajustes-do-pregao/

In [41]:
url = "https://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-ajustes-do-pregao-ptBR.asp"

In [42]:
query = {
    "dData1": "10/03/2023"
}

In [43]:
import requests

In [44]:
# POST sem requests HTML 
r = requests.post(
    url, 
    params=query
)

In [45]:
r.content

b'\r\n<!doctype html>\r\n<html class="no-js" lang="pt-br">\r\n<head>\r\n<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />\r\n<meta name="viewport" content="width=device-width, initial-scale=1.0" />\r\n<script type="text/javascript" src="/ruxitagentjs_ICA2NVfjqru_10265230425083909.js" data-dtconfig="rid=RID_-771026108|rpid=-510812684|domain=bmf.com.br|reportUrl=/rb_ea8d7b8f-040c-423b-b726-54310cab2c40|app=e44446475f923f8e|featureHash=ICA2NVfjqru|vcv=2|rdnt=1|uxrgce=1|bp=3|cuc=kju5m9kg|mel=100000|dpvc=1|ssv=4|lastModification=1686056358479|dtVersion=10265230425083909|tp=500,50,0,1|agentUri=/ruxitagentjs_ICA2NVfjqru_10265230425083909.js"></script><link rel="stylesheet" href="css/foundation.css" />\n<link rel="stylesheet" href="css/jquery.ui.datepicker.css" />\n<script src="js/vendor/modernizr.js"></script>\r\n\r\n<script type="text/javascript">\r\nfunction Retroativo(theForm) {\r\n    if(!CkDate(theForm.dData1, theForm.dData1.value))\r\n        {\r\n            \r

In [46]:
!pip install requests-html

Collecting requests-html
  Downloading requests_html-0.10.0-py3-none-any.whl (13 kB)
Collecting pyquery
  Downloading pyquery-2.0.0-py3-none-any.whl (22 kB)
Collecting fake-useragent
  Downloading fake_useragent-1.1.3-py3-none-any.whl (50 kB)
     ---------------------------------------- 0.0/50.5 kB ? eta -:--:--
     ---------------------------------------- 50.5/50.5 kB 1.3 MB/s eta 0:00:00
Collecting parse
  Downloading parse-1.19.0.tar.gz (30 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting w3lib
  Downloading w3lib-2.1.1-py3-none-any.whl (21 kB)
Collecting pyppeteer>=0.0.14
  Downloading pyppeteer-1.0.2-py3-none-any.whl (83 kB)
     ---------------------------------------- 0.0/83.4 kB ? eta -:--:--
     ---------------------------------------- 83.4/83.4 kB 1.6 

In [47]:
import requests_html

In [49]:
# Se usar o requests sem o HTML não vai funcionar

table_html = r.html.xpath("//table[contains(@id,'tblDadosAjustes')]")

AttributeError: 'Response' object has no attribute 'html'

In [50]:
r = requests_html.HTMLSession().post(
    url, 
    params=query
)

In [51]:
table_html = r.html.xpath("//table[contains(@id,'tblDadosAjustes')]")

In [52]:
len(table_html)

1

In [53]:
table_html[0].full_text

'\n\n\nMercadoria\nVencimento\nPreço de ajuste anterior\nPreço de ajuste Atual\nVariação\nValor do ajuste por contrato (R$)\n\n\n\n\nABEVO - Contrato Futuro de ABEV3                          \nH23 \n13,64\n13,54\n-0,10\n0,10\n\n\n\nJ23 \n13,80\n13,70\n-0,10\n0,10\n\n\n\nK23 \n13,93\n13,83\n-0,10\n0,10\n\n\nAFS   - Rande da África do Sul (em USD)                   \nJ23 \n18.599,200\n18.246,200\n-353,000\n980,01\n\n\n\nK23 \n18.646,500\n18.289,300\n-357,200\n991,67\n\n\n\nM23 \n18.689,700\n18.333,500\n-356,200\n988,90\n\n\n\nN23 \n18.728,300\n18.374,000\n-354,300\n983,62\n\n\n\nQ23 \n18.767,100\n18.414,800\n-352,300\n978,07\n\n\nARB   - Peso Argentino (em Reais)                         \nJ23 \n24,6590\n24,7400\n0,0810\n12,15\n\n\n\nK23 \n23,1830\n23,3820\n0,1990\n29,85\n\n\n\nM23 \n21,8100\n22,0960\n0,2860\n42,90\n\n\n\nN23 \n20,5060\n20,7840\n0,2780\n41,70\n\n\n\nQ23 \n19,2400\n19,4820\n0,2420\n36,30\n\n\nARS   - Peso Argentino (em USD)                           \nJ23 \n208.916,9000\n2

In [54]:
for element_ in table_html[0].xpath("//tr"):
    print(element_.text.split("\n"))

['Mercadoria', 'Vencimento', 'Preço de ajuste anterior', 'Preço de ajuste Atual', 'Variação', 'Valor do ajuste por contrato (R$)']
['ABEVO - Contrato Futuro de ABEV3', 'H23', '13,64', '13,54', '-0,10', '0,10']
['J23', '13,80', '13,70', '-0,10', '0,10']
['K23', '13,93', '13,83', '-0,10', '0,10']
['AFS - Rande da África do Sul (em USD)', 'J23', '18.599,200', '18.246,200', '-353,000', '980,01']
['K23', '18.646,500', '18.289,300', '-357,200', '991,67']
['M23', '18.689,700', '18.333,500', '-356,200', '988,90']
['N23', '18.728,300', '18.374,000', '-354,300', '983,62']
['Q23', '18.767,100', '18.414,800', '-352,300', '978,07']
['ARB - Peso Argentino (em Reais)', 'J23', '24,6590', '24,7400', '0,0810', '12,15']
['K23', '23,1830', '23,3820', '0,1990', '29,85']
['M23', '21,8100', '22,0960', '0,2860', '42,90']
['N23', '20,5060', '20,7840', '0,2780', '41,70']
['Q23', '19,2400', '19,4820', '0,2420', '36,30']
['ARS - Peso Argentino (em USD)', 'J23', '208.916,9000', '211.752,1000', '2.835,2000', '725,9

['V23', '5.303,1100', '5.397,3780', '94,2680', '4.713,40']
['X23', '5.325,0920', '5.420,8320', '95,7400', '4.787,00']
['Z23', '5.343,9560', '5.443,0600', '99,1040', '4.955,20']
['F24', '5.361,6380', '5.462,3720', '100,7340', '5.036,70']
['G24', '5.386,5540', '5.490,6730', '104,1190', '5.205,95']
['H24', '5.402,7400', '5.509,8040', '107,0640', '5.353,20']
['J24', '5.422,9310', '5.532,0000', '109,0690', '5.453,45']
['N24', '5.488,6290', '5.605,3000', '116,6710', '5.833,55']
['V24', '5.565,8650', '5.690,5590', '124,6940', '6.234,70']
['F25', '5.646,1540', '5.778,5040', '132,3500', '6.617,50']
['J25', '5.736,5130', '5.873,9980', '137,4850', '6.874,25']
['N25', '5.828,6260', '5.973,0110', '144,3850', '7.219,25']
['V25', '5.944,5260', '6.095,0900', '150,5640', '7.528,20']
['F26', '6.060,0650', '6.218,6130', '158,5480', '7.927,40']
['ELETO - Contrato Futuro de ELET3', 'H23', '32,18', '31,90', '-0,28', '0,28']
['J23', '32,56', '32,27', '-0,29', '0,29']
['K23', '32,87', '32,59', '-0,28', '0,28'

['J24', '88.218,90', '88.092,25', '-126,65', '126,65']
['N24', '85.854,65', '85.688,65', '-166,00', '166,00']
['V24', '83.465,56', '83.270,17', '-195,39', '195,39']
['F25', '81.182,15', '80.974,25', '-207,90', '207,90']
['J25', '78.930,09', '78.711,60', '-218,49', '218,49']
['N25', '76.761,34', '76.538,32', '-223,02', '223,02']
['V25', '74.375,63', '74.154,16', '-221,47', '221,47']
['F26', '72.104,37', '71.884,76', '-219,61', '219,61']
['J26', '69.971,85', '69.748,60', '-223,25', '223,25']
['N26', '67.872,35', '67.647,16', '-225,19', '225,19']
['V26', '65.739,42', '65.521,25', '-218,17', '218,17']
['F27', '63.770,32', '63.557,00', '-213,32', '213,32']
['J27', '61.864,43', '61.667,08', '-197,35', '197,35']
['N27', '59.944,64', '59.732,84', '-211,80', '211,80']
['V27', '58.017,01', '57.804,70', '-212,31', '212,31']
['F28', '56.202,15', '55.990,25', '-211,90', '211,90']
['F29', '49.394,94', '49.174,06', '-220,88', '220,88']
['F30', '43.475,68', '43.251,21', '-224,47', '224,47']
['F31', '3

In [55]:
list_linhas = []

for element_ in table_html[0].xpath("//tr")[1:]:
    if len(element_.text.split("\n")) == 6:
        instrumento = element_.text.split("\n")[0]
        linha = element_.text.split("\n")
    if len(element_.text.split("\n")) == 5:
        linha = [instrumento] + element_.text.split("\n")        
    list_linhas.append(linha)

In [56]:
list_linhas

[['ABEVO - Contrato Futuro de ABEV3',
  'H23',
  '13,64',
  '13,54',
  '-0,10',
  '0,10'],
 ['ABEVO - Contrato Futuro de ABEV3',
  'J23',
  '13,80',
  '13,70',
  '-0,10',
  '0,10'],
 ['ABEVO - Contrato Futuro de ABEV3',
  'K23',
  '13,93',
  '13,83',
  '-0,10',
  '0,10'],
 ['AFS - Rande da África do Sul (em USD)',
  'J23',
  '18.599,200',
  '18.246,200',
  '-353,000',
  '980,01'],
 ['AFS - Rande da África do Sul (em USD)',
  'K23',
  '18.646,500',
  '18.289,300',
  '-357,200',
  '991,67'],
 ['AFS - Rande da África do Sul (em USD)',
  'M23',
  '18.689,700',
  '18.333,500',
  '-356,200',
  '988,90'],
 ['AFS - Rande da África do Sul (em USD)',
  'N23',
  '18.728,300',
  '18.374,000',
  '-354,300',
  '983,62'],
 ['AFS - Rande da África do Sul (em USD)',
  'Q23',
  '18.767,100',
  '18.414,800',
  '-352,300',
  '978,07'],
 ['ARB - Peso Argentino (em Reais)',
  'J23',
  '24,6590',
  '24,7400',
  '0,0810',
  '12,15'],
 ['ARB - Peso Argentino (em Reais)',
  'K23',
  '23,1830',
  '23,3820',
  '0

In [57]:
table_html[0].xpath("//tr")[0].text.split("\n")

['Mercadoria',
 'Vencimento',
 'Preço de ajuste anterior',
 'Preço de ajuste Atual',
 'Variação',
 'Valor do ajuste por contrato (R$)']

In [58]:
import pandas as pd

In [59]:
df_futuros = pd.DataFrame(list_linhas, columns=table_html[0].xpath("//tr")[0].text.split("\n"))

In [60]:
df_futuros.head()

Unnamed: 0,Mercadoria,Vencimento,Preço de ajuste anterior,Preço de ajuste Atual,Variação,Valor do ajuste por contrato (R$)
0,ABEVO - Contrato Futuro de ABEV3,H23,1364,1354,-10,10
1,ABEVO - Contrato Futuro de ABEV3,J23,1380,1370,-10,10
2,ABEVO - Contrato Futuro de ABEV3,K23,1393,1383,-10,10
3,AFS - Rande da África do Sul (em USD),J23,"18.599,200","18.246,200",-353000,98001
4,AFS - Rande da África do Sul (em USD),K23,"18.646,500","18.289,300",-357200,99167


In [61]:
df_futuros.tail()

Unnamed: 0,Mercadoria,Vencimento,Preço de ajuste anterior,Preço de ajuste Atual,Variação,Valor do ajuste por contrato (R$)
611,ZAR - Rande da África do Sul (em Reais),J23,"2.769,8730","2.871,2110",1013380,"3.546,83"
612,ZAR - Rande da África do Sul (em Reais),K23,"2.774,6590","2.876,4630",1018040,"3.563,14"
613,ZAR - Rande da África do Sul (em Reais),M23,"2.784,2800","2.886,2940",1020140,"3.570,49"
614,ZAR - Rande da África do Sul (em Reais),N23,"2.792,6670","2.895,1150",1024480,"3.585,68"
615,ZAR - Rande da África do Sul (em Reais),Q23,"2.800,0050","2.902,6890",1026840,"3.593,94"
