# Projeto 2 - Como pegar dados de um site com Python? - Pegando dados de ETFs do mundo inteiro.


# Desafio:
    
* Construir um código que vá no site etf.com e busque dados de todos os etfs do mercado americano e, consequentemente, do mundo. Rentabilidade, patrimônio, gestora, taxa...


# Passo a passo:
    
**Passo 1** - Definir um navegador que você irá utilizar para navegar com o Python.

**Passo 2** - Importar os módulos e bibliotecas.

**Passo 3** - Entender como funcionam requisições na internet.

**Passo 4** - Conhecer e mapear o processo de coleta de dados no site ETF.com.

**Passo 5** - Achar todos os elementos necessários dentro do HTML do site.

**Passo 6** - Ler a tabela de dados.

**Passo 7** - Construir a tabela final.

In [1]:
!pip install webdriver-manager

Collecting webdriver-manager
  Downloading webdriver_manager-3.8.5-py2.py3-none-any.whl (27 kB)
Collecting python-dotenv
  Downloading python_dotenv-0.21.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv, webdriver-manager
Successfully installed python-dotenv-0.21.1 webdriver-manager-3.8.5


In [2]:
!pip install selenium

Collecting selenium
  Downloading selenium-4.8.0-py3-none-any.whl (6.3 MB)
     ---------------------------------------- 6.3/6.3 MB 6.6 MB/s eta 0:00:00
Collecting trio~=0.17
  Using cached trio-0.22.0-py3-none-any.whl (384 kB)
Collecting trio-websocket~=0.9
  Using cached trio_websocket-0.9.2-py3-none-any.whl (16 kB)
Collecting async-generator>=1.9
  Using cached async_generator-1.10-py3-none-any.whl (18 kB)
Collecting outcome
  Using cached outcome-1.2.0-py2.py3-none-any.whl (9.7 kB)
Collecting exceptiongroup>=1.0.0rc9
  Downloading exceptiongroup-1.1.0-py3-none-any.whl (14 kB)
Collecting wsproto>=0.14
  Using cached wsproto-1.2.0-py3-none-any.whl (24 kB)
Collecting h11<1,>=0.9.0
  Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Installing collected packages: outcome, h11, exceptiongroup, async-generator, wsproto, trio, trio-websocket, selenium
Successfully installed async-generator-1.10 exceptiongroup-1.1.0 h11-0.14.0 outcome-1.2.0 selenium-4.8.0 trio-0.22.0 trio-websocket-0.9.2 ws

In [3]:
!pip install html5lib



## Passo 1: Escolher o navegador

Será usado o Google Chrome.

## Passo 2: Importar bibliotecas

In [1]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import time
import pandas as pd

## Passo3: Entender como funcionam requisições na internet

In [2]:
driver = webdriver.Chrome(service = Service(ChromeDriverManager().install()))

url = "https://www.etf.com/etfanalytics/etf-finder"

driver.get(url)

## Passo 4: Conhecer e mapear o procesos de coleta de dados no site ETF.com

### Processo de dados

* Abrir o site
* Mudar a visualização pra 100
* Ler a tabela
* Avançar todas as páginas
* Trocar pra outra categoria
* Ler todas as tabelas dessa outra categoria

## Passo 5.1: Achar todos os elementos necessários dentro do HTML do site - Expandindo a tabela para 100 itens

In [3]:
time.sleep(5)

botao_100 = driver.find_element("xpath", '''/html/body/div[5]/section/div/div[3]/section/div/div/div/div/div[2]
                                /section[2]/div[2]/section[2]/div[1]/div/div[4]/button/label/span''')

driver.execute_script('arguments[0].click()', botao_100)

## Passo 5.2: Achar todos os elementos necessários dentro do HTML do site - Pegando o número de páginas da tabela

In [4]:
numero_paginas = driver.find_element('xpath', '''/html/body/div[5]/section/div/div[3]/section/div/div/div/div/div[2]
                                    /section[2]/div[2]/section[2]/div[2]/div/label[2]''')

numero_paginas = numero_paginas.text.replace('of ', '')
numero_paginas = int(numero_paginas)

print(numero_paginas)

31


## Passo 6.1: Ler a tabela de dados - Lendo a tabela de dados básicos

In [5]:
lista_de_tabela_por_pagina = []

for pagina in range(0, numero_paginas):
    tabela = driver.find_element('xpath', '''/html/body/div[5]/section/div/div[3]/section/div/div/div
                                        /div/div[2]/section[2]/div[2]/div/table''')

    html_tabela = tabela.get_attribute('outerHTML')

    tabela_final = pd.read_html(html_tabela)[0]
     
    lista_de_tabela_por_pagina.append(tabela_final)
    
    botao_avancar_pagina = driver.find_element('xpath', '''/html/body/div[5]/section/div/div[3]/section/div/div/
                                                div/div/div[2]/section[2]/div[2]/section[2]/div[2]/div/span[2]''')
    
    driver.execute_script('arguments[0].click()', botao_avancar_pagina)
    
base_de_dados_basic = pd.concat(lista_de_tabela_por_pagina)
base_de_dados_basic = base_de_dados_basic.set_index('Ticker')
base_de_dados_basic

Unnamed: 0_level_0,Name,Segment,Issuer,Expense Ratio,AUM
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
SPY,SPDR S&P 500 ETF Trust,Equity: U.S. - Large Cap,State Street Global Advisors,0.09%,$383.85B
IVV,iShares Core S&P 500 ETF,Equity: U.S. - Large Cap,Blackrock,0.03%,$312.01B
VTI,Vanguard Total Stock Market ETF,Equity: U.S. - Total Market,Vanguard,0.03%,$285.80B
VOO,Vanguard S&P 500 ETF,Equity: U.S. - Large Cap,Vanguard,0.03%,$283.28B
QQQ,Invesco QQQ Trust,Equity: U.S. - Large Cap,Invesco,0.20%,$161.67B
...,...,...,...,...,...
AWYX,ETFMG 2x Daily Travel Tech ETF,Leveraged Equity: Global Internet & Direct Mar...,ETFMG,0.95%,$345.20K
TADS,The Active Dividend Stock ETF,Equity: U.S. - Total Market,"Tuttle Tactical Management, LLC",1.68%,$294.89K
CRYP,AdvisorShares Managed Bitcoin Strategy ETF,Asset Allocation: Global Target Outcome,AdvisorShares,1.59%,$218.67K
FLRU,Franklin FTSE Russia ETF,Equity: Russia - Total Market,Franklin Templeton,0.19%,$8.00K


## Passo 6.2: Ler a tabela de dados - Lendo a tabela de dados de rentabilidade

In [6]:
# Mudar a aba
botao_aba = driver.find_element('xpath', '''/html/body/div[5]/section/div/div[3]/section/div/div
                                                /div/div/div[2]/section[2]/div[2]/ul/li[2]/span''')
driver.execute_script('arguments[0].click()', botao_aba)

# Voltar para página 1
caixa_de_texto = driver.find_element('xpath', '''/html/body/div[5]/section/div/div[3]/section/div/div/div
                                                /div/div[2]/section[2]/div[2]/section[2]/div[2]/div/input''')

for pagina in range(0, numero_paginas):
    botao_voltar_pagina = driver.find_element('xpath', '''/html/body/div[5]/section/div/div[3]/section/div/div/
                                                        div/div/div[2]/section[2]/div[2]/section[2]/div[2]/div/span[1]''')
    
    driver.execute_script('arguments[0].click()', botao_voltar_pagina)

In [7]:
lista_de_tabela_por_pagina = []

for pagina in range(0, numero_paginas):
    tabela = driver.find_element('xpath', '''/html/body/div[5]/section/div/div[3]/section/div/div/div
                                        /div/div[2]/section[2]/div[2]/div/table''')

    html_tabela = tabela.get_attribute('outerHTML')

    tabela_final = pd.read_html(html_tabela)[0]
     
    lista_de_tabela_por_pagina.append(tabela_final)
    
    botao_avancar_pagina = driver.find_element('xpath', '''/html/body/div[5]/section/div/div[3]/section/div/div/
                                                div/div/div[2]/section[2]/div[2]/section[2]/div[2]/div/span[2]''')
    
    driver.execute_script('arguments[0].click()', botao_avancar_pagina)
    
base_de_dados_performance = pd.concat(lista_de_tabela_por_pagina)
base_de_dados_performance = base_de_dados_performance.set_index('Ticker')

base_de_dados_performance

Unnamed: 0_level_0,Name,1 Month,3 Month,YTD,1 Year,3 Years,5 Years,10 Years,As Of Date
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
SPY,SPDR S&P 500 ETF Trust,4.03%,8.84%,3.52%,-9.95%,7.75%,8.99%,12.39%,01/20/23
IVV,iShares Core S&P 500 ETF,4.06%,8.84%,3.51%,-9.95%,7.75%,9.03%,12.44%,01/20/23
VTI,Vanguard Total Stock Market ETF,4.49%,8.98%,3.96%,-10.36%,7.24%,8.49%,12.02%,01/20/23
VOO,Vanguard S&P 500 ETF,4.02%,8.82%,3.52%,-9.98%,7.73%,9.02%,12.44%,01/20/23
QQQ,Invesco QQQ Trust,4.88%,5.30%,6.16%,-21.28%,8.84%,11.98%,16.55%,01/20/23
...,...,...,...,...,...,...,...,...,...
AWYX,ETFMG 2x Daily Travel Tech ETF,36.11%,38.66%,31.27%,-48.46%,--,--,--,01/20/23
TADS,The Active Dividend Stock ETF,0%,0%,0%,0%,--,--,--,01/20/23
CRYP,AdvisorShares Managed Bitcoin Strategy ETF,--,--,--,--,--,--,--,01/20/23
FLRU,Franklin FTSE Russia ETF,0%,0%,0%,-66.37%,-31.18%,--,--,01/20/23


In [8]:
driver.quit()

## Passo 7: Construir a tabela final

In [9]:
base_de_dados_performance = base_de_dados_performance[['1 Year', '5 Years', '10 Years']]
base_de_dados_performance

Unnamed: 0_level_0,1 Year,5 Years,10 Years
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
SPY,-9.95%,8.99%,12.39%
IVV,-9.95%,9.03%,12.44%
VTI,-10.36%,8.49%,12.02%
VOO,-9.98%,9.02%,12.44%
QQQ,-21.28%,11.98%,16.55%
...,...,...,...
AWYX,-48.46%,--,--
TADS,0%,--,--
CRYP,--,--,--
FLRU,-66.37%,--,--


In [10]:
base_de_dados_final = base_de_dados_basic.join(base_de_dados_performance)
base_de_dados_final

Unnamed: 0_level_0,Name,Segment,Issuer,Expense Ratio,AUM,1 Year,5 Years,10 Years
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
SPY,SPDR S&P 500 ETF Trust,Equity: U.S. - Large Cap,State Street Global Advisors,0.09%,$383.85B,-9.95%,8.99%,12.39%
IVV,iShares Core S&P 500 ETF,Equity: U.S. - Large Cap,Blackrock,0.03%,$312.01B,-9.95%,9.03%,12.44%
VTI,Vanguard Total Stock Market ETF,Equity: U.S. - Total Market,Vanguard,0.03%,$285.80B,-10.36%,8.49%,12.02%
VOO,Vanguard S&P 500 ETF,Equity: U.S. - Large Cap,Vanguard,0.03%,$283.28B,-9.98%,9.02%,12.44%
QQQ,Invesco QQQ Trust,Equity: U.S. - Large Cap,Invesco,0.20%,$161.67B,-21.28%,11.98%,16.55%
...,...,...,...,...,...,...,...,...
AWYX,ETFMG 2x Daily Travel Tech ETF,Leveraged Equity: Global Internet & Direct Mar...,ETFMG,0.95%,$345.20K,-48.46%,--,--
TADS,The Active Dividend Stock ETF,Equity: U.S. - Total Market,"Tuttle Tactical Management, LLC",1.68%,$294.89K,0%,--,--
CRYP,AdvisorShares Managed Bitcoin Strategy ETF,Asset Allocation: Global Target Outcome,AdvisorShares,1.59%,$218.67K,--,--,--
FLRU,Franklin FTSE Russia ETF,Equity: Russia - Total Market,Franklin Templeton,0.19%,$8.00K,-66.37%,--,--
