Skip to content

Commit

Permalink
Atualizacoes e correcoes para uso no Windows 10, conforme itens levan…
Browse files Browse the repository at this point in the history
…tados pela issue: #22
  • Loading branch information
aphonsoar authored and Aphonso Rafael committed Apr 24, 2022
1 parent 232d3b8 commit 67e05ff
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 12 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,14 @@ Nesse repositório consta um processo de ETL para **i)** baixar os arquivos; **i

### Infraestrutura necessária:
- [Python 3.8](https://www.python.org/downloads/release/python-3810/)
- [PostgreSQL 13](https://www.postgresql.org/download/)
- [PostgreSQL 14.2](https://www.postgresql.org/download/)

---------------------

### How to use:
1. Com o Postgre instalado, inicie a instância do servidor (pode ser local) e crie o banco de dados conforme o arquivo `banco_de_dados.sql`.
1. Com o Postgres instalado, inicie a instância do servidor (pode ser local) e crie o banco de dados conforme o arquivo `banco_de_dados.sql`.

2. Conforme o seu ambiente, crie um arquivo `.env` no diretório `code`, conforme as variáveis de ambiente do arquivo `.env_template`:
2. Crie um arquivo `.env` no diretório `code`, conforme as variáveis de ambiente do seu ambiente de trabalho (localhost). Utilize como referência o arquivo `.env_template`. Você pode também, por exemplo, renomear o arquivo de `.env_template` para apenas `.env` e então utilizá-lo:
- `OUTPUT_FILES_PATH`: diretório de destino para o donwload dos arquivos
- `EXTRACTED_FILES_PATH`: diretório de destino para a extração dos arquivos .zip
- `DB_USER`: usuário do banco de dados criado pelo arquivo `banco_de_dados.sql`
Expand Down
5 changes: 2 additions & 3 deletions code/.env_template
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
OUTPUT_FILES_PATH=C:/Aphonso/Dados_RFB
EXTRACTED_FILES_PATH=C:/Aphonso/Dados_RFB/Extracted_files

OUTPUT_FILES_PATH=C:\Aphonso_C\Dados_RFB\OUTPUT_FILES
EXTRACTED_FILES_PATH=C:\Aphonso_C\Dados_RFB\EXTRACTED_FILES
DB_HOST=localhost
DB_PORT=5432
DB_USER=postgres
Expand Down
28 changes: 22 additions & 6 deletions code/ETL_coletar_dados_e_gravar_BD.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,31 @@
import zipfile

#%%
# Ler arquivo de configuração de ambiente # https://dev.to/jakewitcher/using-env-files-for-environment-variables-in-python-applications-55a1
def getEnv(env):
return os.getenv(env)

load_dotenv()
print('Especifique o local do seu arquivo de configuração ".env". Por exemplo: C:\...\Receita_Federal_do_Brasil_-_Dados_Publicos_CNPJ\code')
# C:\Aphonso_C\Git\Receita_Federal_do_Brasil_-_Dados_Publicos_CNPJ\code
local_env = input()
dotenv_path = Path(local_env+'\.env')
load_dotenv(dotenv_path=dotenv_path)

dados_rf = 'http://200.152.38.155/CNPJ/'
output_files = Path(getEnv('OUTPUT_FILES_PATH'))
extracted_files = Path(getEnv('EXTRACTED_FILES_PATH'))

#%%
# Read details from ".env" file:
try:
output_files = getEnv('OUTPUT_FILES_PATH')
extracted_files = getEnv('EXTRACTED_FILES_PATH')
print('Diretórios definidos: \n' +
'output_files: ' + str(output_files) + '\n' +
'extracted_files: ' + str(extracted_files))
except:
pass
print('Erro na definição dos diretórios, verifique o arquivo ".env" ou o local informado do seu arquivo de configuração.')

#%%
raw_html = urllib.request.urlopen(dados_rf)
raw_html = raw_html.read()

Expand Down Expand Up @@ -51,8 +68,6 @@ def getEnv(env):
########################################################################################################################
## DOWNLOAD ############################################################################################################
########################################################################################################################

# Download files
# Create this bar_progress method which is invoked automatically from wget:
def bar_progress(current, total, width=80):
progress_message = "Downloading: %d%% [%d / %d] bytes - " % (current / total * 100, current, total)
Expand Down Expand Up @@ -83,14 +98,15 @@ def bar_progress(current, total, width=80):
if not os.path.exists(extracted_files):
os.mkdir(extracted_files)

#%%
# Extracting files:
i_l = 0
for l in Files:
try:
i_l += 1
print('Descompactando arquivo:')
print(str(i_l) + ' - ' + l)
with zipfile.ZipFile(output_files / l, 'r') as zip_ref:
with zipfile.ZipFile(output_files + '\\' + l, 'r') as zip_ref:
zip_ref.extractall(extracted_files)
except:
pass
Expand Down

0 comments on commit 67e05ff

Please sign in to comment.