Skip to content

Commit

Permalink
added tests folder, modified dockerfile and readme
Browse files Browse the repository at this point in the history
  • Loading branch information
samparsky committed Dec 1, 2018
1 parent 6466a51 commit d24e5c6
Show file tree
Hide file tree
Showing 14 changed files with 311 additions and 16 deletions.
14 changes: 14 additions & 0 deletions .travis.yml
@@ -0,0 +1,14 @@
language: python
dist: xenial
matrix:
include:
- python: "3.5"
env: TOX_POSARGS="-e py35"
- python: "3.6"
env: TOX_POSARGS="-e py36"
- python: "3.7"
env: TOX_POSARGS="-e py37"
install:
- travis_retry pip install tox
script:
- tox $TOX_POSARGS
6 changes: 3 additions & 3 deletions Dockerfile
@@ -1,11 +1,11 @@
FROM python:3.6-alpine
MAINTAINER Eric Lim <elim0322@gmail.com>
ENV PROJECT_DIR=ethereum-etl
MAINTAINER Omidiora Samuel <samparsky@gmail.com>
ENV PROJECT_DIR=bitcoin-etl

RUN mkdir /$PROJECT_DIR
WORKDIR /$PROJECT_DIR
COPY . .
RUN apk add --no-cache gcc musl-dev #for C libraries: <limits.h> <stdio.h>
RUN pip install --upgrade pip && pip install -e /$PROJECT_DIR/

ENTRYPOINT ["python", "ethereumetl"]
ENTRYPOINT ["python", "bitcoinetl"]
2 changes: 1 addition & 1 deletion LICENSE
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2018 blockchain-etl
Copyright (c) 2018 Evgeny Medvedev, Omidiora Samuel evge.medvedev@gmail.com, samparsky@gmail.com

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
190 changes: 188 additions & 2 deletions README.md
@@ -1,2 +1,188 @@
# bitcoin-etl
ETL scripts for bitcoin. Contributions are welcome.
# Bitcoin ETL

[![Join the chat at https://gitter.im/ethereum-eth](https://badges.gitter.im/ethereum-etl.svg)](https://gitter.im/ethereum-etl/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![Build Status](https://travis-ci.org/blockchain-etl/ethereum-etl.png)](https://travis-ci.org/blockchain-etl/ethereum-etl)
[Join Telegram Group](https://t.me/joinchat/GsMpbA3mv1OJ6YMp3T5ORQ)

Install Bitcoin ETL:

```bash
pip install bitcoin-etl
```

Export blocks and transactions ([Schema](#blockscsv), [Reference](#export_blocks_and_transactions)):

```bash
> bitcoinetl export_blocks_and_transactions --start-block 0 --end-block 500000 \
--rpc-pass 'test' --rpc-host 'localhost' --rpc-user 'test' \
--blocks-output blocks.csv --transactions-output transactions.csv
```

For the latest version, check out the repo and call
```bash
> pip install -e .
> python bitcoinetl.py
```

[LIMITATIONS](#limitations)

## Table of Contents

- [Schema](#schema)
- [blocks.csv](#blockscsv)
- [transactions.csv](#transactionscsv)
- [Exporting the Blockchain](#exporting-the-blockchain)
- [Export in 2 Hours](#export-in-2-hours)
- [Command Reference](#command-reference)
- [Bitcoin Cash Support](#ethereum-classic-support)
- [Querying in Amazon Athena](#querying-in-amazon-athena)
- [Querying in Google BigQuery](#querying-in-google-bigquery)
- [Public Dataset](#public-dataset)


## Schema

### blocks.csv

Column | Type |
------------------------|--------------------|
number | bigint |
hash | hex_string |
parent_hash | hex_string |
nonce | hex_string |
sha3_uncles | hex_string |
logs_bloom | hex_string |
transactions_root | hex_string |
state_root | hex_string |
receipts_root | hex_string |
miner | address |
difficulty | numeric |
total_difficulty | numeric |
size | bigint |
extra_data | hex_string |
gas_limit | bigint |
gas_used | bigint |
timestamp | bigint |
transaction_count | bigint |

### transactions.csv

Column | Type |
--------------------|-------------|
hash | hex_string |
nonce | bigint |
block_hash | hex_string |
block_number | bigint |
transaction_index| bigint |
from_address | address |
to_address | address |
value | numeric |
gas | bigint |
gas_price | bigint |
input | hex_string |

You can find column descriptions in [https://github.com/medvedev1088/ethereum-etl-airflow](https://github.com/medvedev1088/ethereum-etl-airflow/tree/master/dags/resources/stages/raw/schemas)

Note: for the `address` type all hex characters are lower-cased.
`boolean` type can have 2 values: `True` or `False`.

## LIMITATIONS
[Coming Soon]

## Exporting the Blockchain

1. Install python 3.5.3+ https://www.python.org/downloads/


### Export in 2 Hours
[Coming Soon]

### Running in Docker

1. Install Docker https://docs.docker.com/install/

1. Build a docker image
```bash
> docker build -t bitcoin-etl:latest .
> docker image ls
```

1. Run a container out of the image
```bash
> docker run -v $HOME/output:/bitcoin-etl/output bitcoin-etl:latest export_all -s 0 -e 5499999 -b 100000 -p https://mainnet.infura.io
> docker run -v $HOME/output:/bitcoin-etl/output bitcoin-etl:latest export_all -s 2018-01-01 -e 2018-01-01 -p https://mainnet.infura.io
```

### Command Reference

- [export_blocks_and_transactions](#export_blocks_and_transactions)


All the commands accept `-h` parameter for help, e.g.:

```bash
> bitcoinetl export_blocks_and_transactions --help
Usage: bitcoinetl.py export_blocks_and_transactions [OPTIONS]

Export blocks and transactions.

Options:
-s, --start-block INTEGER Start block
-e, --end-block INTEGER End block [required]
-b, --batch-size INTEGER The number of blocks to export at a time.
-h, --rpc-host TEXT The URI of the remote bitcoin node
-u, --rpc-user TEXT The RPC username of the bitcoin node [required]
-p, --rpc-pass TEXT The RPC password of the bitcoin node [required]
-o, --rpc-port INTEGER The RPC port of the bitcoin node
-w, --max-workers INTEGER The maximum number of workers.
--blocks-output TEXT The output file for blocks. If not provided
blocks will not be exported. Use "-" for stdout
--transactions-output TEXT The output file for transactions. If not
provided transactions will not be exported. Use
"-" for stdout
--help Show this message and exit.
```

For the `--output` parameters the supported types are csv and json. The format type is inferred from the output file name.

#### export_blocks_and_transactions

```bash
> bitcoinetl export_blocks_and_transactions --start-block 0 --end-block 500000 \
--rpc-pass 'test' --rpc-host 'localhost' --rpc-user 'test' \
--blocks-output blocks.csv --transactions-output transactions.csv
```

Omit `--blocks-output` or `--transactions-output` options if you want to export only transactions/blocks.

You can tune `--batch-size`, `--max-workers` for performance.

### Running Tests

```bash
> pip install -e .[dev]
> export ETHEREUM_ETL_RUN_SLOW_TESTS=True
> pytest -vv
```

### Running Tox Tests

```bash
> pip install tox
> tox
```

### Bitcoin Cash Support
[Coming Soon]

## Querying in Amazon Athena
[Coming Soon]

### Tables for Parquet Files
[Coming Soon]

## Querying in Google BigQuery
[Coming Soon]

### Public Dataset
[Coming Soon]
13 changes: 9 additions & 4 deletions bitcoinetl/cli/export_blocks_and_transactions.py
@@ -1,6 +1,6 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
# Copyright (c) 2018 Evgeny Medvedev, Omidiora Samuel evge.medvedev@gmail.com, samparsky@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -36,19 +36,24 @@
@click.option('-e', '--end-block', required=True, type=int, help='End block')
@click.option('-b', '--batch-size', default=100, type=int, help='The number of blocks to export at a time.')
@click.option('-h', '--rpc-host', default='localhost', type=str, help='The URI of the remote bitcoin node')
@click.option('-u', '--rpc-user', default=None, type=str, help='The rpc username of the bitcion node')
@click.option('-p', '--rpc-pass', default=None, type=str, help='The RPC password of the bitcoin node')
@click.option('-u', '--rpc-user', required=True, default=None, type=str, help='The RPC username of the bitcoin node')
@click.option('-p', '--rpc-pass', required=True, default=None, type=str, help='The RPC password of the bitcoin node')
@click.option('-o', '--rpc-port', default=8332, type=int, help='The RPC port of the bitcoin node')
@click.option('-w', '--max-workers', default=5, type=int, help='The maximum number of workers.')
@click.option('--blocks-output', default=None, type=str, help='The output file for blocks. If not provided blocks will not be exported. Use "-" for stdout')
@click.option('--transactions-output', default=None, type=str, help='The output file for transactions. If not provided transactions will not be exported. Use "-" for stdout')

def export_blocks_and_transactions(start_block, end_block, batch_size, rpc_host, rpc_user, rpc_pass, rpc_port, max_workers, blocks_output, transactions_output):
"""Export blocks and transactions."""

print("in export block and transactions")

if blocks_output is None and transactions_output is None:
raise ValueError('Either --blocks-output or --transactions-output options must be provided')

if rpc_user is None or rpc_pass is None:
raise ValueError('Both the --rpc-user and --rpc-pass must be provided')


job = ExportBlocksJob(
start_block=start_block,
end_block=end_block,
Expand Down
3 changes: 1 addition & 2 deletions blockchainetl/executors/batch_work_executor.py
Expand Up @@ -20,14 +20,13 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from web3.utils.threads import Timeout as Web3Timeout
from requests.exceptions import Timeout as RequestsTimeout, HTTPError, TooManyRedirects
from blockchainetl.executors.bounded_executor import BoundedExecutor
from blockchainetl.executors.fail_safe_executor import FailSafeExecutor
from blockchainetl.progress_logger import ProgressLogger
from blockchainetl.utils import dynamic_batch_iterator

RETRY_EXCEPTIONS = (ConnectionError, HTTPError, RequestsTimeout, TooManyRedirects, Web3Timeout, OSError)
RETRY_EXCEPTIONS = (ConnectionError, HTTPError, RequestsTimeout, TooManyRedirects, OSError)


# Executes the given work in batches, reducing the batch size exponentially in case of errors.
Expand Down
6 changes: 3 additions & 3 deletions blockchainetl/jobs/exporters/composite_item_exporter.py
Expand Up @@ -21,9 +21,9 @@
# SOFTWARE.
import logging

from ethereumetl.atomic_counter import AtomicCounter
from ethereumetl.exporters import CsvItemExporter, JsonLinesItemExporter
from ethereumetl.file_utils import get_file_handle, close_silently
from blockchainetl.atomic_counter import AtomicCounter
from blockchainetl.exporters import CsvItemExporter, JsonLinesItemExporter
from blockchainetl.file_utils import get_file_handle, close_silently


class CompositeItemExporter:
Expand Down
2 changes: 1 addition & 1 deletion blockchainetl/progress_logger.py
Expand Up @@ -23,7 +23,7 @@
import logging
from datetime import datetime

from ethereumetl.atomic_counter import AtomicCounter
from blockchainetl.atomic_counter import AtomicCounter


# Thread safe progress logger.
Expand Down
53 changes: 53 additions & 0 deletions setup.py
@@ -0,0 +1,53 @@
import os

from setuptools import setup, find_packages


def read(fname):
return open(os.path.join(os.path.dirname(__file__), fname)).read()


long_description = read('README.md') if os.path.isfile("README.md") else ""

setup(
name='bitcoin-etl',
version='1.0.0',
author='Omidiora Samuel',
author_email='samparsky@gmail.com',
description='Tools for exporting Bitcoin blockchain data to CSV or JSON',
long_description=long_description,
long_description_content_type='text/markdown',
url='https://github.com/blockchain-etl/bitcoiin-etl',
packages=find_packages(exclude=['schemas', 'tests']),
classifiers=[
'Development Status :: 5 - Production/Stable',
'Intended Audience :: Developers',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7'
],
keywords='bitcoin',
python_requires='>=3.5.3,<3.8.0',
install_requires=[
'python-dateutil==2.7.0',
'click==6.7',
'python-bitcoinrpc==1.0'
],
extras_require={
'dev': [
'pytest~=3.2.0',
],
},
entry_points={
'console_scripts': [
'bitcoinetl=bitcoinetl.cli:cli',
],
},
project_urls={
'Bug Reports': 'https://github.com/blockchain-etl/bitcoin-etl/issues',
'Chat': 'https://gitter.im/bitcoin-etl/Lobby',
'Source': 'https://github.com/blockchain-etl/bitcoin-etl',
},
)
Empty file added tests/__init__.py
Empty file.
Empty file added tests/bitcoinetl/__init__.py
Empty file.
Empty file.
22 changes: 22 additions & 0 deletions tests/bitcoinetl/job/test_export_blocks_job.py
@@ -0,0 +1,22 @@
# MIT License
#
# Copyright (c) 2018 Omidiora Samuel, samparsky@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

0 comments on commit d24e5c6

Please sign in to comment.