## Stock Analysis Engine - Extracting Datasets

This notebook shows how to extract datasets once they have been collected. Please refer to the [distributed dataset collection tools](https://github.com/AlgoTraders/stock-analysis-engine#distributed-automation-with-docker) for quickly downloading some data.

### Start Services

To develop jupyter notebooks start the [notebook-integration](https://github.com/AlgoTraders/stock-analysis-engine/blob/master/compose/notebook-integration.yml) containers using docker-compose. Here's the command to start it:
```
./compose/start.sh -j
```
Verify the containers are running:
```
docker ps -a
```

The sample data for this guide was collected using the automated dataset collection:
```
./compose/start.sh -c
```

### Verify Datasets are in Redis

By default the datasets are [automatically archived in S3](http://localhost:9000/minio/pricing/) and cached in Redis. Until S3 extraction is supported, let's confirm the datasets are in Redis before continuing.

These commands assume you have the [redis client installed](https://redis.io/download):

```
redis-cli
127.0.0.1:6379> select 4
OK
127.0.0.1:6379[4]> keys NFLX_*
 1) "NFLX_2018-10-07_options"
 2) "NFLX_2018-10-07_tick"
 3) "NFLX_2018-10-07_pricing"
 4) "NFLX_2018-10-07_minute"
 5) "NFLX_2018-10-07"
 6) "NFLX_2018-10-07_company"
 7) "NFLX_2018-10-07_daily"
 8) "NFLX_2018-10-07_news"
 9) "NFLX_2018-10-07_news1"
10) "NFLX_2018-10-07_peers"
11) "NFLX_2018-10-07_dividends"
12) "NFLX_2018-10-07_stats"
```

### Start the Logger

In [None]:
import datetime
from spylunking.log.setup_logging import build_colorized_logger
from analysis_engine.iex.utils import last_close

log = build_colorized_logger(name='intro-ds-1', handler_name='jupyter')

### Select a Ticker and Date

In [None]:
ticker = 'NFLX'
use_last_close = last_close()
last_close_str = use_last_close.strftime('%Y-%m-%d')
log.info('Using ticker={} and last close={}'.format(ticker, last_close_str))

### Extracting Entire Cache

In [None]:
import analysis_engine.iex.extract_df_from_redis as extract_utils

extract_req = work
extract_req['redis_key'] = '{}_minute'.format(work['redis_key'])

log.info('extracting - start - ticker={} from redis_key={}'.format(ticker, extract_req['redis_key']))
extract_status, minute_df = extract_utils.extract_minute_dataset(work_dict=work)
log.info('extracting - done - ticker={} from redis_key={}'.format(ticker, extract_req['redis_key']))

In [None]:
import analysis_engine.charts as ae_charts
from analysis_engine.consts import SUCCESS
from analysis_engine.consts import IEX_MINUTE_DATE_FORMAT

today_str = datetime.datetime.now().strftime('%Y-%m-%d')
if extract_status == SUCCESS:
    log.info(
        'ticker={} creating chart date={}'.format(
            ticker,
            today_str))
    """
    Plot Pricing with the Volume Overlay:
    """
    image_res = ae_charts.plot_overlay_pricing_and_volume(
        log_label='intro-nb-{}'.format(ticker),
        ticker=ticker,
        date_format=IEX_MINUTE_DATE_FORMAT,
        df=minute_df,
        show_plot=True)
else:
    log.error('ticker={} - did not extract a dataset from redis_key={}'.format(ticker, work['redis_key']))