## Extracting NFLX Datasets

This notebook shows how to extract datasets once they have been collected. Please refer to the [distributed dataset collection tools](https://github.com/AlgoTraders/stock-analysis-engine#distributed-automation-with-docker) for quickly downloading some data.

### Start Services

To develop jupyter notebooks start the [notebook-integration](https://github.com/AlgoTraders/stock-analysis-engine/blob/master/compose/notebook-integration.yml) containers using docker-compose. Here's the command to start it:
```
./compose/start.sh -j
```
Verify the containers are running:
```
docker ps -a
```

The sample data for this guide was collected using the automated dataset collection:
```
./compose/start.sh -c
```

### Verify Datasets are in Redis

By default the datasets are [automatically archived in S3](http://localhost:9000/minio/pricing/) and cached in Redis. Until S3 extraction is supported, let's confirm the datasets are in Redis before continuing.

These commands assume you have the [redis client installed](https://redis.io/download):

```
redis-cli
127.0.0.1:6379> select 4
OK
127.0.0.1:6379[4]> keys NFLX_*
 1) "NFLX_2018-10-05_tick"
 2) "NFLX_2018-10-05_news"
 3) "NFLX_2018-10-05_daily"
 4) "NFLX_2018-10-05_stats"
 5) "NFLX_2018-10-05"
 6) "NFLX_2018-10-05_minute"
 7) "NFLX_2018-10-05_options"
 8) "NFLX_2018-10-05_company"
 9) "NFLX_2018-10-05_dividends"
10) "NFLX_2018-10-05_pricing"
11) "NFLX_2018-10-05_peers"
12) "NFLX_2018-10-05_news1"
127.0.0.1:6379[4]> 
```

### Create Imports and Logger

In [None]:
import datetime
import analysis_engine.charts as ae_charts
from IPython.display import display
from IPython.display import HTML
from analysis_engine.api_requests import get_ds_dict
from analysis_engine.consts import SUCCESS
from analysis_engine.consts import ppj
from analysis_engine.consts import IEX_MINUTE_DATE_FORMAT
from analysis_engine.consts import IEX_DAILY_DATE_FORMAT
from analysis_engine.consts import IEX_TICK_DATE_FORMAT
from analysis_engine.utils import utc_now_str
from analysis_engine.utils import get_last_close_str
from spylunking.log.setup_logging import build_colorized_logger

log_label = 'intro-ds-1'
log = build_colorized_logger(name=log_label, handler_name='jupyter')

### Select a Ticker and Date

In [None]:
ticker = 'NFLX'
today_str = utc_now_str()
last_close_str = get_last_close_str()

log.info('Using ticker={} with last close={}'.format(ticker, last_close_str))

### Load Cache Keys

In [None]:
cache_dict = get_ds_dict(ticker=ticker, label=log_label)
log.info('Cache keys for ticker={} and last close={} cache_dict={}'.format(ticker, last_close_str, ppj(cache_dict)))

### Extracting Minute Cache

In [None]:
from analysis_engine.iex.extract_df_from_redis import extract_minute_dataset

log.info('extracting - start - ticker={}'.format(ticker))
extract_status, minute_df = extract_minute_dataset(cache_dict)
log.info('extracting - done - ticker={}'.format(ticker))

In [None]:
if extract_status == SUCCESS:
    log.info(
        'ticker={} creating chart date={}'.format(
            ticker,
            today_str))
    """
    Plot Pricing with the Volume Overlay:
    """
    image_res = ae_charts.plot_overlay_pricing_and_volume(
        log_label='intro-nb-{}'.format(ticker),
        ticker=ticker,
        date_format=IEX_MINUTE_DATE_FORMAT,
        df=minute_df,
        show_plot=True)
else:
    log.error('ticker={} - did not extract a dataset from redis_key={}'.format(ticker, cache_dict['minute']))

### Extracting Tick Cache

In [None]:
from analysis_engine.iex.extract_df_from_redis import extract_daily_dataset

log.info('extracting - start - ticker={}'.format(ticker))
extract_status, tick_df = extract_daily_dataset(cache_dict)
log.info('extracting - done - ticker={}'.format(ticker))

In [None]:
if extract_status == SUCCESS:
    log.info(
        'ticker={} creating chart date={}'.format(
            ticker,
            today_str))
    """
    Plot Pricing with the Volume Overlay:
    """
    image_res = ae_charts.plot_overlay_pricing_and_volume(
        log_label='intro-nb-{}'.format(ticker),
        ticker=ticker,
        date_format=IEX_TICK_DATE_FORMAT,
        df=tick_df,
        show_plot=True)
else:
    log.error('ticker={} - did not extract a dataset from redis_key={}'.format(ticker, cache_dict['tick']))

### Extracting Daily Cache

In [None]:
from analysis_engine.iex.extract_df_from_redis import extract_daily_dataset

log.info('extracting - start - ticker={}'.format(ticker))
extract_status, daily_df = extract_daily_dataset(cache_dict)
log.info('extracting - done - ticker={}'.format(ticker))

In [None]:
if extract_status == SUCCESS:
    log.info(
        'ticker={} creating chart date={}'.format(
            ticker,
            today_str))
    """
    Plot Pricing with the Volume Overlay:
    """
    image_res = ae_charts.plot_overlay_pricing_and_volume(
        log_label='intro-nb-{}'.format(ticker),
        ticker=ticker,
        date_format=IEX_DAILY_DATE_FORMAT,
        df=daily_df,
        show_plot=True)
else:
    log.error('ticker={} - did not extract a dataset from redis_key={}'.format(ticker, cache_dict['daily']))

### Extracting Stats Cache

In [None]:
from analysis_engine.iex.extract_df_from_redis import extract_stats_dataset

log.info('extracting - start - ticker={}'.format(ticker))
extract_status, stats_df = extract_stats_dataset(cache_dict)
log.info('extracting - done - ticker={}'.format(ticker))

In [None]:
if extract_status == SUCCESS:
    display(stats_df)
else:
    log.error('ticker={} - did not extract a dataset from redis_key={}'.format(ticker, cache_dict['stats']))

### Extracting Peers Cache

In [None]:
from analysis_engine.iex.extract_df_from_redis import extract_peers_dataset

log.info('extracting - start - ticker={}'.format(ticker))
extract_status, peers_df = extract_peers_dataset(cache_dict)
log.info('extracting - done - ticker={}'.format(ticker))

In [None]:
if extract_status == SUCCESS:
    display(peers_df)
else:
    log.error('ticker={} - did not extract a dataset from redis_key={}'.format(ticker, cache_dict['peers']))

### Extracting News from IEX Cache

In [None]:
from analysis_engine.iex.extract_df_from_redis import extract_news_dataset

log.info('extracting - start - ticker={}'.format(ticker))
extract_status, news_iex_df = extract_news_dataset(cache_dict)
log.info('extracting - done - ticker={}'.format(ticker))

In [None]:
if extract_status == SUCCESS:
    display(news_iex_df)
else:
    log.error('ticker={} - did not extract a dataset from redis_key={}'.format(ticker, cache_dict['news1']))

### Extracting Financials Cache

In [None]:
from analysis_engine.iex.extract_df_from_redis import extract_financials_dataset

log.info('extracting - start - ticker={}'.format(ticker))
extract_status, financials_df = extract_financials_dataset(cache_dict)
log.info('extracting - done - ticker={}'.format(ticker))

In [None]:
if extract_status == SUCCESS:
    display(financials_df)
else:
    log.error('ticker={} - did not extract a dataset from redis_key={}'.format(ticker, cache_dict['financials']))

### Extracting Earnings Cache

In [None]:
from analysis_engine.iex.extract_df_from_redis import extract_earnings_dataset

log.info('extracting - start - ticker={}'.format(ticker))
extract_status, earnings_df = extract_earnings_dataset(cache_dict)
log.info('extracting - done - ticker={}'.format(ticker))

In [None]:
if extract_status == SUCCESS:
    display(earnings_df)
else:
    log.error('ticker={} - did not extract a dataset from redis_key={}'.format(ticker, cache_dict['earnings']))

### Extracting Dividends Cache

In [None]:
from analysis_engine.iex.extract_df_from_redis import extract_dividends_dataset

log.info('extracting - start - ticker={}'.format(ticker))
extract_status, dividends_df = extract_dividends_dataset(cache_dict)
log.info('extracting - done - ticker={}'.format(ticker))

In [None]:
if extract_status == SUCCESS:
    display(dividends_df)
else:
    log.error('ticker={} - did not extract a dataset from redis_key={}'.format(ticker, cache_dict['dividends']))

### Extracting Company Cache

In [None]:
from analysis_engine.iex.extract_df_from_redis import extract_company_dataset

log.info('extracting - start - ticker={}'.format(ticker))
extract_status, company_df = extract_company_dataset(cache_dict)
log.info('extracting - done - ticker={}'.format(ticker))

In [None]:
if extract_status == SUCCESS:
    display(company_df)
else:
    log.error('ticker={} - did not extract a dataset from redis_key={}'.format(ticker, cache_dict['company']))

### Extracting Option Calls Cache

In [None]:
from analysis_engine.yahoo.extract_df_from_redis import extract_option_calls_dataset

log.info('extracting - start - ticker={}'.format(ticker))
extract_status, option_calls_df = extract_option_calls_dataset(cache_dict)
log.info('extracting - done - ticker={}'.format(ticker))

In [None]:
if extract_status == SUCCESS:
    display(option_calls_df)
else:
    log.error('ticker={} - did not extract a dataset from redis_key={}'.format(ticker, cache_dict['options']))

### Extracting Option Puts Cache

In [None]:
from analysis_engine.yahoo.extract_df_from_redis import extract_option_puts_dataset

log.info('extracting - start - ticker={}'.format(ticker))
extract_status, option_puts_df = extract_option_puts_dataset(cache_dict)
log.info('extracting - done - ticker={}'.format(ticker))

In [None]:
if extract_status == SUCCESS:
    display(option_puts_df)
else:
    log.error('ticker={} - did not extract a dataset from redis_key={}'.format(ticker, cache_dict['options']))

### Extracting Pricing from Yahoo Cache

In [None]:
from analysis_engine.yahoo.extract_df_from_redis import extract_pricing_dataset

log.info('extracting - start - ticker={}'.format(ticker))
extract_status, pricing_df = extract_pricing_dataset(cache_dict)
log.info('extracting - done - ticker={}'.format(ticker))

In [None]:
if extract_status == SUCCESS:
    display(pricing_df)
else:
    log.error('ticker={} - did not extract a dataset from redis_key={}'.format(ticker, cache_dict['pricing']))

### Extracting News from Yahoo Cache

In [None]:
from analysis_engine.yahoo.extract_df_from_redis import extract_yahoo_news_dataset

log.info('extracting - start - ticker={}'.format(ticker))
extract_status, news_yahoo_df = extract_yahoo_news_dataset(cache_dict)
log.info('extracting - end - ticker={}'.format(ticker))

In [None]:
if extract_status == SUCCESS:
    display(news_yahoo_df)
else:
    log.error('ticker={} - did not extract a dataset from redis_key={}'.format(ticker, cache_dict['news']))