# WikiRecentFinale

For the impatient (like me). 

This is notebook is functional equivilant to the imgAna4 notebook, it has fewer cells since
support code has been moved into script files. The other notebooks in this directory
walk through building up this application in detail. 



## Overview 
This application locates, crops and performs emotion classification of photos containing people submitted to WikiPedia. The application is composed, built and submitted looks like...

![stillPhase5.jpg](images/stillPhase4.jpg)

Events are flowing from left to right:
- WikiPedia sends out events via a SSE connection, they enter the stream on the far left.
- Many WikiPedia events are generated by 'robots' that check for 'undesirable' content, robot actions are dropped from this stream.
- For this application WikiPedia sends inconseqential fields, these are pared down in this next operation. 
- WikiPedia gets data in many languages spread over many servers, the next operation aggregates the language. The result of this processing is not rendered in this notebook, refer to a prior notebooks for rendering. 
- Some events refer to images, the next operator leverages the [beautifulsoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) library to discover referenced images. If an image is not found, the event is dropped. 
- Some of the images submitted to WikiPedia may have faces, we use the [Facial Recognizer](https://developer.ibm.com/exchanges/models/all/max-facial-recognizer/) (NeuralNet) to extract  faces from images. If no potential face is not found, the event is dropped. All potential faces are cropped from images and sent to the next operator.
- The last operator in this stream, on the far right, uses the [Facial Emotion Classifier](https://developer.ibm.com/exchanges/models/all/max-facial-emotion-classifier/)(NeuralNet)
to classify the cropped images. 
-  Images with potential faces, cropped and the scored are rendered below. 

Execute the following cells to setup, compose, submit and render the live WikiPedia data.


<a name="setup"></a>
# Setup
### Add credentials for the IBM Streams service
This notebook has been tested using the ICP4D and Cloud instances. ICP4D uses the systems notebook's. I've submitted to Cloud from Jupyter notebook deployed from an Anaconda install.
#### ICPD setup
With the cell below selected, click the "Connect to instance" button in the toolbar to insert the credentials for the service.

<a target="blank" href="https://developer.ibm.com/streamsdev/wp-content/uploads/sites/15/2019/02/connect_icp4d.gif">See an example</a>.

#### Cloud setup

To use Streams instance running in the cloud setup a [credential.py](setup_credential.ipynb)


##  Show me
After doing the 'Setup' above you can use Menu 'Cell' | 'Run All' to compose, build, submit and start the rendering of the live Wikidata. Go to [Show me now](#showMeNow) for the rendering - be patient, a significant amount of processing/communicating is being done.


In [None]:
# Install components
!pip install sseclient
!pip install --user --upgrade streamsx

In [3]:
# Setup 

from IPython.core.debugger import set_trace
from IPython.display import display, clear_output

import sys

import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import Button, HBox, VBox, Layout
from matplotlib.pyplot import imshow

from streamsx.topology.topology import *
import streamsx.rest as rest
from streamsx.topology import context

if '../scripts' not in sys.path:
    sys.path.insert(0, '../scripts')
%matplotlib inline

In [4]:
# import support code that was embeded
from streams_render import list_jobs
from streams_render import display_views

## Connect to the server : ICP4D or Cloud instance.¶

Attempt to import if fails the cfg will not be defined we know were using Cloud.

In [5]:
def get_instance(cfg=None):
    """Setup to access your Streams instance.

    ..note::The notebook is work within Cloud and ICP4D. 
            Refer to the 'Setup' cells above.              
    Returns:
        instance : Access to Streams instance, used for submitting and rendering views.
    """
    try:
        from icpd_core import icpd_util
        import urllib3
        cfg[context.ConfigParams.SSL_VERIFY] = False
        instance = rest.Instance.of_service(cfg)
        print("Within ICP4D")
        urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
    except ImportError:
        cfg = None
        print("Outside ICP4D")
        import credential  
        sc = rest.StreamingAnalyticsConnection(service_name='Streaming3Turbine', 
                                               vcap_services=credential.vcap_conf)
        instance = sc.get_instances()[0]
    return instance,cfg

try:
    cfg
except NameError:
    cfg = None
instance,cfg = get_instance(cfg)

Outside ICP4D


## List jobs and cancel....

This page will submit a job named 'WikiPhase4'. If it's running you'll want to cancel it before submitting a new version. If it is running, no need to cancel/submit you can just procede to the [Viewing data section](#viewingData).


In [6]:
list_jobs(instance, cancel=True)

<a id='composeSubmit'></a>
## Compose, build and submit the Streams application.


### Compose 

In [7]:
## Import the operations that will be composed.
from streams_operations import get_events
from streams_operations import sum_aggregation
from streams_operations import tally_fields
from streams_operations import wiki_lang
from streams_operations import soup_image_extract
from streams_operations import facial_image
from streams_operations import emotion_image

## Compose the Flow
def WikiRecentFinale(jobName=None, wiki_lang_fname=None):
    """
    Compose topology. 
    -- wiki_lang : csv file mapping database name to langauge

    """
    topo = Topology(name=jobName)
    ### make sure we sseclient in Streams environment.
    topo.add_pip_package('sseclient')
    topo.add_pip_package('bs4')

    ## wiki events
    wiki_events = topo.source(get_events, name="wikiEvents")
    ## select events generated by humans
    human_filter = wiki_events.filter(lambda x: x['type']=='edit' and x['bot'] is False, name='humanFilter')
    # pare down the humans set of columns
    pared_human= human_filter.map(lambda x : {'timestamp':x['timestamp'],
                                              'new_len':x['length']['new'],
                                              'old_len':x['length']['old'], 
                                              'delta_len':x['length']['new'] - x['length']['old'],
                                              'wiki':x['wiki'],'user':x['user'],
                                              'title':x['title']}, 
                        name="paredHuman")
    pared_human.view(buffer_time=1.0, sample_size=200, name="paredEdits", description="Edits done by humans")

    ## Define window(count)& aggregate
    sum_win = pared_human.last(100).trigger(20)
    sum_aggregate = sum_win.aggregate(sum_aggregation(sum_map={'new_len':'newSum','old_len':'oldSum','delta_len':'deltaSum' }), name="sumAggregate")
    sum_aggregate.view(buffer_time=1.0, sample_size=200, name="aggEdits", description="Aggregations of human edits")

    ## Define window(count) & tally edits
    tally_win = pared_human.last(100).trigger(10)
    tally_top = tally_win.aggregate(tally_fields(fields=['user', 'title'], top_count=10), name="talliesTop")
    tally_top.view(buffer_time=1.0, sample_size=200, name="talliesCount", description="Top count tallies: user,titles")

    ## augment filterd/pared edits with language
    if cfg is None:        
        lang_augment = pared_human.map(wiki_lang(fname='../datasets/wikimap.csv'), name="langAugment")
    else:
        lang_augment = pared_human.map(wiki_lang(fname=os.environ['DSX_PROJECT_DIR']+'/datasets/wikimap.csv'), name="langAugment")

    lang_augment.view(buffer_time=1.0, sample_size=200, name="langAugment", description="Language derived from wiki")

    ## Define window(time) & tally language
    time_lang_win = lang_augment.last(datetime.timedelta(minutes=2)).trigger(5)
    time_lang = time_lang_win.aggregate(tally_fields(fields=['language'], top_count=10), name="timeLang")
    time_lang.view(buffer_time=1.0, sample_size=200, name="talliesTime", description="Top timed tallies: language")

    ## attempt to extract image using beautifulsoup add img_desc[{}] field
    soup_image = lang_augment.map(soup_image_extract(field_name="title", url_base="https://www.wikidata.org/wiki/"),name="imgSoup")
    soup_active = soup_image.filter(lambda x: x['img_desc'] is not None and len(x['img_desc']) > 0, name="soupActive")
    soup_active.view(buffer_time=1.0, sample_size=200, name="soupActive", description="Image extracted via Bsoup")
    
    ## facial extraction  - 
    facial_images = soup_active.map(facial_image(field_name='img_desc'),name="facialImgs")
    face_image = facial_images.flat_map(name="faceImg")
    face_image.view(buffer_time=10.0, sample_size=20, name="faceImg", description="Face image analysis/extraction")

    ## emotion anaylsis on image - 
    face_emotion = face_image.map(emotion_image(), name="faceEmotion")
    face_emotion.view(buffer_time=10.0, sample_size=20, name="faceEmotion", description="Factial emotion analysis")
       
    return ({"topo":topo,"view":{ }})

### Build & Submit : ICP or Cloud

In [8]:
resp = WikiRecentFinale(jobName="WikiRecentFinale")
if cfg is not None:
    # Disable SSL certificate verification if necessary
    cfg[context.ConfigParams.SSL_VERIFY] = False
    submission_result = context.submit("DISTRIBUTED",resp['topo'], config=cfg)

if cfg is None:
    import credential
    cloud = {
        context.ConfigParams.VCAP_SERVICES: credential.vcap_conf,
        context.ConfigParams.SERVICE_NAME: "Streaming3Turbine",
        context.ContextTypes.STREAMING_ANALYTICS_SERVICE:"STREAMING_ANALYTIC",
        context.ConfigParams.FORCE_REMOTE_BUILD: True,
    }
    submission_result = context.submit("STREAMING_ANALYTICS_SERVICE",resp['topo'],config=cloud)

# The submission_result object contains information about the running application, or job
if submission_result.job:
    print("JobId: ", submission_result['id'] , "Name: ", submission_result['name'])


JobId:  2 Name:  ipythoninput779cc0244094f::WikiRecentFinale_2


<a id='viewingData'></a>
## Viewing data 

The running application has a number of views, a view enables observation of the data moving through the stream. The following cell will fetch the views' queue and display it's data when selected. 

| view name | description of data is the view | bot |
|---------|-------------|--------------|
|aggEdits  | summarised fields | False |
|langAugment | mapped augmented fields | False |
|paredEdits | seleted fields | False |
|talliesCount | last 100 messages tallied | False | 
|talliesTimes | 2 minute windowed | False |
|soupActive | extracted images links| False |
|faceImg | analyse image for faces and extract | False |
|faceEmotion | emotional analysis of facial images | False | 



You want to stop the the fetching the view data when done.

In [None]:
# View the data that is flowing.....
display_views(instance, "WikiRecentFinale")

![phase4_1.gif](attachment:phase5.gif)

# Access Views / Render Views UI

From the server this is getting the cropped images. Streams is passing the image through the 
IBM Facial Recognizer that extracts the coordinates of potential faces. A new tuple is generated
for each potential face consisting of the 
- input tuple, this include a url image being analyzed
- face dict() consisting of ...
- - probability : probabilty that it's an face
- - image_percentage : % of image original image the found face occupies
- - bytes_PIL_b64 : binary image version of found image
- - detection_box : region within the original image the face was detected



<a id='showMeNow'></a>
### Show me now

In [10]:
## Setup the 'Dashboard' - Display the images sent to Wikipedia, result of facial extraction followed by emotion (pie chart) analysis
##                         Next cell populates the 'Dashboard'.....
from streams_render import render_emotions
crops_bar = list()  # setup in layout section.
bar_cells = 7
full_widget = widgets.Output(layout={'border': '1px solid red','width':'100%','height':'300pt'})
vbox_bar = list()
for idx in range(bar_cells):
    vbox = {
        'probability' : widgets.Label(value="prop:{}".format(idx), layout={'border': '1px solid blue','width':'100pt'}),
        'image_percent' : widgets.Label(value="image %", layout={'border': '1px solid blue','width':'100pt'}),
        'image' : widgets.Output(layout={'border': '1px solid blue','width':'100pt','height':'120pt'}),
        'pie' : widgets.Output(layout={'border': '1px solid black','width':'100pt','height':'100pt'})
    }
    crops_bar.append(vbox)
    vbox_bar.append(widgets.VBox([vbox['probability'], vbox['image_percent'], vbox['image'], vbox['pie']]))
    
display(widgets.VBox([full_widget,widgets.HBox(vbox_bar)]))

In [None]:
# Populate the dashboard - If you want this to run longer set cnt higher
cnt = 40
_view = instance.get_views(name="faceEmotion")[0]
_view.start_data_fetch()
for idx in range(10):
    emotion_tuples = _view.fetch_tuples(max_tuples=10, timeout=20)
    print("Count of tuples", len(emotion_tuples))
    render_emotions(emotion_tuples, full_widget, crops_bar, bar_cells)
_view.stop_data_fetch()

## Cancel jobs when your done

In [12]:
list_jobs(instance, cancel=True)

# Notebook wrap up¶

In this notebook we composed and deployed a Streams application that processes live Wikipedia events on a server. It 
extended the previous application to using the extracted image that we applied 
deep learning image processing models to derive insight into the images submitted.

## Extentions
The processing on the stream can continue on, gaining more insights into the
events occuring on the Wikipedia servers.

Possible explorations
- Continue the face analysis stream by applying the [Facial Age Estimator](https://developer.ibm.com/exchanges/models/all/max-facial-age-estimator/)
- For for smaller cropped facial apply the [Image Resolution Enahance](https://developer.ibm.com/exchanges/models/all/max-image-resolution-enhancer/) before proceding to the emotion analysis.
- Use the image [Image Caption Generator](https://developer.ibm.com/exchanges/models/all/max-image-caption-generator/) to generate captions for the images
- Use the result of the 'Image Caption Generator' to verify captions provided by the submitted, check the translation of captions submitted outside english speaking countries. 
- Check the sentiment of submitted updated.
- On images that faces do not appear use the [Object Detector](https://developer.ibm.com/exchanges/models/all/m[beautifulsoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) beatuful soup to extract the text being sumbitted and apply 
[Sentiment Classifier](https://developer.ibm.com/exchanges/models/all/max-text-sentiment-classifier/)

