Build your own Cognitive Search Library in Azure for all your valuable treasures.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
DataEnricher
EnricherFunction
Microsoft.Cognitive.Capabilities
WebUI
images
.gitattributes
.gitignore
CognitiveSearch.sln
README.md

README.md

The Library Of Corom

Create your own cognitive enriched searchable libraries in Azure for all your treasures.

Library Of Corom You can watch the demo in action in Joseph Sirosh Build 2017 session (@5:00)

Overview

This project can be used to easily create an data enrichment pipeline that allows you to index documents and images automaticaly from your phone, email, scanner, etc. and search them using a Web UI. The search index is enriched using Microsoft cognitive cabilities which currently include:

  • Handwriting / OCR (keyword search)
  • Named Entity extraction (People and Places)
  • Image caption and Tags
  • Adult / Racy image score

Services

Library Of Corom

The data pipeline is as follows:

  1. images are captured on devices (such as a phone, scanner, etc)
  2. images are uploaded to the cloud (such as OneDrive, or Office 365, Outlook.com, etc)
  3. Microsoft Flow (or Azure Logic Apps) is configured to automatically move images from various places to Azure blob storage
  4. An Azure Function triggered by the blog store and
    1. uses the Cognitive Services Vision API extract text information from the image
    2. and an Azure ML webservice to extract named entities from the text (People and Places)
    3. then adds the data to the Azure Search index
  5. A single page Web App uses the AzSearch.js library to search to index

Limitations

  1. This is just a demo to showcase a congnitive search scenario. It is not intended to demonstrate a scalable architecture.
  2. The OCR technology is not perfect and the handwriting capability is in preview. The results will vary greatly by scan and image quality.
  3. The code currenly only processes images. Documents need to be in image format (.jpg, .png, .tiff, etc) rather than PDF or other document formats. Scanned documents with multiple pages should be in multi-page TIFF format. Check your scanner to see if it will generate this.

Setting up your own library

Prerequisites

  1. Azure Subscription you can access. All services can use the free teirs for this demo (with the exeption of Azure Storage)
  2. A Microsoft account with access to Microsoft Flow (comes with Office 365), otherwise use Azure Logic Apps which is pretty much the same thing.
  3. Visual Studio 2015 with tools for Azure Functions installed.
  4. Basic familiarity with using the Azure Portal and cloning and compiling code from github

Create Azure Services

TIP: create all your Azure services in the same Resource Group and Region for best performance and managability

  1. Create Azure Search service for your library. The free teir works well. Copy these settings that you will use later.

    1. Service Name (under the "Properties" section)
    2. Admin key (PRIMARY ADMIN KEY under "Keys" section)
    3. Query key (click Manage query keys under "Keys" section)
  2. Create Azure Blob Storage account for your images. The default values work well. Copy these settings that you will use later.

    1. Account Name (Storage account name under the "Access Keys" section)
    2. Account Key (key1 under the "Access Keys" section)
  3. Create a Azure ML Entity Recognition Web Service experiment by clicking the "Open in Studio" button. Using a free AzureML studio workspace for this works fine. Then click "Setup as Web Service" button, then Run the experiment, and click "Deploy Web service" to publish it. Copy these settings that you will use later.

    1. API Key (API key on the "entity recognition web service" page)
    2. Webservice URL (Request URI on the page after clicking "REQUEST/RESPONSE" web service page)
  4. Get a 30 day Cognitive Services Trial Key for the Computer Vision API or purchase one in the Azure Portal. Copy these settings that you will use later.

    1. API Key (key1 on the "your APIs" page)

Update Code

  1. Git clone or download this codebase and open the CognitiveServices.sln in Visual Studio. The free community edition will work fine. Update the configuation settings constants in the DataEnricher\EnrichFunction.cs file where indicated with comments near the top of the file.
    Set DataEnricher is as the default project and hit F5 to run it. It should run without errors and create the search indexes, blob containers, and test your settings. If it fails check your settings to ensure they are correct.

  2. Open the WebUI\index.html file in Visual Studio and update the settings near the top of the file indicated in comments. You can test the UI by running the DataEnricher project with a command line argument that points to a folder that contains some images to upload.

    DataEnricher.exe c:\myimages

    You an also enable images to be uploaded directly from the UI by setting the storage connection string in the html file, but see the security note in the code before publishing it to the internet. Also you will need to enable CORS on your storage service in the Azure portal for this to work. The CORS rule should specify all http methods, have a TTL of 500, and use "*" in the remaining fields.

    To run the UI right click WebUI\index.html in Visual Studio and select View in Browser. In the UI Hit enter in the search box to see all content uploaded to the library.

Setup an Automated Pipline

  1. Create Azure Function App for your enricher. Choose a consumption plan to pay only for what you use, or create a free App Service plan that you will share with your web UI. Right click the EnricherFunction Project in Visual Studio and select Publish... then publish to the Function app you created. Now go to the function app in the Azure Portal and expand Functions and then EnrichImages and click Integrate. Associate the storage account with this function by clicking new next to the Storage account connection and select the storage account you created earlier. Click Save.

    You can test the function by using the Azure Storage explorer to upload images to the library blob container on your storage account.

  2. Configure Microsoft Flow to create blobs from images that are taken on your phone, sent to your email, etc. You can also create these using Azure Logic Apps instead of Flow and they work the same way.

    This flow automatically adds the the library pictures taken on a phone that are uploaded to OnDrive. OneDrive Flow

    This flow automatically adds the the library pictures sent as attachments to emails send to a Outlook.com account. OneDrive Flow

Publish your Web Application

  1. In Visual Studio right click the WebUI project and select publish. You can create a new Azure Web App from Visual studio or you can do it in the Azure portal. This application will work well on a free app service plan, or you can use the same app service plan as your function app if you created one earlier.

  2. You can easily customize the UI by modifying the index.html to meet your needs. The UI is generated using the AzSearch.js library and it takes very little code to change what is shown in the search interface.