# Demo of Azure Computer Vision applied to Business Intelligence
# Basket Analysis


<b>Role:</b> Data Scientist, Data Engineer, Machine Learning Engineer

<b>Topic:</b> A generalized schema for basket analysis of data trapped in photos, using Azure Computer Vision.

<b>Purpose:</b> Demonstrate a way that the immense interest in A.I. (circa Feb 2023) can be used to unlock a tremendous amount of valuable B.I. data locked away in unstructured photos and images.

<b>Background:</b> In late November 2022, the phenomenon of ChatGPT was born. For the first time, a sufficiently impressive, performant, and most of all, easy-to-use A.I. was available to a wide audience. Basically, anyone with an Internet connection. 

The importance of this phenomenon is that A.I. looks like it's at the "hockey stick curve". A.I. has been permeating our lives for the past decade rather quietly. That is, through Siri, Alexa, armies of bots, etc.

While this demo isn't about ChatGPT, it is about the equally impressive Microsoft A.I. offerings, collectively called Azure Cognitive Services. 

<b>Copyright (c) 2021 - 2023 Kyvos Insights</b>

Permission is hereby granted, free of charge, to customers of Kyvos Insights obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

<b>Authored by:</b> Eugene Asahara<br>
<b>Date Created: </b> February 12, 2023

# Computer Vision and Business Intelligence

<b>Azure <i>Computer</i> Vision</b> is a set of pre-trained models capable of recognizing objects and text in images and video. This consists of widespread and common topics such as food, people in general, animals, famous locations, famous people.

<b>Azure <i>Custom</i> Vision</b> is a platform for training custom domain models, beyond the common objects training in Computer Vision. For example, an enterprise may be in the farming business. A model could be trained to recognize things related to farming, but not the general public. That might include various machinery, types of feed, types of fertilizers, what healthy corn looks like, various fungus. 



# What Fires Together Wires Together

A fundamentel of logical thinking is to correlate things/events that appear together.

<img src=".\images\basket_analysis.jpg"/>



# Architecture

<img src=".\images\azure_computer_vision_architecture.jpg"/>

# Set up Azure Computer Vision



# Set up .env file


<img src=".\images\env_file_computer_vision.jpg"/>

<b>COMPUTER_VISION_KEY="Enter Azure Computer Vision Key here"</b>

<b>COMPUTER_VISION_ENDPOINT="Enter Azure Computer Vision Endpoint here."</b>

<b>COMPUTER_VISION_IMAGE_PATH="c:/temp/computer_vision/"</b>

<b>SAVE_FILENAME = "C:\\temp\\computer_vision.json"</b>

Location to save the raw json created when processing the picture. This is the primary data source. 

This is the input for the basket analysis.

<b>FREQUENT_ITEM_SETS="C:\\temp\\computer_vision.csv"</b>

Itemsets are objects found together in each photo. This is created through a function in the apriori package.

This is the basket analysis data.

In [None]:
import pandas as pd
import json

# Functionality to retrieve Kyvos cube metadata - dimensions, attributes, members.
from kypy.Metadata import AzureComputerVisionLib as az

# Basket Analysis Demo

For this demo, we analyze a very small set of "baskets" of fruits. Each photo is a basket.

<img src=".\images\basket_analysis_demo.jpg"/>



# Set up Computer Vision demo parameters

### Recognized Tags

Azure Computer Vision will return very many tags from each photo. We probably won't be interested
in all of them, so we can specify the tags we are interested in. 

For the sake of simplicity, we will restrict it to a small set of fruit:

<b>recognized_tags</b> = ["mango","grapefruit","grape","oranges","apple", "pear", "clementine","banana"]

In reality, we would be interested in many more things than this.

### Minimum Support

If a group of items appears in more than one basket, it is an interesting data point. It's especially interesting
if it appears in many baskets. <b>min_support</b> is a minimum percentage of baskets that a group should belong to in order
to be saved as a group. 

In reality, the min_support might be very small. For example, we might be President Biden and VP Kamala Harris in many photos, but there's nothing
very newsworthy about that. But seeing a particular KGB spy and a particular CIA spy together in two or three photos out of thousands is intersting. In a
real enterprise scenario, we are able to store a larger volume of groups.

In [None]:
# Set up computer vision parameters.
recognized_tags = ["mango","grapefruit","grape","oranges","apple", "pear", "clementine","banana"]
min_support = 0.2 # 20% of images should have the group.
max_len = 5 # Most number of items in a group.
min_confidence = 0.6 # Minimum confidence of an item recognition.

# Create class that handles the interface between Azure Computer Vision and Kyvos OLAP Cubes.


In [None]:
# Create an ObjectGroups object that encapsulates the functionality that involves Azure Computer Vision.
cvo = az.ObjectGroups() # cvo is "Computer Vision Object".
# CreateGroup executes against Azure Computer vision. It presents each of the photos in the
# folder specified in the .env file with the COMPUTER_VISION_IMAGE_PATH entry.
cvo.CreateGroup(
    min_support=min_support, 
    recognized_tags=recognized_tags,
    max_len=max_len,
    min_confidence=min_confidence
)
print(cvo.frequent_itemsets.head(100))

In [None]:
# Show the "raw" results (json) from Azure Computer Vision.
# Note that all the tags (not just the ones we requested in "recognized tags").
print(json.dumps(cvo.tags,default=str, indent=2))

# Data files for the OLAP cube Basket Analysis Schema.

The following few cells show how to retrieve data for the tables in the basket analysis schema below:

<img src=".\images\computer_vision_schema.jpg"/>

Note that the fact tables are association/bridge tables.

In [None]:
# Display the items dimension. These are recognized tags from among all the images processed.
print(cvo.item_dimension.head(100))
print(cvo.group_dimension.head(100))
print(cvo.basket_dimension.head(100))
print(cvo.basket_group_fact.head(100))

# This is a fact table of all recognized items and groups they are in.
print(cvo.item_group_fact.head(100))

# A Few Sample Query functions.


Reports that could be gleaned through the schema:

<ul>
    <li>Which items appear together often?</li>
    <li>How many times did a group of items appear together?</li>
    <li>What groups does an item appear with?</li>
</ul>

## List baskets that contain apple OR mango.

Note that the create_date offers the order in of the images. So we know that pictures with apple or mango
appeared at these times. 

If we were recognizing people, we would have a chronology of when and where a person appeared.

In [None]:
def run_sample_queries():
    print("Baskets with apple OR mango.")
    print(cvo.get_baskets(["apple","mango"]).head(100))
    
    print("List baskets that include apple AND mango.")
    print(cvo.get_baskets(["apple","mango"], item_in=False))

    print("Display the confidence of the apple recognized in the file is indeed an apple.")
    print(cvo.get_tag_confidence( "apple","apples_orange_grape.jpg"))

In [None]:
run_sample_queries()

# Fact table of all items in every basket. 

This is a association/bridge table.

It's a virtual table since it could be very long if there are a large number of baskets. For example, consider something like
Web searches. Google has billions of them each day. Each of those searches consist of a number of key words. So there are several
times for keyword-search rows than just search rows.

In [None]:
print(cvo.item_basket_fact.head(500))

In [None]:
# All items - not just the ones we chose to recognize.
# basket_count is the number of distinct baskets containing the item.
print(cvo.all_items)

# Save Data.

Save the data files to the directory specified in the COMPUTER_VISION_SAVE_DATA_PATH .env key

They are the files we viewed in the prior few cells:
<ul>
    <li>basket_dimension.csv</li>
    <li>basket_group_fact.csv</li>
    <li>computer_vision.json</li>
    <li>computer_vision_frequent_itemsets.csv</li>
    <li>group_dimension.csv</li>
    <li>item_basket_fact.csv</li>
    <li>item_dimension.csv</li>
    <li>item_group_fact.csv</li>
</ul>


In [None]:
cvo.save()

# Read Existing Basket Analysis Data

We will re-read the primary json file we just saved (computer_vision.json).

Then we'll run those same queries to show that the data has been reloaded.

In [None]:
cvo.read(min_support=min_support, max_len=max_len)

In [None]:
run_sample_queries()