# Handout #3: Developing a Schedule

Content Authors:


*   Chris Malone Ph D, Professor of Data Science and Statistics, Winona State University; Email: cmalone@winona.edu
*   Collin Engstrom PhD, Assistant Professor of Computer Science, Winona State University; Email: collin.engstrom@winona.edu

Content in this handout was adapted from the following sources.
*    Situnayake, Daniel, and Jenny Plunkett. *AI at the Edge*. "O'Reilly Media, Inc.", 2023.

---



## Edge AI Workflow

Consider the following image from the "AI on the Edge" textbook:

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1n_nFEAZD8vd6BBPyU8usipEnc5jIHkQv" width='50%' height='50%'></img></p>


The earliest phases of the "Edge AI Workflow" comprise the **Discover** phase and involve:

*  Exploration
*  Goal setting
*  Bootstrapping

These pertain to up-front planning for a project, but they might need to be re-visited as needed throughout a project's lifespan.

The middle phases of the workflow make up the **Test and iterate** phase and include:

*  Application
*  Dataset
*  Algorithms
*  Hardware

These are more iterative and should help refine the goal(s) you've outlined in the discover phase.

---

##Exploration

This is where you will try to get a handle on what you want to do with your project. Questions that need to be answered:

1.  What is the problem we're trying to solve?
2.  Do I actually need Edge AI?
3.  Is this project feasible?
4.  What machine learning methodology (e.g., classification, regression, feature selection, etc.) is most appropriate?
5.  Are there any potential risks, harms, or unintended consequences?
6.  Who are the stakeholders? What are their needs/wants?
7.  What will the data look like (data cleaning, feature selection, visualization, etc.)?

---

##Goal Setting

During this phase, we will describe what we're aiming for. Question that need to be answered include:

1.  What evaluation metrics will we use?
2.  What systemic (big picture, fundamentals) goals do we have?
3.  What technical goals do we have?
4.  What do the stakeholders think?
5.  What values-based framework will we use to track progress?
6.  Whom should we choose to evaluate the ongoing project?
7.  What scheme should we use for testing our algorithms and application?
8.  What support goals do we have in the longer term?
9.  When should we abort the project, if necessary?

###When to call it quits

AI projects often involve failure. Some key points to consider:

*  If failure is unavoidable, it is best to *fail early* before too many resources have been expended.
*  To reduce chances of failure, *set project-specific milestones and goals*. Write these down early on in the goal setting phase.
*  At each project phase, be prepared to *evaluate the status of your project*.
*  Have a firm *understanding of your budget* in terms of time, money, and other resources.


---

##Bootstrapping

This phase is where we get started on our first iteration of putting together a pratical solution. In other words, this is where the "rubber meets the road." The end goal of bootstrapping is, ideally, to quickly come to a working prototype.

Key tasks of this phase include:

*  Collecting a minimal dataset
*  Make an attempt at determining hardware requirements
*  Developing a simple initial algorithm
*  Building the simplest possible end-to-end application
*  Doing some initial empirical testing and evluation
*  Performing an early review of the first prototype

Key questions that should be asked:

*  What *algorithm or model* can we use as our baseline? (Simpler is better for the first prototype.)
*  Is the baseline algorithm good enough, or does it need improvement?
*  What *hardware* is needed for a prototype? (Again, simpler is better for the first prototype.)
*  After first deployment, how does the algorithm *perform*? Does it meet our desired level based on our metrics?
*  If this algorithm will be incorporated into an existing company workflow, what considerations need to be made for workers? Will they need to be (re)trained?


---

##Task #1: Discover Phase


Consider the Wisconsin Breast Cancer dataset. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. A brief description of some of the characteristics measured are given below. The outcome of interest (i.e. the class variable) is whether or not cancer is present.

<br>

**Some questions to consider:**

1.  What is the problem we're trying to solve?
    - We want to determine breast cancer status using measured features.
2.  Is AI appropriate?
    - Yes, since we have a binary classification problem.
3.  What does the dataset look like?
    - (See below for detailed explanation.)
4.  What machine learning methodology might we consider using?
    - Any modeling technique that supports binary output--malignant or benign. Some examples might include: decision trees, random forests, support vector machines, and neural nets (or deep learning).
5.  What metrics might we use to evaluate the performance from (4) above?
    - Area Under the ROC (AuROC) and accuracy would be good to start with. Since false positives are a notable concern, precision and recall might also be considered. (More on all these in the next module.)
    
<br>

<table>
  <tr>
    <td width='100%'>
      <ul>
        <li><strong>Class</strong>: Labels are 0 (Benign) and 1 (Malignant)</li><br>
        <li><strong>Features</strong>:</li>
        <ul>
          <li>ID - unique ID for each sample</li>
          <li>CellClumpThickness - higher values indicate malignancy</li>
          <li>UniformCellSize - higher values suggest greater likelihood of malignancy</li>
          <li>UniformCellShape - higher values indicate more variability and potential malignancy</li>
          <li>MarginalAdhesion - lower values can be indicatvie to cancer</li>
          <li>SingleEpithelialSize - larger sizes my indicate malignancy</li>
          <li>BareNuclei - Counts the number of nuclei that are not surrounded by cytoplasm (higher values are often associated with cancer)</li>
          <li>BlandChromatin - Measure of texture of the cell nucleus chromatin (courser chromatin is typical in cancer cells)</li>
          <li>NormalNucleoli - Counts the number of nucleoli with the nucleus (higher values are linked to malignancy)</li>
          <li>Mitoses - Measures the number of cells undergoing mitosis (higher values are associated with malignancy)</li>
         </ul>
    </ul>
    </td>
</tr>
</table>

[Data - Local Copy](https://github.com/christophermalone/mayo_ml_workshop/blob/main/WI_BreastCancer.csv)

<p align='center'><img src="https://drive.google.com/uc?export=view&id=175xKGsxAKhQlul-CEMZiiTjCoR30Jny9" width='50%' height='50%'></img></p>

Image Source: https://sphweb.bumc.bu.edu/otlt/mph-modules/ph/ph709_cancer/ph709_cancer7.html

---

##Test and Iterate

Consider the following image once more:

<br>

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1mKs2C7uyl9O70rURE4yVgrO-5gtkiYAH" width='50%' height='50%'></img></p>

The *Test and Iterate* phase is where we improve upon our prototype over numerous iterations. You'll note that there are four components to this stage: **application**, **dataset**, **algorithms**, and **hardware**. These four pieces advance on their own, but will influence the development of the others.

<br>

Classically, the AI development process was thought of as a fairly simple cyclical process:

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1PoxEWrn1Z2N9HXkPZuf27QCpR9R-3BrF" width='50%' height='50%'></img></p>

This step-by-step cycle is known as a **feedback loop**. This allows us to more readily grasp how development proceeds. In reality, it's more accurate to think of the AI development process as follows:

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1tHTFaBnxOA-U86DR8rPMFOAo-3ZJkllv" width='50%' height='50%'></img></p>

Note that in this model, it is possible to transition from one component to any one of the others. Since these components are continually influencing one another, this depiction is more realistic.

<br>
A typical iterative process looks like this:

1.  Obtain data, and split it into training, validation (a.k.a. "tuning"), and test sets.
2.  Train a large model on the training set.
3.  Measure performance on the validation split.
4.  Tweak the setup (more data, tune model parameters, etc.) to optimize validation performance.
5.  Train and measure performance once more.
6.  Repeat (5) and (6) until you're satisfied the model is performing at its best.
7.  Try the model out on the held-aside test set.
8.  If the model works as expected, you're ready to begin *deploying* onto a hardware device or into the real world. Otherwise, start from scratch!
<br><br>
**Final notes:**
*  *Deployment* is a critical part of the development process.
*  Deploy early and often. This ensures you have the hardware and other resources to support your algorithm/model.
*  Your project is never really done! You'll still need to monitor and maintain its performance. A crucial part of this will be deciding how to maintain *support* for customers or end users. Some examples might include expert forums, e-mail and telephone support, etc.