![](assets/solutions-microsoft-logo-small.png)
<img src="assets/ai.jpg" style="height:200px;float:right;vertical-align:text-top">


## Artificial Intelligence on IaaS++


#### From the Microsoft Cloud and AI Team

This workshop leads you through a series of [Jupyter Notebooks](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html) that explain how to use the Microsoft Azure Data Science Virtual Machine (DSVM) and the Team Data Science Process (TDSP) to create Data Science and AI solutions. These solutions can be deployed in a variety of ways, leveraging Infrastructure-as-a-Service (IaaS) through Platform-as-a-Service (PaaS) environments, and a mixture of both based on requirements. 

[There are a few pre-requisites for this course - make sure you've done them before you start.](../readme.md)

### The Microsoft Azure Data Science Virtual Machine (DSVM)

<img src="assets/keyboard.jpg" style="height:100px;float:right;vertical-align:text-top">

The Microsoft [Data Science Virtual Machine](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/) (DSVM) is a customized VM image on Microsoft’s Azure cloud built specifically for doing data science. It has many popular data science and other tools pre-installed and pre-configured to jump-start building intelligent applications for advanced analytics. It is available on Windows Server and on Linux. You can use it for training, and for all phases of the Data Science process. For this workshop, we'll use two Data Science Virutal Machines - one for development, and another as both the GPU-enabled training system and the Internet of Things (IoT) target.

The advantages of working with the Data Science Virtual Machine are that it provides a quick, simple, cost-effective way of standing up a GPU-enabled environment for creation, experimention, deployment and service access with all the tools, libraries, runtimes and abstraction layers (such as Conda and Docker) needed for both learning and production. 

### The Team Data Science Process (TDSP)

<img src="assets/pin.jpg" style="height:75px;float:right;vertical-align:text-top">

[The Team Data Science Process](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview) (TDSP) is an agile, iterative data science methodology to deliver predictive analytics solutions and intelligent applications efficiently. TDSP helps improve team collaboration and learning. It contains a distillation of the best practices and structures from Microsoft and others in the industry that facilitate the successful implementation of data science initiatives. The goal is to help companies fully realize the benefits of their analytics program. Essentially, the TDSP provides:

1. A Data Science Lifecycle
2. A standardized project structure
3. An infrastructure and other resources
4. A set of tools and utilities

<img style="float: center;" height="1000" width="1000" src="https://azure.github.io/LearnAI-Bootcamp/lab03.1-tdsp_and_aml/resources/docs/images/tdsp.png">

*This workshop guides you through a series of exercises you can use to learn to implement the TDSP in your Data Science project, using Python in a Notebook. You can change the **Setup** and **Lab** cells in this Notebook to use another language, another platform, and with more or fewer prompts based on your needs.*

For the labs that follow, there may be places where you need to enter the code for the steps you see listed. For some of the lab work, there may be sections marked: 

`# <TODO: REPLACE THIS COMMENT WITH CODE>`

There may be just one line of code needed, but most often more lines of code than that - read the entire code snippet to see what you need to do. 

An Microsoft Azure Machine Learning solution has the following elements:
  - *A developent environment* - this can be a local workstation or a Virtual Machine. Microsoft has a Data Science Virtual Machine (DSVM) that has all of the tools you need to create multiple types of Machine Learning and Artificial Intelligence solutions.
  - *Microsoft Azure Machine Learning (AML)* - AML is a set of cloud-based services to create, manage, and serve your experiments and models. 
  - *Contained Environments for Experimentation and Operationalization* - You can use Docker for both Experimentation and Operationalization, and you can also Operationalize your solution using Spark. 

### Lab 0.6 - Note the TDSP Structure

<img src="assets/checkmark.jpg" style="float:right;vertical-align:text-top">

We'll start by examining the Team Data Science Process (TDSP) structure for the solution. [Ensure you have completed the pre-requisites in the introduction document](../readme.md), and then perform these steps next.  

Instructions *(you do not need to run any of this code)*:
1. [Open and review this reference](https://github.com/Azure/Azure-TDSP-ProjectTemplate)
2. This course's github clone has already included the TDSP structure for you, so you do not need to clone it again. However, in production for a new solution, you can use the following :

`cd <WhateverDirectoryYouWantForYourSolution>`

And then type:

`git clone https://github.com/Azure/Azure-TDSP-ProjectTemplate.git`

#### Lab verification
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">Navigate to the 
    
`~/notebooks/gpuclass` 

folder and ensure the directory structure has at least the same directories as located here: https://github.com/Azure/Azure-TDSP-ProjectTemplate.git</p>

### Lab 0.7 - Review and Download Project Planning Documents

<img src="assets/checkmark.jpg" style="float:right;vertical-align:text-top">

Next we'll create a project plan for the solution. You'll store the finalized project in the `Docs/Project` folder in the solution directory on your Data Science Virtual Machine, but you can edit the project plan using whatever system you like. 

Instructions:
1. [Open and review this reference](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/team-data-science-process-project-templates)
2. Download either the Microsoft Project template or Microsoft Excel file noted in that location. 
3. Edit and store the file in the `Docs/Project` folder in the solution directory on your Data Science Virtual Machine, in the `~/notebooks/gpuclass` directory.

#### Lab verification
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">Open the Microsoft Project Template or the Microsoft Excel file. For the Excel file you can use Microsoft Office 365, a compatible spreadsheet viewer or the [free Microsoft Excel viewer](https://support.microsoft.com/en-us/help/273711/how-to-obtain-the-latest-excel-viewer).</p>

### Introduction wrap-up

<img src="assets/wrapup.jpg" style="float:right;vertical-align:text-top">

This workshop introduced the Team Data Science Process, and walked you through each step of implementing it. Regardless of plaform or technology, you can use this process to guide your projects in Advanced Analytics from start to finish. 

The Notebooks are arranged in the same order as the Team Data Science Process: 

0 - *(This module)* [Introduction and Setup](./0%20-%20Introduction.ipynb)

1 - *(Proceed to this Notebook Next)* [Business Understanding](./1%20-%20Business%20Understanding.ipynb)

2 - [Data Acquisition and Understanding](./2%20-%20Data%20Acquisition%20and%20Understanding.ipynb)

3 - [Modeling](./3%20-%20Modeling.ipynb)

4 - [Deployment](./4%20-%20Deployment.ipynb)

5 - [Customer Acceptance](./5%20-%20Customer%20Acceptance.ipynb)

6 - [Workshop Wrap-up](./6%20-%20Workshop%20Wrap-up.ipynb)

<p style="border-bottom: 1px solid lightgrey;"></p> 