[View this in Azure Notebooks](https://notebooks.azure.com/library/vsdsdemo/)

# Overview of Data Science Workload in VS2017

Visual Studio 2017 allows developers to install tools according to scenarios and platforms they are interested in. One such set of tools, or "workloads" is the Data Science and Analytical Apps workload. This workload enables developers to one-click install Python tools, the Anaconda distribution, R tools, Microsoft R Client, and F# tools. Developers can also select only a subset of tools as per their need. 

![installer](https://github.com/eclectir/Visual-Studio-Data-Science-Demos/raw/master/images/ds-wrkld.png)

This demo shows how simple it is to solve data science and machine learning problems using only the tools and runtimes installed by this workload. This demo uses Python, but you could easily plug in R and F# demos available in the repository into this script. 

## Feature Overview

Check out the [VS blog post on the Data Science workload](https://blogs.msdn.microsoft.com/visualstudio/2016/11/18/data-science-workloads-in-visual-studio-2017-rc/) for a lap around the workload. The idea behind the Data Science workload is to install premier data science tools, distros, and clients with one click. Before this, developers needed to go to non-MSFT sites to get their distros and install VS extensions individually. Now developers can get all they need to get started with data science in one go through the comfort of the VS installer. 

The Workload has the following components:

### Python Components
- Python language support
- Python web support
- Python native tools
- Anaconda 32-bit
- Anaconda 64-bit

### R Components
- R language support
- Microsoft R Client
- R Runtime

### F# Components
- F# tools for Visual Studio

## Notable Customer Scenarios

### Basic Data Science 
The data science process entails numerous steps: data import, data cleaning, data joining, training models, plotting results. Python and R offer power tools and packages for each stage in the data science process. A very basic scenario is to predict values or classify data into categories via supervised machine learning i.e. training a prediction or classification model with pre-existing data. Python and R support for Visual Studio affords data scientists to handle all peices of this process from an polyglot IDE.

### Scaling R scripts
Often data scientists need to analyze large data sets but may not have sufficient local compute resources. To benefit from disk scalability, performance and speed, data scientists can use Microsoft R Client to push the compute to a production instance of Microsoft R Server in SQL Server, Linux Server, Windows Server, Hadoop/Spark, and Teradata DB.

## Demo Overview

1. Workload Installer and components of the DS workload.
2. Simple ML example using Pandas, NumPy, Sklearn, Matplotlib and the Anaconda distro. You may want to use an R or F# sample to suit your audience. 
3. Plotting & Sharing results.
4. Azure Notebooks

**Shortform Demo: 10-30 min**  
- Overview of workload components, and demo of how to run a simple ML script captures the spirit of this feature. 
- For shorter demos, steps 1, 2, and 3 should suffice. 

**Longform Demo : 30-60 min**  
- Drill down of each component in the data science workload, and discussion of scenarios for each language: R, Python, F#. 
- Overview of Microsoft R services. ML example in each language highlighting the full suote of tools available: debugging, profiling, package/environment management. 
- An overview of Azure Notebooks, and how Jupyter notebooks fit in the data science workflow.

## Installing the Workload

Workloads are the same across all SKUs of Visual Studio. After the VS installer has downloaded files, it will display worlkoads from choose from. Select the data science workload. You may also allude to other related workloads such as the Python Web Development workload which installs the same set of Python tools DS workload does, and the Data workload which includes SQL Server and other database tools. 

![installer](https://github.com/eclectir/Visual-Studio-Data-Science-Demos/raw/master/images/ds-wrkld.png)

Once you have a selected a workload, you will notice that a few core components are checked by default. You may uncheck these to hihglight how developers can reduce the size of their installation by only checking what they need.  

## Workload Components
Depending on the length of the demo, you can drill into each of the workload components. Here is a brief description of each of the components, with helpful links to learn more about the products they support or integrate with. 

### Python language support
This component includes Python language support for Visual Studio: Editor, debugger, REPL, Environment and Package Management, along with project templates for Machine Learning. 

- **Editor** - complete editing experience for Python scripts, including detachable/tabbed windows, syntax highlighting, and much more.
- **IntelliSense** - (aka auto-completion) available in both the editor and the Interactive Python window.
- **Python Interactive Window** - work with the Python console directly from within Visual Studio.
- **Plotting** 
- **Debugging** breakpoints, stepping, watch windows, call stacks and more.
- **Git** source code control via Git and GitHub.


### Python web support
Includes templates for Bottle, Flask, and Azure Cloud Services.

### R language support
This component includes R language support for Visual Studio.
- **Editor** - complete editing experience for R scripts and functions, including detachable/tabbed windows, syntax highlighting, and much more.
- **IntelliSense** - (aka auto-completion) available in both the editor and the Interactive R window.
- **R Interactive Window** - work with the R console directly from within Visual Studio.
- **History window** - view, search, select previous commands and send to the Interactive window.
- **Variable Explorer** - drill into your R data structures and examine their values.
- **Plotting** - see all of your R plots in a Visual Studio tool window.
- **Debugging** - breakpoints, stepping, watch windows, call stacks and more.
- **R Markdown** - R Markdown/knitr support with export to Word and HTML.
- **Git** - source code control via Git and GitHub.
- **Extensions** - over 6,000 Extensions covering a wide spectrum from Data to Languages to Productivity.
- **Help** use ? and ?? to view R documentation within Visual Studio.

### R runtime
The workload installs the CRAN-R distribution of the R language runtime. 

### Microsoft R Client
Microsoft R Client is a free, community-supported, data science tool for high performance analytics. R Client is built on top of Microsoft R Open so you can use any open source R package to build your analytics. Additionally, R Client introduces the [powerful ScaleR technology](https://msdn.microsoft.com/en-us/microsoft-r/scaler-getting-started) and its proprietary functions to benefit from parallelization and remote computing. 

R Client allows you to work with production data locally using the full set of ScaleR functions, but there are some constraints. On its own, the data to be processed must fit in local memory, and processing is limited up to two threads for ScaleR functions. To benefit from disk scalability, performance and speed, you can push the compute context to a production instance of Microsoft R Server such as [SQL Server R Services](https://msdn.microsoft.com/en-us/library/mt604845.aspx) and R Server for Hadoop. 

An R audience might benefit from the details of Microsoft R services, especially the differences between CRAN-R, Microsoft R Open, and Microsoft R Services. 

### F# language support


## A Basic ML example with Anaconda
This example is meant to highlight how templates and the Anaconda distro installed by the workload make for a powerful combination that can help data scientists perform basic tasks clustering, regression, and classification very quickly, without needing to step out of the IDE. 

The template used does the following:
1. Fetches an open data set on stock values.
2. Cleans the data set and reads it into dataframe and matrices. 
3. Trains regression models using three different learners.
4. Plots training values (gotten from the data set) and predicted values for each learner SxS.

Use the script in the github repository or create a regression script from its template by going to File -> New Project -> Python -> Machine Leanring -> Regression. 

![regression](https://github.com/eclectir/Visual-Studio-Data-Science-Demos/raw/master/images/py-env-window.PNG)

Launch the Python environmens window by navigating to Tools -> Python Tools -> Python Environments. Select the Anaconda environment as the default, global environment.

![anaconda](https://github.com/eclectir/Visual-Studio-Data-Science-Demos/raw/master/images/py-env-window.PNG)

Now walk through the code. Highlight how the script uses some of the key scientific computing packages: Pandas, NumPy, Sklearn, and Matplotlib. Show how three different regression learners are being trained in the model training step(steps are marked in the code with comments).

Run the script to generate the plot. The take-away is that one can get from 0 to Insight using the samples and the Anaconda distro with one click. There was no need to configure things to run a reasonably insightful analysis. The script is easily customizable to read data from a different source. 


## Plotting and Sharing
The regression sample generates three regression plots in the same Window. In this step, focus on the tools available in the plot window e.g. pan, zoom, and export diagram to PDF. 

## Azure Notebooks

[Azure Notebooks](http://notebooks.azure.com) is a new Azure Service (Preview) that hosts Jupyter Notebooks in the Cloud. The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

![image](https://jupyter.org/assets/jupyterpreview.png)

The service also provides a simple, intuitive experience to create libraries of notebooks and share them via an url. It supports Python, R, and F# languages. 


In a long form demo, it might be worth going through one of the many samples on the homepage. Check out this notebook to learn how to demo Azure notebooks to your customers.