# Getting Started

This is a Jupyter notebook which makes use of the file extension `.ipynb`. **Jupyter Notebooks** are interactive environments for REPL (Read-Evaluate-Print-Loop) instruction sets that allow work to be shared among colleagues consistently, reproducibly, and in real-time sessions, eg. The code blocks may be saved with their respective outputs **OR** The code blocks may be altered in future sessions.

## Accessing Jupyter Notebooks

To access the `.ipynb` file-type, there are a few common options which we'll explore:
- Jupyter Server
- VSCode's Jupyter Package

One potential benefit associated with the overhead of hosting a Jupyter Server is that it allows multiple team members to access and edit the notebook simultaneously in a shared session. This technique is commonly employed to streamline Big Data operations among Titans of Industry. For more information, check out the list of current users on the __[Official Site](https://jupyter.org/)__.

The listed enterprise users include the following:
- Microsoft
- Google
- Oracle
- IBM
- NASA
- Bloomberg

Microsoft is notorious for __[*"eating their own dog food"*](https://en.wikipedia.org/wiki/Eating_your_own_dog_food)__ and they've found uses for Jupyter notebooks in their operations!

## Blocks

Each Jupyter notebook consists of units called "blocks." A **block** may either be a ***markdown block*** for storing text or a ***code block*** for storing code written in various interactive programming languages including: python, R, and Julia.

<!-- NOTE: There's actually a lot of supported REPL languages -->

Upon running a block the formatted text will be rendered or the code will be evaluated and relevant output printed. Otherwise, code blocks will likely print the output of the last statement if it is not a variable assignment.

This document/notebook/workbook structure succeeds in hosting documentation and code alongside each other without introducing excessive comments in the code which may make it more difficult to read.

<!-- Bonus: Some of the code includes excessive comments -->

## Where's the Value?

> "The proof is in the pudding"

The reality is that VBA (and similar *scripting* equivalent) codebases quickly become unmanageable and unmaintainable.

Python reinforces readability, extensibility, cross-platform operability, and Guido van Rossum (the creator of the language) insists upon writing self-documenting code rather than clever code where possible.

With respect to the business logic, there's also likely to be one product expert that understands the **intended** functionality of your VBA codebase and that has the potential to spell real-world consequences for business operations.

If you are currently maintaining a VBA codebase, I would ask you to consider the possibility of migrating your code from VBA to python. VBA itself is filled with anti-patterns, antiquated conventions, and it doesn't fully support modern OOP (Object Oriented Programming) paradigms.

For instance, To remove column(s) in VBA one is forced to reverse iterate over the Column objects comparing each Cell value to either an index or String value stored in a list (Variant). The reason that a reverse loop is required is that it's generally frowned upon to modify a collection while iterating over it.

This approach carries the likelihood of creating bugs by introducing overly complicated boilerplate code.

Another potential benefit is that modern colleges typically teach either Java or python for students. It's a universal cross-platform glue language for students, teachers, IT professionals, data scientists, analysts, and web developers alike. Therefore, it's ubiquitous and will likely receive continued extended use among professionals due to its ease and broad prototyping applications.

## Code Comparison

As an example, follow along with the following code samples which demonstrate performing the same operation in both VBA and python: removing a collection of columns by their header name.

### VBA

In this sample, a sub-routine named `ReverseIterateAndDeleteColumns()` is defined which takes no arguments and iterates through the Columns of the ActiveWorksheet and a Variant containing column names for which we want to remove the entire column.

<!-- syntax highlighting: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks

supported languages: https://github.com/github-linguist/linguist/blob/master/lib/linguist/languages.yml -->

```visual basic for applications
' Require "explicit" variable datatype declarations prior to assignment
Option Explicit

Sub ReverseIterateAndDeleteColumns()
    Dim ws As Worksheet
    Dim rng As Range
    Dim colCount As Integer
    Dim colArray As Variant

    ' set the active worksheet
    Set ws = ThisWorkbook.Sheets('Sheet1')

    ' define a list of columns to delete
    colArray = Array("Delete", "These")

    ' reverse iterate through columns because VBA
    For colCount = ws.Cells(1, ws.Columns.Count).End(xlToLeft).Column to 1 Step -1
        Set rng = ws.Cells(1, colCount)
        If Not IsError(Application.Match(rng.Value, colArray, 0)) Then
            ' cell Value is in the array so delete it
            rng.EntireColumn.Delete
        End If
    Next colCount
End Sub
```

### Python

Comparitively, if we were to use python + pandas + openpyxl for the same operation which may appear significantly more intuitive and direct.

```python
# import library dependencies
import pandas as pd
from pathlib import Path

# create file path accessor
worksheet = Path("../data/Spreadsheet.xlsx")

# read file contents into a DataFrame
df = pd.read_excel(worksheet)

# declare keep vs delete columns (for multiple approach)
keep_columns = ["Do", "Not", "Remove"]
delete_columns = ["Delete", "These"]

# SELECT what you want to keep rather than EXCLUDING what you don't want
new_df = df[keep_columns]

# OR feel free to REMOVE the columns via 'list comprehension'
# "Have it YOUR way"
[df.pop(c) for c in delete_columns]
```

The choice to extend the python code portion to cover multiple approaches was done so that the script length would be comparable to the VBA version for similar operations and it's still shorter.

## A Modest Proposal

If you're still not convinced, please follow along with this series. In the following entries, I will demonstrate common applications for a full fledged Swiss army data science and analysis toolkit in daily business operations.

In [None]:
# Our First Code Block!
# a simple, but effective ritual required of any introduction

print('Hello World!')