# Why use Jupyter Notebooks?

## 1. Prototyping Workflows

Jupyter Notebooks are extremely useful when you do not have a defined final process and are still in the prototyping phase of your scripted workflow. This is mainly thanks to the feature where code is written into independent cells, which can each execute independently from the rest of the code. This allows a Python user to quickly test a specific step in a sequential workflow without re-executing code from the beginning of the script.

Many Integrated Development Environments (IDEs) allow you to do this in several ways, but I’ve found Jupyter Notebook’s concept of a “code cell” to be the most intuitive approach for prototyping logic and sequential code.

## 2. Visualizing Pandas Dataframes

Pandas (Python Data Analysis Library) provides high-performing and easy-to-use data structures that allow you to work with large amounts of data extremely fast. The core data object is a Dataframe, which is essentially an in-memory table that allows powerful indexing operations.

Jupyter Notebook allows you to visualize these tables at any point in your notebook. This is extremely useful because you can view the state of your data (and the effect of all the actions your code is performing on your data) as each step of your logic executes. This capability reinforces the use of Jupyter Notebook in a prototyping workflow when you are attempting to confirm that your workflow is doing what it needs to do at each step of the way.

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.DataFrame(np.random.randn(50, 20))

In [3]:
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,-1.117332,0.245104,1.735087,0.423161,-0.076686,-1.207895,-0.03996,-1.15932,-0.948683,1.260157,-1.176725,0.287948,-1.043284,-1.406331,-0.54176,-1.010286,0.745454,-1.942218,0.443759,1.717936
1,-0.902458,-1.083397,-0.446138,-2.05306,-0.206858,-1.349132,-0.306244,1.085071,-2.102071,1.548899,-1.29629,-0.261002,0.029891,-0.353946,-0.630655,-2.07379,0.099171,-0.445244,-0.085338,1.404917
2,-0.292235,0.339557,-0.572186,0.604958,0.246008,-0.887378,-2.24459,0.163571,-0.203427,0.331327,0.017082,-2.68635,0.280029,1.940812,1.024429,0.849074,-1.767332,-0.855655,-1.173457,0.856602
3,0.839714,-0.532032,-1.084085,-0.705373,-0.231294,-1.976499,-0.231121,-1.515223,-1.191587,-0.38852,1.132192,0.33129,-0.236725,-1.652234,0.956964,2.031699,-1.597248,-2.43349,-0.422138,-1.109396
4,-0.682158,0.555097,-1.302506,0.794329,-1.108163,-1.725008,0.767457,0.076811,-1.757391,1.544285,-0.802115,-0.886394,-0.423693,-0.677775,1.459044,-0.393353,-1.088426,1.255134,-0.17413,1.575953
5,0.74491,-0.682966,-0.557394,-1.878702,-0.743584,0.991557,-2.055975,-0.026674,-0.098646,-2.012554,0.81648,-0.903573,2.788787,-0.026275,-0.255704,0.795129,-0.722173,0.536471,-0.143526,-2.208673
6,-1.062396,0.785216,-1.870075,-0.488419,-2.317813,-0.035194,1.764611,-0.964631,-0.504796,0.058558,1.432328,0.773008,1.508041,-0.673675,-0.501238,-0.692182,0.598952,0.014178,0.513849,0.773536
7,-0.655858,-1.159901,-0.167051,1.411258,0.768247,1.183202,0.456666,-1.087267,0.435837,0.680023,-0.590948,0.193141,0.474004,0.634106,0.474631,-0.443987,0.079263,0.938627,0.63866,-0.568577
8,0.315063,1.330623,-0.157259,0.836876,-0.238589,-0.117982,-0.100321,0.43294,0.213147,1.605562,1.018568,-0.321297,0.022205,0.508014,1.404787,0.097984,0.312863,-0.032928,0.538255,0.888724
9,0.062659,-2.033388,-1.622806,0.431904,-0.955607,0.093652,-0.829268,-0.207838,0.087856,0.880506,-0.105871,-1.137043,-0.05815,0.450034,1.706417,-0.517162,2.22359,-0.755476,0.741451,-1.303309


### So why are Pandas Dataframes such a big deal?

As a GIS user, the first foray into working with Python and GIS data management typically uses some mix of arcpy’s “CalculateField”, “SearchCursors”, and “UpdateCursors”. Most of the examples teach you to use these operations and they are all completely functional, but they suffer from the same process-intensive issue: they all need to iterate upon every record of your data to perform a data management operation.

In other words: Imagine that you are a director of a movie in production, and you find out that to change the lighting in a scene, you need to watch the movie from the very beginning… for every change. This would take forever!
Operating on a Pandas Dataframe solves for this with powerful indexing that allows effective querying and array-wide operations. You essentially find the specific scene of the movie that you need to fix, and skip to that scene. Once my GIS data analysis workflows started integrating Pandas Dataframes into heavy data operations, I saw exponential improvements in performance.
Visualizing these Dataframes and seeing the effects of my code in each dataset became a crucial component of working efficiently.

### Let's do a race!

#### Contestant 1: Arcpy's UpdateCursor

#### Contestant 2: Pandas Dataframe

## 3. Integration with ArcGIS using the API for Python!

The newest (and most exciting) reason is the integration of Jupyter Notebooks with the ArcGIS Platform. My two main production tools had long been the ArcGIS Platform and Jupyter Notebook. When Esri announced that the ArcGIS API for Python would provide support for geographic visualizations, organization administration, and even access to the most powerful analytical capabilities of the platform within Jupyter Notebooks, I literally could not stop smiling.