## Prerequisites

  * A Google Cloud project with billing enabled.
  * The Vertex AI API and Dataproc API enabled in your project.
  * You have the necessary IAM permissions to create and manage Vertex AI Workbench instances.


## UI: Vertex AI Workbench set up

### 1\. Navigating to the Vertex AI Workbench service in the Google Cloud Console

1.  **Sign in to Google Cloud Console:** Go to [console.cloud.google.com](https://console.cloud.google.com/) and log in with your Google account.
2.  **Select your Project:** Ensure you have selected the correct Google Cloud project from the project selector dropdown at the top of the console.
3.  **Navigate to Vertex AI Workbench:**
      * In the Google Cloud Console, use the **Navigation menu** (three horizontal lines, usually on the top left).
      * Scroll down and select **Vertex AI**.
      * Under the "Workbench" section in the left-hand navigation, click **Instances**.

 
### 2\. Creating a Managed Notebook Instance with a Spark Runtime Environment

As of recent updates, the "Managed Notebooks" and "User-Managed Notebooks" have largely converged into "Vertex AI Workbench Instances" with options for Dataproc Serverless integration. We'll focus on creating one of these instances with Spark capabilities.

1.  **Click "Create New":** On the Vertex AI Workbench Instances page, click the **"Create new"** button.
2.  **Configure Instance Details:**
      * **Name:** Provide a unique name for your instance (e.g., `spark-workbench`).
      * **Region and Zone:** Choose a region and zone that is geographically close to you for better performance (e.g., `europe-west1` and `europe-west1-b`).
        ![image.png](attachment:image.png)
      * **Environment:**
          * **Environment:** Typically, you'll choose a pre-configured environment like "TensorFlow Enterprise," "PyTorch," or a general "Python" environment. Many of these will come with PySpark and Dataproc Serverless integration.
          * **JupyterLab version:** You can choose the latest version, including JupyterLab 4 (Preview) if available, for the newest features.
            ![image-2.png](attachment:image-2.png)
      * **Machine configuration:**
          * **Machine type:** Select a machine type. For Spark work, something like an `n1-standard-4` (4 vCPUs, 15 GB memory) or larger is a good starting point, but you can always adjust this later if your local computations are resource-intensive.
          * **GPUs:** You likely don't need GPUs specifically for *Spark* processing itself, but if you plan to do deep learning on the Workbench VM, you might add them.  ![image-3.png](attachment:image-3.png)
      * **Disk:** Adjust the boot disk size if you expect to store a lot of data directly on the instance. ![image-4.png](attachment:image-4.png)


  
3.  **Enable Dataproc Serverless Integration:**
      * **Crucially for Spark**, look for an option like "**Enable Dataproc Serverless Interactive Sessions**" or similar, usually under "Advanced options" or directly within the main creation dialog. **Ensure this checkbox is selected.** This is what allows your PySpark kernel to leverage Dataproc Serverless for distributed processing.
4.  **Network and Security (Optional but important):**
      * You can configure network settings (VPC network, subnet) and security options (root access, idle shutdown). For most practical purposes, the defaults are fine, but for production or specific security requirements, you'd customize these. **Idle shutdown is a good way to save costs\!**
5.  **Click "Create":** Review your settings and click the **"Create"** button.

The instance creation process will take a few minutes. You'll see its status change from "Provisioning" to "Starting" to "Running."




## Exploring the JupyterLab Interface within Workbench

1.  **Open JupyterLab:** Once your instance status is "Running," click the **"Open JupyterLab"** link next to your instance name on the Vertex AI Workbench Instances page. This will open a new browser tab with the JupyterLab interface.

2.  **Familiarize Yourself with JupyterLab:**

      * **File Browser (Left Sidebar):** This is where you navigate your files and folders on the instance's persistent disk. You can create new notebooks, folders, text files, and upload files from your local machine.
      * **Launcher:** When you first open JupyterLab, you'll often see the "Launcher" tab. This allows you to quickly create new notebooks with different kernels, open a terminal, or start other activities.
      * **Notebook Area (Main Content):** This is where your Jupyter notebooks open and where you write and execute code.
      * **Kernels:** In the top right of an open notebook, you'll see the active kernel (e.g., "Python 3"). You can change the kernel by going to `Kernel` -\> `Change Kernel` in the menu.
      * **Terminal:** You can open a terminal (File -\> New -\> Terminal) to run shell commands, install additional Python packages (using `pip` or `conda`), interact with `gcloud` CLI, or clone Git repositories.
      * **Extensions:** JupyterLab has many extensions. You might find extensions for Cloud Storage Browse, BigQuery integration, and even Dataproc Serverless session management directly within the UI.
