# Tutorial: create and run your own Jupyter notebook with R 

This tutorial walks you through the process of using Azure Notebooks to create a complete Jupyter notebook. In the course of this tutorial, you familiarize yourself with the Jupyter notebook user interface, which includes creating different cells, running cells, and presenting the notebook. It is adapted from an azure guide that focused in python (we will use R).

The tutorial begins with a new project and an empty notebook so you can experience creating it step by step.

## Create the project

1. Go to [Azure Notebooks](https://notebooks.azure.com) and sign in.

1. From your public profile page, select **My Projects** at the top of the page:

    ![My Projects link on the top of the browser window](media/my-projects-link.png)

1. On the **My Projects** page, select **+ New Project** (keyboard shortcut: n); the button may appear only as **+** if the browser window is narrow:

    ![New Project command on My Projects page](media/new-project-command.png)

1. In the **Create New Project** popup that appears, enter or set the following details, then select **Create**:

    - **Project name**: Linear Regression Example - Cricket Chirps
    - **Project ID**: linear-regression-example
    - **Public project**: (cleared)
    - **Create a README.md**: (cleared)

1. After a few moments, Azure Notebooks navigates you to the new project.


## Upload the data file

1. On your project dashboard in Azure Notebooks, select **Upload** > **From URL**

1. In the popup, enter the following URL in **File URL** and *cricket_chirps.csv* in **File Name**, then select **Done**.

    ```url
    https://raw.githubusercontent.com/Microsoft/AzureNotebooks/master/Samples/Linear%20Regression%20-%20Cricket%20Chirps/cricket_chirps.csv
    ```

1. The *cricket_chirps.csv* file should now appear in your project's file list:

    ![Newly created CSV file showing in the project file list](media/csv-file-in-project.png)

1. (Note you can also upload any files from your computer, through **Upload** > **From Computer**)

## Create and run a notebook

Now create and open the notebook.

1. On the project dashboard, select **+ New** > **Notebook**.
1. In the popup, enter *Linear Regression Example - Cricket Chirps.ipynb* for **Item Name**, choose **R** for the language, then select **New**.
1. After the new notebook appears on the file list, select it to start the notebook. A new browser tab opens automatically.
1. The notebook opens in the Jupyter interface with a single empty code cell as the default.


## Create a Markdown cell

1. Click into the first empty cell shown on the notebook canvas. By default, a cell is a **Code** type, which means it's designed to contain runnable code for the selected kernel (which is R for us). The current type is shown in the type drop-down on the toolbar:

    ![Cell type toolbar drop-down](media/tutorial-cell-type-drop-down.png)

1. Change the cell type to **Markdown** using the toolbar drop-down; alternately, use the **Cell** > **Cell Type** > **Markdown** menu command:

    ![Cell type menu command](media/tutorial-cell-type-menu.png)

1. Click into the cell to start editing, then enter the following Markdown:


    ```markdown
    # Example of Linear Regression

    This notebook loads some data, does a plot, calculates a correlation coefficient and fits a linear regression.
    The data, obtained from
    [college.cengage.com](https://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/slr/frames/frame.html),
    relates the rate of cricket chirps to temperature from *The Song of Insects*, by Dr. G. W. Pierce, Harvard College Press.

    In this example we're looking at the relationship between the count of chirps per minute and temperature.

    A useful aspect of Notebooks is that you can use Markdown cells to explain what the code is doing rather than code comments.
    There are several benefits to doing so:

    - Markdown allows for richer text formatting, like *italics*, **bold**, `inline code`, hyperlinks, and headers.
    - Markdown cells automatically word wrap whereas code cells do not. Code comments typically use explicit line breaks for formatting, but that's not necessary in Markdown.
    - Using Markdown cells makes it easier to run the Notebook as a slide show.
    - Markdown cells help you remove lengthy comments from the code, making the code easier to scan.

    When you run a code cell, Jupyter executes the code; when you run a Markdown cell, Jupyter renders all the formatting into text that's suitable for presentation.
    ```

1. To make the Markdown look nice (render it as HTML for the browser) select the **Run** command on the toolbar, or use the **Cell** > **Run Cells** command. The Markdown code for formatting and links now appear as you expect them to in a browser.

1. To edit the Markdown again, double-click in the rendered cell. To render HTML again after making changes, run the cell.

## Create a code cell with commands

As the previous Markdown cell explained, you can include commands directly in the notebook. You can use almost any commands from R, and more besides. (Jupyter notebooks run within a Linux virtual machine, so you have the full Linux command set to work with should you wish.)

1. Enter the commands below in the code cell that appeared after you used **Run** on the previous Markdown cell. If you don't see a new cell, create one with **Insert** > **Insert Cell Below** or use the **+** button on the toolbar.

    ```r
    mydta <- read.csv("cricket_chirps.csv")

    head(mydta)
    ```

1. Before running the cell, create a new cell with the **+** button on the toolbar, set it to Markdown, and enter the following explanation:

    ```
    Note that when you run a code block it make take the notebooks a little time to complete the task. To the left of the code block you see `In [*]` to indicate that execution is happening. The Notebook's kernel on the upper right also shows a filled-in circle to indicate "busy."
    ```

1. Select the **Cell** > **Run All** command to run all the cells in the notebook. Notice that the Markdown cells render as HTML, and the command run in the kernel, and observe the kernel indicator as described in the Markdown itself:

## Create the remaining cells
To populate the rest of the notebook, you next create a series of Markdown and code cells. For each cell listed below, first create the new cell, then set the type, then paste in the content.

Although you can wait to run the notebook after you've created each cell, it's interesting to run each cell as you create it. Not all cells show output; if you don't see any errors, assume the cell ran normally.

Each code cell depends on the code that's been run in previous cells, and if you neglect to run one of the cells, later cells may produce errors. If you find that you've forgotten to run a cell, try using the **Cell** > **Run All Above** before running the current cell.

If you see unexpected results (which you probably will!), check that each cell is set to "Code" or "Markdown" as necessary. For example, an "Invalid syntax" error typically occurs when you've entered Markdown into Code cell.

1. Markdown cell:

    ```
    ## Plot the data

    We first run code to create a scatter plot. This will be saved as a png file, that can be used in other software.
    ```

1. Code cell; when run, this saves a png file with the chart.

    ```r
    png("myplot.png")
    plot(mydta)
    dev.off()
    ```

1. Markdown cell; when run, this shows the chart.

    ```
    The chart is shown below.
    ![scatter chart](myplot.png).

    ## Correlation

    What is Pearson's estimated correlation coefficient?

    ```

1. Code cell; when run, this cell gives R output for Pearson correlation.

    ```r
    cor.test(mydta[,1], mydta[,2])
    ```

1. Markdown cell:
    ```
    ## Linear regression

    We can fit a linear model to the data. This assumes that a line that describes the relationship between the independent (cricket chirps) and the dependent (temperature) variables. With a simple data set like we're using here, you can visualize the line on a simple x-y plot: the x-axis is the independent variable (chirp count in this example), and the y-axis is the independent variable (temperature). Fitting the data means plotting all the points in the training set, then drawing the best-fit line through that data.

    The regressor's `fit` method here creates the line, which algebraically is of the form `y =  a + b*x`, where b is the coefficient or slope of the line, and a is the intercept of the line at x=0.
    ```

1. Code cell; when run, this cell shows the output of a linear regression in R

    ```r
    mylm<-lm(Temperature~Chirps, mydta)

    summary(mylm)
    ```

## Clear outputs and rerun all cells


After following the steps in the previous section to populate the entire notebook, you've created both a piece of running code in the context of a full tutorial on linear regression. This direct combination of code and text is one of the great advantages of notebooks!

Try rerunning the whole notebook now:

1. Clear all the kernel's session data and all cell output by selecting **Kernel** > **Restart & Clear Output**. This command is always a good one to run when you've completed a notebook, just to make sure that you haven't created any strange dependencies between code cells.

1. Rerun the notebook using **Cell** > **Run All**. Notice the kernel indicator is filled in while code is running.

1. If you have any code that runs for too long or otherwise gets stuck, you can stop the kernel by using the **Kernel** > **Interrupt** command.

1. Scroll through the notebook to examine the results. (If again the plot doesn't appear, rerun that cell.)

## Save, halt, and close the notebook

During the time you're editing a notebook, you can save its current state with the **File** > **Save and Checkpoint** command or the save button on the toolbar. A "checkpoint" creates a snapshot that you can revert to at any time during the session. Checkpoints allow you to make a series of experimental changes, and if those changes don't work, you can just revert to a checkpoint using the **File** > **Revert to Checkpoint** command. An alternate approach is to create extra cells and comment out any code that you don't want to run; either way works.

You can also use the **File** > **Make a Copy** command at any time to make a copy of the current state of the notebook into a new file in your project. That copy opens in a new browser tab automatically.

When you're done with a notebook, use the **File** > **Close and halt** command, which closes the notebook and shuts down the kernel that's been running it. Azure Notebooks then closes the browser tab automatically.


