# Jupyter Notebooks

Welcome to Jupyter Notebooks ("notebooks" for-short).  A notebook is a document that contains text and code.  How to they compare to Excel worksheets?

Excel worksheets are a series of _cells_.  Each cell can contain _either_:
  * Data, or
  * Computation (code)

and cells can be organised in two dimensions.

Jupyter Notebooks are a series of _cells_.  Each cell can contain _either_:
  * Text, or
  * Computation (code)

and cells can be organised in one dimension.   You can't store data in a jupyter text cell, you need to use a variable in a computation cell for that.

## Python

The code cells are all written in a language called Python.  Notebooks can actually support many different languages, but they are almost always used (and we will always use) Python.  Python and Jupyter are separate tools that combine to make the notebooks we use here.  When you want to know something about document flow, you learn about Jupyter.  When you want to know something about how to write what goes in code block, you learn about Python.


# Variables

In Excel you can refer to a piece of data by it's cell address.

In Jupyter Notebooks you need to create a _variable_ in a code cell.  Variables have a _name_ and a _value_, just like data cells in Excel, but you have to create them yourself.

![An example excel table](imgs/small_table.png)

For example, we need a code cell to replicate the Excel table above, it looks like...

In [1]:
A1 = 5
A2 = 6
A3 = 7
B1 = A1 + 1
B2 = A2 + 1
B3 = A3 + 1

# Running Notebooks
A Juptyer cell can be _run_ or _waiting to be run_.  A cell won't automatically run.  Hit the play button next to the cell above to have it run.  You will see a "tick" mark appear to show you it has run.  The first time you run a cell, you might be asked to choose what version of Python you want to use.  We recommend the default.

In Excel, all "code" is run as-needed.  You don't need to think about it.  In Jupyter, you have complete control (and complete responsiblity).  You will find buttons for:
  * "Run All" (at the top of the notebook) which runs every code cell in the notebook
  * "Execute Cell" (to the left of each cell) which will run only this cell.
  * "Execute Above" (on the top right of each cell) which will run every cell above this one (but not this one).  We use this when we need to make sure any variables this cell uses are up to date
  * "Execute Cell and Below" (on the top right of each cell) which runs this cell to get its new output and runs all the cells below since they might have used a variable from this cell that has a new value.

On a text cell you have a "tick" icon to render the cell instead of run it, but if you "Run All" every text cell will get rendered.  Rendering a text cell is the equivalent of running a code cell.


## Viewing the computed values in the document

Is simply a matter of printing them out with a code cell, for example.

In [2]:
print(B1)
print(B2)
print(B3)

6
7
8


When such a cell is run, it's output is given just below.  Run cell `[2]` above to see it's output.


In [3]:
A1

5

# Exercise

Create three variables,`C1`, `C2`, and `C3` and fill them with twice the values of `B1`, `B2`, and `B3` respectively.  Print out those variables and see that the values you get are the same as those shown in the "Variables" tab.

In [4]:
print("put solution to exercise here")

put solution to exercise here



# Functions

Without noticing, we just used our first function.  `print` is a Python function.  Functions take in data (which we put in parenthesis) and often give us data back, which we can put in a variable if we like.  If a function that gives data back is the last line of a code cell, the value we get back will be printed to the notebook for us.   You can think of functions as working a lot like commands to in Excel.  You will even find equivalents of most of your favourite Excel commands available in Python.

When we use a function we say "we are _calling_ the function" and "the funtion was _called_".

When we get values back from a function, we say "the function _returned_ a value".

When we give data to a function we say "we pass _parameters_ to the function" - they appear in parenthesis.

# Methods

Some functions are _attached_ to other values.  We call these functions "methods".  The extra terminology is useful because methods are said to "run _on_ the value they are attached to" and often modify it or build new versions of it.

# Exercise

Use some of the [built in python functions](https://docs.python.org/3/library/functions.html) to complete the following code cell

In [5]:
v = -6
s = "this is a string"
# print out double of whatever is stored in the variable `v`
print(v)
# print out the absolute value of whatever is stored in the variable `v`
print(v)
# print out the the number of characters in the variable `s`
print(s)
# print out the type of the values in both `v` and `s`
print(v)
print(s)
# print out the absolute value of whatever is stored in the variable `v`
print(v)

-6
-6
this is a string
-6
this is a string
-6


## Defining Functions

As a bonus peek into the future, here is how you define your own funtion.  This function `pnt` will do the same thing as `print` but it will not move to the next line for the next `print`.  This allows you to print multiple things on one line without any spaces between them

In [6]:
def pnt(str):
    print(str, end="")

# Execution Order

Python code is always executed from top to bottom within a single cell.  We can manually run cells themselves "out of order" and the last value put inside a variable is left in there until the notebook is restarted.  Consider the following three cells.  The first will set some variables, the second will print their values and the last will set those same variables to other values.  Can you work out how to make the second cell output "wrong" without making any changes to the code?  Hint: you simply need to execute the cells in the right order.

Note!  You will need the special function we wrote above (`pnt`) so make sure you execute that cell.  Even though it produces no output, it does update the state of the document.

In [7]:
foo = "r"
bar = "i"
bat = "g"
boo = "h"
bip = "t"

In [8]:
pnt(foo)
pnt(bar)
pnt(bat)
pnt(boo)
print(bip)

right


In [9]:
foo = "w"
bar = "r"
bat = "o"
boo = "n"
bip = "g"

# Give me my tables back!

This is all quite a hassle, we need a way of getting the idea of a "table" back please!

In Jupyter, these are called "DataFrames" and they come from the `pandas` library.  [The next notebook](1-data_frames.ipynb) will discuss dataframes.

# Concept Summary
  * Excel is one program where as notebooks are a programming language (Python) and a tool (Jupyter Notebooks).
  * We program a notebook using python
  * Data in Excel is stored in "cells"
  * Data in Python is stored in variables
  * We choose the name of our variables

# Python Concepts
  * Python code executes from top to bottom, one line at a time
  * We can bundle Python code into funtions.
  * There are libraries of pre-bundled code for us we can call through functions

# Further Reading
  * [All the details about running Jupyter Notebooks in VSCode as we do here](https://code.visualstudio.com/docs/datascience/jupyter-notebooks)
  * If you would like to learn more about Python, keep working on [SoloLearn](https://www.sololearn.com), there are other courses for extending your knowledge.