## Introduction

These relatively brief notes contain an overview of the Jupyter Notebook environment.

## Overview of Jupyter Notebook

Jupyter Notebook is a browser-based coding environment, used extensively for prototyping and interactive development in data science applications.  Jupyter Notebook is an evolution of an older project called the IPython Noteboook (this is the origin of the notebook file extension ".ipynb"), and while (as the name suggests) Jupyter Notebook supports languages other than Python, at the current time Python is by far the most common language for these notebook.  General information about Jupyter Notebook and related projects can be found at the [Jupyter project page](http://jupyter.org).

The central unit within a Jupyter Notebook are "cells".  These cells can be either contain code or Markdown (a simple formatting language, which can also include things like LaTeX equations).  The dropdown menu at the top of the screen indicates the type of the current cell.  

Code cells can be executed by pressing the <i class="fa fa-step-forward"></i> button at the top of the notebook, or more commonly via the commands Shift-Enter (execute and move to the next cell) or Control-Enter (execute and stay on that cell).  All Python code is executed in a single running Python environment, called the "Kernel" in Jupyter notebook.  Variables are shared across all cells, and the code is executed in the order in which the cells are run (not necessarily sequential in the notebook), which can get your notebook into rather confusing states if you don't always execute cells in order.

Let's look at a a few examples.

In [1]:
1+4

5

In [3]:
a = 1.0
b = 2.0

In [4]:
print(a)
b 

1.0


2.0

Any `print` statements will print to the output section for the cell, and the output will also contain the string representation of the object returned by the last line in the cell.  Thus, in the above setting where the variable `b` appears on the last line of the cell, its content is printed at the end of the cell output, following any previous print statements that happen before.

Any Python code will be valid in these cells, so we can import external libraries, define classes, and functions, etc.

### Efficient navigation in the notebook

Because you'll be spending a lot of timing writing code in the notebook, it helps to become at least a little bit familiar with the features of the editor.  While the notebook in many ways makes a poor actual code editing environment (compared to more fully featured editors ranging from Sublime, vi, Emacs, Atom, etc), it's not that bad if you familiarize yourself with some of the keyboard shortcuts.

You can look up all the keyboard shortcuts (in addition to adding your own) in the "Help -> Keyboard Shortcuts" menu, but some of the more common ones I use are the following.  First, though, it is important to distinguish between the "edit mode" (when directly editing a cell) and "command mode" (navigating over cells).  You can switch between these using "ESC" or "Ctrl-M" (switch to command mode) and "Enter" (switch to edit more).

Common keyboard shortcuts in command mode:
- "x": delete a cell (though be careful, because you can only undo the deletion of one cell)
- "a": insert new cell above
- "b": insert new cell below
- "m": convert cell to Markdown
- "y": convert cell to code
- Up/down: navigate up and down over cells
- Shift-Enter, Ctrl-Enter: execute cell (also works in edit mode)

Common shortcuts in edit mode:
- Ctrl-up/Ctrl-down: move to start/end of cell
- Tab: indent selected area (when text is selected) or autocomplete (when editing in the middle of a line)
- Shift-Tab: unindent selected area (when text is selected), get function help (when following a function name)
- Command-/: toggle comment/uncomment selected region (American keyboard layout)

The best way to familiarize yourself with these commands is to try to use the notebook entirely without using the mouse.  In practice, you'll often rely on the mouse for some things (for instance, I prefer to have commands like restarting the kernel require a mouse click), but trying to avoid it for some time will get you familiar enough with the common commands to be fairly productive in the Notebook editing environment.

### Getting help

This was touched upon in the previous list of commands, but is important enough to warrant further discussion.  One of the nicer elements of the notebook editor is it's built-in support for code autocompletion and function lookup.  After writing some portion of code, you can press the "tab" key to autocomplete: if there is only one viable completion of the word you are currently typing, it will simply complete the word.  If there are multiple viable predictions, then it will open a popup menu where you can view all possible completions.  The latter is useful for quickly browsing through all functions or variables in a library/object.

The second useful command is the inline help popup.  After typing a function name, you can get help for that function by pressing "Shift-Tab".  This will bring up a popup with a brief description of the function.  If you want additional help, it may be available for some functions by clicking the <i class="fa fa-chevron-up"></i> button in the top right of the help popup.

## Python-specific features for AAA course

In the remainder of this notebook (which as you go through it, should serve both as an illustration of some concepts in the Notebook, and a brief highlight of some of the relevant Python concepts we'll frequently use throughout this course.  Importantly, these notes are not meant to be a general tutorial on Python, which you are already expected to be familiar with.  If any of the basic Python concepts here seem difficult, we recommend you consult a standard [Python Tutorial](https://docs.python.org/3/tutorial/) or use the following books: [Python Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook), [Data Science from Scratch](http://math.ecnu.edu.cn/~lfzhou/seminar/[Joel_Grus]_Data_Science_from_Scratch_First_Princ.pdf)

The most common "built-in" data types you will interact with when doing data science work are lists and dictionaries (there are of course additional types like Numpy Arrays, Pandas Dataframes, and others, but these are provided by external libraries).  It's good to have a brief understanding of how to use these data structures effectively.

### Strings

Strings can be delimited by single or double quotation marks (quotes have to match):

In [5]:
A = 'data science and'
B = " machine learning"
print(A, B)

data science and  machine learning


In the following you will find some methods for strings:
- mathematical operations can be applied for strings but they have different function. Only sum `+` and multiply `*` can be used for strings.
- note that the "+" operator concatenates two strings without a space in between, whereas using the print function and a comma in between several strings does add a space (see above).

In [6]:
print (A + B)
print (B*5)


data science and machine learning
 machine learning machine learning machine learning machine learning machine learning


In [8]:
# Capitalizing the first charachter of a string
A.capitalize()



'Data science and'

In [9]:
# Lower case und upper case for the entire string
print(A.lower())
print(A.upper())

data science and
DATA SCIENCE AND


In [10]:
# Counting the number of substring (which can be one or a number of charachters) within the main string 
B.count("i")

2

In [46]:
# Spliting the string. It returns a list of words in the string.
A.split(sep = None)

['data', 'science', 'and']

In [7]:
# Finding position of a substring within the variable.
# Note: "a" is the second letter in "Machine Learning", so would normally appear at index 1 (as counting starts at 0). 
# Here, however, an additional space is at the beginning of B (see above). Hence, the index is 2.
B.find("a")

2

### Lists

Lists are denoted by items within square brackets.

In [11]:
list1 = ["a", "b", "c", "d"]
list2 = [1, 2, 3, "4", [1,2]]

Items within a list can be accessed via several types of indexing: indexing to a given element, negative indexing (backward from the end of a list), and slice-based indexing that returns subsets of the list.

In [12]:
print(list1[0]) # first element - index 0 - here: a
print(list1[-1]) # last element - index -1 - here d
print(list1[1:3]) # part of the list from index 1 (second element, inclusive) to index 2 (exclusive). This resulte in just the second element - index 1 - being printed.
print(list1[0:4:2]) # part of the list from index 0 (inclusive) to index 4 (exclusive), but only every 2nd element. Here: a (index 0), c (index 2) 
for i in list1:
    print(i)

a
d
['b', 'c']
['a', 'c']
a
b
c
d


### List comprehensions

One slightly less well-known construct that I use extremely frequently (so it may come up in both sample code and homework code), is the list comprehension.  Briefly, this a method for constructing a list by iterating over another list.  Let's suppose we wanted to create a `list3` object that included every element from `list1`, but with an underscore after each string.  We could do it by explicitly constructing the list and adding each element, like so.

In [13]:
list3 = []
for x in list1:
    list3.append(x + "_")
list3

['a_', 'b_', 'c_', 'd_']

However, this quickly gets verbose, if we want to create several new lists this way.  We could get this same result through a list comprehension, which has the syntax `[some_expression(item) for item in list]`, and returns a new list by applying `some_expression` (not necessarily an actual function, just some expression that involves `item`) to each element of the list.

In [14]:
list33 = [x + "_" for x in list1]
list33

['a_', 'b_', 'c_', 'd_']

#### List methods
In the following you can see some example of methods for lists.

In [15]:
#Adding a new value to the list
list1.append("e")
list1

['a', 'b', 'c', 'd', 'e']

In [16]:
# returns first index of this value within the list
#pay attention that indexes in python starts from 0
list1.index("c")

2

In [17]:
# removes the item and shows the removed value as the output
# the argument refers to the index of the element to be removed
# without argument, the last element of the list is removed
list1.pop(4)

'e'

In [70]:
list1

['a', 'b', 'c', 'd', 'e']

In [18]:
# Removes the first occurence of the item
list1.remove('b')
list1

['a', 'c', 'd']

In [75]:
list4 = [20,15,36,-1,6]
# Sorting the values in the list
list4.sort(key = None, reverse = False)
list4

[-1, 6, 15, 20, 36]

### Tuples

Tuples are lists' immutable cousins. Pretty much any thing you can do to a list that doesn't involve modifying it, you can do to a tuple. A tuple can be specified using parentheses (or nothing) instead of square brackets:

In [19]:
Tuple_1 = (1,2,6,9,8,7,3)
Tuple_2 = 3,4,2,"a",6

In [22]:
# The try/except construct tries to execute a command (here, change an element of the tuple Tuple_1)
# If the execution fails due to a specific error (here TypeErro), the program throws an exception.
# The command after the exception is executed to show the programmer that the tried command failed.
# This way, the execution of the program can continue, even when an error occured.
try:
    Tuple_1[1] = 100
except TypeError:
        print("Cannot modify a tuple!")

Cannot modify a tuple!


In [20]:
# Counting the number of a value in the tuple
Tuple_1.count("a")

0

In [21]:
#Returns the first index of the value inside the parentheses
Tuple_2.index("a")

3

### Dictionaries

The next main built-in data type you'll use in data science is the dictionary.  Dictionaries are mappings from keys to values, where keys can be any "immutable" Python type (most commonly strings, numbers, booleans, tuples ... but importantly _not_ lists or other dictionaries), and values can be any python type (including lists or dictionaries).

Dictionaries can be created with curly brackets, like the following:

In [23]:
dict2 = {'1':34, 'v': 'as'}
dict1 = {"a":1, "b":2, "c":3}

And then elements are accessed by square brackets.

In [24]:
dict1['a']

1

In [25]:
dict1.keys()

dict_keys(['a', 'b', 'c'])

In [25]:
dict2.values()

dict_values([34, 'as'])

Unlike lists, dictionaries can't be indexed by slices, or negative indexing, or anything like that.  The keys in a dictionary are treated as _unordered_, so there is no notion to sequence in the items of a dictionary.  We can use the same notation to assign new elements to the dictionary.

In [26]:
dict1["d"] = 4

Note that we can make this call even though `dict["d"]` previously did not contain anything (and if we try to just execute this statement, it will throw an exception).  You can also check for a key belonging to a dictionary with the command:

In [27]:
"a" in dict1

True

Finally, there is the analogue to list comprehensions: dictionary comprehensions.  These are specified similar to a list comprehension, but are denoted by the syntax `{key(item) : value(item) for item in list}`.

In [28]:
{i:i**2 for i in range(10)}

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

### Classes

Although this section will not describe it in any detail, it will also be necessary to be familiar with classes in Python.  We will just mention a few notes about classes here, using this example:

In [20]:
class MyClass:
    def __init__(self, n):
        self.n = n
    
    def get_n(self):
        return self.n

a = MyClass(1)
a.get_n()

1

The first note is that all functions within a Python class take an instance of that class as the first argument, usually named `self` in implementations (but it can be named anything).  Class methods and static methods (see the discussion in the list below) behave differently, but we won't be using those features much in this class.  The second thing of some importance is that unlike other object-oriented languages, Python does not actually have a distinction between "public" and "private" variables.  Although here it seems like the "n" variable should be private (inaccessible outside the class), and only accessible via the method `get_n()`, we can just as easily access it directly.

In [21]:
a.n

1

This means that Python doesn't really respect the traditional object-oriented abstractions (they are more just rules of thumb, that can be overridden by any code that wants to).  Be careful of this fact if you're used to classes in more structured languages like C++ or Java.

### Control Flow

As in most programming languages, you can perform an action conditionally using `if/elif/else`.

The general structure of `if` statements is:

```
if conditional 1:
    statement 1
    
elif conditional 2:
    statement 2 
    
else : 
    statement 3```

In [87]:
if 1>2:
    message = "if only 1 were greater than two... "
elif 1>3:
    message = "elif stands for else if"
else:
    message = "when all else fails use else (if you want to)"
print(message)

when all else fails use else (if you want to)


Python also has while loop:

In [88]:
x = 0
while x<10:
    print (x, " is less than 10")
    x+=1

0  is less than 10
1  is less than 10
2  is less than 10
3  is less than 10
4  is less than 10
5  is less than 10
6  is less than 10
7  is less than 10
8  is less than 10
9  is less than 10


Also we can use `for` and `in` for the above example. Note that range(10) does include 10 values from 0 to 9, but not the digit 10 itself. The range() function is useful when you want a for loop to run for a fixed number of iterations: 

In [29]:
for i in range(10):
    print(i, " is less than 10")

0  is less than 10
1  is less than 10
2  is less than 10
3  is less than 10
4  is less than 10
5  is less than 10
6  is less than 10
7  is less than 10
8  is less than 10
9  is less than 10
