# How to use Jupyter Notebooks

Jupyter notebooks mix text ("markdown") with code boxes. You can write and run Python code in code boxes, and you can write and format text in markdown boxes.

To execute Python code in a code box, click on a code box and hit "Run", for example on the box below. 

In [1]:
print("Hit 'Run' to execute this Python code.")

Hit 'Run' to execute this Python code.


You can also use a keyboard shortcut to run the code in the box you are currently in. The command for that is

Shift+Enter.

## Getting a new empty box

You can always get additional boxes in a Jupyter Notebook. Hit ESC B to get a new box below the one you are currently in. (Or use ESC A to get a new box above.) Then hit "Enter" to get out of command mode.

(More info: ESC gets you into command mode, and then B means "make a new box below." Enter gets you out of command mode, and into the mode where you can type in the current box.)

## Code versus Markdown

You can see in the top menu whether the box you are currently in is "Code" or "Markdown". In that drop-down menu, you can switch the current box between Code and Markdown.

Or you can hit ESC M to make the current box Markdown. (Then remember to hit Enter to get out of command mode). ESC Y makes the current box Code. 

## Prettier Markdown

In Markdown mode, ```*surrounding text by a single star on either side*``` makes it *italic*. ```**Two stars**``` make it **boldface.**

Putting triple back quotes around text makes it look like ```code```.

When you put a single hash mark at the beginning of the line, the following text will be formatted as a big heading. With two hash marks, you get a slightly less big heading, and so on.



## More Jupyter Notebook tricks

... can be found here: https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/

# First steps in Python
You can type expressions of Python into a Jupyter Notebook, in one of the boxes that say "In [...]", and have Python evaluate them. Here is an example:

In [2]:
2+2

4

Here is an expression that is a string. Python knows it is a string because it is enclosed in double quotes, like "this". Single quotes, like 'this', also work.

In [3]:
"Hello world"

'Hello world'

Python knows about basic arithmetic expressions. It also uses "+" not only to add numbers, but also to paste strings together:

In [4]:
"Hello" + " " + "world"

'Hello world'

If you want to evaluate all the boxes in this notebook in one go, you can hit the "fast forward" button above to re-run all the code boxes. A dialog will come up asking you whether you want to proceed. It is safe to do so at this point.

## Variables

On your hard drive, you store data in files. Each file has a name by which you can retrieve the data. In programming languages, you also often need to store data, for example the result of some calculation that you intend to use in another calculation later. And again, you need to give names to the stored data, so you can retrieve it later. In a programming language, you make a *variable* in which you store data, and you give it a name by which you can retrieve the data. 

Try running the following box:

In [5]:
myvar = 2 + 3
myvar

5

The output you saw is the content of ```myvar```. Here it is again:

In [6]:
myvar

5

You can change the contents of a variable by again using "=" for assignment:

In [7]:
myvar = 7

You can access the contents of a variable through its name:

In [8]:
another_number_var = myvar * 3
another_number_var

21

We store the value 5 (the value of the expression "2+3") in myvar by typing ```myvar = 2+3```. We can then retrieve the stored data by its name: If we type the name of the variable, Python supplies the stored value. We can use the variable as a stand-in for its value: After we have stored the number 7 in ```myvar```, ```myvar * 3``` has the same value as ```7 * 3```.

When you update a file, you are storing a new value under its existing name. You can do the same with a variable. In the line ```myvar = 7``` we are storing a new value in the same variable that we had before, myvar.

You choose the names for the variables you use. What can you choose the name of a variable to be?

* Variable names can contain letters, numbers, underscore
* They must not start with a number.
* They must not be identical to one of the "reserved words" that Python has already defined.

*Warning*: There are some names that are already defined in Python that you can still use as a variable name, for example "sum" (which is a function that calculates the sum of multiple values). But if you re-define that as a variable name, you can't use it in its original sense anymore. That is, if you define ```sum = 2+3```, then you cannot access the function that calculates the sum of multiple values anymore: you have basically overwritten the previous value of the variable "sum". This is a very popular bug. 

You can also change a variable like this:

In [9]:
myvar = myvar + 2

Remember ```=``` is not a statement of equality, that would be nonsensical: Take what is in ```myvar```, add two to it, and store the result in ```myvar```. So, ```myvar + 2``` is 9, which is now being stored in ```myvar```, as you can check:

In [10]:
myvar

9

**Try it for yourself:** Make up a variable name of your choosing, and store in it the value of the expression ```2**4```. Inspect your variable to see what it contains. Then reduce its value by 1. Now make up a second variable, and set it to have the same value as the first one.

In [11]:
# You can type your code here.

## Data types: strings and numbers

We have encountered data of (at least) three data types so far:

* Integers
* Floating point numbers
* Strings

Different data types come with different operations. Integers and floating point numbers can be added, subtracted, divided, ... Strings can be concatenated, you can count letters in them, and so on. You can subtract one number from another:

In [12]:
123 - 5

118

But you cannot subtract strings. Remove the "#" at the beginning of the next line, then run, to get an error.

In [13]:
# "hello" - "world"

By the way: When something goes wrong, Python gives you an error message. Please read this message! It will help you figure out what went wrong. Here, it says that you cannot subtract a string from a string.

Especially in the beginning, you will see a lot of error messages. But don't worry, you will see fewer of them very soon.

Here is something that works for strings but not numbers: strings have a length, but numbers don't. 

In [14]:
len("hello world")

11

Remove the "#" at the beginning of the following line to get an error message:

In [15]:
# len(123)

There are built-in functions in Python that convert data from one type to another. ```int()``` converts to an integer. You can apply it to a string, if it actually contains an integer. Remember: "3" is a string because of the quotes, while 3 is a number. You can also apply ```int()``` to a floating point number. ```float()``` converts to a floating point number, and ```str()``` converts to a string:

In [16]:
int("123")

123

In [17]:
int(456.7)

456

Remove the "#" at the beginning of the following line to get an error message.

(Incidentally, # is for commenting in Python code: Python will ignore anything you type after # in any line.)

In [18]:
# int("hello")

In [19]:
float(12)

12.0

In [20]:
float("3.14")

3.14

In [21]:
str(3.14)

'3.14'

You can also ask Python what the type of a piece of data is., using the function ```type()```. The type of a variable is the type of its contents.

In [22]:
type(1)

int

In [23]:
type(1.1)

float

In [24]:
type("1")

str

In [25]:
type("hello world")

str

In [26]:
type(myvar)

int

## Strings

Strings in Python are arbitrary sequences of characters, enclosed in either "..." or '...'. (Most of the time, it doesn't matter if you use single or double quotes. Just make sure you use the same type of quotes at the beginning and the end.)

To make a string that runs over more than one line, use ```"""..."""``` That is, three double quotes, then your string, then three double quotes. (You can also use three single quotes on either side. But you have to use the same kinds of quotes.) For example, here is a string that holds the first two paragraphs of the Wikipedia entry on Python.


In [27]:
wikipedia_on_python = """Python is a general-purpose, high-level programming language
whose design philosophy emphasizes code readability.Python claims to combine "remarkable power with very 
clear syntax", and its standard library is large and comprehensive. 
Its use of indentation for block delimiters is unique among popular programming languages.
Python supports multiple programming paradigms, primarily but not limited to object-oriented, 
imperative and, to a lesser extent, functional programming styles. 
It features a fully dynamic type system and automatic memory management, similar to that of 
Scheme, Ruby, Perl, and Tcl. 
Like other dynamic languages, Python is often used as a scripting language, but is also 
used in a wide range of non-scripting contexts. Using third-party tools, Python code can 
be packaged into standalone executable programs. Python interpreters are available for many operating systems."""
len(wikipedia_on_python)

902

As we want to process natural language text, strings and string manipulations are going to be very important. Luckily for us, Python has a lot of built-in functionality for doing things with strings.

### Some built-in string functions

A note on notation: Some functions in Python are written like functions in mathematics: function name, then brackets, then arguments, for example
```len("hello")```

Other functions are written in a different format: First one argument, then a period, then the function name, then other arguments in parentheses, for example
```"hello".capitalize()```

For now, just know that there are these two formats, and know that you need to remember which function is written in which fashion.

Here are some useful string functions. Try them out to see what they do. 

In [28]:
"hippopotamus".count("p")

3

In [29]:
"KNIGHT".lower()

'knight'

In [30]:
"new".upper()

'NEW'

In [31]:
"new".capitalize()

'New'

In [32]:
"    a lot of spaces, then some text ".strip()

'a lot of spaces, then some text'

In [33]:
"armadillo".replace("mad", "happy")

'arhappyillo'

Also, as mentioned above, you can use "+" to concatenate strings.

Here is a string function that we will use a lot. It splits text on whitespace, returning something called a list, which we will discuss more later. We apply it to the first sentence of the Wikipedia page on Monty Python. As you can see, the result of ```split()``` is almost a separation of the sentence into words -- what does it get wrong, and why?

In [34]:
mystring = "Monty Python (sometimes known as The Pythons) was a British surreal comedy group who created their influential Monty Python's Flying Circus, a British television comedy sketch show that first aired on the BBC on 5 October 1969."
mystring.split()

['Monty',
 'Python',
 '(sometimes',
 'known',
 'as',
 'The',
 'Pythons)',
 'was',
 'a',
 'British',
 'surreal',
 'comedy',
 'group',
 'who',
 'created',
 'their',
 'influential',
 'Monty',
 "Python's",
 'Flying',
 'Circus,',
 'a',
 'British',
 'television',
 'comedy',
 'sketch',
 'show',
 'that',
 'first',
 'aired',
 'on',
 'the',
 'BBC',
 'on',
 '5',
 'October',
 '1969.']

Some string functions return either True or False. (This is another datatype, called a Boolean.) For example, "in" tests for substrings. 

In [35]:
"eros" in "rhinoceros"

True

In [36]:
"nose" in "rhinoceros"

False

Note that you can make the word "nose" from the letters of "rhinoceros", but that is not what "in" tests.

Here are two other functions that return True or False:

In [37]:
"truism".endswith("ism")

True

In [38]:
"inconsequential".startswith("pro")

False

You can find Python documentation for Python 3.x at https://docs.python.org/3/. The subpage that you will probably use most often is the Python Standard Library at https://docs.python.org/3/library/index.html, which documents builtin functions and standard available packages. String functions are described at https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str

**Try it for yourself:** Use the string functions above, and the documentation on the Standard Library page, to answer the following questions. Some of the functions you need may only be on the Standard Library page.

* Split the following sequence at each occurence of an "!": ``"ab!cde!!f!ghij!!!"``
* How many letters are in the word "jabberwocky"?
* Concatenate the following strings into a single sequence, making sure to also include a whitespace inbetween them: "hello", "world"
* Test whether all characters in the following strings are digits: "123456", "123.456"
* How many o's are there in "onomotapoeia"? And how many a's?

In [39]:
# type your coe here.

### Accessing parts of a string

You can use *indexing* to access individual letters or substrings of a string. 

In [40]:
"rhinoceros"[3]

'n'

In [41]:
"rhinoceros"[0]

'r'

The 4th character in "rhinoceros" is an "n", and the first is "r". Note that Python indices start at 0, 
so ```"rhinoceros"[0]``` gets you the first character. Also note that indices use straight brackets, not round.

The following carves out a *slice* of a string: 

In [42]:
"rhinoceros"[2:5]

'ino'

The slice starts at the third letter (index 2), and ends before the 6th letter.

What happens if you try to access a single character beyond the end of the string? What happens if you do the same with a slice? Try it. 

In the following line, remove the "#" to try the code.

In [43]:
# "art"[3]

In [44]:
"art"[1:4]

'rt'

Do you think negative indices are valid? Give them a try. (They are in fact valid. Can you figure out what they do?)

In [45]:
"art"[-1]

't'

In [46]:
"art"[-3:-1]

'ar'

You can omit one of the indices in the slice. The slice ```"polyphonic"[2:]```  will contain all letters from the 3rd one to the end, and ```"chimera"[:4]"``` has all letters up to and including the 4th. 

In [47]:
"polyphonic"[2:]

'lyphonic'

In [48]:
"chimera"[:4]

'chim'

**Try it for yourself:** What other words can you form out of the letters in "rhinoceros"? Find at least 3 words, and construct them in Python using indices to pick letters and using "+" to concatenate them. For example,
if ```r = "rhinoceros"``` then ```r[3:5] + r[-1] + r[-4]``` gets you "nose".

In [49]:
# type your code here.

Here is another neat trick: You can make the first index be bigger than the second, and then add a "-1" as a third number, like this:
        
        "rhinoceros"[9:5:-1]"
        
Then the sequence will be inverted. Here is an example:

In [50]:
"rhinoceros"[9:5:-1]

'sore'

Which is the same as:

In [51]:
"rhinoceros"[-1:-5:-1]

'sore'

If you omit both indices and just put the "-1" as the third number, the whole string gets reversed:

In [52]:
"rhinoceros"[::-1]

'soreconihr'

**Try it for yourself:** Try inverting a palindrome sentence, like "No lemon, no melon". Here are some others: https://examples.yourdictionary.com/palindrome-examples.html

Or just pick a long word and see if you can find good reverse words hidden in it, like "sore" in "rhinoceros".

In [53]:
# type your code here.

# Functions

You have seen builtin functions, like type(), int(), str(), and functions from packages, like math.sqrt(). In Python, you can also define functions yourself. This is useful when you need to do the same thing (or something very similar) again and again. Suppose that you are in a country where temperature is given in Celsius, but you are more familiar with Fahrenheit. Here is how you can figure out that 20 degrees Celsius are 68 degrees Fahrenheit, and 30 degrees Celsius are 86 degrees Fahrenheit:

In [54]:
print( (20 * 9/5) + 32)

68.0


In [55]:
print( (30 * 9/5) + 32)

86.0


If you need to convert temperatures very often, this gets tedious. Instead, you can define your own function by giving a name to a piece of code:

In [56]:
def celsius_2_fahrenheit(temp):
    fahrenheit = temp * 9/5 + 32
    return fahrenheit

In [57]:
# here is a variant without the variable "fahrenheit"
def celsius_2_fahrenheit_variant(temp):
    return (temp * 9/5) + 32

After that, you can use ```celsius_2_fahrenheit``` like any built-in function:

In [58]:
celsius_2_fahrenheit(10)

50.0

## Input and output

A function has an input and an output. You know that from a builtin function like ```len()```: It takes as input an object, for example a list. You put the input between the parentheses. And it returns an output, the length of the object. You can, for example, store that output in a variable:

In [59]:
mylist = ["a", "b", "c"]
somevariable = len(mylist)
print("the output of len that I stored is", somevariable)

the output of len that I stored is 3


The same is true for functions that you define yourself. You define the inputs in parentheses after the function name. The function name is ```celsius_2_fahrenheit```, and in parentheses after that you see ```(temp)```. What is ```temp```? It is just a variable name. When you call a pre-defined function, you give it an actual input value. When you define your own function, you just prepare a container, a variable name, that will store the input. In this case, that container is ```temp```.

The output of the function is what you see after the keyword ```return```: It is the value currently stored in ```fahrenheit```.

So when we call our self-defined function with a different input, for example 100 (that is, 100 degrees celsius), the output that we get is 212:

In [60]:
celsius_2_fahrenheit(100)

212.0

The variable ```temp``` in the function definition is used in much the same way as in function definitions in mathematics:

f(x) = x+1

There, you don't have to specify beforehand what x is. Rather, for each x, the function value is f(x) = x+1. In the same way, when you call your function saying celsius_2_fahrenheit(20), then at this time, Python stores the value 20 in the variable ```temp``` and executes the code of the function with ```temp``` set to 20.


## The components of a function definition

Let's take a look at all the bits and pieces of this function definition:

In [61]:
def celsius_2_fahrenheit(temp):
    fahrenheit = temp * 9/5 + 32
    return fahrenheit

* ```def``` is a reserved word. It tells Python that you are defining a function.

* The name of the function is ```celsius_2_fahrenheit```. This is a variable that you define, just this time it is a container that contains Python code, not a number or a string. The names of functions are subject to the same restrictions as other Python variable names.

* After the name of the function, you see the input to the function, also called its argument, in round brackets. This is a variable name: the container that we have prepared to take the actual input when the function is called.

* After the argument there is a colon. 
* Then come some indented lines, which define what the function does.

In [62]:
# compare to:
for i in ["a", "b" "c"]:
    print(i)
    

a
bc


So we have an overall shape of 

```keyword something:
    indented code```
    
which is the same as, for example, in for-loops and in if-conditions.

**Try it for yourself:**

* Define a function that converts degrees Fahrenheit to degrees Celsius (for the  benefit of Europeans like me). 

* Define a function that takes a string as input, then strips all punctuation symbols from the beginning and end and lowercases. For example, when the input is "Hello!" the output should be "hello", and when the input is "???WHAT???", the output should be "what". 

## Returning lists

The output, what you write after the keyword ```return```, can be a single value: a number, as in the Celsius-to-Fahrenheit example, or a string, as in the stripping-punctuation-and-lowercasing example. The output of a function can also be a list. Here is a function that takes a string as input, tokenizes it, lowercases, and removes punctuation:

In [63]:
import nltk
import string

def my_preprocess(inputstring):
    words = nltk.word_tokenize(inputstring)
    newwords = [w.lower() for w in words if w.strip(string.punctuation) != ""]
    return newwords

my_preprocess("What a lovely day!!")

['what', 'a', 'lovely', 'day']

**Try it for yourself:**

* Define a function that takes as input a string, splits it on whitespace, removes every word that starts with "http", and returns the remaining list of words. So if the input is ```"Go to https:\\abc.def.gh"```, the output should be ```["Go", "to"]```

* Define a function that takes as input a string of numbers separated by whitespace, for example ```"123 3.4  67.9"```. It should return a list with the numbers that were in the string, in our case ```[123, 3.4, 67.9]```.

## Functions with multiple arguments

A function can have more than one piece of input. Here is an example. This function repeats a string a given number of times:

In [64]:
def repeat_string(somestring, numtimes):
    return somestring * numtimes

repeat_string("hello", 5)

'hellohellohellohellohello'