# Week 1: Coding with data in Python

We start out with the basics. The exercises in this session cover:

* Writing Python code and Markdown in Jupyter notebooks
* Introductory Python
* Getting some data from Reddit

## Exercises

### Part 1: Know thy notebook

This document is what we call a *Jupyter notebook*. We will be using these extensively throughout the course so **read this very closely**. If you understand how notebooks work, you will save yourself lots of time and frustration throughout this course.

There are two basic things you need to know about Jupyter notebooks:

1. A notebook is nothing but a list of cells. A cell can either be a **code cell** or a **Markdown cell**. Code cells are for writing executable code, and Markdown cells (like this one) are for explaining things in text and making your notebook more readable. A typical workflow that you will soon get use to, is something like: solving a problem with some code in a *code cell* and explaining your reasoning or the results you obtained in a *Markdown cell*. You can toggle cell type when you are in *command mode* by pressing <kbd>y</kbd> for code and <kbd>m</kbd> for Markdown. **Try to do that**. Change this *Markdown* cell to a *code* cell, and change it back again. What happens if you execute (<kbd>shift</kbd>+<kbd>enter</kbd>) when this cell is a code cell, compared to when it is a Markdown cell?

2. The notebook has two *modes*: **edit mode** and **command mode**. You enter command mode by pressing <kbd>esc</kbd> or clicking outside a cell, and edit mode by clicking a cell and pressing <kbd>enter</kbd> or double clicking a cell. When you're in edit mode, the left border of the current cell turns green (not with `jupyter lab`, though, there the bar is always blue) and whatever you type into your keyboard goes into that cell, whether it is a code or Markdown cell. [Here](http://maxmelnick.com/2016/04/19/python-beginner-tips-and-tricks.html)'s a nice rundown of the different commands you can use. **Beware of <kbd>x</kbd> and <kbd>dd</kbd>**. Read the full list of hotkeys by pressing <kbd>h</kbd> in command mode to figure out why.

>*Heads up:* Because we'll be using Jupyter notebooks so much in this course, I strongly recommend investing 5 more minutes playing around with cell types, modes and hotkeys. It will save you heaps of time down the road. Above all, make sure you have read and understood these ^ two points!

When you run a code cell by pressing <kbd>shift</kbd> + <kbd>enter</kbd>, the code gets evaluated by the Python interpreter installed on your computer. The interpreter always returns some output, so unless you store it in a variable, it gets printed below the cell. In general, you will use code cells for doing analysis and working with data.

*Markdown* is a simple markup language for formatting text (like *HTML* or $\LaTeX$). You will typically use it for writing explanations about how you solve the exercises and the results you get, and styling your notebook with sections and subsections. It can do **bold**, *italics* and $\LaTeX$ formatting (for equations), and much much more. You can read about the Markdown language [here](http://daringfireball.net/projects/markdown/).

Below is your first exercise. The exercise are numbered by the convention `[session]`.`[section]`.`[problem]`.`[subproblem]`. For example, exercise 4.2.3.1 is in week 4, section 2, problem 3, and subproblem 1.

>**Ex. 1.0.1**: In the Markdown cell below, write a short text that shows that you can:
>* Create sections
>* Write words in bold and italics
>* Write an equation in LateX formatting
>* Create bullet lists
>* Create [hyperlinks](https://en.wikipedia.org/wiki/Hyperlink)

>*Hint: Remember to execute the cell (<kbd>shift</kbd>+<kbd>enter</kbd>) so the Markdown gets rendered.*

[Answer to Ex. 1.0.1]

### Part 2: Essential Python

These exercises take you through some very basic Python. Use them to calibrate your expectations: If you find them hard, you must spend some more time getting up to speed (see the preperation goals for today's session in the [canvas](https://canvas.disabroad.org/courses/7424) calendar).

>**Ex. 1.1.1**: Create a list `a` that contains the numbers from $0$ to $1110$ (including $0$ and $1110$), incremented by one, using the `range` function.

In [None]:
# [Answer to Ex. 1.1.1]

>**Ex. 1.1.2**: Show that you understand [slicing](http://stackoverflow.com/questions/509211/explain-pythons-slice-notation) in Python by extracting a list `b` with the numbers from $760$ to $769$ (including both) from the list created above.

In [None]:
# [Answer to Ex. 1.1.2]

>**Ex. 1.1.3**: Define a function that takes as input a number $x$ and outputs the number multiplied by itself plus three $f(x) = x(x+3)$. 

In [None]:
# [Answer to Ex. 1.1.3]

>**Ex. 1.1.4**: Apply this function to every element of the list `b` using a `for` loop and append the results to a new list `c`. Print `c`.

In [None]:
# [Answer to Ex. 1.1.4]

>**Ex. 1.1.5**: Do the exact same thing using a *list comprehension*.

In [None]:
# [Answer to Ex. 1.1.5]

>**Ex. 1.1.6**: Write the numbers in `c` to a text file with one number per line.

In [None]:
# [Answer to Ex. 1.1.6]

>**Ex. 1.1.7**: Show that you understand how strings work in Python. You should:
>
>1. Add a comment above each block of code that explains it.
>2. Find all the lines where **a string** is put into a string. How many are there?
>3. Explain the difference between `%d`, `%s` and `%f`.
>
>[Source](https://learnpythonthehardway.org/book/ex6.html)

In [3]:
# This is an example of a comment

x = "There are %d types of people." % 10
binary = "binary"
do_not = "don't"
y = "Those who know %s and those who %s." % (binary, do_not)

print(x)
print(y)

print("I said: %s." % x)
print("I also said: '%s'." % y)

hilarious = False
joke_evaluation = "Isn't that joke so funny?! %s"

print(joke_evaluation % hilarious)

w = "This is the left side of..."
e = "a string with a right side."

print(w + e)

popu = 5.840045
print("%f million people live in Denmark" % popu)
print("About %.4f million people live in Denmark" % popu)
print("About %.2f million people live in Denmark" % popu)

There are 10 types of people.
Those who know binary and those who don't.
I said: There are 10 types of people..
I also said: 'Those who know binary and those who don't.'.
Isn't that joke so funny?! False
This is the left side of...a string with a right side.
5.840045 million people live in Denmark
About 5.8400 million people live in Denmark
About 5.84 million people live in Denmark


[Answer to Ex. 1.1.7.2]

[Answer to Ex. 1.1.7.3]

>**Ex. 1.1.8**: Why does `5 // 2` output `2` in Python 3.x? What does `5 / 2` give?

In [None]:
# [Answer to Ex. 1.1.8]

>**Ex. 1.1.9**: Explain the point of using `try` and `except` statements? Write some code that shows how to use these.
>
> *Hint: You will do a lot of Googling in this course. If you don't already know how to use `try` and `except`, start Googling now.*

In [None]:
# [Answer to Ex. 1.1.9]

>**Ex 1.1.10**: `dict`s and `defaultdict`s.
1. What is a `defaultdict`? How would you say it is different from a normal Python `dict`?
2. Write some code that takes a list of tuples:
>
>        l = [("a", 1), ("b", 3), ("a", None), ("c", False), ("b", True), ("a", None)]
>
>     And produces a `defaultdict` object
>
>        defaultdict(<class 'list'>, {'a': [1, None, None], 'c': [False], 'b': [3, True]})
>
>*Hint: you can import `defaultdict` from `collections`. Your code should be a for loop that loops over the tuples in `l` and updates an initially empty defaultdict, iteration after iteration.*

In [None]:
# [Answer to Ex. 1.1.10]

>**Ex 1.1.11**: Take a list `a = list("welcometodatasciencewithpython")` and
1. count the number of times each element occurs using `Counter`,
2. report the two most common elements
>
>*Hint: you can import `Counter` from `collections`. `Counter` has a method called `most_common` can you can use.*

In [None]:
# [Answer to Ex. 1.1.11]

>**Ex 1.1.12**: Take another list `b = list("itsgoingtobeafuncourse")` and
1. get the `set` of characters that exist in both `a` and `b` (intersection),
2. get the `set` of characters that exist in either `a` or `b` (union), and
3. compute the [Jaccard similarity](https://en.wikipedia.org/wiki/Jaccard_index) between the distinct elements in `a` and `b`.
>
>*Hint: use the `set` function to get a `set`-type object of distinct elements from a list. Sets supports a [number of different operations](https://snakify.org/en/lessons/sets/#section_4).*

In [None]:
# [Answer to Ex. 1.1.12]

### Part 3: A little bit of real data

>**Ex. 1.2.1**: Learn about JSON by reading the **[wikipedia page](https://en.wikipedia.org/wiki/JSON)**. Then answer the following questions in the cell below. 
>
>1. What do the letters stand for?
>2. What is JSON?
>3. Why is JSON superior to XML? (... or why not?)

[Answers to Ex. 1.2.1.1-3]

>**Ex. 1.2.2**: Working with JSON files
>1. Use [`requests`](https://www.google.dk/search?q=python+requests+get+json&gws_rd=cr&ei=M5OdWaewD8Ti6AS54J24Bg), or another Python module, to fetch the **[data stored at this url](https://www.reddit.com/r/gameofthrones/.json)** in a new variable `data`.
>2. Show that `data` is a `dict` type object.

In [None]:
# [Answer to Ex. 1.2.2.1]

In [None]:
# [Answer to Ex. 1.2.2.2]

>**Ex. 1.2.3**: Let's try to inspect the data you retrieved. 
>
>1. Use the `json` module to print your data variable as a string with `indent=4`.
>2. Print the keys of `data`.
>
>*Hint: 1. Use the `json` function `dumps`. 2. Call `.keys()` on the variable.*

In [None]:
# [Answer to Ex. 1.2.3.1]

In [None]:
# [Answer to Ex. 1.2.3.2]

>**Ex. 1.2.4**: The URL reveals that the data is from reddit/r/gameofthrones, but can you recover that information from the data? Give your answer by 'keying' into the dictionary using square brackets.
>
>*Hint: 'Keying' is a word i just made up. By it, I mean the following. Consider a nested dictionary like:*
>
>        my_json_obj = {
>            'cats': {
>                'awesome': ['Missy'],
>                'useless': ['Kim', 'Frank', 'Sandy']
>            },
>            'dogs': {
>                'awesome': ['Finn', 'Dolores', 'Fido', 'Casper'],
>                'useless': []
>            }
>        }
>
>*I can get the list of useless cats by keying into `my_json_obj` like such:*
>
>        >>> my_json_obj['cats']['useless']
>        Out [ ]: ['Kim', 'Frank', 'Sandy']
>
>*`my_json_obj['cats']` returns the dictionary `{'awesome': ['Missy'], 'useless': ['Kim', 'Frank', 'Sandy']}` and getting '`useless`' from that eventually gives us `['Kim', 'Frank', 'Sandy']`. If any of those list items were a list of a dictionary themselves, we could have kept keying deeper into the structure.*

In [None]:
# [Answer to Ex. 1.2.4]

>**Ex 1.2.5**: Write two `for` loops (or list comprehensions) which:
>1. Count the number of spoilers.
>2. Only prints headlines that aren't spoilers.

In [None]:
# [Answer to Ex. 1.2.5.1]

In [None]:
# [Answer to Ex. 1.2.5.2]