# Innovating Journalism
## Practical Python exercise 1: Getting started with Jupyter Notebooks

*Damian Trilling and Penny Sheets*

This notebook is meant to show you what you can do with a Jupyter Notebook. Feel free to play around!

## Downloading this notebook from github

You can find this file on github. Github is a very popular platform to share code and it has a central role in open-source software development and in open science. People who want to make their data analyses transparent usually share there code here.

Usually, people *clone* or maybe download whole repositories (projects) from github (and, in fact, you can do so to if you want to), but if you just want to have a single file (such as this one), then you can get it as follows:

1. Click on the file to view it on Github.
2. Click on the "Raw" button in the upper right corner
3. Depending on your system, the file either downloads directly or you see the raw computer code behind this. That's fine, just click on "File/save as" in your browser.
4. It is *very important* that you do *not* choose 'HTML', 'web page' or something similar in Save-as-dialogue. Choose "All files (\*.\*)" or similar as file type, and make sure that the filename ends in ".ipynb".
5. Open the downloaded file in Jupyer Notebook on your computer.

## Cell types

There are different types of cells: Code cells and Markdown cells (there are two more, but they are not necessary). Markdown cells contain text, code cells, well, code. You can edit a cell by double-clicking on it. To 'run' the cell (in the case of markdown, to format it), press CTRL-Enter. 

Try it out!

If you want to know more about formatting with markdown, have a look at 
https://guides.github.com/features/mastering-markdown/

## To create a new cell
...You hit the 'plus sign' button up on the top left of jupyter notebook, at the toolbar.  By default, this is a code cell.  But you can change it to a markdown cell at the dropdown menu to the right of that same toolbar.  Try creating a new markdown cell below this.  Don't forget control+enter to format it.

# Note that there are various ways to format things; using hashtags allows for bigger, bolder fonts.

### More hashtags, smaller fonts, but still bigger and bolder than no-hashtags.

## Running Python code in Jupyter

Now we can start with some actual python commands, actual code, instead of markdown. 

Let's try to print something... don't forget to hit control+enter to run the command.

In [None]:
print('Hello world')

Now create your own print command in the next cell.  Print whatever you want.  The key is to make sure you format the command correctly - you need parentheses and quotation marks, and to be sure all are closed out afterward. Python helps you with this quite a bit (for example, look at how the colors change if you format something (in-)correctly), but, you have to practice.

In [1]:
print('boogers')

boogers


Python also allows us to do very simple calculations.  Just tell it the values and make it do the work:

In [None]:
a = 5
b = 10
c = a + b
print(c)

Once you run commands, you see that output appears.  You also see that a number next to the command line appears, which indicates the number of the step you've just taken during that particular session.  This is convenient to figure out if you've run a command or not, when it's a command that doesn't give output (like importing a dataset, for example).  But you can always clear what you've done and re-run various (or all) cells, by using the "cell" menu above. 

**Just note! that if you clear everything, then all imported data and modules (see next point) are also cleared.  So you can't just start running your commands in the middle of the notebook; you'll often have to go back to the earlier cells to start from scratch.  (Since the code is already written, this takes literally only seconds sometimes to get back to where you were.)


## Importing Modules

Because we want to do a lot more than printing words and running simple calculations, we can import modules that help us do fancier things -- and in particular, help us to read data easily.  Our main module in this course is called "pandas".  Whenever you import anything into Python, it needs to have a name.  You can give leave the original name - pandas - (by just typing `import pandas`) or shorten that name so you don't have to type it again and again and again.  So one shorthand that is commonly used is `pd` for pandas.  Try importing it now.

In [2]:
import pandas as pd

We'll explain a bit more about pandas in class, but, pandas basically allows us to work with data more easily.  So anytime you see a command that follows here that has 'pd' in the line of code, it means pandas is at work.

Here, pandas can help us read in a dataset from the web, just a random dataset that is often used to illustrate things like statistical programming. The command is simply `pd.read_` and then the type of the file, and its url.  this is a csv file, but python can also read many other types of files--we'll address more of those in a minute.

In this case, the dataset comes from the url listed here, and, because the dataset is originally called 'iris', we will also tell python to call it `iris`. 

As for the second line of code here, if we just type the name of the datset after having read it into jupyter, it also displays a bunch of the dataset for us to see.  This is handy, as long as you don't have insanely huge datasets.

In [3]:
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
iris

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
5,5.4,3.9,1.7,0.4,setosa
6,4.6,3.4,1.4,0.3,setosa
7,5.0,3.4,1.5,0.2,setosa
8,4.4,2.9,1.4,0.2,setosa
9,4.9,3.1,1.5,0.1,setosa


### Methods
In Python, basically everything is an "object". That's also why we assign names to them: It makes the objects re-usable. In the cell above, we created an object (more specifically, a pandas dataframe) that we called `iris`.

Objects can have "methods" that are associated with them. Pandas dataframes, for example, have some methods that allow you to directly run some simple analyses on them. One of them is `.describe()`.

Note the `()` at the end. If you want to "call" (= execute, run) a method, you need to end with these parentheses. They also allow you to give some additional "arguments" (parameters, options). Compare the following two method calls:

In [4]:
iris.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333
std,0.828066,0.435866,1.765298,0.762238
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [5]:
iris.describe(percentiles=[0.1, 0.9])

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333
std,0.828066,0.435866,1.765298,0.762238
min,4.3,2.0,1.0,0.1
10%,4.8,2.5,1.4,0.2
50%,5.8,3.0,4.35,1.3
90%,6.9,3.61,5.8,2.2
max,7.9,4.4,6.9,2.5


One more note: as with SPSS and syntax help, python is happy to help you.  You can type a command and then put a question mark after it, and it'll explain that command to you.  Try it here:

In [6]:
iris.describe?


You can even get a list of all available methods and properties by just pressing the TAB key after having typed  `iris.` , `pandas.` and so on. Try it:

In [None]:
pd.

# You see...

It's actually not that difficult. It can seem overwhelming to not know the codes for things, but, that's what we're going to teach you.  And there are tons of resources online to help, as well.

What's wonderful about jupyter notebook is that we have code, results, and explanation/notes in one single file!  You will also format your assignments this way, using markdown cells to provide notes.  The 'output' or results don't matter that much in the file itself, because we can always re-run the code each time we open your files.  But the markdown and code cells are essential.

We are looking forward to exploring the possibilities of Jupyter Notebook, Python, and Pandas with you in the next weeks!