# Einführung in Jupyter Notebooks

![Jupyter Logo](https://avatars1.githubusercontent.com/u/7388996?s=200&v=4 "Jupyter Logo")

## Einleitung

> Project Jupyter is a set of open source software projects that form the building blocks for interactive and exploratory computing that is reproducible and multi-language. The main application offered by Jupyter is the Jupyter Notebook, a web-based interactive computing platform that allows users to author documents that combine live code, equations, narrative text, interactive dashboard and other rich media. These documents provide a complete record of a computation and can be shared with others using email, Dropbox, version control systems (git/GitHub) and nbviewer.
>
> -- https://github.com/jupyter/design/wiki/Jupyter-Logo

## Woher kommt der Name?

* Planet Jupiter soll Assoziationen mit Wissenschaftlichkeit wecken
* die ursprünglichen Backends waren **Ju**lia, **Pyt**hon, **R**
* Galileos Entdeckung der Jupiter-Monde als frühes Beispiel für reproduzierbare Wissenschaft

## Support

### Backends

Es gibt [backends für verschiedenste Sprachen](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels), darunter:
    
* Python
* JavaScript
* Php
* Go
* C++
* Java
* uvm.

### Jupyter-Unterstützung in Visual Studio Code

<img alt="Jupyter VS Code" src="img/vscode.png" style="height: 600px;"/>


### Hosting (Beispiele)

<img alt="Google Colab" src="img/colab.png" style="height: 600px;"/>

<img alt="Binder" src="img/binder.png" style="height: 600px;"/>

## Warum Notebooks verwenden?

* Daten-Exploration
* Plotting
* Reproduzierbarkeit & Dokumentation

### Daten-Exploration

![](img/exploration1.png)

![](img/exploration2.png)

### Inline Plotting

<img alt="" src="https://cdn-images-1.medium.com/max/1600/1*mTfjnFoMMq8kgUUA_28LiQ.jpeg" style="height: 600px;"/>

### Reproduzierbarkeit & Dokumentation

![](img/literate-code.png)

## Kehrseite

* out-of-order execution
* hidden states
* verzeiht schlampiges Arbeiten (in Bezug auf Modularität, Benennung der Files, etc.)

(via [I Don't Like Notebooks - Joel Grus - #JupyterCon 2018](https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/edit#slide=id.g3b600ce1e2_0_0))

Außerdem derzeit:

* autocomplete berücksichtigt keine type annotations
* kein linting

In [9]:
x = 1

In [7]:
x += 1

In [10]:
x

1

## Live Demo: Apollo.ai Clustering x NewsAPI.org

## Weiterführende Links

* Projekt Homepage: https://jupyter.org/
* Slideshow Extension: [RISE](https://rise.readthedocs.io/en/docs_hot_fixes/)
* NewsAPI Doc: https://newsapi.org/docs/endpoints/top-headlines
* Tutorial: [Reproducible Data Analysis in Jupyter](http://jakevdp.github.io/blog/2017/03/03/reproducible-data-analysis-in-jupyter/)
* Blog Post: [28 Jupyter Notebook tips, tricks, and shortcuts](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/)
* Tool: https://github.com/kynan/nbstripout

In [None]:
!ls *.json

In [None]:
!cat secrets.json

In [13]:
secrets = json.load(open("secrets.json"))

NEWSAPI_KEY = secrets["NEWSAPI_KEY"]
APOLLO_KEY  = secrets["APOLLO_KEY"]

FileNotFoundError: [Errno 2] No such file or directory: 'secrets.json'

In [11]:
import requests

In [12]:
r = requests.get("https://newsapi.org/v2/top-headlines?q=Trump&pageSize=100&apiKey="+NEWSAPI_KEY)

NameError: name 'NEWSAPI_KEY' is not defined

In [None]:
r.status_code

In [None]:
r.json()

In [None]:
from pprint import pprint
pprint(r.json())

In [None]:
articles = []
for item in r.json()["articles"]:
    title = item["title"] or "<NA>"
    description = item["description"] or "<NA>"
    print(title)

In [None]:
import langdetect
langdetect.detect("Der US-Shutdown ist beendet - und Trump erleidet empfindliche Niederlage")

In [None]:
articles = []
for item in r.json()["articles"]:
    title = item["title"] or "<NA>"
    description = item["description"] or "<NA>"
    lang = langdetect.detect(title + " " + description)
    if lang != "en":
        continue
    articles.append((title, description))

In [None]:
articles[:3]

In [None]:
headers = {
    "Authorization": "Bearer " + APOLLO_KEY
}
res = requests.post("http://api.apollo.ai/clustering?lang=en&threshold=.4", headers=headers, json=articles)
res.status_code

In [None]:
articles = []
for item in r.json()["articles"]:
    title = item["title"] or "<NA>"
    description = item["description"] or "<NA>"
    lang = langdetect.detect(title + " " + description)
    if lang != "en":
        continue
    articles.append({"title": title, "content": description})

In [None]:
articles[:3]

In [None]:
headers = {
    "Authorization": "Bearer " + APOLLO_KEY
}
res = requests.post("http://api.apollo.ai/clustering?lang=en", headers=headers, json=articles)
res.status_code

In [None]:
for cluster in res.json()["data"]:
    for a in cluster:
        print(a["title"])
    print("="*50)

In [None]:
res = requests.post("http://api.apollo.ai/clustering?lang=en&threshold=.4", headers=headers, json=articles)
res.status_code

In [None]:
for cluster in res.json()["data"]:
    for a in cluster:
        print(a["title"])
    print("="*50)

In [None]:
def cluster_and_print_results(threshold=.8):
    headers = {
        "Authorization": "Bearer " + APOLLO_KEY
    }
    res = requests.post("http://api.apollo.ai/clustering?lang=en&threshold="+str(threshold), headers=headers, json=articles)

    for cluster in res.json()["data"]:
        for a in cluster:
            print(a["title"])
        print("="*50)

In [None]:
cluster_and_print_results(threshold=.2)