# The easiest way to get Jupyter notebooks on your system
![Anaconda](./images/basics-anaconda.png)

The easiest way to get Jupyter notebooks on your own system is to install the [**Anaconda**](https://www.anaconda.com/download/) distribution of Python. Not only is it one of the better distributions, but it comes with Jupyter notebooks built in!

# Running Jupyter notebooks on your machine once it’s installed
Incredibly easy. Just open up a terminal and type

`jupyter notebook`

on the command line. It’ll open up a local Jupyter notebook server and Jupyter page on your browser.

# If you don’t want to commit just yet...
![Try Jupyter](./images/basics-try_jupyter.png)

You can visit the [**Try Jupyter**](https://jupyter.org/try) page and try it online, in the safety of your web browser.

# You can even look at this presentation online!

1. Point your browser at [**colab.research.google.com**](https://colab.research.google.com/).
2. If you’re already logged into a Google service like Gmail, you’ll see a dialog box with an orange menu bar. Select **GitHub** from the menu bar and enter the GitHub repo name **AccordionGuy/DevFestFlorida2019**, then jump to step 4.
3. If you’re not logged into a Google service, select **File**, the **Open notebook...**. Then follow step 2.
4. You’ll be presented with a directory of files. Files with the **.ipynb** extension are Jupyter notebooks. Open them!

# Jupyter notebooks are made of cells
![Cells](./images/intro-cells.png)

Think of them as a single column of cells, each of which can contain one of the following:

* Text
* Code

## This is a text cell
Text cells are usually written in **Markdown**, and it’s for textual data, with all the markdown that you’ve come to love and expect from a text markup language.

You can also use HTML when you bump up against Markdown’s limitations.

Jupyter notebooks are often used for papers with lots of math notation in them, and luckily for us, they also support **LaTeX**. The formula below was written in LaTeX:

$$c = \sqrt{a^2 + b^2}$$

You render a text cell by selecting one and pressing either:

* **Ctrl-Enter** to simply render the cell, or
* **Shift-Enter** to render the cell and move the focus to the next one.

## Below this is a code cell

In [None]:
print("This is a code cell.")
print("It contains code.")
print("You execute a code cell by selecting one and pressing either:")
print("**Ctrl-Enter** to simply execute code in the cell, or")
print("**Shift-Enter** to execute the code in the cell and move the focus to the next one.")

## Executing shell commands
Code cells can execute shell commands too — any line starting with `!` is executed as a shell command.

In [None]:
!ls

## Magic commands, or “magics”
There are “magic” commands that extend Jupyter notebook capabilities — they’re on lines that start with `%`.

In [None]:
import numpy as np
from numpy.random import randint

# A function to simulate one million dice throws.
def one_million_dice():
    return randint(low=1, high=7, size=1000000)

print("Let's try %time first:")
%time throws = one_million_dice()
%time mean = np.mean(throws)

print("\nLet's do the same with %timeit")
%timeit throws = one_million_dice()
%timeit mean = np.mean(throws)

# Example: A simple Jupyter notebook
![Basic music recommendation](./images/basics-basic_music_recommendation.jpg)

Let’s build an incredibly simple music recommendation engine — no data science libraries; just pure Python.

Consider a set of users’ ratings of a set of bands on a scale of 1 to 5, where 1 is “hated it”, and 5 is “loved it”. Here’s the data in table form:

![Music tastes chart](./images/basics-ratings.jpg)

One way to express the chart in Python form is to turn it into a dictionary where:

* The users’ names are the keys, and
* Their ratings are the values — and those ratings in turn are dictionaries where:
     * The band names are the keys, and
     * The user’s ratings for the bands are the values.

Here’s the code:

In [None]:
users = {
    "Angelica": {
        "Blues Traveler": 3.5,
        "Broken Bells": 2.0,
        "Norah Jones": 4.5,
        "Phoenix": 5.0,
        "Slightly Stoopid": 1.5,
        "The Strokes": 2.5,
        "Vampire Weekend": 2.0
    },
    "Bill": {
        "Blues Traveler": 2.0,
        "Broken Bells": 3.5,
        "Deadmau5": 4.0,
        "Phoenix": 2.0,
        "Slightly Stoopid": 3.5,
        "Vampire Weekend": 3.0
    },
    "Chan": {
        "Blues Traveler": 5.0,
        "Broken Bells": 1.0,
        "Deadmau5": 1.0,
        "Norah Jones": 3.0,
        "Phoenix": 5,
        "Slightly Stoopid": 1.0
    },
    "Dan": {
        "Blues Traveler": 3.0,
        "Broken Bells": 4.0,
        "Deadmau5": 4.5,
        "Phoenix": 3.0,
        "Slightly Stoopid": 4.5,
        "The Strokes": 4.0,
        "Vampire Weekend": 2.0
    },
    "Hailey": {
        "Broken Bells": 4.0,
        "Deadmau5": 1.0,
        "Norah Jones": 4.0,
        "The Strokes": 4.0,
        "Vampire Weekend": 1.0
    },
    "Jordyn": {
        "Broken Bells": 4.5,
        "Deadmau5": 4.0,
        "Norah Jones": 5.0,
        "Phoenix": 5.0,
        "Slightly Stoopid": 4.5,
        "The Strokes": 4.0,
        "Vampire Weekend": 4.0
    },
    "Sam": {
        "Blues Traveler": 5.0,
        "Broken Bells": 2.0,
        "Norah Jones": 3.0,
        "Phoenix": 5.0,
        "Slightly Stoopid": 4.0,
        "The Strokes": 5.0
    },
    "Veronica": {
        "Blues Traveler": 3.0,
        "Norah Jones": 5.0,
        "Phoenix": 4.0,
        "Slightly Stoopid": 2.5,
        "The Strokes": 3.0
    },
    # GrungeBob isn't in the chart; I’ve added him to be a user
    # with nothing in common with any of the other users.
    "GrungeBob": {
        "Nirvana": 4.5,
        "Pearl Jam": 4.0,
        "Soundgarden": 5.0
    },
}

Many recommendation engines work on these principles:

* The people who like a lot of the things that you like are the best people to recommend new things to you.
* The more similar two people’s ratings for the same things, the more similar they are.
* If you think of people’s ratings for things as points on a graph, the more similar two people are, the smaller the distance between their ratings points.

For calculating the distance between points, the simplest, least computationally expensive way to do so is calculating the **Manhattan distance** between those points:

![Manhattan distance](./images/basics-manhattan_distance.jpg)

Let’s define a function to calculate the Manhattan distance between two users’ ratings:

In [None]:
def manhattan(rating1, rating2):
    """Computes the Manhattan distance. Both rating1 and rating2 are dictionaries of the form
    {'The Strokes': 3.0, 'Slightly Stoopid': 2.5 ..."""
    distance = 0
    for key in rating1:
        if key in rating2:
            distance += abs(rating1[key] - rating2[key])
    return distance

Let’s calculate the Manhattan distance between Jordyn and Sam.

As a reminder, here are Jordyn’s ratings for bands:

* Broken Bells: 4.5
* Deadmau5: 4.0
* Norah Jones: 5.0
* Phoenix: 5.0
* Slightly Stoopid: 4.5
* The Strokes: 4.0
* Vampire Weekend: 4.0

And here are Sam’s:

* Blues Traveler: 5.0,
* Broken Bells: 2.0,
* Norah Jones: 3.0,
* Phoenix: 5.0,
* Slightly Stoopid: 4.0,
* The Strokes: 5.0

In [None]:
print(f"Manhattan distance between Jordyn and Sam: {manhattan(users['Jordyn'], users['Sam'])}")

How about Chan and Dan?

Here are Chan’s known musical tastes:

* Blues Traveler: 5.0
* Broken Bells: 1.0
* Deadmau5: 1.0
* Norah Jones: 3.0
* Phoenix: 5
* Slightly Stoopid: 1.0

And here are Dan’s:

* Blues Traveler: 3.0
* Broken Bells: 4.0
* Deadmau5: 4.5
* Phoenix: 3.0
* Slightly Stoopid: 4.5
* The Strokes: 4.0
* Vampire Weekend: 2.0

In [None]:
print(f"Manhattan distance between Chan and Dan: {manhattan(users['Chan'], users['Dan'])}")

What happens if we compute the Manhattan distance between Dan and himself? 

In [None]:
print(f"Manhattan distance between Dan and himself: {manhattan(users['Dan'], users['Dan'])}")

That makes sense; two people with identical tastes should have zero taste distance between them.

What about the Manhattan distance between Dan and GrungeBob?

Remember, GrungeBob is stuck in 1994, and has no ratings in common with Dan.

In [None]:
print(f"Manhattan distance between GrungeBob and Jordyn: {manhattan(users['GrungeBob'], users['Jordyn'])}")

That does *not* make sense.

![Thinking guy](./images/basics_thinking-guy.jpg)

That makes sense. Let’s refactor our function:

In [None]:
def better_manhattan(all_users, user1_name, user2_name):
    distance = 0
    common_keys_were_found = False
    user1_ratings = all_users[user1_name]
    user2_ratings = all_users[user2_name]

    for key in user1_ratings:
        if key in user2_ratings:
            distance += abs(user1_ratings[key] - user2_ratings[key])
            common_keys_were_found = True

    if common_keys_were_found:
        return distance
    else:
        return None

In [None]:
print(f"Manhattan distance between Jordyn and Sam: {better_manhattan(users, 'Jordyn', 'Sam')}")

In [None]:
print(f"Manhattan distance between Jordyn and Jordyn: {better_manhattan(users, 'Jordyn', 'Jordyn')}")

In [None]:
print(f"Manhattan distance between Jordyn and GrungeBob: {better_manhattan(users, 'Jordyn', 'GrungeBob')}")

In [None]:
for user_name in users:
    distance = better_manhattan(users, user_name, 'Hailey')
    if distance is None:
        print(f"No bands in common between Hailey and {user_name}.")
    else:
        print(f"Manhattan distance between Hailey and {user_name}: {distance}.")

## Sources
* The recommendation engine exercise was taken from [A Programmer's Guide to Data Mining.](http://guidetodatamining.com/) It’s a great (and free) intro to data mining in Python.