# Up and running from spreadsheets to Python

## Hello, Jupyter

This is the interface that we will use to execute `.ipynb`, or IPython notebook files.

Notebooks are divided into cells which can be either text or code, among other things.

Go ahead and click into this cell. What happens?

(From a new cell)  you are seeing *raw Markdown* styling in the above cell. 

You can close out of it by **running the cell.** `Ctrl + Enter` is the keyboard shortcut.

Markdown allows us to style text using plain-text format. 

There's [a lot you can do with Markdown](https://www.markdownguide.org/cheat-sheet). Some basics:

# Big Header 1
## Smaller Header 2
### Even smaller headers
#### Still more

*Using one asterisk renders italics*

**Using two asterisks renders bold**

It's worth studying up on Markdown to write elegant text in your notebooks. 

But in this class we'll focus on the *code* block, because that's where executable code goes!


## Python as a fancy calculator

We can use Python as a highfalutin calculator, just as you might do with Excel.

Enter 

In [1]:
# This is a code block. 
# You can execute code here.

# Python can be used as a fancy calculator.

1+1

2

## Cell comments

What's the deal with the hashtags and text in the above cell?

Those are cell comments used to give us verbal instructions and reminders about our code. This helps other users -- and ourselves -- remember what we are doing with it.

![Gandalf coding meme](images/gandalf.jpg)

And yes, you can embed images into notebooks 😎.

Try writing comments in the cell below.


In [None]:
# Python follows the order of operations, just like spreadsheets. 

2+3*4/(5+3)*15/2^2+3*4^2

In [2]:
# We can also call functions:
# Let's find the absolute value of -100

abs(-100)


100

In [7]:
# These aren't going to work to find them!

ABS(-100)
Abs(-100)

# Moral of the story:
### Python is case-sensitive. ###

NameError: name 'Abs' is not defined

## Assigning variables

Calling functions like `abs(100)` can be useful, but where things get *really* interesting in Python is by assigning results of operations to variables.

Let's go ahead and pass the absolute value of -100 to a variable, `my_first_variable`.

In [8]:
my_first_variable = abs(-100)

The result of `abs(-100)` has been stored in a *variable*, which will make it much easier for us to refer to and use it. 

### Printing variables

To see the result of that variable, we can *print* it using the `print()` function: 

In [9]:
print(my_first_variable)

100


What do you think the result of the below will be?

In [None]:
print(MY_FIRST_VARIABLE)

## Python variable naming conventions

> There are only two hard things in Computer Science: cache invalidation and naming things. --Phil Karlton


There are some rules in naming Python variables:

- They must start with a letter or underscore.
- The rest of your variable can only contain letters, numbers or underscores.

Based on these rules, which of the following is an invalid variable name?

A. `My_string_`  
B. `string_1`  
C. `razzle.dazzle`  
D. `_`  

In [16]:
# Try assigning and printing these variables if you're not sure!

## Variable types

You can think about a variable as a box that we are putting a piece of information into. 

![variables shoebox](images/variables-shoebox.png)

Variables can be of different types, like different categories and dimensions of boxes. 

In [22]:
my_int = 2
my_float = 2.222
my_string = 'Hello'
my_boolean = True

print(type(my_int))
print(type(my_float))
print(type(my_string))
print(type(my_boolean))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>


We can call functions directly on these variables.

In fact, that's what we were doing with `print()` and `type()` all along!

In [25]:
# Absolute value of my_int

abs(my_int)

2

In [26]:
# Length of my_string

len(my_string)

5

In [27]:
# Assign the product to a variable

my_nonsense = abs(my_int) * len(my_string)
print(my_nonsense)

10


# DRILLS

1. Assign the sum of -10 and 2 to `a`.
2. Assign the absolute value of `a` to `b`.
3. Assign `b` minus 1 as `d`.
4. Print the result of `d`. What is the value? What type is this variable?


# From spreadsheet ranges to Python lists

Generally in spreadsheets we want to operate on multiple cells at a time and the same is true in Python. 

In [1]:
# You know how to assign the number 1 to a variable... do it now!

What about the numbers 1, 2 and 3? Do we have to assign each to its own variable?

Thank heavens not! We can use a *collection* variable tye to assign all of them at once. Let's look at a common collection data type, a list.

## Lists

Lists are denoted with brackets `[]`.

In [None]:
# Make a list

my_first_list = [1,2,3]
print(my_first_list)
type(my_first_list)

Notice that the type isn't `integer` but `list`. This is its own type of variable!

Lists can contain all sorts of individual data types inside of it.

![List shoebox](images/list-shoebox.png)

In [None]:
my_other_list = [1,2,3,"Boo!"]
print(my_other_list)
print(type(my_other_list))

They can even contain *other lists*!

In [None]:
my_list_here = [1,2,3,[1,2,3,"Boo!"]]
print(my_list_here)
print(type(my_list_here))

We can find the number of *elements* in a list using the `len()` function. Any list inside a list is considered one element.

In [None]:
len(my_list_here)

# DRILL

1. Create a list containing the values `North`, `East`, `South` and `West`.  
2. What is the result of the below?

```
len(['Monday','Tuesday','Wednesday','Thursday','Friday',['Saturday','Sunday']])
```

# Modifying lists

There are several ways you might want to manipulate a list. Let's look at a couple of common ones.

## Sorting lists 

You can do this using the `.sort()` method. A method is similar to a function, but we will suffix our variable with it. 

The method will operate directly on our variable, so we do not have to assign the results to another variable.

In [None]:
my_list = [1,4,3,2]

my_list.sort()

print(my_list)

## Appending lists

We can add elements to our list using the `.append()` method.

In [None]:
my_list.append(0)

print(my_list)


# Let's re-sort our list!
my_list.sort()

print(my_list)

For other list methods, [check out this article](https://www.w3schools.com/python/python_ref_list.asp).

# DRILL

1. What do you expect to be the result of the following? Run the code and see how you did.

```
my_week = (['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'])
my_week.sort()
print(my_week)
```

2. Pass the `clear()` method to `my_week` from above. What happens?

 # Lists and Python indexing  

Have you ever accidentaly downloaded the same files multiple times and seen something like this?

![Computer downloads are an example of zero-based indexing](zero-based-index.png)

The first time you downloaded it, there was no number given. But after that, your fi, the file was suffixed with the numbers 1, 2, 3, and so on. 

This is an everyday example of *zero-based indexing*. 

We tend to count things from 1... but Python counts from *zero*. 

In [None]:
my_list = [7,12,5,10,9]

We would like to pull out the third element of this list.

We can do so using this notation:

```
list[position]
```
So let's try it:

In [None]:
# Get the third element from my list... right?
my_list[3]

### Wrong!

This gets us the *fourth* element...

...so what gives?

This is zero-based indexing at work. What we see as the third element is to Python in the second *position*:


| `0` | `1` | `2` | `3` | `4` |
| --- | --- | --- | --- | --- |
| 7   | 12  | 5   | 10  | 9   |

Let's try again:


In [None]:
my_list[2]

Nice work!

![Kip meme](images/kip-yes.gif)

### Negative indexing

It's also worth noting that you can index starting at the *end* of the list, as well.

The first element will be in position `-1`.

| `0`<br>`-5` | `1`<br>`-4` | `2`<br>`-3` | `3`<br>`-2` | `4`<br>`-1` |
| ----------- | ----------- | ----------- | ----------- | ----------- |
| 7           | 12          | 5           | 10          | 9           |
  

Give it a try!

In [14]:
my_new_list = [6,10,3,9,1]

# Find the next-to-last element in the list 
# using a negative index 

my_new_list[-2]

9

## Slicing a list

What if we wanted to index multiple elements of a list at once?

This is called *slicing* and ... of course, it's got a loophole! 

The basic notation for slicing a list is

`list[starting_element:ending_element]`
 
However, the result is *exclusive* of that element. 🙈

Let's take an example.

In [22]:
my_list = [7,12,5,10,9]

# This gives me the 
# first through second elements... right?

my_list[0:1]

[7]

### Wrong!

The ending element is not included in the final results. You get everything *up until* that element.

Weird, right?

![Head scratch](images/confused.gif)

Let's see this in action a couple more times.

In [23]:
my_list = [7,12,5,10,9]

# First through second elements
print(my_list[0:2])

# Third through fifth elements
print(my_list[2:5])

# Fourth-last through second-last elements
print(my_list[-4:-1])

[7, 12]
[5, 10, 9]
[12, 5, 10]


## Drill

Practice some more slicing below:

In [None]:
my_list = [7,12,5,10,9]

# Get the first through third elements


# Get the third-last to second-last elements


# Get the second through last elements


## Slicing to/from first/last elements

If we leave part of our slice blank, Python will index *all* the remaining elements in the list:

In [3]:
my_list = [7,12,5,10,9]
# Print the second through the end element
print(my_list[1:])

my_big_list = [1,3,2,5,3,1,8,3,11,4]
# Works the same here
print(my_big_list[1:])

[12, 5, 10, 9]
[3, 2, 5, 3, 1, 8, 3, 11, 4]


Likewise, we can get everything from the *beginning* of the list to a certain element by leaving the first part of our slice bank:

In [24]:
# Get everything but the last element
my_list = [7,12,5,10,9]
print(my_list[:-1])

my_big_list = [1,3,2,5,3,1,8,3,11,4]
# Get everything up until the fourth element
print(my_big_list[:4])

# Yes, this would print the whole list 😎
my_big_list = [1,3,2,5,3,1,8,3,11,4]
print(my_big_list[:])


[7, 12, 5, 10]
[1, 3, 2, 5]
[1, 3, 2, 5, 3, 1, 8, 3, 11, 4]


## DRILL

Practice slicing lists below

In [28]:
this_list = ["Slicing","works","on","lists","of","strings","identically"]

# Get the third to final elements
print(this_list[2:])


# Get everything up to the fourth element
print(this_list[:4])


# Get everything starting with the second-last element
print(this_list[-2:])

['on', 'lists', 'of', 'strings', 'identically']
['Slicing', 'works', 'on', 'lists']
['strings', 'identically']


### Variable management

We've defined quite a few variables in this notebook. 

To see a list of them all, use the command

```
%who
```

In [29]:
%who

my_big_list	 my_list	 my_new_list	 os	 sys	 this_list	 


We should be aware of the variables that we create as they take memory and can bloat our environment.

If we aren't using a variable anymore, it's not a bad idea to delete it:

In [None]:
# Remove the my_big_list_ variable

del my_big_list

# my_big_list has left the building!
print(my_big_list)

We can remove *all* assigned variables by restarting the kernel. This is how our notebook communicates with the Python programming language.

It's not a bad idea when you're having coding difficulties to start with restarting the kernel. 

![Restarting the kernel](images/restart-kernel.gif)

Go ahead and restart the kernel in your notebook. But remember, *this will wipe any variables you created in your environment!*

## Lists and data analysis

Strings are a foundational variable type in Python. It's worth getting comfortable with them and, in turn, concepts like *modules* and *indexing*.

All that said, lists are not easily capable of handing many common data analysis tasks. Let's take doubling what we would call a "range" of cells, like we do in spreadsheets all the time:

import ihtml
%%ihtml

<center><iframe width="1000" height="500" frameborder="0" scrolling="no" src="https://onedrive.live.com/embed?resid=57D2AB2A84D54C81%21997&authkey=%21AGdlGfKL9x3bed4&em=2&wdAllowInteractivity=False&AllowTyping=True&wdDownloadButton=True&wdInConfigurator=True"></iframe></center>

This is not easily done with a list, even to the first range:

In [None]:
my_list = [1,9,5,3,8]
my_list * 2

For easier data analysis, we will make use of some external packages and modules. 

But before we do that, let's take some time to learn about ... packages and modules.

# Python modules

Python does not come as an analytics powerhouse out of the box. We need to load and install a few *modules*.

## The [Python standard library](https://docs.python.org/3/library/index.html)

Python does not have a built-in function for taking a square root, but it does come with a Python `math` module.

A *module* is a bundle of code. The `math` module comes standard with Python, but we need to call it into our session. 

We can do this with the `import` statement.

In [None]:
# Import the math module from the Python standard library

import math

We now have access to the `sqrt()` function, but when we use it, we need to tell Python *where* we got it from. We will do that by prefixing `sqrt()` with `math`:

In [None]:
# Take the square root of 100 
# by using the math.sqrt() function:

math.sqrt(100)

## Drill

The `factorial()` function from `math` will take the factorial of a number `X`.

Find the factorial of 10 using this function.

# Installing modules

Python comes with an [impressive number of modules in the standard library](https://docs.python.org/3/library/index.html), but the real power comes from installing "aftermarket" modules developed by the community.

These modules can be submitted to and curated by the [Python Package Index](https://pypi.org).  A package is a way of bundling modules.

Anyone is free to install and use these packages as they please. It's easy to install them using the `pip` package installer.

From a notebook, we can install a package with the command `!pip install [package name]`.

In [None]:
# Install a package called "pandas"

!pip install pandas

You will use packages all the time, and if you ever have an issue with one, a good place to start (after restarting the kernel!) is checking whether you have it installed, and what version.

You can see all packages you've installed with pip, along with their versions, using `pip freeze`.

In [30]:
pip freeze

alabaster==0.7.12
anaconda-client==1.7.2
anaconda-navigator==1.9.7
anaconda-project==0.8.3
asn1crypto==1.0.1
astroid==2.3.1
astropy==3.2.1
atomicwrites==1.3.0
attrs==19.2.0
Babel==2.7.0
backcall==0.1.0
backports.functools-lru-cache==1.5
backports.os==0.1.1
backports.shutil-get-terminal-size==1.0.0
backports.tempfile==1.0
backports.weakref==1.0.post1
beautifulsoup4==4.8.0
bitarray==1.0.1
bkcharts==0.2
bleach==3.1.0
bokeh==1.3.4
boto==2.49.0
Bottleneck==1.2.1
cachetools==4.0.0
certifi==2019.9.11
cffi==1.12.3
chardet==3.0.4
Click==7.0
cloudpickle==1.2.2
clyent==1.2.2
colorama==0.4.1
comtypes==1.1.7
conda==4.8.1
conda-build==3.18.9
conda-package-handling==1.6.0
conda-verify==3.4.2
contextlib2==0.6.0
cryptography==2.7
cycler==0.10.0
Cython==0.29.13
cytoolz==0.10.0
dask==2.5.2
decorator==4.4.0
defusedxml==0.6.0
distributed==2.5.2
docutils==0.15.2
entrypoints==0.3
et-xmlfile==1.0.1
fastcache==1.1.0
filelock==3.0.12
Flask==1.1.1
fsspec==0.5.2
future==0.17.1
gevent==1.4.0
glob2==0.7
google-api-

# Drill 

Install the `seaborn` package.