# Lecture 4 Python Basics
__Math 3080: Fundamentals of Data Science__

Reading:
* [McKinney, *Python for Data Science*, Chapters 2-4](https://wesmckinney.com/book/python-basics)

Class notes are found through GitHub. As changes are made, they will automatically be uploaded to GitHub. A link to the repository is on Canvas.

## Installing Python
__MacOS__:
* Since MacOS is unix-based, python is automatically installed on your machine. Open a terminal and run `python`.

__Windows__:
1. Google CoLab
2. Anaconda or Miniconda
3. Install WSL on Python
4. Install Python directly

For now, I would recommend *Google CoLab*. If you want it locally, *Anaconda* is the easiest to work with. It does install a lot of extra (non-essential) packages, but this will work quite well until we get the hang of using Python. You can always learn how to install Python directly on your computer and/or WSL later (I'm happy to show you if you would like).

In addition to having Python on your computer, you will need a programming environment, or an Integrated Development Environment (IDE).
* Jupyter (or Python Notebooks)
  * Simple environment which separates Python into blocks of code - only need to run one block at a time instead of the entire program
  * Comes pre-installed in Anaconda
  * Google CoLab uses an environment very similar to Jupyter
* Microsoft VisualStudio Code
  * Once you have a kernel set up, VSCode can recognize it
  * Reads Jupyter .ipynb files
* Spyder
  * Nice layout for code, data displays, and figures all in one window
  * Readily available in *Anaconda*

### Python from the Command Line
From the command line, run `python`.

```python
1+4
print('Hello World!')
print('I am',30 + 5,'years old!')
print('I am {0} {1} old'.format(30+5,'years'))
```

To exit, type `exit()`

### Python from the file
Alternatively, save code in a file. For example, save the following in the file *Hello.py*:
```python
print('Hello World! My name is Michael.')
```

Then from the command line, type `python Hello.py`.

### IPython
An alternative to python is IPython. Advantages:

#### Tab-Completion
Press `Tab` to complete a command, filename, variable, etc.

#### Magic Functions
Certain command-line functions can be completed within python. Preface with `%`. For example, within iPython, type
```python
%pwd
%mkdir tmp
%cd tmp
```

They also make coding easier. For example, if you want to copy-paste a block of code, try:
```python
%paste
%cpaste
```

#### Introspection
If you want to know more about a function, type a `?` after the command:
```python
print?
```
This also works for variables and other objects within python. Any information available within your python installation will be displayed.

If you use `??`, source-code will also be displayed.

In [23]:
import seaborn as sns
sns.scatterplot?

[0;31mSignature:[0m
[0msns[0m[0;34m.[0m[0mscatterplot[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mdata[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mx[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0my[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mhue[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msize[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mstyle[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpalette[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mhue_order[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mhue_norm[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msizes[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msize_order[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[

## Jupyter Notebook / Lab
Jupyter Notebook runs just like IPython. They were originally one and the same. In Jupyter Notebook, your code is separated in cells. This is a happy balance between command-line and file-stored programs. You have other advantages:

### HTML based
Output is HTML based, so is much cleaner than a text-based command-line output.

### Inline images
When we talk about images, you will be able to see your images within Notebook. You would not be able to see these images in the command-line.

### Inline results
In a Jupyter Notebook, you can run blocks of code individually from the rest of the code. You can even see the output from that block independently of the rest of the code.
* In python, ipython, or Jupyter, type `print(variable)` to see the value of your variable
* In Jupyter, you can just type `variable` to output its value

### More on Introspection
Just like IPython, you can use the introspection commands (`?` and `??`) to learn more of a function. But if you are wanting that information without leaving your work, you can type `Shift-Tab`. This will produce a pop-up window with the same information as `?`. This allows quick reference while typing your program.

### Markdown
Some cells can be turned into __Markdown__ cells. This is simplified HTML coding, providing an easy way to annotate your work.

### Lab
Jupyter Lab is based on Notebook, but provides a setup with the following in one window:
* File Directory
* Tabs for all open files
  * Tabs can be tiled
* Console (Like command-line input, but with HTML output)
* Terminal (for file browsing purposes, or for executing file-stored python programs...)
* Markdown files

### Closing Jupyter Notebook or Jupyter Lab
When you started Notebook/Lab, you saw that a terminal window opened. This is listing all the processes happening within Jupyter. As long as Jupyter is running, it is using RAM and Processing power.

To close Notebook or Lab,
1. Shutdown all running Kernels
2. Quit from Notebook/Lab in the File Menu
3. Press `Ctrl-C` twice in the terminal window

You could just do step (3), but could cause problems with the hardware of the computer, leaving stacks of information in memory and processes in the processor. It is best to properly close every time.

* In VSCode, closing the tab/window will do everything as it closes

## Interrupting
If you have a run-away code (e.g. infinite loop), you can 
1. use `Ctrl-c` to interrupt the command (command-line, IPython, or Jupyter)
2. go to the "Kernel" menu and select "Interrupt"

## Modules / Packages
Python comes with a lot of tools. But other tools have been created that we can attach to Python. These tools are assembled in __packages__ (also called __modules__). Some packages are already installed. Others, we have to install by hand:
* Open a terminal window
* Run `pip install numpy` substituting *numpy* for whatever package you are wanting to install

We can attach these packages using the `import` command. Then we access the tools (functions) by calling *package.function*:

In [24]:
import math
math.sin(math.pi/6)

0.49999999999999994

Common packages that we use:
* math
* NumPy
* pandas
* MatplotLib
* Seaborn
* SciPy
* ...many others...

We can also give nicknames to the packages to make them easier to access:

In [25]:
import numpy as np
np.array([i for i in range(10)])

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

We can also import specific functions from within a package. Note that when you do this, there is no need to call the package name any more - just call the package itself.

In [26]:
from math import pi
2*pi

6.283185307179586

__What is a module?__

To best answer this, we also need to define __classes__, __objects__. Python has the ability to work as an object-oriented language. To better understand this, consider a house that is to be built.
* A __class__ is a specific type within programming, and what you can do to this class is different than any other classes.
* An __object__ is a specific instance of a class.
* A __module__ is a package with different tools that can be applied to any object in the class.
* A __method__ is a specific tool within the module, which can sometimes be mistaken for a function.

Here are two examples of the differences between classes, objects, and modules:

|     | $\sqrt{16}$ vs. "sixteen" | Building a House |
| ---: | :---: | :---: |
| __Class__ | The number (16) is an integer, and we can apply certain functions to it. $\sqrt{16} = 4$. Now, try to apply the square root the word $\sqrt{sixteen}$. You can't take a square root of a word. | For the house you are building, the __class__ would be the plan for what you're building: a house. Defining this determines the tools you will need to build the house. (For example, you wouldn't bring in a 10-story crane!) |
| __Object__ | 16 is an integer. In other words, the number (16) is an object of the class integer and inherits the properties of an integer. | The actual physical house would be the actual object. It is the physical construction of the *house* class. |
| __Module__ | The square root function is part of a math package, or *module*, which can be applied to 16 (the *object*) because it is an integer (the *class*) | The set of tools (*module*) that you can use to build your house (*object*) in the way the plans (*class*) instruct. |
| __Method__ | An individual function housed within a module. For example, the square root function is a method that can be applied to integers, and is housed within the math module. | A hammer is a tool in the toolset, which can be applied to build the house. |

* A function has an input and produces an output
* A method is a tool used to work on a given object

Not all methods can be used as functions. 'sqrt' is a function that can be applied to integers, but does not work as a method.

In [27]:
import numpy as np       # numpy is a module
y = np.array([1,4,9,16]) # y is an object of class array
max(y)                   # max() is a function

np.int64(16)

In [28]:
y.max()                  # .max() is a method, or a property applied to the object y of class array

np.int64(16)

## Basic Python Programming

### Tips for Programming
* Keep code clean and organized
  * Use Whitespace
* Use comments

In [29]:
x = 4
# A comment is led with a '#' symbol
print(x)

4


## Basic Math

In [30]:
print(5 + 2)  # Addition
print(5 - 2)  # Subtraction
print(5 * 2)  # Multiplication
print(5 / 2)  # Division
print(5 // 2) # Floor Division
print(5 % 2)  # Modulus
print(5 ** 2) # Exponentiation

7
3
10
2.5
2
1
25


In [31]:
# In Notebooks, you can just type the expression to see its value
5 + 2 # Addition

7

## Variables
### Integer and Float

In [32]:
# Integers

A = 5
B = 2
print(A / B) # Regular division produces a float
print(A // B) # Floor division produces an integer

2.5
2


### Input

In [1]:
x = input('Enter your name: ')
print('Hello, ' + x)

Hello, Michael


## Print
* Basic print command
* Format command
  * Using `.format()`
  * Using formatted strings `f"String with {var}"`

In [33]:
# Basic Print Command
A = 2
B = 7

print(A, "/", B, " = ", A/B)

2 / 7  =  0.2857142857142857


In [34]:
# Print using format method
A = 2
B = 7
print("{0} / {1} = {2}".format(A,B,A/B))
print("{0:0.1f} / {1:.2f} = {2:.3f}".format(A,B,A/B))

2 / 7 = 0.2857142857142857
2.0 / 7.00 = 0.286


In [35]:
# Print using f-strings (Python 3.6+)
A = 2
B = 7
print(f"{A:.1f} / {B:.2f} = {A/B:.3f}")

2.0 / 7.00 = 0.286


## Tuples and Lists
* Try to manipulate a tuple and a list
* Append, insert, remove, pop from a list
* Indices
* Slicing
  * Negative slicing
* Splitting

In [36]:
### Tuple ###
tuple_A = (5,2)
tuple_A

# Can't edit components of a tuple - these values are set

(5, 2)

In [37]:
tuple_A[0] = 7  # This will give an error

TypeError: 'tuple' object does not support item assignment

In [38]:
### List ###
list_A = [5,2]
print(list_A)

# Lists are mutable - can edit components

# Editing the elements of a list
list_A[1] = 3
print(list_A)

[5, 2]
[5, 3]


In [39]:
# Lists can contain mixed data types

list_B = [5, 'Hello', 3.14, [1,2,3]]
print(list_B)

[5, 'Hello', 3.14, [1, 2, 3]]


In [40]:
# 1st element of the list
list_A[0]

5

In [41]:
# Adding elements to a list
list_A.append(10)
print(list_A)

[5, 3, 10]


## Dictionaries
* Creating and Updating
* Zipping

In [42]:
dict_1 = {'a':1, 'b':2, 'c':3}
dict_1['a']

1

In [43]:
dict_1.update({'d':4, 'e':5})
dict_1

{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}

In [44]:
Category = ['First Name','Last Name','President #']
Data = ['Abraham','Lincoln','16']
dict_2 = dict(zip(Category,Data))
dict_2

{'First Name': 'Abraham', 'Last Name': 'Lincoln', 'President #': '16'}

In [45]:
dict_2['Last Name']

'Lincoln'

In [46]:
# Nexted Dictionaries

dict_presidents = {
    1: {'First Name':'George', 'Last Name':'Washington', 'President #':1},
    16: {'First Name':'Abraham', 'Last Name':'Lincoln', 'President #':16},
    32: {'First Name':'Franklin', 'Last Name':'Roosevelt', 'President #':32}
}

print(dict_presidents,"\n")
print(dict_presidents[1],"\n")
print(dict_presidents[1]['First Name'])  # Accessing nested dictionary values

{1: {'First Name': 'George', 'Last Name': 'Washington', 'President #': 1}, 16: {'First Name': 'Abraham', 'Last Name': 'Lincoln', 'President #': 16}, 32: {'First Name': 'Franklin', 'Last Name': 'Roosevelt', 'President #': 32}} 

{'First Name': 'George', 'Last Name': 'Washington', 'President #': 1} 

George


## NumPy
Once of the most widely used tools in Python is *Numerical Python*, or *NumPy*. It contains functions that create and do math with arrays and matrices, which can greatly enhance our productivity as Data Scientists.

In [47]:
# 'numpy' is a module with many basic functions
import numpy as np

# 'x' is an object defined by the class 'array', which is described in 'numpy'
x = np.array([1,3,5,7,9,0,8,6,4,2,0])
x

array([1, 3, 5, 7, 9, 0, 8, 6, 4, 2, 0])

In [48]:
# 'max' is a method from the 'numpy' module and finds the maximum value
# of the object 'x' which was defined in the 'array' class.
max(x)   # Function

np.int64(9)

In [49]:
x.max()  # Method

np.int64(9)

A matrix is a 2-D array, or an array of arrays.

In [50]:
matrix = np.array([[1,3,5],
                   [7,9,0],
                   [8,6,4],
                   [2,0,2]])

print(matrix)

[[1 3 5]
 [7 9 0]
 [8 6 4]
 [2 0 2]]


## Strings
A string is just an array of characters

In [51]:
# Showing the indices of an array
phrase = "Hello World"
print(phrase)


print(phrase[0])
print(phrase[1])
print(phrase[2])

Hello World
H
e
l


In [52]:
## Splitting an Array

World = phrase.split()
World

['Hello', 'World']

In [53]:
## Splitting an Array on a specific character

World = phrase.split("o")
World

['Hell', ' W', 'rld']

## Installing Packages
Most basic functions are installed directly into Python. But to Python more versatile, the desired packages are left for the user to load into Python.

__To install packages__:
  * To install packages within an Anaconda environment: `conda install numpy`
  * To install packages within a Python environment: `pip install numpy`

__To load packages__:
```python
import numpy
```

Often, we give the package an abbreviation to make it easier to work with:
```python
import numpy as np
```

Sometimes, we only want one or two specific functions from the package:
```python
from numpy import sqrt
from numpy.random import randint
```

## Virtual Environments
Within Data Science, we use many tools. However, sometimes these tools don't work well with each other. For example, installing tools for Machine Learning may interfere with tools for Data Analysis. So, we create something called *Virtual Environments* which are setups within your computer where you can customize what is installed.

* *Anaconda*
  * To create a new virtual enviornment in *Anaconda*: `conda create -n Math3080`
  * To start a virtual environment in *Anaconda*: `conda activate Math3080`
  * To close a virtual environment in *Anaconda*: `conda deactivate`
* *Python*
  * To create a new virtual environment in *Python*: `python3 -m venv $HOME/.virtualenvs/Math3080`
  * To start a virtual environment in *Python*: `source $HOME/.virtualenvs/Math3080/bin/activate`
  * To close a virtual environment in *Python*: `deactivate`
  
More information can be found on the *Anaconda* documentation page:
* https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html

Once virtual environments are made, you should be able to see an option within Jupyter to create Notebooks/Consoles within that environment.

### Connecting Virtual Environment to Jupyter Notebook
* After Jupyter is installed, install the kernel within the virtual environment
  * In *venv*: `pip install ipykernel`
  * In *anaconda*: `conda install -c anaconda ipykernel`
* Connect the new kernel with Jupyter
  * `python -m ipykernel install --user --name Math3080 --display-name "Python (DataScience)"`
* Reload Jupyter Lab/Notebook

## Creating an Array of Random Numbers

In [54]:
import numpy as np
rnd_nums = np.random.randint(50, size=120) # 120 values from 0 to 50

rnd_nums

array([27, 41, 43, 32, 10, 15, 15, 49, 15, 23, 42, 17, 44, 26, 11, 17,  9,
       18, 42, 36, 25, 12, 45, 35, 25,  2, 23, 11,  8, 34, 37,  2, 35,  5,
       48, 39, 39,  8, 42, 48, 35, 28, 16, 22, 30, 31, 30, 46, 49,  4,  1,
       12, 10, 12, 32, 23, 48, 11, 34, 15, 32, 10, 45,  8,  2, 30,  6, 44,
        5, 13, 28, 48, 22, 17, 38, 44, 30, 20,  2, 15, 44, 17, 45, 39, 39,
        4, 45, 49, 40, 49, 16, 42, 43, 46, 32, 44, 26, 48, 47,  8,  1, 28,
       28, 18, 10, 38, 41, 40, 47, 39, 46, 35,  2, 46, 48, 18, 13, 18, 46,
       18])

In [55]:
from numpy.random import randint
rnd_nums = randint(50, size=120)

rnd_nums

array([44, 28,  3, 37, 10,  4,  9, 38, 13, 35, 19, 42,  9,  6,  5, 16, 27,
       45,  8, 26, 34,  2, 18, 40, 12, 41, 12, 19, 19, 30, 45, 38, 27, 25,
       24, 46, 22, 30, 25, 46, 36,  3, 42, 39,  6, 21, 35, 13, 22,  0, 21,
        8, 15, 21, 29, 39, 13, 22, 30,  8,  8, 14, 45,  7, 30, 13, 33, 14,
       13, 41, 15, 14, 49, 17, 44, 17, 29, 48, 27, 46, 38, 23, 14, 47, 49,
       13,  1,  2,  2,  9, 15, 37, 15, 45, 48, 18,  6, 34, 46,  7, 32, 29,
        6, 31, 24, 38, 31, 14, 37, 29, 19, 44, 21, 26, 30,  1, 43, 40, 34,
       40])

In [56]:
np.random.rand(2,4)

array([[0.04639119, 0.88838482, 0.81627753, 0.95446558],
       [0.6739855 , 0.87415846, 0.60107025, 0.68944069]])

## For Loops

In [57]:
for i in range(5):
    print(rnd_nums[i]*2)

88
56
6
74
20


In [58]:
total = 0
for i in range(len(rnd_nums)):
    total += rnd_nums[i]

total

np.int64(2964)

In [59]:
total = 0
for value in rnd_nums:
    total += value

total

np.int64(2964)

In [60]:
doubles = [value*2 for value in rnd_nums]
doubles[:5]

[np.int64(88), np.int64(56), np.int64(6), np.int64(74), np.int64(20)]

## If Statments

In [61]:
evens = []

for value in rnd_nums:
    if (value % 2) == 0:
        evens.append(True)
    else:
        evens.append(False)

print(rnd_nums[:10])
print(evens[:10])

[44 28  3 37 10  4  9 38 13 35]
[True, True, False, False, True, True, False, True, False, False]


In [62]:
evens = []
for value in rnd_nums:
    evens.append(True if (value%2)==0 else False)

evens[:10]

[True, True, False, False, True, True, False, True, False, False]

In [63]:
evens = [True if (value%2)==0 else False for value in rnd_nums]
evens[:10]

[True, True, False, False, True, True, False, True, False, False]

## Functions
__What is a function?__
* How to make a function
  * def
  * lambda

In [64]:
def power_function(x,y):
    result = 1
    for i in range(y):
        result *= x
    return result

power_function(2,3)

8

In [65]:
power_ftn = lambda x,y: x**y

power_ftn(2,5)

32

### Docstrings

In [66]:
def subtract_one(x):
    """ Here is a description of the function. 
    And a second line to the description. """
    return x-1

In [67]:
subtract_one?

[0;31mSignature:[0m [0msubtract_one[0m[0;34m([0m[0mx[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Here is a description of the function. 
And a second line to the description. 
[0;31mFile:[0m      /tmp/ipykernel_35594/2951862098.py
[0;31mType:[0m      function

In [68]:
subtract_one??

[0;31mSignature:[0m [0msubtract_one[0m[0;34m([0m[0mx[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
[0;32mdef[0m [0msubtract_one[0m[0;34m([0m[0mx[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m""" Here is a description of the function. [0m
[0;34m    And a second line to the description. """[0m[0;34m[0m
[0;34m[0m    [0;32mreturn[0m [0mx[0m[0;34m-[0m[0;36m1[0m[0;34m[0m[0;34m[0m[0m
[0;31mFile:[0m      /tmp/ipykernel_35594/2951862098.py
[0;31mType:[0m      function

### Loading functions from a file

In [69]:
%cd code
from function_file import find_power
%cd ..

[Errno 2] No such file or directory: 'code'
/mnt/c/Users/michael.olson2/Documents/GitHubs/math3080/Notes/code
/mnt/c/Users/michael.olson2/Documents/GitHubs/math3080/Notes


  bkms = self.shell.db.get('bookmarks', {})
  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [70]:
find_power?

[0;31mSignature:[0m [0mfind_power[0m[0;34m([0m[0mbase[0m[0;34m,[0m [0mexponent[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Returns base raised to the exponent power.
[0;31mFile:[0m      /mnt/c/Users/michael.olson2/Documents/GitHubs/math3080/Notes/code/function_file.py
[0;31mType:[0m      function

In [71]:
find_power??

[0;31mSignature:[0m [0mfind_power[0m[0;34m([0m[0mbase[0m[0;34m,[0m [0mexponent[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
[0;32mdef[0m [0mfind_power[0m[0;34m([0m[0mbase[0m[0;34m,[0m [0mexponent[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"""Returns base raised to the exponent power."""[0m[0;34m[0m
[0;34m[0m    [0;32mreturn[0m [0mbase[0m [0;34m**[0m [0mexponent[0m[0;34m[0m[0;34m[0m[0m
[0;31mFile:[0m      /mnt/c/Users/michael.olson2/Documents/GitHubs/math3080/Notes/code/function_file.py
[0;31mType:[0m      function

In [72]:
find_power(2,5)

32