# Lecture 4 Python Basics
__Math 3080: Fundamentals of Data Science__

Reading:
* [McKinney, *Python for Data Science*, Chapters 2-4](https://wesmckinney.com/book/python-basics)

Class notes are found through GitHub. As changes are made, they will automatically be uploaded to GitHub. A link to the repository is on Canvas.

## Installing Python
__MacOS__:
* Since MacOS is unix-based, python is automatically installed on your machine. Open a terminal and run `python`.

__Windows__:
1. Google CoLab
2. Anaconda or Miniconda
3. Install WSL on Python
4. Install Python directly

For now, I would recommend *Google CoLab*. If you want it locally, *Anaconda* is the easiest to work with. It does install a lot of extra (non-essential) packages, but this will work quite well until we get the hang of using Python. You can always learn how to install Python directly on your computer and/or WSL later (I'm happy to show you if you would like).

In addition to having Python on your computer, you will need a programming environment, or an Integrated Development Environment (IDE).
* Jupyter (or Python Notebooks)
  * Simple environment which separates Python into blocks of code - only need to run one block at a time instead of the entire program
  * Comes pre-installed in Anaconda
  * Google CoLab uses an environment very similar to Jupyter
* Microsoft VisualStudio Code
  * Once you have a kernel set up, VSCode can recognize it
  * Reads Jupyter .ipynb files
* Spyder
  * Nice layout for code, data displays, and figures all in one window
  * Readily available in *Anaconda*

### Python from the Command Line
From the command line, run `python`.

```python
1+4
print('Hello World!')
print('I am',30 + 5,'years old!')
print('I am {0} {1} old'.format(30+5,'years'))
```

To exit, type `exit()`

### Python from the file
Alternatively, save code in a file. For example, save the following in the file *Hello.py*:
```python
print('Hello World! My name is Michael.')
```

Then from the command line, type `python Hello.py`.

### IPython
An alternative to python is IPython. Advantages:

#### Tab-Completion
Press `Tab` to complete a command, filename, variable, etc.

#### Magic Functions
Certain command-line functions can be completed within python. Preface with `%`. For example, within iPython, type
```python
%pwd
%mkdir tmp
%cd tmp
```

They also make coding easier. For example, if you want to copy-paste a block of code, try:
```python
%paste
%cpaste
```

#### Introspection
If you want to know more about a function, type a `?` after the command:
```python
print?
```
This also works for variables and other objects within python. Any information available within your python installation will be displayed.

If you use `??`, source-code will also be displayed.

In [3]:
import seaborn as sns
sns.scatterplot?

[0;31mSignature:[0m
[0msns[0m[0;34m.[0m[0mscatterplot[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mdata[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mx[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0my[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mhue[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msize[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mstyle[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpalette[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mhue_order[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mhue_norm[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msizes[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msize_order[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[

## Jupyter Notebook / Lab
Jupyter Notebook runs just like IPython. They were originally one and the same. In Jupyter Notebook, your code is separated in cells. This is a happy balance between command-line and file-stored programs. You have other advantages:

### HTML based
Output is HTML based, so is much cleaner than a text-based command-line output.

### Inline images
When we talk about images, you will be able to see your images within Notebook. You would not be able to see these images in the command-line.

### Inline results
In a Jupyter Notebook, you can run blocks of code individually from the rest of the code. You can even see the output from that block independently of the rest of the code.
* In python, ipython, or Jupyter, type `print(variable)` to see the value of your variable
* In Jupyter, you can just type `variable` to output its value

### More on Introspection
Just like IPython, you can use the introspection commands (`?` and `??`) to learn more of a function. But if you are wanting that information without leaving your work, you can type `Shift-Tab`. This will produce a pop-up window with the same information as `?`. This allows quick reference while typing your program.

### Markdown
Some cells can be turned into __Markdown__ cells. This is simplified HTML coding, providing an easy way to annotate your work.

### Lab
Jupyter Lab is based on Notebook, but provides a setup with the following in one window:
* File Directory
* Tabs for all open files
  * Tabs can be tiled
* Console (Like command-line input, but with HTML output)
* Terminal (for file browsing purposes, or for executing file-stored python programs...)
* Markdown files

## Interrupting
If you have a run-away code (e.g. infinite loop), you can 
1. use `Ctrl-c` to interrupt the command (command-line, IPython, or Jupyter)
2. go to the "Kernel" menu and select "Interrupt"

## Closing Jupyter Notebook or Jupyter Lab
When you started Notebook/Lab, you saw that a terminal window opened. This is listing all the processes happening within Jupyter. As long as Jupyter is running, it is using RAM and Processing power.

To close Notebook or Lab,
1. Shutdown all running Kernels
2. Quit from Notebook/Lab in the File Menu
3. Press `Ctrl-C` twice in the terminal window

You could just do step (3), but could cause problems with the hardware of the computer, leaving stacks of information in memory and processes in the processor. It is best to properly close every time.

# More about Python

## Modules / Packages
Python comes with a lot of tools. But other tools have been created that we can attach to Python. These tools are assembled in __packages__ (also called __modules__). Some packages are already installed. Others, we have to install by hand:
* Open a terminal window
* Run `pip install numpy` substituting *numpy* for whatever package you are wanting to install

We can attach these packages using the `import` command. Then we access the tools (functions) by calling *package.function*:

In [1]:
import math
math.sin(math.pi/6)

0.49999999999999994

Common packages that we use:
* math
* NumPy
* pandas
* MatplotLib
* Seaborn
* SciPy
* ...many others...

We can also give nicknames to the packages to make them easier to access:

In [3]:
import numpy as np
np.array([i for i in range(10)])

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

We can also import specific functions from within a package. Note that when you do this, there is no need to call the package name any more - just call the package itself.

In [5]:
from math import pi
2*pi

6.283185307179586

__What is a module?__

To best answer this, we also need to define __classes__, __objects__. Python has the ability to work as an object-oriented language. To better understand this, consider a house that is to be built.
* A __class__ is a specific type within programming, and what you can do to this class is different than any other classes.
* An __object__ is a specific instance of a class.
* A __module__ is a package with different tools that can be applied to any object in the class.
* A __method__ is a specific tool within the module, which can sometimes be mistaken for a function.

Here are two examples of the differences between classes, objects, and modules:

|     | $\sqrt{16}$ vs. "sixteen" | Building a House |
| ---: | :---: | :---: |
| __Class__ | The number (16) is an integer, and we can apply certain functions to it. $\sqrt{16} = 4$. Now, try to apply the square root the word $\sqrt{sixteen}$. You can't take a square root of a word. | For the house you are building, the __class__ would be the plan for what you're building: a house. Defining this determines the tools you will need to build the house. (For example, you wouldn't bring in a 10-story crane!) |
| __Object__ | 16 is an integer. In other words, the number (16) is an object of the class integer and inherits the properties of an integer. | The actual physical house would be the actual object. It is the physical construction of the *house* class. |
| __Module__ | The square root function is part of a math package, or *module*, which can be applied to 16 (the *object*) because it is an integer (the *class*) | The set of tools (*module*) that you can use to build your house (*object*) in the way the plans (*class*) instruct. |
| __Method__ | An individual function housed within a module. For example, the square root function is a method that can be applied to integers, and is housed within the math module. | A hammer is a tool in the toolset, which can be applied to build the house. |

* A function has an input and produces an output
* A method is a tool used to work on a given object

Not all methods can be used as functions. 'sqrt' is a function that can be applied to integers, but does not work as a method.

In [52]:
import numpy as np       # numpy is a module
y = np.array([1,4,9,16]) # y is an object of class array
max(y)                   # max() is a function

16

In [53]:
y.max()                  # .max() is a method, or a property applied to the object y of class array

16

## Basic Python Programming

### Tips for Programming
* Keep code clean and organized
  * Use Whitespace
* Use comments

In [22]:
x = 4
# A comment is led with a '#' symbol
print(x)

4


## Basic Math

In [None]:
5 + 2  # Addition
5 - 2  # Subtraction

## Variables
### Integer and Float

In [3]:
# Integers

A = 5
B = 2
A / B

2.5

In [4]:
# Float

A = 5.0
B = 2.0
A / B

2.5


### Strings


In [13]:
Hello = 'Hello World!'
Hello

'Hello World!'

## Print
* Basic print command
* Format command

In [21]:
A = 2
B = 7

print(A, "/", B, " = ", A/B)

2 / 7  =  0.2857142857142857


In [20]:
A = 2
B = 7
print("{0} / {1} = {2}".format(A,B,A/B))
print("{0:0.1f} / {1:.2f} = {2:.3f}".format(A,B,A/B))

2 / 7 = 0.2857142857142857
2.0 / 7.00 = 0.286


In [23]:
A = 2
B = 7
print(f"{A:.1f} / {B:.2f} = {A/B:.3f}")

2.0 / 7.00 = 0.286


## Tuples and Lists
* Try to manipulate a tuple and a list
* Append, insert, remove, pop from a list
* Indices
* Slicing
  * Negative slicing
* Splitting

In [14]:
### Tuple ###
tuple_A = (5,2)
tuple_A

# Can't edit components of a tuple - these values are set

(5, 2)

In [15]:
### List ###
list_A = [5,2]
list_A

[5, 2]

In [16]:
# 1st element of the list
list_A[0]

5

In [17]:
# Editing the elements of a list
list_A[1] = 3
list_A

[5, 3]

## Dictionaries
* Creating and Updating
* Zipping

In [8]:
dict_1 = {'a':1, 'b':2, 'c':3}
dict_1['a']

1

In [10]:
dict_1.update({'d':4, 'e':5})
dict_1

{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}

In [28]:
Category = ['First Name','Last Name','President #']
Data = ['Abraham','Lincoln','16']
dict_2 = dict(zip(Category,Data))
dict_2

{'First Name': 'Abraham', 'Last Name': 'Lincoln', 'President #': '16'}

In [29]:
dict_2['Last Name']

'Lincoln'

## NumPy
Once of the most widely used tools in Python is *Numerical Python*, or *NumPy*. It contains functions that create and do math with arrays and matrices, which can greatly enhance our productivity as Data Scientists.

In [5]:
# 'numpy' is a module with many basic functions
import numpy as np

# 'x' is an object defined by the class 'array', which is described in 'numpy'
x = np.array([1,3,5,7,9,0,8,6,4,2,0])
x

array([1, 3, 5, 7, 9, 0, 8, 6, 4, 2, 0])

In [6]:
# 'max' is a method from the 'numpy' module and finds the maximum value
# of the object 'x' which was defined in the 'array' class.
max(x)   # Function

9

In [7]:
x.max()  # Method

9

A matrix is a 2-D array, or an array of arrays.

In [9]:
matrix = np.array([[1,3,5],
                   [7,9,0],
                   [8,6,4],
                   [2,0,2]])

print(matrix)

[[1 3 5]
 [7 9 0]
 [8 6 4]
 [2 0 2]]


## Strings
A string is just an array of characters

In [None]:
# Showing the indices of an array
phrase = "Hello World"
for i in range(len(phrase)):
    print ("phrase[{0}] = {1}".format(i,phrase[i]))

In [None]:
## Splitting an Array

World = Hello.split()
World

In [None]:
World = Hello.split("o")
World