# 1.0 Intro to Python

This chapter focuses on the Python componemnt of the class.  We will focus exclusively on using Python for interactive data science work rather than on "programming" per se.

## Python Book
VanderPlas, J. <i>Python Data Science Handbook: Essential Tools for Working with Data</i>, O'Reilly Media, 2016.  Available on 
Amazon and at https://jakevdp.github.io/PythonDataScienceHandbook/.

Note that this is not a book that teaches Python or programming.  Instead, it's focused on using Python for *Data Science*. 

## Key Python Modules
We will use several common Python <i>modules</i>.  A modules is similar to a library in other lanugages.  The modules that we will use include:
<ul>
<li />NumPy
<li />Pandas
<li />SciPy
<li />Matplotlib and Seaborn
<li />Scikit-learn
</ul>
In addition, we will also learn how to create and use our own modules.

## Introduction to Python

In [None]:
import os
# To remember this, think "get current working directory."  This is the base
# directory that will be used when looking for files.
print(os.getcwd())

In [None]:
print ("Hello World!")

In [None]:
# Unlike many languages that use {} and other start/end identifers
#   to identify blocks, Python uses intention.  
for i in range(5):
    j = i + 1
    print (f"{j} Hello World!")

In [None]:
# If you don't use indenion as Python expects, it will let you know.
for i in range(5):
j = i + 1
print (f"{j} Hello World!")

In [None]:
# Note here that we indent for the "for" block and again for the "if" block.
# This is called nested indention -- and it generally makes code more "readable"
for i in range(20):
    if i % 2:
        print(i)

### Python Object types - modules, statements, expressions, objects -- Slides

### Pillars of Programming -- Slides

In [None]:
# simple implementation of the pseudo code from Slide 5 of the Intro to 
#   Python slide set.
Numbers = [123, 87, 96, 24, 104, 16, 55, 24, 19, 86, 776, 1945, 87.5, 12.34]
# Uncomment the next line if you want to see the error message
#Numbers = []
Total = 0
Count = 0
for Num in Numbers:
    Total = Total + Num
    Count=Count+1
if Count > 0 :
    Average = Total/Count
else :
    Average = "Can't compute the average of a sample of size 0."
print (Average)


### Initial Core Statements

In [None]:
# Assignment
x = 12
y = x + 14
z = [1, 2, 3]
# the following returns the objects as a tuple -- we'll discuss tubles/lists/sets later, but 
#   for now, think of this as a simple way to "see" the values of objects.
x, y, z

If you want to *see* these examples in more detail, try pasting them into Sypder

In [None]:
# Repitition
for j in [1, 2, 3, 4, 5]:
    print(j)

In [None]:
# check the current value of the variable j after the previous loop.  What should it be?
j
# The loop exit condition is that the value j is the "last" value in the list

In [None]:
# Range objects (more on the Range object in the Lists section of the notes)
range(10), type(range(10)), list(range(10))
# Note that what is displayed is a tuple with three items: the range object, it's type, and a list

In [None]:
# Very flexible -- arguements are start, stop, and step value
list(range(2, 20, 2))

In [None]:
for j in range(10):
    print (j)

In [None]:
# Selection
x = 5
y = 5
if x == 12:
    y = y + 1
x, y

In [None]:
if x in [1, 2, 3, 4, 5]:
    y = 35
else:
    y = 50
x,y    

### General Concepts - Dynamically typed, Strongly typed, Mutable vs. Immutable, Namespaces -- Slides/Board

### Numbers

Numeric and math modules - https://docs.python.org/3.3/library/numeric.html

In [None]:
# Integers (ints) and floating point numbers (floats)
n1 = 3
n2 = 3.0
n1, n2

In [None]:
n1 == n2, type(n1) == type(n2)

In [None]:
# Type converstion
n3 = float(n1)
n4 = int(n2)
n3, n4, type(n3), type(n4)

In [None]:
# many built-in function in addition to the math module
import math
math.pi, math.sqrt(227)

In [None]:
# show the functions in the math module
dir(math)

In [None]:
# Help -- use ? to see the Docstring (bottom of the browser window)
math.factorial?

In [None]:
# More on Namespaces - three different Pi constants ...
# Overlap in constants for pi -- one in each namespace
import math
import numpy as np
import scipy as sp
math.pi, np.pi, sp.pi

In [None]:
# Random Numbers (pseudo-random, technically)
import random
for j in range(10):
    print(random.random())

In [None]:
dir(random)

In [None]:
# simulating 10 rolls of a fair die.
for j in range(10):
    print(random.choice(list(range(1,7))))

### Strings

A *string* is a sequence of characters.  Recall that a *sequence* is a "positionally ordered collection of other objects" -- in this case, those objects are characters.

Common string operations - https://docs.python.org/3.3/library/string.html

In [None]:
# Create a string object, a variable (s), and a reference from s to
#    the string object.
s = "The dog is hungry."
s, type(s)

In [None]:
# Using the default print function
print(s)

In [None]:
# Individual elements of the string
s[0], s[1], s[2], s[12]

In [None]:
# string length ... Why doesn't s[len(s)] == '.'?
len(s)
#s[len(s)]

In [None]:
# Strings are immutable
s[0] = 'x'
# But what if I did s = 'x' rather than s[0] = 'x'?
# s = 'x'

In [None]:
# slices - s[i:j] - give me everything from i up to (but not including) j
s[4:9]

In [None]:
a = 6
b = len(s) - 1
s[a:b]

In [None]:
# If the intial number is blank, start at the front,
#   if the second is blank, go to the end.
print(s[:3])
print(s[3:])

In [None]:
# concatenation
print(s + " So give her some food.")
# note that this does not change s
print(s)

In [None]:
# but what if I assinged the new object to s
s = s + " So give her some food."
print(s)
# Why?

In [None]:
# find substring
# Back to the original
s = "The dog is hungry."
s1 = s + " So give her some food."
s1.find('gry')

In [None]:
s1[s1.find('gry'):]

In [None]:
# replace
print(s1.replace('dog', 'cat'))
# note that this creates a new object -- it doesn't change s1 
print(s1)
# But if I did s1 = s1.replace('dog', 'cat'), what would happen?  Why?

In [None]:
# split - splits a string into substrings
s2 = "eight, nine, 12, seventy, four"
s2.split(',')

In [None]:
# chaining functions/methods
(s + " So give her some food.").split()

In [None]:
# Tokenize the string -- separate the string into "tokens" - in this case, a token
# is a word.
# the easiest way to understand this is to go inside-out and left-to-right
(s + " So give her some food.").replace('.','').split()
# Python-eque code makes significant use of chaining -- can be confusing to beginners.

**Dealing with strings is a very important part of Python programming - especially in data science-related applications.**

Be sure to have a look at Common string operations - https://docs.python.org/3.3/library/string.html and experiment/practice.

### Formatting strings: the format() method and f-strings

https://docs.python.org/3.1/library/string.html#format-specification-mini-language

In [None]:
a= 'Jim'
b = 'Carl'
c = 'Nancy'
"{:}, {:}, and {:} are going on a trip".format(a, b, c)

In [None]:
# Can use modifiers to format the variable display
salary = 122000
"{:}'s salary is ${:} per year.".format(c, salary)

In [None]:
# Fancier number (currency) formatting
"{:}'s salary is ${:,.2f} per year.".format(c, salary)

In [None]:
# use a string format to control the output format.  Note
# that this creates a string object
math.pi, "{:.4f}".format(math.pi)

In [None]:
# Can also use the round() function to achieve a similar
# result, but this function returns a numeric rather than string object
round(math.pi, 4)

In [None]:
# Notice that format() creates a string whereas round "keeps" the float.
type("{:.4f}".format(math.pi)), type(round(math.pi,4))

Note that the preceeding are just *examples* -- see the docs for all the details (https://docs.python.org/3.1/library/string.html#format-specification-mini-language).  In most of your **homework assignments** we will ask you to use "user-friendly" formats for model outputs.

#### Formatted String Literals -- f-strings

Introduced in Python 3.6 - https://peps.python.org/pep-0498/.  The behavior is the similar to the format() function and they can be easier to read and create.

In [None]:
# f-strings - Formatted string literals
print(f"{c}'s salary is ${salary:,.2f} per year.")

# Compare this version to the format() version
print("{:}'s salary is ${:,.2f} per year.".format(c, salary))
#
# You decide which is easier for you, but it often important to have
# user-friently formatting :-).

In [None]:
# Make sure that the math module has been imported or uncomment the next line
import math
str = f"Long value of pi: {math.pi}; and a shorter value of pi: {math.pi:.3f}"
str