## Chapter 8: Python Programming

Python is an open-source programming language, widely used for Web applications, data mining, and machine learning. Depending on the category (most in demand, most popular, best to know, and so on), Python is one of the top programming languages. Python is an interpreted programming language for which the Python interpreter processes Python code one statement at a time, as opposed to a compiled language (such as Java or C#), for which the language compiler converts the program’s source code into an executable file. An advantage of Python’s interpreted environment is that you can execute Python statements one at a time, interactively using the interpreter. 

Some of the scripts presented in this notebook use several Python libraries which have been pre-installed for you. If you had been required to install these libaries on your own, you would issue the following commands:

```python
! pip install --user pandas
! pip install --user matplotlib
! pip install --user numpy
```

# Python Variables are Dynamic

Unlike strongly-typed programming languages, such as Java and C#, for which you must declare a variable before you use it, specifying the variable’s data type and name, Python variables are dynamic, meaning, you simply refer to a variable’s name in order to use it. The following statements illustrate the use of variables within Python. As you can see, the statements use a variable named value in three different ways:

In [None]:
#####################################
# Chapter 8 (Python) / Deliverable 1
#####################################

value = 1001
print(value)
value = "Hello, world!"
print(value)
value = 3.14
print("Area equals: ", 5*5*value)

# Python Lists

An array is a data structure that contains multiple values of the same type within a single variable. Arrays can be one dimensional or multidimensional. To access the values in an array, you use an index value which you specify within left and right brackets []. Like most programming languages, the first element in a Python array is at the index location zero (array[0]), which developers call zero-based indexing. 

Python does not have an array data structure, per se, but instead uses lists. The following statement, for example, creates a list of values:

numbers = [1, 5, 10]

In this case, to display the values, you would use the index values 0, 1, and 2:

In [None]:
numbers = [1, 5, 10]
print(numbers[0])
print(numbers[1])
print(numbers[2])

In a similar way, the following statement creates a list of days:

In [None]:
days = ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]
print(days[0])
print(days[3])

A Python list can have multiple dimensions. The following statement creates a two-dimensional list of values:

values = [[10, 20, 30], [40, 50, 60], [70, 80, 90]]

You can visualize this list as a table.

The following script, 2DList.py, creates a two-dimensional list and then uses a for loop (discussed later in this notebook) to display the values:

In [None]:
values = [[10, 20, 30], [40, 50, 60], [70, 80, 90]]

# print lists
for index in range(0, 3):
  print(values[index])

# print individual values
for row in range(0, 3):
  for column in range(0, 3):
    print(values[row][column])

As you can see, the first for loop displays the row values and the second for loop displays the individual values.
When you use for and while loops to manipulate lists, you will often want to know the length of the list. To do so, you can use the Python len function:

count = len(someList)

To determine the number of rows and columns in a two-dimensional list, you would use:

rows = len(someList)
columns = len(someList[0])

# Python Groups Statements Based Upon Statement Indentation

If you are familiar with programming languages such as Java, JavaScript, or C#, you know that such languages group related statements within left and right braces {}. Python does not use braces, but rather, Python uses indentation to group related statements. The following if-else statement illustrates Python’s use of indentation to group related statements:

In [None]:
#####################################
# Chapter 8 (Python) / Deliverable 2
#####################################

age = 21

if age >= 21:
    print("Going to Vegas!")
    print("Feeling lucky!")
else:
    print("One of these days ...")

If you examine the statement indentation, you will find the if statement has two related statements and the else statement one. You will examine the if-else statements later in this section. For now, however, beyond the statement indentation, note that the Python syntax requires a colon after the if and else.

# Conditional Processing Using if-elif

To provide your scripts with the ability to make decisions, Python provides the *if* construct, the format of which is:
	If condition:
	    statement
	[elif condition:
        statement]
    [else:
        statement]
        
The brackets in the if-construct format indicate that the *elif* and *else* are optional. Note the colon that follows the conditions and the else—you must include the colon. Remember, Python, unlike many other programming languages, does not use braces to group related statements. Instead, Python relies on indentation to indicate the statements that go with the *if*, *elif*, and *else*.

The following Python script, Conditional.py, illustrates the use of these statements:

In [None]:
#####################################
# Chapter 8 (Python) / Deliverable 3
#####################################

pet = "dog"

if pet == 'dog':
  print("Remember to buy dog biscuits")
elif pet == 'cat':
  print("Buy some catnip")
else:
  print("Buy something")

# Python Logical Operators

When you specify a condition within a Python *if* or *while* statement, there will be times when you must test two or more conditions. To allow you to do so, Python provides the *and*, *or*, and *not* logical operators. The *and* operator examines two conditions and results as true if both of the conditions are true, and false otherwise. In contrast, the *or* operator returns true if either condition is true and false only if both conditions are false.

The following Python script, Logical.py, illustrates the use of the Python logical operators:

In [None]:
#####################################
# Chapter 8 (Python) / Deliverable 4
#####################################

day = 'Sunday'
season = 'fall'

if season == 'fall' and day == 'Sunday':
   print('Football season is here!')
else: 
   print('Football season is coming!')

language = 'C#'

if language == 'Python' or language == 'R':
  print('Learn some machine language!')
else:
  print('Learn Python and R')

As you can see, the logical *and* and *or* operators allow you to combine two conditions. Unlike other programming languages, Python does not require you to group your condition within parentheses. 

# Iterative Processing

To let your programs repeat a set of statements, Python provides the *for* and *while* loops. The format of the Python *for* loop is different from that used by Java, C++, C#, and JavaScript. The following Python script, OneToTen.py, uses a *for* loop to display the numbers 1 through 10:

In [None]:
#####################################
# Chapter 8 (Python) / Deliverable 5
#####################################

for i in range(1, 11):
  print(i)

The loop uses the variable i as its control variable. When the loop starts, Python initializes i with a value 1. Within the loop, the code prints the value of i and repeats the loop. Note that the range specified is 1 to 11. The script’s goal is to display the numbers 1 to 10. If the loop were to use range(1, 10), the loop would end when i has the value 10 and the 10 would not be displayed. 

The following Python script, ForList.py, uses a *for* loop to display the elements in a list:

In [None]:
week = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']

for day in week:
    print(day)

In this case, the *for* loop uses the control variable day. Again, when the loop begins, Python will assign the variable day the first item in the list. With each iteration, the loop assigns the variable day the next list value.

Finally, the following Python script, ForDataSet.py, uses a *for* loop to display the rows and columns of a dataset. The script opens the Seattle.csv dataset which contains data about the Seattle housing market. Within the script, the *for* loop uses two control variables. The first is assigned a row index and the second the actual row values. The loop then prints the value of two specific columns:

In [None]:
#####################################
# Chapter 8 (Python) / Deliverable 6
#####################################

import pandas as pd

data = pd.read_csv('Seattle.csv')

for index, row in data.iterrows():
        print(row['price'], row['sqft_living'])

# Looping with While Loops

In addition to the *for* loop, Python also provides a *while* loop, the format of which is:

```python
while (condition):
    statement
```
The following Python script, WhileDemo.py, uses a *while* loop to plot data within two lists:

In [None]:
#####################################
# Chapter 8 (Python) / Deliverable 7
#####################################

import matplotlib.pyplot as plt

x = [35,34,32,37,33,33,31,27,35,34,62,54,57,47,50,57,59,52,61,47,50,48,39,40, 45,47,39,44,50,48]

y = [79,54,52,77,59,74,73,57,69,75,51,32,40,47,53,36,35,58,59,50,23,22,13,14, 22,7,29,25,9,8]

index = 0
while index < len(x):  
   plt.scatter(x[index], y[index], marker='x', color='red')
   index += 1

plt.show()

# Continue and Break Statements

Like most programming languages, Python supports the *continue* and *break* statements. When Python encounters a continue statement within a *for* or *while* loop, Python will immediately branch to the next iteration of the loop. The following Python script, ShowOdd.py, uses the continue statement to display only odd values:

In [None]:
#####################################
# Chapter 8 (Python) / Deliverable 8
#####################################

for i in range(0, 10):
  if i % 2 == 0:
    continue
  print(i)

The script uses the *modulo* (%) operator to get the remainder of the variable after it has been divided by 2. For even values, the remainder will be 0, which causes the code to continue with the next iteration of the loop. For odd values, the result of the *modulo* operation will be 1, which fails the *if* condition, and so moves on to the line following, which prints the next value.

When Python encounters a *break* statement within a *for* or *while* loop, Python will immediately end the loop’s processing, continuing the script’s execution at the statement that follows the loop. The following Python script, StopAt5.py, illustrates the use of the break statement to end the loop’s processing when the value 5 is encountered:

In [None]:
for i in range(0, 10):
  if i == 5:
    break; 
  print(i)

Although *continue* and *break* are common to most programming languages, in most instances, you can refactor code that uses them to create a more structured solution. For example, the following code displays the odd values without the need of the *continue* statement:

In [None]:
for i in range(0, 10):
  if i % 2:
     print(i)

In programming, a result of *true* has a value of 1, and a result of *false* has a value of 0. Therefore, this *if* statement can be read as "if the remainder of this division is 1, the condition is true; if 0, this condition is false." This is why an == operator is unnecessary here.

# Python Supports Functions

Python is a procedural programming language in that it lets you define and later call functions to perform specific tasks. To create a function within Python, you use the *def* keyword identifying the function name and use indentation to group the function statements. The following Python script, TwoFunctions.py, creates and calls two functions:

In [None]:
#####################################
# Chapter 8 (Python) / Deliverable 9
#####################################

def Hello(): 
    print("Hello, world!")

def AddValues(a, b):
    return(a + b)

# call the functions
Hello()
print("The sum of 1 + 2 is: ", AddValues(1, 2))

As you can see, the script first defines a function named Hello that contains one print statement, which displays the message "Hello, world!" Note the colon that follows the function declaration. The Hello function does not receive any parameters, which you indicate by following the function name with the empty parenthesis.

The script also defines the AddValues function, which receives two parameter values that Python will assign to the local variables a and b. The function uses the return statement to return the sum of the two parameter values.

Within the main program, the script first calls the Hello function by referring to the function name. When you call a function you must include the parenthesis even if the function does not have any parameter values. To call the AddValues function, the script again refers to the function name, this time including values for each parameter within the parenthesis.

Many Python functions will make use of the default parameter values, meaning if you don’t specify a value for the parameter when you call the function, Python will use the default.

When you create your own functions using *def*, you can specify a parameter’s default values by using the assignment operator within the parameter list, as shown here:

```python
def someFunction(a = 100, b = 200):
    return(a + b)
```
As you can see, to specify the default values, you use the assignment operator to assign a value to one or more of the parameter variables. The following Python script, UseDefault.py, illustrates the use of default parameter values:

In [None]:
######################################
# Chapter 8 (Python) / Deliverable 10
######################################

def someFunction(a = 100, b = 200):
    return(a + b)

print(someFunction())
print(someFunction(500))
print(someFunction(0, 0))
print(someFunction(b=300))

This script calls the function four times, with different parameter values each time. The first uses both default values; the second specifies one parameter by order, such that a = 500; the third specifies two parameters by order, such that a = 0, b = 0; the fourth specifies one parameter by name, such that b = 300.

# Leveraging Python's Built-In Functions

Python provides a large set of built-in functions that you can call simply by referring to the function’s name and providing any needed parameters, such as print(“Hello”). 

The following Python script, BuiltIns.py, illustrates the use of several Python built-in functions:

In [None]:
print('Hello, world')
print(1 + 2 * 5)
print('Power of 5 raised to 2 is', pow(5, 2))
print('Absolute value of -3 is', abs(-3))
print('Sorted list', sorted([3,2,1,5,4]))

# Python is Object Oriented

Python is an object-oriented programming language that lets you create objects from classes that provide attribute values and methods. Objects are specific instances of a class (a "House" may be a class, "yourhouse" may be an object of that class), and methods are essentially functions, except ones that are attached to, and called on, an object. To access an object’s attribute values or to call an object’s methods, you use dot notation:

	objectName.SomeAttribute = 7;
	student.Name = “Bill Smith”;
	student.ShowGrades()

The following Python script, StudentObject.py, defines a student class and then uses it to create student objects:

In [None]:
######################################
# Chapter 8 (Python) / Deliverable 11
######################################

class Student:
     def __init__(self, name, age, gpa):
         self.name = name
         self.age = age
         self.gpa = gpa

     def show(self):
        print('Name:', self.name)
        print('Age:', self.age)
        print('GPA:', self.gpa)

Jim = Student('Jim Smith', 25, 3.6)
Mary = Student('Mary Davis', 22, 4.0)

Jim.show()
Mary.show()

The script uses the *def* statement to define the Student class. Within the class, the script defines the \__init\__ function, which is a Python constructor method, that Python automatically calls each time you create an object (init stands for initializer). In this case, the function assigns 3 values to the object attributes (name, age, and GPA). Within the function, *self* refers to the current object. The script also defines a second function, show, which displays the student’s attribute values. 

To create an object, the script calls the Student method, passing to it the values to assign to the student attributes. As you can see, to refer to object attributes and methods, the script uses the *dot* operator.

# Understanding Python Modules

Python programmers store Python scripts within text files with the .py extension, such as Hello.py. Such a file can be a complete program, or, it can be a module that contains function and class definitions. To use a Python module, within your script, you use the import statement:

import moduleName

The following Python script, Random100.py imports the rand module and then uses the library's randint method to display 100 random values in the range 1 to 25:

In [None]:
import random 

for i in range(0, 100):
  print(random.randint(1, 25))

As you can see, the program refers to functions defined within the module using dot notation: random.randint. 

Depending on the name of the Python module you are importing, there may be times when you want to create an abbreviated alias name to refer to the module. To do so, you include the AS aliasname following your import statement as shown here:

import random as rnd

The following Python script, Rnd100.py, illustrates the use of a module alias:

In [None]:
import random as rnd 

for i in range(0, 99):
  print(rnd.randint(1, 25))

As you have seen, when you import a Python module using an import statement, you can call the module’s methods using dot notation:

	moduleName.method()
		or
	aliasName.method()

To eliminate your need to use the dot notation, you can use a different form of the import statement:

    from moduleName import functionName

The following Python script, FromImport.py, illustrates the use of this import form:

In [None]:
from random import randint

for i in range(0, 99):
  print(randint(1, 25))

# Understanding Python Dataframe Objects

Considering Python's popularity and utility in the data science community, Python programmers will often make extensive use of dataframe objects to store various datasets. You can think of a dataframe as a two-dimensional container that stores a dataset’s rows and columns. The following Python script, UseDataframe.py, builds several dataframes by hand:

In [None]:
######################################
# Chapter 8 (Python) / Deliverable 12
######################################

from pandas import DataFrame

Data = {
        'x': [35,34,32,37,33,33,31,27,35,34,62,54,57,47,50,57,59,52,61,47,50,48,39,40,45],
        'y': [79,54,52,77,59,74,73,57,69,75,51,32,40,47,53,36,35,58,59,50,23,22,13,14,22]
       }

df = DataFrame(Data,columns=['x','y'])

Sales = {
        'Day': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
        'Count': [79,54,52,77,59]
       }

salesdf = DataFrame(Sales,columns=['Day','Count'])

print(df.head())
print(df.size)
print(df.shape)

print(salesdf.head(7))
print(salesdf.size)
print(salesdf.shape)

As you can see, this script creates two dataframe objects. One dataframe contains only numeric data and the second contains text and numeric data. The script uses the size, shape, and head methods to display the number of elements in the dataframe, the dimensions, and the first five rows of data. 

Most of the Python scripts this book presents will load dataframe object from a CSV file. The following Python script, LoadDataframe.py, loads a dataset with data from the Titanic dataset that contains data for passengers who lived and died on the Titanic. 

As you can see, the script uses the read.csv function to read data from the Titanic.csv file. The script then uses the dataframe *describe* method to display specifics about each column:

In [None]:
######################################
# Chapter 8 (Python) / Deliverable 13
######################################

import pandas as pd
from pandas import DataFrame

df = pd.read_csv('Titanic.csv')
print(df.describe())

# Using the pandas Package for Numerical Processing

The following Python script, UsePandas.py, uses pandas to create a dataframe object, and then to load the Iris.data.csv file, a popular dataset frequently used in data-mining operations, which contains information on three species of the Iris flower. The script then displays several facts about the dataset:

In [None]:
######################################
# Chapter 8 (Python) / Deliverable 14
######################################

import pandas as pd
from pandas import DataFrame

df = pd.read_csv('Iris.data.csv')
print(df.size)
print(df.shape)
print(df.describe())

# Using matplotlib to Create Visualizations

Another staple in the Python library catalogue is the matplotlib package, used extensively in the data-mining community for data visualization. The following script, UseMatPlotLib.py, uses the matplotlib library to create a stackedplot chart:

In [None]:
import numpy as np
import matplotlib.pyplot as plt

N = 5
salesMay = (20, 35, 30, 35, 27)
salesJune = (25, 32, 34, 20, 25)
ind = np.arange(N)
width = 0.35      

p1 = plt.bar(ind, salesMay, width)
p2 = plt.bar(ind, salesJune, width, bottom=salesMay)

plt.ylabel('Sales')
plt.title('Sales by Team Member')
plt.xticks(ind, ('Bob', 'Bill', 'Mary', 'Gina', 'Wally'))
plt.legend((p1[0], p2[0]), ('May', 'June'))

plt.show()

# Using the numpy Package for Numerical Calculations

The following Python script, UseNumpy.py, uses numpy to create a 10x2 array of random values. The script then uses the matplotlib library previously discussed to create a scatter chart of the values:

In [None]:
import numpy as np
import matplotlib.pyplot as plt

x = np.random.rand(10,1)
y = np.random.rand(10,1)

plt.title('Random Numpy Values')
plt.plot(x,y)

plt.show()

# Using the sklearn Package for Data Mining and Machine Learning

Python data-mining and machine-learning scripts make extensive use of the scikit-learn library. The scikit-learn library contains functions your code will use to perform cluster analysis, data classification, linear regression for prediction, and more. 

The following Python script, UseSklearn.py, uses the library to cluster a dataframe’s values:

In [None]:
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from pandas import DataFrame

Data = {
        'x': [35,34,32,37,33,33,31,27,35,34,62,54,57,47,50,57,59,52,61,47,50,48,39,40,45,47,39,44,50,48],
        'y': [79,54,52,77,59,74,73,57,69,75,51,32,40,47,53,36,35,58,59,50,23,22,13,14,22,7,29,25,9,8]
       }

df = DataFrame(Data,columns=['x','y'])

kmeans = KMeans(n_clusters=3).fit(df)
centroids = kmeans.cluster_centers_
plt.title('Clustering with Sklearn')
plt.scatter(df['x'], df['y'], c=kmeans.labels_.astype(float))
plt.scatter(centroids[:, 0], centroids[:, 1], c='red')
plt.show()