# Module 2 Part 1: Introduction to Python

## Introduction

This module is designed as an introduction to the Python programming language. It covers the basic syntax of Python, main data types and most used data collections with examples.

This module consists of 3 parts:

- **Part 1** - Introduction to Python.
- **Part 2** - Python Strings and Lists.
- **Part 3** - Python Tuples, Dictionaries, Reading data from a file, Formatting print output.

Each part is in a separate notebook. It is recommended to follow the order of the notebooks from Part 1 to Part 3.

## Learning outcomes

In this module, you will learn and practice:
- basic Python syntax;
- main data types in Python;
- variables and expressions;
- Python modules and built-in functions;
- how to design function in Python;
- Python strings, lists, tuples and dictionaries;
- how to work with files in Python.

## Readings and Resources

The majority of the notebook content draws from the recommended readings. We invite you to further supplement this notebook with the following recommended texts.

Downey, A. (2015). *Think Python. How to Think Like a Computer Scientist*. O'Reilly Media.   
**NOTE:** This book is also available online as a Free Book from Green Tea Press and can be retrieved from [http://greenteapress.com/wp/think-python-2e/](http://greenteapress.com/wp/think-python-2e/).
Green Tea Press.


McKinney, W. (2017). *Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython* (2nd edition). O'Reilly Media.


VanderPlas, Jake (2016). *Python Data Science Handbook. Essential Tools for Working with Data*. O'Reilly Media.    
**NOTE:** This books is also available online on author's GitHub page and can be retrieved from [https://jakevdp.github.io/PythonDataScienceHandbook/](https://jakevdp.github.io/PythonDataScienceHandbook/).

## Defining a Problem and Preparing the Data 

<figure>
    <img src="dataScienceProcess.png" alt="This image shows the stages of creating a model from start to finish." style="width: 100%;"/>    
    <figcaption><em>This image shows the stages of creating a model from start to finish. (Course Authors, 2018)</em></figcaption>
</figure>

Models are used in a variety of industries to understand business problems and predict potential outcomes in order to make data-driven recommendations and decisions. As we saw in the last module, the first step to building a predictive model is to define the problem we are trying to solve. This provides us with the direction required to select and prepare the right data for our model.

### Defining a Problem

Projects often fail in organizations when a problem has not been clearly defined, and all stakeholders are not on the same page. As a data scientist, you will often help a line of business validate and solve a problem, so it is critical for both sides to be on the same page about what the problem is and its potential causes.

When defining the problem, consider some of the following business questions: what is the outcome you're trying to understand (e.g., low revenue, high expenses, lack of customer response to a product, high production error rate)? What are the inputs that may lead to the outcome (e.g., customer profiles, store locations, raw materials)? Which parties are involved (e.g., front-line employees, customers)? What is the time frame of the problem (e.g., over the past year, since product inception)?

Having a clear problem statement will help you build hypothesis to inform your models. You can use the SMART method to build your problem statement. SMART stands for: Specific, Measurable, Action-oriented, Relevant, and Time-bound. Here is an example problem statement following the SMART method:

"Credit card sales in Ontario, our largest division, have decreased 35% since January, despite employing traditional sales tactics at the branch. We need to understand which customer-driven attributes are leading to the sale decline so we can adjust our product, marketing and sales strategy."

As you can see, the problem statement is Specific (about credit card sales in Ontario), Measurable (decrease of 35%), Action-oriented (understanding the drivers will help us re-evaluate our strategy), Relevant (this is our largest business), and Time-bound (this has been happening since January).

You can use analytics to determine the root causes leading to your problem. Once you have an understanding of the drivers which cause the problem, you can use predictive modeling techniques to assess how a change in strategy will influence an outcome.

When the problem has been defined, you can select which data points are relevant to your analysis. For example, if trying to understand a customer base, you may choose to look at information such as age, purchase behavior, product selections, city of residence, and method of purchase.

### Preparing the Data

The first step of data preparation is exploration. The purpose of this is to understand what data is available from which sources and whether or not this data can help you in solving your problem. In this course, we will be using the programming language Python for our data analysis and model development. Before working with real data, in the following sections you will learn the language structure and basic syntax for common functions. This will help you apply the appropriate functions depending on different data types, for example numeric vs. alphanumeric data. In part 3 of this module, you will also learn to read and write data to and from a file. This is an important phase of model development, as it will help you identify potential data inefficiencies early in your journey.

## Why Python?

Python is one of the most used programming languages and very popular among data scientists. It is generally considered a first choice for social media data hacking. Indeed, it is widely used in the analysis of sociological data and the use of Python in the financial industry has been increasing rapidly since 2005. 

Increases in Python usage were led largely by the creation and maturation of analytics libraries, such as NumPy and Pandas to allow working with DataFrames, SciPy, and Scikit-learn to provide statistics and machine learning algorithms.   

Python is well-suited as an interactive analysis environment. It can also enable the development of robust systems in a fraction of the time it would have taken in Java or C++. It supports a mixture of procedural, object-oriented, functional, and imperative styles. Python has a reputation of being relatively easy to learn.

## Data types in Python. Variables and assignments.

Python is an interpreted programming language. There are two ways to use the interpreter: __interactive__ and __script__ modes.     
In interactive mode, we can type a line of Python code and the interpretor processes it immediately and displays the result.

In [1]:
1 + 2

3

Alternatively, the code can be stored in a file, which sometimes is called a script. The content of the file can be executed with the interpreter. By convention, Python scripts have filenames that end with `.py`. One of the benefits of working with an interpreted language is that you can test bits of code in interactive mode before it is put in a script.

In the code, a variable can be created and assigned to a value. In Python, a variable name can be arbitrarily long and can contain both letters and numbers, but must start with a letter. Both uppercase and lowercase letters can be used, but it is often recommended to start with lowercase. The underscore sign can appear in a name and a name may start with underscore. Quite often, the underscore sign is used between words in long names. This style is sometimes referred to as __snake case__. 

It is important to remember that Python 3 has 35 reserved keywords that cannot be used as variable names. Below is the list of Python 3.6 reserved keywords in their exact spelling (McKinney, 2017):

    False      class      finally    is         return
    None       continue   for        lambda     try
    True       def        from       nonlocal   while
    and        del        global     not        with
    as         elif       if         or         yield
    assert     else       import     pass
    break      except     in         raise
 

The main data types in Python are __integer__, __float__, __string__, and __boolean__. For example, 
- `1`    is an integer
- `1.2`  is a float 
- `"Hello, world!"` is a string
- `"1.2"` is also a string.

Any string of characters enclosed in either single or double quotation marks is a value of string type.

Python is a dynamically-typed language. The type of a variable is set when a value is assigned to it. There is no need to declare a variable's type. Variable assignment to a value is easy and concise. This also allows us to change the type of a variable farther down in the code. The built-in function __type__ returns the type of an object.

Below are some examples of assignments.

In [2]:
radius = 8
pi_number = 3.14

area_of_circle = pi_number * radius**2

print(area_of_circle)
print(type(area_of_circle))

200.96
<class 'float'>


In [3]:
my_string = "Hello, world!"

print(my_string)
print(type(my_string))

Hello, world!
<class 'str'>


In [4]:
a = True

print(a)
print(type(a))

True
<class 'bool'>


In Python, assignment may take more complex form. Below is an example of a __chained__ assignment:

In [5]:
a = b = c = 10

Generally, the chained assignment has the form:   

x0 = x1 = ... = xN = value.

Another form of assignment is y0, y1 = value_1, value_2. In this case, the variable y0 is assigned to the value_1 and the variable y1 is assigned ot value_1.

In [6]:
x, y, z = 10, 'plus', 20

print(x)
print(y)
print(z)

10
plus
20


### Expressions and statements

A __statement__ is a unit of code that the Python interpreter can execute. For example, an assignment is a statement. An **expression** is a combination of values, variables, and operators. A value all by itself is considered an expression. Technically, an expression is also a statement, but it is probably simpler to think of them as different things. The important difference is that an expression has a value; a statement does not.

Python supports the following operators on numbers: addition (+), subtraction (-), multiplication (\*), division (/). Python also supports integer division (//), remainder or modulo (%), and exponentiation (\*\*).

In [7]:
'''Addition:'''

4 + 7

11

In [8]:
'''Subtraction:'''

4 - 7

-3

In [9]:
'''Multiplication:'''

4 * 7

28

In [10]:
'''Division:'''

4 / 7

0.5714285714285714

In [11]:
'''Integer division:'''

4 // 7

0

In [12]:
'''Modulo - divides 2 numbers and returns the remainder'''

4 % 7

4

Below is an example of using integer division and modulo operators.

In [13]:
'''Example: The run time of a movie is 135 minutes.
How long the movie runs, in hours and minutes?
''' 


minutes = 135
print ("The movie runs for {} hour(s) and {} minutes".format(minutes//60, minutes%60))

The movie runs for 2 hour(s) and 15 minutes


In [14]:
'''Exponentiation:'''

4 ** 7

16384

The order of the operators are as you'd expect: 
- Parentheses()
- Exponentiation **
- Multiplication/Division
- Addition/Subtraction

Let's demonstrate:

In [15]:
x = 3

x ** x ** x

7625597484987

In [16]:
(x ** x) ** x

19683

In [17]:
x ** x * x

81

In [18]:
x ** x * x - x

78

Checking if a variable is of a certain type can be achieved by using the **`isinstance()`** Boolean built-in function.

In [19]:
x = 1
isinstance(x, int)

True

### Type conversion

Python offers several built-in functions for type conversion of values: `str()`, `int()`, `float()`.

A string that contains a number can be converted to integer or float type by using the functions `int()` or `float()` respectively. The function `str(arg)` converts passed arguments into a string.

`int(arg)` converts floating-point values to integers by chopping its fractional part and can also be used to convert strings into integers.

In [20]:
int(15.9)

15

In [21]:
int('1')

1

The function `str()` converts passed argument to a string:
str(8) -> '8', str(15.8) -> '15.8'

`bool()` converts any non-zero numerical value or non-empty string to Boolean `True` and 0 or empty string to `False`

In [22]:
str(8)

'8'

In [23]:
str(15.8)

'15.8'

In [24]:
bool(' ')

True

## Functions

We have already seen several built-in functions: `type()`, `print()`, `str()`, `int()`, `float()`.

A **function** is a named sequence of statements that perform a computation. Functions allow us to create a block of statements and make a program smaller by eliminating repetitive code. To define a function, one needs to specify the name and the sequence of computation. The result is called the __return value__. Some functions yield results. Others perform an action (like printing), but do not return a value. These are referred to as **void functions**. 

Once defined, a function can be called by its name (__function call__) any number of times throughout the program. Expressions `type(a)` and `print(type(a))` are examples of function calls we have seen before.

The definition of a new function must start with the keyword **`def`** followed by the function's name and a list of names of parameters in brackets. This first row of code, a function definition, ends with the colon `(:)`. If the function returns a value, the body of the function ends with a __`return`__ statement. 

The body of the function contains statements that will be executed every time the function is called. The scope of the function is specified by indenting the code. The standard indent is 4 spaces.

The general structure of the function looks like this:

     def function_name(param_1, param_2, ...):
         do something with parameters
         ...
         return final_result

## Conditionals

### Conditional execution

Conditional execution allows the execution of code based on certain conditions. The most common conditional statements are: `if`, `else`. The `if` statement will execute associated code if its conditions are true at the time of evaluation, otherwise the `else` code will be executed.

Example:

In [25]:
score = 92

if (score > 50):
    print("You passed!")
else:
    print("You failed.")

You passed!


The expression after `if` should be a boolean expression, i.e. that evaluates to `True` or `False`. Here is a list of the comparison operators that are useful for constructing boolean expressions:  

     x == y   tests for equality      
     x != y   x is not equal y (5 != 6 is True)   
     x > y    greater than            
     x < y    less than              
     x >= y   greater than or equal   
     x <= y   less than or equal  
     x is y   x is the same as y

__NOTE:__ The `is` operator may seem the same as the equality (==) operator. In fact, operator `is` checks if two variables point to the same object, whereas the `==` checks if the values for the two variables are the same. 

Often, more than one comparison is required. Comparison expressions can be combined with logical operators: __and__, __or__, __not__. These have similar meaning in English.

Examples: 
- Expression (`grade > 60 and grade < 80`) - returns `True` if the value is in that range
- `not(grade > 60)` - negates expression in brackets, returns `True` if the grade is __less__ than 60.

Operator `and` returns `True` only if both boolean expressions are True. Otherwise, `False` will be returned.    

(True and True) -> True  
(True and False) -> False  
(False and True) -> False    
(False and False) -> False.

Operator `or` returns `True` if only one of the boolean expressions is `True`:     
(True or True) -> True    
(True or False) -> True    
(False or True) -> True  
(False or False) -> False.

In [26]:
'''Operator `or` returns True if either x or y is True'''

x = True
y = False
x or y

True

### Chained conditionals

**elif** (else if) statements may be used to create additional conditions which will be evaluated in order. There is no limit to the number of `elif` statements.

In [27]:
a = 10
if (a > 10):
    print ("a is greater than 10")
elif (a < 10):
    print ("a is less than 10")
else:
    print ("a is equal to 10")

a is equal to 10


### Nested conditionals

Within a block of code associated with an `if...else` statement, you can add additional `if...else` statements, creating more complex code execution logic.

Example:

In [28]:
age = 31
if (age >= 30):
    if (age < 40):
        print("He is in his 30s")
    else:
        print ("He is not in his 30s")
else:
    print("He is not in his 30s")

He is in his 30s


## Iteration

Variables may be assigned values multiple times. The value of the variable will be updated each time a new value is assigned.

In [29]:
x = 5
print(x)
x = 10
print(x)

5
10


### `while` statement

The **while** statement allows for efficient repetition of code. It first evaluates whether the given condition is `True`:
- If true: executes associated code and returns to re-evaluate condition
- If false: ends the loop and continues to the next statement in the program.

In [30]:
def counter():
    n=0
    while (n < 5):
        print("n is equal to " + str(n)) 
        n += 1

__NOTE:__ The `+=` operator was used in the last example. This operator provides a short way to update a variable.
The augmented assignment statement n += 1 is equivalent to n = n + 1, and is the combination, in a single statement, of a binary operation and an assignment. Other examples of augmented assignment statements are `-=`, `*=`, `/=`, `//=`, `%=`, `**=`.

In [31]:
counter

<function __main__.counter()>

In [32]:
counter()

n is equal to 0
n is equal to 1
n is equal to 2
n is equal to 3
n is equal to 4


### Infinite loops

Infinite loops occur when a loop is written so that the condition can never evaluate to false and thus continues endlessly.

Below is an example of such a loop. Without incrementing the value of `n` at the end of the loop, the `while` statement will always evaluate to `True`. If the program goes into an infinite loop it will appear to get stuck without progressing.

In [33]:
def counter():
    n=0
    while (n < 1):
        print("n is equal to " + str(n))

The __`break`__ statement allows for a loop to be exited prematurely.
This statement can be useful when the programmer doesn't know the number of iterations the loop must make.

In [34]:
def counter(n):
    while (n < 1000):
        if ((n * 11) % 7 == 0):
            print (n)
            break
        else:
            n += 1

counter(15)

21


The __break__ statement allows us to exit a loop, the __continue__ statement can be used to stop the current iteration,
and continue with the next. As for illustration of the difference between __break__ and __continue__ statements, let's imagine a student reading a book on the Python programming language chapter by chapter. Each chapter is an iteration; if, while reading chapter N, the student realizes that he is familiar with the content of chapter N, he would jump to the next chapter. This is an execution of the `continue` statement. If something happens that forces him to stop reading the book (i.e. she descovered that book covers Python 2 not Python 3); in this case the `break` statement is executed.  

Let's consider Newton's method for computing square roots as an example of an algorithm where the number of iterations is not known in advance. To find the square root of number `n` one can start with almost any estimate `x` and then repeatedly improve the estimate with the following formula `(x + n/x)/2`. The number of steps required to get the right answer is unknown, and a `break` statement can be used when the estimate stops changing.


In [35]:
'''The implementation below assumes that
the number used as an argument is a positive number, x > 0.''' 
 

def squareroot(x):
    est = x/2
    while True:
        print(est)
        y = (est + x/est)/2
        if y == est:
            break
        est = y
        
squareroot(256)

128.0
65.0
34.46923076923077
20.94807220229001
16.584383571973717
16.010295955761958
16.000003310579185
16.00000000000034
16.0


A function can call another function. It can even call itself. This is referred to as a __recursive function__.  
As an example of a recursive function let's consider the factorial function.

In [36]:
def factorial(n):
    if n == 0:
        return 1
    else:
        result = n * factorial(n-1)
        return result

* If the argument is 0, the function returns 1 since the factorial of 0 is 1 (0! = 1). If any other integer number is passed, that number will be multiplied by `n * factorial(n - 1)` or `n * (n-1)!`;
* Then, `factorial(n-1)` will call itself and the calculation will be `n * (n-1) * factorial(n-2)` and so on. The function will keep calling itself until 0 is reached. 
* In the end we will get `n \* n-1 \* n-2 \* ... \* 2 \* 1`.

### **EXERCISE 1:** Fibonacci sequence.

Write a function to compute the n-th element of the Fibonacci sequence recursively.  
If you need to familiarize yourself with the Fibonacci sequence, please refer to
https://en.wikipedia.org/wiki/Fibonacci_number (Fibonacci number, n.d.).
The function should take an integer number `n` as an argument and return n-th element of the Fibonacci sequence.
The first two numbers in the sequence are 0 and 1, and each subsequent number is the sum of the previous two.

In [None]:
'''Type your code here:'''






















In [38]:
'''Exercise 1 Solution.

The first two numbers are 0 and 1; so that if n is 0 or 1,
the numbers in Fibonacci sequence are 0 and 1, correspondingly.
For other n we take sum of two preceding numbers in the sequence.'''


def fibonacci(n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)

__End of Part 1.__

This notebook makes up one part of this module. Now that you have completed this part, please proceed to the next notebook in this module.

If you have any questions, please reach out to your peers using the discussion boards. If you and your peers are unable to come to a suitable conclusion, do not hesitate to reach out to your instructor on the designated discussion board.

## References

Hashemi, M. (2014). [https://www.paypal-engineering.com/2014/12/10/10-myths-of-enterprise-python/](https://www.paypal-engineering.com/2014/12/10/10-myths-of-enterprise-python/)

McKinney, W. (2017). *Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython* (2nd edition). O'Reilly Media.


Fibonacci number (n.d.). https://en.wikipedia.org/wiki/Fibonacci_number.