# SMU Master of Science (Economics) Programming Workshop in Python


## Introduction
This is an introductory course/workshop in Programming with Python, aimed at achieving the following objectives:
1. Give a brief introduction/overview of Python
2. Equip students with foundational programming knowledge for the MSc Programme in Economics

Throughout the duration of the course, we will learn programming using a application-based approach. For most of the course, students will be required to write code (given in the form of in-class assignments) - this is probably the best way to learn coding/programming. More often than not, we are not interested in whether you got the answer right or wrong; indeed, the logic behind the code is more important. 

#### Class Breakdown
The specific breakdown for the class (which runs for 195 minutes) is given below (subjected to change, of course):

1. Lesson Material - 50 minutes
2. Break - 15 minutes
3. Lesson Material - 50 minutes
4. Break - 15 minutes
5. Lesson Material - 50 minutes

Please feel free to ask any burning questions you may have during lesson time; if we happen to have some time left after the class, we can jump right in to the next chapter. We can also include additional chapters!

#### Course Outline
The course will be split into the following modules:

1. Introduction, Syntax, Operations and Data Structures
2. Loops, Conditional Statements, Functions and Classes
3. NumPy and Pandas - Data Cleaning, Reading and Manipulation
4. Matplotlib and Seaborn - Data Plotting
5. Regular Expressions and Text Analysis
6. Advanced Topics in Programming - Machine Learning and Natural Language Processing

In today's class, we will be covering the first module on syntax, operations and data structures.

## Basic Syntax and Operations

In the early days of programming, users have to key in specific code that the machine can interpret for any action to be taken by the machine. For example, Fortran and C are 2 languages that are of a "low-level"; that is, they are languages that can easily be interpreted by the computer (think of it as speaking directly to the computer):

<img src="images/fortran_code.jpg">

However, in recent times, for sake of simplicity and convenience, new languages, such as R and Python, tend to be of a "higher level"; that is, they have a build-in interpreter that acts like a "translator" for the computer that takes what we key in as the inputs and outputs machine-readable code.

One of the benefits of Python and R therefore stems from the fact that it is easily readable and understood, but comes as the expense of slightly slower computation timings. As long as we are not dealing with *Big Data*, this shouldn't come as an issue, however. 

In this chapter, we will start off with a quick introduction to syntax (grammar) and operations in Python. As a start, we will be using Python both as a way to store things that we are interested in storing, and a calculator.

Through these examples, we will be using simple functions such as print, type, int and float.

<img src="images/python_code.jpeg">

In the following example, we assign the value (aka variable assignment) 200 * 17 to a variable, "x", and then we use the print function to print its value. Note that we can also assign strings (i.e. text, but we have to use quotation marks) to variables.

In [3]:
# Problem 1 (Note that to make comments in Python, we can include a #)
x = 200 * 17
print(x) 

y = "Hello World."
print (y)

3400
Hello World.


Note that in the above code block, there are 2 lines of output. The function, `print`, takes an argument, and prints it out. Below, we can use the `print` function along with the `type` function, to learn more about the type of the variables.

In [4]:
print(type(x))
print(type(y))

<class 'int'>
<class 'str'>


Not surprisingly, '3400' was assigned to the type, `int` while 'Hello World' was assigned to the type `str`. There are many different data types in Python, but the ones which we will be dealing with mainly are the following:

1. int - integers
2. float - floating point (decimals)
3. str - string variables

After seeing those examples, you may have the following questions in mind:
1. Can we add a string to an integer?
2. Can we add a string to a string?
3. Can we do multiplication with a string?

As it turns out, one of the best ways to understand more about programming is to **actively** code. Another way to learn more about programming is to learn to ask the right questions (to the internet). Below, we answer the 3 questions in sequential order.

In [5]:
# Question 1
x + y

TypeError: unsupported operand type(s) for +: 'int' and 'str'

As it turns out, we cannot do so. This is also our first example of an error; in this case, a TypeError. To learn more about the error, we can look at the text that comes after the error: essentially, the error says that we cannot add integers and strings together. However, there is a way to circumvent this, but we digress from this question, and move on to the second question: can we add a string to a string?

In [6]:
y + y

'Hello World.Hello World.'

It turns out that we can add a string to a string. However, note that since we did not assign another variable to store the value returned by the addition of the variables, we cannot retrieve the values. This implies the following: whenever you're interested in a result, or think that you may need it again, save it! This reduces the complexity of your program, since you do not have to run the same function over and over again. 

---
##### Naming Conventions and Comments
As an aside, it is pretty important to know how to name your variable. Suppose you're naming a variable which contains the mean of the GDP variable, one sensible name will be to name it **gdp_mean**, instead of **x**. While this distinction seems pretty trivial right now, when you begin to write longer chunks of code, naming conventions make it easy for other readers to know what you're doing, and for you to follow what you're actually doing.

In addition, writing comments also help your readers understand what each block of code (be it a loop, function or conditional statements) are doing. As you begin to write more and more code, and begin to call your previously defined functions, it becomes very difficult for your readers (and you) to follow what's going on. Writing comments can often help to reduce the reading burden of your readers.

---

And back to the main course. Adding 2 strings together essentially implies appending them to one another. Because of this, we can also do multiplication with a string.

In [7]:
3 * y

'Hello World.Hello World.Hello World.'

In what follows, I will show basic mathematical operations for integers in Python, including:

1. Addition
2. Subtraction
3. Multiplication
4. Division
5. Exponential

In [8]:
3 + 6

9

In [9]:
4 - 7

-3

In [10]:
5 * 3

15

In [11]:
j = 5 / 3
print(type(j), j)

<class 'float'> 1.6666666666666667


This is our first instance of a floating point variable. Floating point work the same way as integers; we can carry out multiplication, addition, division etc. on them. However, when we try to convert a floating point to an integer, Python rounds it down. We will learn more about type conversion in the next section.

In [12]:
int(j)

1

In [13]:
2 ** 3

8

### Type Conversion

Previously, we mentioned that we can circumvent the problem of adding an integer to a string. In practice, one can convert the integer type variable to a string type. One way to do so is to use the "str()" function. On the contrary, however, you cannot use the "int()" function on texts, just numbers in strings. Below, we show some examples of what can be done, and what cannot be done.

In [14]:
k = str(x)
print(type(k))

<class 'str'>


In [15]:
z = int(k)
print(type(z))

<class 'int'>


In [16]:
int(y)

ValueError: invalid literal for int() with base 10: 'Hello World.'

Here, we have the second error: a ValueError. The text that follows after the error says that Python was unable to convert a text, in this case, "Hello World", to an integer. No surprises here.

So far, we have seen quite a number of data types. How do we store these data types in a structure that is easily accessible, yet manipulable? 

It turns out that there are many different types of data structures that provides us with a schema for doing so. For example, we have ordered lists, tuples and more. Listed below are typical data structures used:

1. List (ordered list)
2. Sets
3. Dictionaries (hash-tables; key-value pairs)
4. Tuples

In this subsection, we learn more about the first 3 types of data structures.

### Lists
In what follows, we give a somewhat brief summary of what lists are. Essentially, a list is a collection which is **ordered** and **changeable**, and allows for duplicate members (taken from [here](https://www.w3schools.com/python/python_lists.asp)), and they can be constructed using the square brackets, [ ] or the function, list().

That is, we can change the elements in the list - add new members to a list, remove members from a list and switch elements within the list.

In [14]:
x = []
type(x)

list

One interesting aspect of Python is that it uses zero-indexing i.e. the first element in the list is denoted by the index 0 instead of 1.

In [15]:
x = ["Alex", "Bob", "Charlie"] # assign the variable x to a list containing 3 names
print("The first element in the list is", x[0] + ".")

The first element in the list is Alex.


Similar to strings, when we add the contents of 2 lists together, Python appends the list, `y`, to the back of list `x`. Thus, the order of the operation matters: x + y gives a different output compared to y + x. Similar to strings as well, we can also multiply lists.

In [19]:
y = ["Diane", "Elaine"] # assigns the variable to a list containing 2 names
print(x+y)
print(len(x+y))

['Alex', 'Bob', 'Charlie', 'Diane', 'Elaine']
5


In [18]:
x * 2

['Alex', 'Bob', 'Charlie', 'Alex', 'Bob', 'Charlie']

Lists are a simple way of storing and extracting information. One can easily verify whether an element is in a list, using the list method, `in`. Python checks whether the element is in the list, and returns True if the element is in the list and False otherwise. However, note that this method is sensitive to capital letters, as evident from the following examples.

Note: There is a fundamental difference between methods and functions. (To be discussed)

In [20]:
"Alex" in x

True

In [21]:
"Bobby" in x

False

In [22]:
"alex" in x

False

There are many other methods that we can use for lists, but we will not go through all of them (you can check them out [here](https://www.programiz.com/python-programming/methods/list)). We will go through some of them instead:

1. append
2. extend
3. insert
4. remove
5. reverse
6. sort

Note that these methods are non-reversible. After applying these methods, the list gets altered.

In [24]:
# The list method, append, add an element to the list (note than an element of a list can be a list itself)
print(x)
x.append(["Daniel", "Edgar"])
print(x)

['Alex', 'Bob', 'Charlie', ['Daniel', 'Edgar']]
['Alex', 'Bob', 'Charlie', ['Daniel', 'Edgar'], ['Daniel', 'Edgar']]


In [25]:
# The list method, extend, extends the list to include the elements of another list
print(x, y)
x.extend(y)
print(x)

['Alex', 'Bob', 'Charlie', ['Daniel', 'Edgar'], ['Daniel', 'Edgar']] ['Diane', 'Elaine']
['Alex', 'Bob', 'Charlie', ['Daniel', 'Edgar'], ['Daniel', 'Edgar'], 'Diane', 'Elaine']


In [26]:
# The list method, insert, inserts an element to the list (requires 2 arguments)
x.insert(0, "James")
print(x)

['James', 'Alex', 'Bob', 'Charlie', ['Daniel', 'Edgar'], ['Daniel', 'Edgar'], 'Diane', 'Elaine']


In [27]:
# The list method, remove, removes an element from the list
x.remove('Diane')
print(x)

['James', 'Alex', 'Bob', 'Charlie', ['Daniel', 'Edgar'], ['Daniel', 'Edgar'], 'Elaine']


In [28]:
# The list method, reverse, reverses the entire list index
x.reverse()
print(x)

['Elaine', ['Daniel', 'Edgar'], ['Daniel', 'Edgar'], 'Charlie', 'Bob', 'Alex', 'James']


In [29]:
# The list method, sort, sorts the list by numerical or alphabetical order
x.sort()

TypeError: '<' not supported between instances of 'list' and 'str'

As it turns out, you cannot sort a list that contains both lists and strings. 

In-class assignment:

Can you sort a list if it has both integers and strings? Use an example to show this.

In [30]:
# Your code here

In [31]:
x.remove(['Daniel', 'Edgar'])
x.sort()
print(x)

TypeError: '<' not supported between instances of 'list' and 'str'

Now that we have some experience with lists, here's a quick in-class assignment:

1. Create 2 lists: one that contains the numbers, 3, 5, 7 and another that contains strings "Helen", "Jake", "Betty".
2. Add the integer 4 to the first string such that it is the second element in the first list, and the string "Sarah" to the second one such that it is the first element in the second string.
3. Sort both lists in reverse order (i.e. first string should return [7, 5, 4, 3])
4. Append the second list to the first list.
5. Check if you can sort them.

Note: there are many ways to do this problem.

In [32]:
# Your code here

### Sets

A set is a collection which is unordered and unindexed, and does not allow for duplicate members. That is, it only contains **unique** values. To create a set, we can use the curly brackets, { }, or the function, set().

In [33]:
set_a = set()
print(type(set_a))
set_a = {"Apples", "Bananas", "Oranges", "Pears"}

<class 'set'>


Similar to lists, one can check whether an element is in the set using the "in" function, but note that it is case-sensitive.

In [34]:
print("Bananas" in set_a)
print("bananas" in set_a)

True
False


To add an element to the set, we can use the set method, "add". To add more than one element, we can use the set method, "update". To check how long the list is, we can use the function, len().

In [35]:
set_a.add("Watermelons")
set_a

{'Apples', 'Bananas', 'Oranges', 'Pears', 'Watermelons'}

In [36]:
# Note that the elements in the set may not retain their index
set_b = {"Mangoes", "Durians"}
set_a.update(set_b)
set_a

{'Apples', 'Bananas', 'Durians', 'Mangoes', 'Oranges', 'Pears', 'Watermelons'}

In [37]:
len(set_a)

7

In [38]:
set_a.add("Apples")
set_a

{'Apples', 'Bananas', 'Durians', 'Mangoes', 'Oranges', 'Pears', 'Watermelons'}

In [39]:
len(set_a)

7

As it turns out, adding the string, "Apple" does not change the set in any way, since the string **already** appears in the set. This is one of the key differences between sets and lists. Another difference (as we have previously discusssed) is that sets may not retain their structure as they are constructed, but lists do.

---

To remove an element from the set, we can use the set method, "remove". However, if an element does not exist in the set, this will raise an error. For that reason, some prefer to use the set method, "discard". Below, we show both methods.

In [40]:
set_a.remove("Jackfruits")

KeyError: 'Jackfruits'

In [41]:
set_a.discard("Jackfruits")
print(set_a)

{'Oranges', 'Pears', 'Mangoes', 'Durians', 'Apples', 'Watermelons', 'Bananas'}


To learn more about sets, you can refer to the link [here](https://www.w3schools.com/python/python_sets.asp).

### Dictionaries

A dictionary is a collection which is unordered, changeable and indexed. In Python dictionaries are written with curly brackets _{ }_, and they have keys and values. Dictionaries are perhaps the most useful of all data structures, since we can store 2 types of information with one data structure.

In [42]:
# We begin by defining a dictionary, height, that contains 6 key-value pairs (name to height)

height = {
    "John": 175,
    "Marie": 165,
    "Jack": 190,
    "Stacy": 177,
    "Dana": 152,
    "Jackson": 168
}

In [43]:
# To access the height (value) of a certain individual, we can use the key (name of the individual)
height['Dana']

152

In [44]:
# To check if a name is in the dictionary, we can use the "in" function
'Stacy' in height

True

In [45]:
# To check the names of the dictionary, we can use the "keys" method
[key for key in height.keys()]

['John', 'Marie', 'Jack', 'Stacy', 'Dana', 'Jackson']

In [46]:
# Similarly, we can do the same for values using the "values" method
[value for value in height.values()]

[175, 165, 190, 177, 152, 168]

In [47]:
# We can also change the values of a specific key. For example, suppose Dana grew taller by 5cm this year.
height['Dana'] += 5
height['Dana']

157

In [48]:
# In addition, we can add new keys to the dictionary
height["Luke"] = 180
height

{'Dana': 157,
 'Jack': 190,
 'Jackson': 168,
 'John': 175,
 'Luke': 180,
 'Marie': 165,
 'Stacy': 177}

In [49]:
len(height)

7

In [50]:
# To remove a key-value pair from the dictionary, we can use the "pop" method
height.pop("Jackson")
height

{'Dana': 157,
 'Jack': 190,
 'John': 175,
 'Luke': 180,
 'Marie': 165,
 'Stacy': 177}

In [51]:
# Alternatively, one can use the del (which stands for delete) function
del height['Jack']
height

{'Dana': 157, 'John': 175, 'Luke': 180, 'Marie': 165, 'Stacy': 177}

In-class assignment:

1. Create a dictionary, called "weight", using the follow dataset:

|Name  | Weight|
|------|-------|
|Jane  | 46.0  |
|John  | 75.2  |
|Tina  | 50.2  |
|Lena  | 48.5  |
|Kane  | 78.2  |
|Ryan  | 69.7  |

2. Suppose data for a new person (Kate) is available, and she weighs 43.0 kg. Add this information in.
3. What is the average weight for the males in the group? (Hint: the mean function will be useful: mean( ))
4. What about the average weight for the females?

Suppose information on the height of the individuals is given:

|Name  | Height |
|------|------- |
|Jane  | 155.0  |
|John  | 181.2  |
|Tina  | 172.6  |
|Lena  | 162.3  |
|Kane  | 174.8  |
|Ryan  | 172.3  |
|Kate  | 151.8  |

1. Create another dictionary, "height" using the above dataset. Note that with the inclusion of this dataset, we have information on the height and weight of the 7 individuals.
2. Create a new dictionary, "data" which contains 2 keys (height and weight), where each key returns a value (the dictionary).

In [47]:
# Please do your in-class assignment here


### Application of Data Structures (Prelude to the next module)
In this section of the class, I will show an example of why lists and dictionaries are so important and powerful in computing. Consider the following problem, where we have a specific text, and we need to find the number of times each word appears in the text (suppose we have 10,000 texts with 10,000 words each). This may be because we expect words that appear more frequently to be more important or provide more information about the text.

In this case, although we are dealing with only 1 text, it is possible to loop over the 10,000 texts, after we have written code to solve for 1 text. This is one of the best approaches towards problem solving (in computing, at least): break the problem down into many segments, and solve for each segment. Then, combine the solutions and see if we can find a "global" solution.

Here, you will also see a glimpse of function definition and loops (both to be studied in the second module).

Let's proceed to work on converting our letters to lower-case, and remove punctuations.

In [49]:
# Read and save text data to variable (text)
f = open("text/intro.txt", "r")
text = f.read()
print(text[:100])
f.close()


Why should you learn to write programs?

Writing programs (or programming) is a very creative 
and 


In [50]:
# Data cleaning using a function
def data_cleaning(text):
    '''
    This function strips input of whitespaces, converts it into lower case and removes all punctuation from input.
    In addition, it returns the text as a list by splitting on "spaces".
    
    Input: str
    Output: list
    '''
    text = text.strip() # Remove whitespaces
    text = text.lower() # Convert to lower-case

    # Remove punctuations
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
    newtext = ""
    for char in text:
        if char in punctuations: continue
        else: newtext += char

    text_list = newtext.split()
    return text_list

text_list = data_cleaning(text)

In [51]:
text_list[:10]

['why',
 'should',
 'you',
 'learn',
 'to',
 'write',
 'programs',
 'writing',
 'programs',
 'or']

In [52]:
# Create a dictionary with counts
word_count = {}

for word in text_list:
    if word in word_count.keys(): word_count[word] += 1
    else: word_count[word] = 1

In [53]:
# Sort dictionary by number of times a word appears
import operator
sorted_word_count = sorted(word_count.items(), key=operator.itemgetter(1), reverse=True)

In [54]:
# Get top 50 words
sorted_word_count[:50]

[('the', 249),
 ('to', 205),
 ('a', 172),
 ('and', 164),
 ('you', 152),
 ('is', 114),
 ('of', 103),
 ('python', 103),
 ('in', 81),
 ('it', 76),
 ('that', 73),
 ('we', 67),
 ('for', 52),
 ('are', 43),
 ('be', 42),
 ('program', 41),
 ('language', 41),
 ('your', 39),
 ('with', 37),
 ('will', 34),
 ('as', 31),
 ('at', 31),
 ('or', 29),
 ('have', 27),
 ('programs', 26),
 ('our', 25),
 ('when', 25),
 ('this', 24),
 ('but', 24),
 ('write', 23),
 ('on', 23),
 ('can', 22),
 ('what', 21),
 ('computer', 21),
 ('very', 20),
 ('how', 20),
 ('programming', 19),
 ('these', 19),
 ('use', 19),
 ('words', 18),
 ('not', 18),
 ('more', 18),
 ('from', 17),
 ('so', 17),
 ('machine', 17),
 ('need', 16),
 ('problem', 15),
 ('do', 15),
 ('if', 15),
 ('word', 15)]

## Conclusion

At the end of this lesson, we have discussed the following:

1. Different types of data types
2. Different types of data structures
3. How to use these data types and structures to store data we are interested in