# Intro to Python

## Overview
This Jupyter notebook contains notes from a 2h introductory lesson to programming in Python. The goal of the lesson is not to make you a programmer, but to expose you to concepts and notions that will motivate you to start learning Python.

For questions, suggestions, or other feedback, feel free to email me at joaor [squiggly] stanford [period] edu

## Variable Assignment

In [None]:
# This is a comment. Lines starting with '#' will not be executed by the Python interpreter
# and can be used to annotate our code. Useful for others and for our future selves.

In [None]:
# Variables store information. In their simplest form, computer programs manipulate
# information store in variables to achieve something.
#
# Variables in Python are assigned with <variable name> <equal sign> <value>. The name
# of a variable should be simple and meaningful to the reader. Do not name variables 'i', 'j',
# or 'var1', 'var2'.
#
# To assign text to a variable, enclose it quotes. Single, double, does not matter as long
# as you are coherent.
name = "Stanford"

In [None]:
# We can use the print() function to display the contents of a variable
# Functions in python are always followed by parenthesis that may or may not
# contain values inside (called arguments)

# print() takes as arguments the variable name we wish to display
print(name)

In [None]:
# We can also assign numbers to variables
year = 1891

In [None]:
# And to verify it, we use print() again.
print(year)

In [None]:
# As a dynamically-typed language, Python does not require you to specify types when declaring
# variables - the interpreter infers the type from what you specify as a value. 
# In addition, variables can hold different values of different types during the course of a
# program.
#
# Learning what types there are and what they can be used for is an important piece of knowledge
# to have as a programmer. In Python, the basic variable types are:
# - strings, to represent generic text
# - integers, to represent whole numbers
# - floats, to represent decimal numbers
# - bool, to represent boolean values (True or False)
#
# To check the type of a variable you can use the type() function. As with print(), pass the 
# variable name as an argument.
type(name)  # should print 'str' meaning the variable is a 'string'.

# We make use of a Jupyter notebook feature and avoid the use of print to display the value
# of type(name). In a Python script, you would have to enclose the previous statement inside
# a print() statement, or assign it to a variable and then print it, like this:
#
# type_of_name = type(name)
# print(type_of_name)

In [None]:
type(year)  # should print 'int', meaining the variable is an integer

In [None]:
# We can also use print to display more than one variable at once. Separate your arguments
# by commas and print will display them consecutively, separating each by one space. The print
# function will automatically convert any variable to their string representation in the process.
# Morever, you can give raw values (not assigned to any variable) to print, as below:
print(name, "was founded in", year)

In [None]:
# Python also has a boolean type, which is used in conditionals and other boolean logic
# operations. Booleans in Python are either 'True' or 'False.

In [None]:
True

In [None]:
type(True)

In [None]:
# The capitalization matter. Trying to print 'true' will give an error because that name was
# never defined. Python is case-sensitive. Keep that in mind.
#
# Errors in Python will be quite verbose to try and help you pinpoint the source of the problem.
# Read them in reverse vertical order: last line first. The following example shows that the type
# of error we have is a NameError, followed by what that means in this case. The other lines
# include more information, namely a blurb of your code to help you contextualize where the
# problem is coming from.
#
true

## Operations on Variables

In [None]:
# Depending on the variable type, you can do different operations.
# Take two integers. You can perform any arithmetic operation on them:
# e.g., + (addition), - (subtraction), '*' (multiplication), '/' (division), '**' (exponentiation)
2 + 2

In [None]:
# We can also perform operations with values stored in variables.
year + 2

In [None]:
# However, this does not change the variable value!
print(year)

In [None]:
# You can assign the value of operations to variables
age = 2019 - year
print(age)
# Also, you can have multiple statements inside a cell of a notebook. All of them will be
# evaluated, but only the output of the last line will be displayed below (unless you use print).

In [None]:
# Different variable types work differently with the same operators.
# Adding two strings performs string concatenation: joining them.
name + name

In [None]:
# Similarly, multiplying a string by a number repeats that string n times.
name * 10

In [None]:
# Not all operations work though. And not all types can be mixed.
name - 10

In [None]:
# Another example that you must be careful when mixing types.
name + year

In [None]:
# You can always explicitly convert the variables to a compatible type
# The functions to convert a variable to a given type are named after the type names
# themselves: str(), int(), float(), and bool()
name + str(year)

In [None]:
# Conversions don't always work either. The values must make sense in the new variable type.
# For example, a word cannot be converted to a number. What would it mean?
int(name)

## Reading Files

In [None]:
# Download the following file to wherever you launched this notebook from:
# https://pastebin.com/Trinnt7X
# If you just opened a terminal and typed 'jupyter notebook', or just pressed the jupyter button
# in the Anaconda UI, then you should download the file to your 'home' (or user) directory.
#
# Then, let's assign the file name to a variable, as a string 
fname = "gapminder_lifeexp.txt"

In [None]:
print(fname)

In [None]:
# To open a file, we use the open() function. The arguments to open() are the file name as a
# string and the opening mode. The opening mode can be 'r' (read), 'w' (write), or 'a' (append).
# Careful with 'w', it will erase the contents of your file without warning!
#
# Let's open the file and assign the result of open() to a variable
infile = open(fname, "r")

In [None]:
# When we open a file, the result is a 'handle' to the file. Think of this object as a window
# to the contents of the file.
print(infile)

In [None]:
# You can extract the content of the file line by line using the function readline().
# Unlike print, type, and open, readline() is a function of the file object. As such, we call it
# by writing the file variable, a period, and then readline.
#
# You will find this syntax very often. In Python, all objects (e.g. variables) will have methods
# associated with them. These methods are accessed using this dot notation.
#
# Some functions are special, like print(), in the sense that they are available without this
# dot notation. These are called built-in functions.
#
# Calling readline displays ONE line of the file. Since we just opened the file, it will show
# the first line.
infile.readline()

In [None]:
# Calling it again will display the next line in the file. Importantly, we cannot scrollback when
# using readline (we can, but it's beyond the scope of this intro lesson).
infile.readline()

In [None]:
# To be more useful, we can assign the result of the readline function to a variable
line = infile.readline()
print(line)

In [None]:
# When reading a file as text, regardless of the contents, Python will always store the 
# entire line as a string.
type(line)

In [None]:
# If you notice, when we display the value of the string in Jupyter (without print), we get
# a '\n' at the end of the line. When we use print, these two characters disappear and we get
# instead an empty extra line. That '\n' is a newline character, a special value used by computers
# to represent a line break.
#
# Strings, being objects, have a method called 'strip()' that allows us to remove leading and
# trailing character from the string. You can pass any single character to strip() as its
# argument, and this will remove any instances of that character from the ends of the string. By
# default (if we do not provide arguments) strip() removes whitespace, including newline. 
line.strip()

In [None]:
# Another very useful method of strings is 'split()'. Like 'strip()', 'split()' takes one argument
# and divides the string whenever it finds instances of that character. By default, it splits on
# white space (useful to separate sentences into words). Since we opened a csv (comma-separated)
# file, we will pass as an argument a single comma, as a string.
#
# The result of this operation is a new variable type called a list. Lists (often called arrays
# in other programming languages), are so-called containers because they can store multiple values
# of different types. They are represented by square brackets, where every element of the list is
# separated by a comma.
#
# Jupyter does some formatting for us, putting each element in a new line to make it easier
# to read.
line.split(',')

In [None]:
# Oops, we forgot about the nasty \n.
# Since using .strip() on a string returns another string, we can directly call .split() on this
# new string, avoiding the need to store the intermediate value. These chaining of methods is only
# possible as long as you keep track of what each method returns.
fields = line.strip().split(',')
print(fields)  # this prints the list without fancy Jupyter formatting

In [None]:
# If you try to call strip() again on the list, you will get an error.
# Attribute errors mean that you tried to call a method on the wrong variable type.
fields = line.strip().split(',').strip()

In [None]:
# A small note on variable names. We said before that variables can be named whatever we like.
# This is not necessarily true. Some names are reserved for special values. The boolean values of
# True and False for example, cannot be used as variable names. The same goes for 'if', 'for', 
# 'while', and other keywords.
True = 12

In [None]:
# You can however, assign value to functions. Doing this is VERY BAD as it will cause the original
# meaning (value) of that variable to be lost. Forever.
int = 'banana'

In [None]:
# The following will now fail, because int is just the string 'banana'. The original value of
# int(), the function to convert values to integers, is lost.
int('12')

## Lists and Indexing

In [None]:
# Lists (can) have multiple elements.
print(fields)

In [None]:
# We can access each element individually by using the following notation. The number between
# the square brackets represents the positional index of that element in the list.
fields[1]

In [None]:
# Annoyingly, in Python indexing starts at 0. The first element of a list is therefore
# at position 0:
fields[0]

In [None]:
# We can assign the result of the indexing to a new variable, and we see that it is of type
# string.
continent = fields[0]
print(continent)
type(continent)

In [None]:
# Python also allows you access items in a list in reverse order, by using negative numbers.
# -1 represents the LAST item of a list:
fields[-1]

In [None]:
# Trying to access a position that does not exist, such as a index larger than the number
# of elements in the list, will throw an error:
fields[14]

In [None]:
# Strings, despite not being 'containers', can also be indexed. In this case, the return value
# is the character at that position. So, to extract the first letter of a word, you can do:
continent[0]

In [None]:
# We can use indexing and combine it with any of the other syntax we've seen so far
# For example, let's look at the content of our list again:
print(fields)

In [None]:
# We can, in one line, access the third and last values of the list, convert them to floating
# point numbers, and subtract them to obtain a new value. If you remember the first line of all
# in the file, these columns contained the life expectancy of a particular country in 1952 and 2007
# respectively. This difference gives us the change in life expectancy.
life_exp_change = float(fields[-1]) - float(fields[2])
print(life_exp_change)

In [None]:
# A small note on floating point operations. Numbers in computers are stored with limited
# precision. A computer stores 1/3 as 0.33333 down to a finite number of decimal places. As such,
# there will be sometimes some errors when doing operations with these numbers. Usually very
# small errors (as you see above) but that can accumulate if you perform hundreds/thousands of
# operations consecutively. Keep this in mind. Precision is limited.

In [None]:
# We can combine variable names, raw values, and indexed values in print:
print(fields[1], "has a life expectancy change of", life_exp_change)

## For-loops

In [None]:
# Reading a file line by line, manually, is exhausting and boring. Instead, you can use a so-called
# for-loop to repeat actions.
#
# For example, to print every line in the file.
for line in infile:
    print(line)

In [None]:
# For-loops have a special syntax, all of which matters.
# You define a for-loop by starting a line with the keyword 'for', followed by the name
# of a variable that will represent the value of the for loop at each iteration. This variable
# is 'magically' defined for you - it will be assigned internally in the for loop as it operates.
# Be careful not to name it after any variable you defined before, or you will overwrite its value.
#
# After this name comes the keyword 'in', followed by the container over which we will iterate.
# This container can be something with multiple elements, such as a list of a string. Files are
# special objects that can also be seen as 'containers' of some sorts. Internally, what happens is
# that Python calls the readline() function again and again, as many times as you have lines in the
# file.
#
# For-loops eventually stop when you reach the last item of the container.
#
# The first line of the for-loop ends with a colon (':') and then, in the next line, you start
# writing whatever code you want to be repeated for every item of the container. Very importantly,
# you have to prefix these lines by empty spaces (indentation). Usually Python programmers will use
# 4 spaces. Every single consecutive line that is indented by these 4 spaces will be executed as
# many times as the for loop runs.
#
# Let's see these properties in the following example - iterating over our list 'fields'
for item in fields:
    print(item)

In [None]:
# The variable item, which we never defined (as in item = <something>) now holds the value of the
# last element of the for-loop.
item

In [None]:
# If we define it before, we lose this definition. Also, you can iterate as many times as you want
# over a list or a string.
item = 'apple'
for item in fields:
    print(item)
print(item)  # last value of the for-loop. 'Apple' is gone.

In [None]:
# Files, however, can only be iterated on ONCE. After you go over every line, the file is said to
# be 'exhausted' and you cannot iterate over it anymore. It has no more lines to show you, you saw
# them all already!
for line in infile:
    print(line)

In [None]:
# The solution is to re-open the file.
infile = open('gapminder_lifeexp.txt', 'r')

In [None]:
# Let's now write a for-loop that calculates the life expetancy change for every country in our
# dataset.

In [None]:
# The first line in the file did not contain any data. It was a header file with info on what each
# column held. We do not want to parse this line so we can skip it by forcingly reading it before
# the for-loop. You can assign variables to an underscore character '_' to symbolize they are just
# unimportant. It's a convention. You can also assign them to a variable named 'garbage'.
_ = infile.readline()

In [None]:
# We can now iterate over the remaining lines of the file and do our calculations.
# Let's combine all the tricks we have learned so far.
for line in infile:  # for each line in the file
    fields = line.strip().split(',')  # we make a list 'fields' by separating the line contents by ','. We also remove '\n' from the line.
    
    country = fields[1]  # the second element is the country name
    le_change = float(fields[-1]) - float(fields[2])  # the life expectancy change is calculated like this
    
    print(country, le_change)  # and now we print the result.

In [None]:
# This is great, but we lost all the information we had on the countries. What if we wanted
# to do more calculations afterwards, such as calculate the average life expectancy change over
# all countries?
#
# We can do this by creating empty lists ahead of the for-loop and then adding elements to them
# as we iterate over the file.

infile = open('gapminder_lifeexp.txt', 'r')  # re-open the file
_ = infile.readline()  # skip the header line

# We make new empty lists using a pair of square brackets and assigning them to a variable.
countries = []
life_exp_changes = []

# then we iterate over the file, like we did before
for line in infile:
    fields = line.strip().split(',')
    
    country = fields[1]
    le_change = float(fields[-1]) - float(fields[2])
    
    # and now instead of print(), we use the method 'append()' from lists.
    # append() takes one argument, which is the element we wish to add to the END of the list.
    # In other words, the last value in the list is the last value we appended.
    countries.append(country)
    life_exp_changes.append(le_change)

# As a courtesy to the computer, we should signal when we are done with the file. The .close()
# method of file objects effectively 'closes' the access to the file. Attempting to read anything
# else from the file from now on will give an error.
#
# When you are writing files, closing is particularly important to make sure you wrote everything
# you wanted.
# Python will sometimes wait until it has a lot of things to write before it actually writes, in
# an attempt to make writing faster. When you call the .close() method, Python forcefully empties
# this 'buffer', writing everything it had stored to the file.
infile.close()

In [None]:
# To make sure we got what we wanted:
print(countries)

In [None]:
# We can use the built-in function len() to get the number of elements in a list, or the size of
# the list. The return value is an integer.
len(countries)

In [None]:
# Our other list, life_exp_changes, contains the difference between the life expectancy values
# in 2007 and 1952, for every country in the dataset. The values are stored as floats, because
# they were the product of a subtraction between floats.
print(life_exp_changes)

In [None]:
# We can now iterate over this list to calculate an average number for the life expectancy change
# between 1952 and 2007 in our dataset of 142 countries.
#
# As a reminder, the average is calculated by adding all values in a dataset and dividing that
# sum by the number of values in the dataset.

total_sum = 0.0  # first, we define a new variable that will store the value of our total sum
count = 0  # then we define another variable to keep track of how many elements we have in the list

for item in life_exp_changes:  # for every item in our list
    total_sum  = total_sum + item  # we add that value to our sum and assign the result to the sum variable again.
    count = count + 1  # we add 1 to the number of items we saw so far

# var = var + 1 works because Python evaluates the value first and then assigns it to the variable.
# In other words, Python first retrieves what it knows to be the value of 'var', adds 1 to it, and
# then assigns this new value to the variable 'var'.

# Finally, we calculate the average (by dividing the sum over the count)
average = total_sum / count
print(average)

In [None]:
# However, Python includes a bunch of convenience built-in functions to simplify your life.
# For example, the built-in function sum() adds all the elements in a list. Internally, it runs
# a for-loop much like the one we wrote above. Then, as we saw, you can know the number of values
# in a list by using the built-in function len().

# As a result, we can reproduce the code above by the following statement:
average2 = sum(life_exp_changes) / len(life_exp_changes)
print(average2)

In [None]:
# Other useful built-in functions are for example min() and max(). These retrieve the smallest and
# largest element in a list, respectively. The definition of smallest/largest depends on the type
# of the variable. For numbers, min() returns the smallest value (numerically).
min(life_exp_changes)

In [None]:
# In the case of strings, Python sorts them alphabetically and returns the 'first' entry:
list_of_words = ['banana', 'zucchini', 'apple']
print(min(list_of_words))

## Conditionals

In [None]:
# Like many other languages, Python includes functionality to perform different operations
# depending on the result of certain calculations. This ability allows us to execute different
# blocks of code depending on certain conditions.

# For example, let's write a for loop that iterates over our list 'life_exp_changes' and prints()
# only the negative values.
for value in life_exp_changes:
    if value < 0:
        print(value)

In [None]:
# This 'if' statement evaluates a conditional expression. If the condition is True, then the code
# that is indented under the if-clause is executed, otherwise it's not.
#
# The syntax is very similar to a for loop:
# The keyword 'if' is followed by an expression. This expression is often a comparison between
# variables, or between variables and values (as in the case above). The comparison operators
# included in Python are: > (greater than), < (lesser than), >= (greater or equal), <= (lesser or equal).
# You can also directly compare for equality by using the operator '==' (double equals!) or inequality with '!='.
#
# You can chain multiple conditions using boolean logic and the keywords 'and'/'or'. Two conditions
# chained by 'and' return True if both of them are True or both of them are False. Two conditions chained
# by 'or' return True if any of them is True.
#
# As you can see, the boolean variable types are the core of comparisons in Python.
#
# After the colon, in the following lines, you indent (again, by 4 spaces) the lines of code
# you want to be executed if that condition is True.
#
# Let's see another example. Let's get countries that end with the letter 'l':
for name in countries:
    if name[-1] == 'l':  # we can use indexing directly here!
        print(name)

In [None]:
# Yet another example, countries that end in 'a' and start with 'B':
for name in countries:
    if name[-1] == 'a' and name[0] == 'B':  # remember, case sensitive!
        print(name)

In [None]:
# If you want to do something ELSE when the condition is not True, you can use the keyword 'else'
# in the for-loop, like this. All the code indented under 'else' will be executed whenever
# the condition in the initial 'if' is not True. You can use this if-else combo to divide
# operations in your code.
for name in countries:
    if name[-1] == 'a' and name[0] == 'B':  # remember, case sensitive!
        print(name)
    else:
        print(name, 'does not meet our condition')

## Wrap-up

In [None]:
# The content we went over so far makes up the core of every Python program in existence.
# Most Python programs will have for-loops, if-clauses, etc. If you understand these intro
# concepts, with some practice, you will be able to read Python code and eventually start
# writing it as well.

# To consolidate everything we learned, let's write one small 'script' that does the following:
# 1. Reads our data file
# 2. Extracts country names and calculates life expectancy changes.
# 3. Open a new file for writing
# 4a. Writes the name and life expect. changes for those countries where the change is negative.
# 4b. For all others, it displays on screen (not to file!) a warning.

# (1)
infile = open('gapminder_lifeexp.txt', 'r')
_ = infile.readline()

# (2)
countries = []
life_exp_changes = []

for line in infile:
    fields = line.strip().split(',')
    
    country = fields[1]
    le_change = float(fields[-1]) - float(fields[2])
    
    countries.append(country)
    life_exp_changes.append(le_change)

infile.close()

# (3)
outfile = open('negative_lifeExp.txt', 'w')

# Because we want to iterate over the two lists at the same time, we will use the range() built-in
# function to generate a list of indexes. There are other ways of doing this, but this one is the
# simplest for now.
#
# range() takes as an argument a number and it will return a list of consecutive numbers from 0 up
# to but NOT including that number. This is particularly useful since positional indexes of lists
# also start from 0. We can therefore use the len() function to obtain the number of elements in
# a list and combine it with range to obtain a list of numerical indexes that we can iterate on.
#
num_countries = len(countries)  # 142
index_list = range(num_countries)  # 0, 1, 2, ... until 141
for idx in index_list:
    country_name = countries[idx]  # we access the values of each list at position 'idx'
    value = life_exp_changes[idx]

    if value < 0:  # (4a)
        # the print function can take an additional argument called 'file' that specifies 
        # a file object to write to. We define it as if we were defining a variable.
        # As a result, this print() will not display the values to the screen, but it will
        # write them to the file open in 'outfile'.
        #
        # Also, since we are concatenating the values (+), they must all be converted to strings
        # otherwise we will get a type error.
        print(str(idx) + ',' + country_name + ',' + str(value), file=outfile)
    else:  # (4b)
        # This is a regular print(), which will display to our screen
        print('ignoring', countries[idx])

outfile.close()

In [None]:
# The result is a newly created file in the folder where you have the Jupyter notebook.
# This file contains two lines, also comma-separated, with the countries in our data set
# that had a smaller life expectancy in 2007 than they had in 1952.