<h1 style="background-color: #cf6868; color: white; padding: 24px; line-height: 32px;">Learning Python in 90 minutes</h1>
<p>Hideki Kozima (xkozima@tohoku.ac.jp)</p>

## Contents
* <a href="#Why">Why Python?</a>
* <a href="#Action">Python in action</a>
* <a href="#Lists">Lists, tuples, dictionaries, and sets</a>
* <a href="#Structures">Control structures</a>
* <a href="#Strings">Strings</a>
* <a href="#Modules">Modules</a>
* <a href="#Comprehensions">List comprehensions</a>
* <a href="#Functions">Functions</a>
* <a href="#Files">Reading/writing files</a>

<a name="Why"></a>
<hr style="border-color: #cf6868; border-width: 12px;" />
## Why Python?

This is a hands-on tutorial of Python (Python 3), an advanced, general-purpose programming language.

* Python is an <strong>interpreted language</strong>. No need to compile. Python gives you interactive programming environment.
* Python is <strong>dynamically typed</strong>. No need to specify types of variables and arguments. Python interpreters are equipped with an automatic garbage collector.
* Python uses <strong>white-space indentation</strong> for indicating <strong>code blocks</strong>. No need for curly braces or "begin" / "end".

Here is a simple example of a Python program to find the minimum and maximum value in a list of numbers. Press "<strong>Control-Enter</strong>" in the following box to run it. Try it out!  (So far, you do not have to fully understand the code.)

In [None]:
theList = [14, 15, 92, 65, 35, 89, 79, 32, 38, 46]
first = True
for n in theList:
    print(n)
    if first:
        minimum = n
        maximum = n
        first = False
    elif n < minimum:
        minimum = n
    elif n > maximum:
        maximum = n
print("min =", minimum)
print("max =", maximum)

I would recommend using Python for learning "<strong>data science</strong>".  There are a couple of reasons.
* Python is widely used in <strong>real-world applications</strong>,
* Python has strong connections with existing <strong>frameworks and libraries</strong> for "data analysis" and "machine learning", and
* Python has advantages also in graphical <strong>data visualization</strong>, which will be one of the key technology needed in future collaboration between AI and humans.

Let us learn Python as a gateway to "data science".  Here I suppose that you have some experience in computer programming, say, in C/C++, JavaScript, or other programming languages, but I will keep this tutorial as plain as possible.

<a name="Action"></a>
<hr style="border-color: #cf6868; border-width: 12px;" />
## Python in action

<hr style="border-color: #cf6868; border-width: 2px;" />
### Hello, world!

"<strong><tt>print()</tt></strong>" prints out the value of the argument(s).  See what the following code prints out.  Here we deal with <strong>string</strong>, <strong>int</strong> (integer), and <strong>float</strong> (real number).

In [None]:
# comment spans from "#" to the end of line
print("Hello, world!")
# "+" connects two strings
print("note" + "book")
# "print()" prints out each argument with a space
print("Welcome", "to", "Japan")
# expressions are evaluated, then printed out
print(2 * 2 + 2 / 2)
print("answer is", 2 * 2 + 2 / 2)
# "+" connects only strings, so make it string
print("answer=" + str(2 * 2 + 2 / 2))
# "+", "-", "*" between integers result in an integer
print(1+2, 2-3, 4*5, 6*7)
# if a float is in the operands, result is also an float
print(1+2.0, 2.0-3, 4*5.0, 6.0*7.0)
# "/" always results in a float
print("10 /  3 =", 10 / 3)
# "//" for quotient, "%" for remainder
print("10 // 3 =", 10 // 3)
print("10 %  3 =", 10 % 3)
# 2 to the power of 16
print("2 ** 16 =", 2 ** 16)
# when "*" meets a string
print("Ha" * 5)
# for no "newline" (by default, end="\n")
print("Hello, world", end="")
print(" peace!")

<hr style="border-color: #cf6868; border-width: 2px;" />
### Variables and literals

<strong>No need to "declare" variables</strong>.  Just assign some value to a new identifier, Python automatically makes a new variable.  (However, you cannot access to unassigned variables.)

In [None]:
# slim beer glass
pi = 3.14159
radius = 2.0
height = 10.0
volume = pi * radius * radius * height
print(volume)
# fat beer glass
radius = 4.0
volume = pi * radius * radius * height
print(volume)

Identifiers are somethings like "<tt>skywalker</tt>", "<tt>R2D2</tt>", and "<tt>obi_wan</tt>".  Note that Python is <strong>case sensitive</strong>; "<tt>skywalker</tt>", "<tt>SkyWalker</tt>", and "<tt>SKYWALKER</tt>" are all different. 

<div style="padding: 1em; border: solid 1px #cccccc;">
Below are "<strong>reserved words</strong>" or "<strong>keywords</strong>", which cannot be used for identifiers.
<pre>
False      class      finally    is         return
None       continue   for        lambda     try
True       def        from       nonlocal   while
and        del        global     not        with
as         elif       if         or         yield
assert     else       import     pass
break      except     in         raise
</pre>
</div>

A "<strong>literal</strong>" in Python is a representation in the code, like <tt>123</tt> or <tt>1.23</tt>, to produce a value in a specified data type.  For integers, you could also denote <tt>0x2a</tt> (hexadecimal) or <tt>0b101010</tt> (binary).  For strings, if you want to include quotation marks, type <tt>"what's 'Python'"</tt> or <tt>'what is "Python"'</tt>.  You could also write <tt>'what\'s'</tt> or <tt>'newline is \\\\n'</tt>.

<hr style="border-color: #cf6868; border-width: 2px;" />
### Assignments

As in assignment statements in other programming languages, the expression on the right side is evaluated and get a value, then the value is assigned to the variable on the left side.  C-like <strong>incremental/decremental assignments</strong> are OK.  However,  "<tt>k++</tt>" or "<tt>k--</tt>" are not allowed; so, use "<tt>k += 1</tt>" or "<tt>k -= 1</tt>" instead.  Python accepts <strong>multiple assignments</strong>, which may sometimes be very useful.

In [None]:
k = 1
print(k)
# you cannot do "k++" or "k--" in Python
k += 1
print(k)
# "-=", "/=", "//=", etc. are also OK
k *= 10
print(k)
# exchange the values of k, j
j = 1
print("before: k, j =", k, ",", j)
j, k = k, j
print(" after: k, j =", k, ",", j)
# traditionally, you would do...
print("before: k, j =", k, ",", j)
tmp = k
k = j
j = tmp
print(" after: k, j =", k, ",", j)

<hr style="border-color: #cf6868; border-width: 2px;" />
### User input and data types

"<strong>input()</strong>" reads what the user typed in (and press "enter").  It always returns a <strong>string</strong>.  Take a look at the following code to compute your BMI (Body Mass Index).  Keep it around 22.

In [None]:
# "input()" reads user input (with the prompt)
name = input("Name:")
weight = input("weight[kg]:")
height = input("height[cm]:")
# strings have to be converted to float
meters = float(height) / 100.0
BMI = float(weight) / meters ** 2
print(name + "'s BMI is", BMI)
# round(f, n) make the float value down to the n-th decimal
print(name + "'s BMI is", round(BMI, 3))

When you need to change the "<strong>data type</strong>" of a variable, use "<strong><tt>int()</tt></strong>", "<strong><tt>float()</tt></strong>", <strong><tt>str()</tt></strong>", and <strong><tt>bool()</tt></strong>.  The type "<strong>bool</strong>" is for a value of either <strong><tt>True</tt></strong> or <strong><tt>False</tt></strong>.  To get the data type oa a variable, use "<strong><tt>type()</tt></strong>".

In [None]:
# strings to numbers
anInt = int("123")
print("int('123') =", anInt, type(anInt))
aFloat = float("123")
print("float('123') =", aFloat, type(aFloat))
aFloat2 = float(anInt)
print("float(123) =", aFloat2, type(aFloat2))
# numbers to strings
aString = str(anInt)
print("str(123) = '" + aString + "'", type(aString))
aString2 = str(aFloat)
print("str(123.0) = '" + aString2 + "'", type(aString2))
# bool is {True, False}
aBool = True;
print("aBool = ", aBool, type(aBool))
# True <-> 1 and False <-> 0
anInt2 = int(aBool)
print("int(aBool) =", anInt2, type(anInt2))
aBool2 = bool(0)
print("bool(0) =", aBool2, type(aBool2))

<hr style="border-color: #cf6868; border-width: 2px;" />
### Formatted printing

You might want to print numbers in specified format, like "to the 3rd dicimal" or "in hexadecimal". Use <a href="https://docs.python.org/3.4/library/string.html">"<tt>format()</tt>" method</a>.  Here are some examples.

In [None]:
# printing in a row
nl = [3.14, 1.59, 2.65, 3.58, 9.79]
for n in nl:
    print(n, end=", ")
print()
# printing integer and float
pi = 3.14159265358979
print("####:{0:4d}".format(32))
print("####:{0:04d}".format(38))
print("?.#####:{0:.5f}".format(pi))
print("##.#####:{0:8.5f}".format(pi))
print("##.#####e%%:{0:5.2e}".format(pi/1000))
# hexadecimal (you could reuse 3238)
print("xxxx:{0:04x} ({0:d})".format(3238))
# strings (left-aligned)
print("{0:s}'s weight is {1:.1f}kg, height is {2:3.0f}cm".format("Hideki", 63.25, 169.8))
print("ccccc/ccccc:{0:5s}/{1:5s}.".format("Subj", "Verb"))

<a name="Lists"></a>
<hr style="border-color: #cf6868; border-width: 12px;" />
## Lists, tuples, dictionaries, and sets

Here we will review the major data structure.  For details, refer to <a href="https://docs.python.org/3/tutorial/datastructures.html">the official reference</a>.

<hr style="border-color: #cf6868; border-width: 2px;" />
### Lists (arrays)

A list is an <strong>array</strong> of objects, just as in other programming languages like JavaScript.

In [None]:
x = [14, 15, 92, 65, 35, 89, 79, 32, 38, 46]
print(x)
# x[n] accesses to the n-th element (starting from n=0)
print(x[0])
# len() returns the number of elements
print(x[0], x[1], "...", x[len(x) - 2], x[len(x) - 1])
# negative index also works as follows
print(x[0], x[1], "...", x[-2], x[-1])
# min/max/sum
print("min max =", min(x), max(x))
print("sum =", sum(x))

Lists are <strong>mutable</strong> arrays, so you can change the contents, add a new element, and delete a particular element.

In [None]:
# you can change a particular element
x[0] = 314
print("change:", x)
# you can add a new element at the end
x.append(26)
print("append:", x)
# delete the k-th element
del x[5]
print("delete:", x)
# insert as the k-th element
x.insert(3, 123)
print("insert:", x)
# sort in the ascending/descending order
x.sort()
print("sort:", x)
x.sort(reverse=True)
print("sort:", x)

<hr style="border-color: #cf6868; border-width: 2px;" />
### Tuples (immutable arrays)

A tuple is similar to arrays, but it is <strong>immutable</strong>, so you cannot change its elements once you generate it.  Use round parentheses to embrace the elements.

In [None]:
y = ("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
print("y =", y)
print("len(y) =", len(y))
print(y[0], y[1], "...", y[-2], y[-1])

When you need to modify the contents of the tuple, you can change it into a list.  You can also change a list to a tuple.

In [None]:
# make the tuple d into a new list
yl = list(y)
print("list:", yl)
# so you can modify it
yl[5] = "Fri"
print("list:", yl)
# make the list x into a tuple
xt = tuple(x)
print(xt)
# the tuple xt is no longer mutable

<hr style="border-color: #cf6868; border-width: 2px;" />
### Dictionaries (associative arrays)

A dictionary is an <strong>associative array</strong> in the form like <strong>JSON</strong>, namely <tt>dic = {key1:value1, key2:value2, ...}</tt>.  You can get the value for a key by "<tt>dic[key]</tt>".  You can add (or overwrite) a new value for a key by "<tt>dic[key] = value</tt>".  Note that dictionaries are mutable.

In [None]:
d = {"English":82, "Math":70, "History":58, "Science":87}
print(d)
print(len(d))
print(d["Math"])
# you can make change
d["Math"] = 95
print(d)
# you can add a new pair
d["Economy"] = 85
print(d)
# or delete a pair
del d["History"]
print(d)
# get all keys or values
print(d.keys())
print(d.values())
# above results can be converted into lists
dvl = list(d.values())
print(dvl)
print("sum =", sum(dvl))
print("avg =", sum(dvl) / len(dvl))
# items() gives all key:value pairs
kv = d.items()
print(kv)
kvl = list(kv)
print(kvl)
print(list(kv)[1][0])

<hr style="border-color: #cf6868; border-width: 2px;" />
### Sets (as in math)

A set of collection of <strong>distinct</strong> elements.  Each of its elements appears only once in the set.  As in math, operations like "<tt><strong>&amp;</strong></tt>" (intersection), "<tt><strong>|</strong></tt>" (union), and "<tt><strong>-</strong></tt>" (subtraction) are defined for sets.

In [None]:
s1 = {"html", "css", "javascript", "python"}
s2 = {"javascript", "python", "c", "c++", "lisp"}
s3 = {"python", "R", "spss", "matlab"}
# some operations
print(s1, s2, s3)
print("coverage:", s1 | s2 | s3)
print("common:", s1 & s2 & s3)
# add/remove
s2.add("java")
print(s2)
s3.remove("spss")
print(s3)
# clear the set
s1.clear()
print(s1)
s1.add("actionscript")
print(s1)

Note that a set has <strong>no sequential order</strong> in its elements, so you <u>cannot</u> access to an element by something like <tt>s2[1]</tt>.

<a name="Structures"></a>
<hr style="border-color: #cf6868; border-width: 12px;" />
## Control structures

<hr style="border-color: #cf6868; border-width: 2px;" />
### "if" branches

Conditional statements of <a href="https://docs.python.org/3/tutorial/controlflow.html#if-statements">"<tt><strong>if</tt></strong>"</a> may take several patterns of <strong>control flow</strong>, as illustrated below.  Note that Python uses "<tt><strong>elif</strong></tt>" instead of "<tt><strong>else if</tt></strong>".

In [None]:
n = 123
if n % 2 == 0:
    print(n, "is an even number")
else:
    print(n, "is an odd number")

<div style="float: left;">
<pre>
if <i>&lt;condition&gt;</i>:
    <i>&lt;then-block&gt;</i>
</pre>
</div>
<div>
<img src="img/deepPyCond1.png" width="235">
</div>

<div style="float: left;">
<pre>
if <i>&lt;condition&gt;</i>:
    <i>&lt;then-block&gt;</i>
else:
    <i>&lt;else-block&gt;</i>
</pre>
</div>
<div>
<img src="img/deepPyCond2.png" width="235">
</div>

<div style="float: left;">
<pre>
if <i>&lt;condition1&gt;</i>:
    <i>&lt;block1&gt;</i>
elif <i>&lt;condition2&gt;</i>:
    <i>&lt;block2&gt;</i>
</pre>
</div>
<div>
<img src="img/deepPyCond3.png" width="300">
</div>

<div style="float: left;">
<pre>
if <i>&lt;condition1&gt;</i>:
    <i>&lt;block1&gt;</i>
elif <i>&lt;condition2&gt;</i>:
    <i>&lt;block2&gt;</i>
elif <i>&lt;condition3&gt;</i>:
    <i>&lt;block3&gt;</i>
</pre>
</div>
<div>
<img src="img/deepPyCond4.png" width="300">
</div>

<div style="float: left;">
<pre>
if <i>&lt;condition1&gt;</i>:
    <i>&lt;block1&gt;</i>
elif <i>&lt;condition2&gt;</i>:
    <i>&lt;block2&gt;</i>
elif <i>&lt;condition3&gt;</i>:
    <i>&lt;block3&gt;</i>
else:
    <i>&lt;block4&gt;</i>
</pre>
</div>
<div>
<img src="img/deepPyCond5.png" width="300">
</div>

For comparison, "<tt>==</tt>" (equal), "<tt>!=</tt>" (not equal), "<tt>&lt;</tt>", "<tt>&lt;=</tt>", "<tt>&gt;</tt>", "<tt>&gt;=</tt>" can be used as operators.  Another operator "<tt>in</tt>" check is an item is in the collection (list, tuple, set, or keys in dictionary).

In [None]:
# works for [list], (tuple), {set}
collection = ["George", "Barack", "Donald"]
item = "Donald"
if item in collection:
    print("Yes,", item)
else:
    print("No.")

# for {dictio:nary}, "in" check for keys
dictionary = {"George":55, "Barack":80, "Donald":45}
item = "George"
if item in collection:
    print("Yes,", item)
else:
    print("No.")

The following codes exemplify the conditional control flows illustrated above.  Try them out for <tt>year = 2017, 2020, 2000, 2100</tt>.  (2000 and 2020 are <a href="https://en.wikipedia.org/wiki/Leap_year">leap years</a>.)

In [None]:
year = 2017
# if-elif-else implementation
if year % 400 == 0:
    days = 366
elif year % 100 == 0:
    days = 365
elif year % 4 == 0:
    days = 366
else:
    days = 365
print("year", year, "has", days, "days")

Of course, you can embed another "if" statement inside the block of an "if" statement.

In [None]:
year = 2017
# nested if-else
if year % 4 == 0:
    if year % 100 == 0:
        if year % 400 == 0:
            days = 366
        else:
            days = 365
    else:
        days = 366
else:
    days = 365
print("year", year, "has", days, "days")

Conditional expressions (such as "<tt>year % 4 == 0</tt>") can be combined with other conditional expressions with "<strong><tt>and</strong></tt>", "<strong><tt>or</strong></tt>".  To negate, use "<tt><strong>not</strong> (year % == 0)</tt>".  

In [None]:
year = 2017
# logical operation
if (year % 4 == 0) and ((year % 100 != 0) or (year % 400 == 0)):
    days = 366
else:
    days = 365
print("year", year, "has", days, "days")

<hr style="border-color: #cf6868; border-width: 2px;" />
### "for" loops

<a href="https://docs.python.org/3/tutorial/controlflow.html#for-statements">"<strong><tt>for</tt>"</a> loops are the most common form of iteration.  Try out the following example.  The "<tt>for</tt>" loop computes the sum of 1, 2, ..., 10. Python often uses a list, like <tt>[1, 2, ..., 10]</tt>, for such interactions.

In [None]:
total = 0
for k in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:
    total += k
print("1+...+10 =", total)

Never ever forget to put <strong>"<tt>:</tt>" (colon)</strong> at the end of "for" line. The colon indicates that a <strong>block</strong> starts from the next line.

A block consists of consecutive lines sharing <strong>the same level of indentation</strong> (by leading spaces). This is sometimes called "off-side rule". It is strongly recommended to use <strong>white spaces</strong> to make this indentation. A 4-space indentation is commonly used in Python communities.

In [None]:
loops = 0
total = 0
start = 1
end   = 1001
for k in range(start, end):
    loops += 1
    total += k
print("num of loops =", loops)
print(str(start) + "+...+" + str(end-1) + " =", total)

Here, "<tt>range()</tt>" generates a range of consecutive numbers, which can be used in "for" loops.

In [2]:
print(range(1, 10))
# make it a "list" to see the contents of the range
print("range(1, 10) =", list(range(0, 10)))
# the third argument is "step"
print("range(1, 10, 2) =", list(range(1, 10, 2)))
# simply "range(n)" gives "range(0, n)"
print("range(5) =", list(range(5)))

range(1, 10)
range(1, 10) = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
range(1, 10, 2) = [1, 3, 5, 7, 9]
range(5) = [0, 1, 2, 3, 4]


<hr style="border-color: #cf6868; border-width: 2px;" />
### More about "for" loops

For <strong>reverse order</strong> iterations, use a negative step in "<tt>range(start, end, step)</tt>".

In [None]:
for k in range(10, 0, -1):
    print(k)

When you want to use indices as well as elements in a list, use <tt>enumerate()</tt> as in the example below.

In [None]:
tohoku = ["Miyagi", "Fukushima", "Yamagata", "Iwate", "Akita", "Aomori"]
for (i, name) in enumerate(tohoku):
    print(i, name)

When you want to scan two lists in parallel, use <tt>zip()</tt> as in the next example.

In [None]:
Greek = ["Zeus",    "Poseidon", "Dionysus", "Athena",  "Aphrodite"]
Roman = ["Jupiter", "Neptune",  "Bucchus",  "Minerva", "Venus"]
for (g, r) in zip(Greek, Roman):
    print(g, r)

Any "<tt>for</tt>" loop may include an "<strong><tt>else</tt></strong>" block.  "<tt>else</tt>" block will be executed if no "<tt>break</tt>" occurred inside the loop.  In other words, the "else" block will be executed only after <strong>all</strong> elements were processed.

In [None]:
sum = 0
for score in [80, 92, 83, 57, 88]:
    # test for 60 and 50
    if score < 60:
        print("FAILED")
        break
    sum += score
else:
    print("TOTAL:", sum)

<hr style="border-color: #cf6868; border-width: 2px;" />
### "while" loops

Another way to make a loop is "while". Try this example. This program generates "Fibonacci sequence" and computes the "golden ratio" (1.6180339887...).

In [None]:
a = 1
b = 1
k = 1
while k <= 10:
    print(k, b, b / a)
    next = a + b
    a = b
    b = next
    k += 1
print("Now, k =", k)

Note that you may need to count up or down the control variable ("k" for here) yourself; otherwise, Python could be trapped in an infinite loop.

<hr style="border-color: #cf6868; border-width: 2px;" />
### "break" and "continue" in a loop

You can make an infinite loop by "<strong><tt>while True:</tt></strong>" and specify the condition to terminate the loop.  "<strong><tt>break</tt></strong>" terminates the (most inner) loop.

In [None]:
a = 1
b = 1
k = 1
while True:
    print(k, b, b / a)
    b, a = a + b, b    # multiple value assignment
    k += 1
    if b > 100:
        break
print("Now, k =", k)

"<tt>break</tt>" can be used also in "<tt><strong>for</strong></tt>" loops.  Once encountered with "<tt>break</tt>", the loop immediately terminates.

In [None]:
a = 1
b = 1
for k in range(1, 100):
    print(k, b, b / a)
    b, a = a + b, b
    if b > 100:
        break
print("Now, k =", k)

When a loop encounters "<strong><tt>continue</tt></strong>", it immediately comes back to the loop head and continues running the block from the beginning.

In [None]:
for x in range(-10, 10):
    y = x ** 3
    if (y > 100.0) or (y < -100.0):
        continue
    print(x, y)

<hr style="border-color: #cf6868; border-width: 2px;" />
### Exceptions (try... except...)

Using <a href="https://docs.python.org/3/library/exceptions.html">"<tt>try... except...</tt>"</a>, you can handle <strong>errors</strong>, so as not to stop your program.

In [None]:
try:
    age = int(input("Age:"))
except:
    print("An integer is required.");
else:
    birthYear = 2017 - age
    print("You were born in", birthYear, "or", birthYear - 1)
finally:
    print("Thank you for using me!")

Above example catches errors in all types.  But you can specify the exception types: <tt>ValueError</tt>, <tt>ArithmeticError</tt> (subtypes are <tt>OverflowError</tt>, <tt>ZeroDivisionError</tt>).  For example, use "<tt>except ValueError:</tt>" for coping with <tt>ValueError</tt>.  You can make multiple except blocks for different error types.

<a name="Strings"></a>
<hr style="border-color: #cf6868; border-width: 12px;" />
## Strings

<hr style="border-color: #cf6868; border-width: 2px;" />
### String basics

Behavior of <a href="https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str">strings</a> is intuitively understandable.  It is similar to that in JavaScript.

In [None]:
s = "Hello, world!"
# upper/lower case
print("lower:", s.lower())
print("upper:", s.upper())
# the original string never changes
print("original:", s)
# string is a list of characters, so len() also works
print(len(s))
# so indexing is also works
print(s[0], s[4], s[7], s[-1])
print(s[7:12])

"<tt>format()</tt>" can be used outside of "<tt>print()</tt>".

In [None]:
base = "I'm sorry, {0:s}.  I'm afraid, but I can't do that."
ans = base.format("Dave")
print(ans)

<hr style="border-color: #cf6868; border-width: 2px;" />
### Split and connect

When you want to tokenize (split) a string into a list of words, use "<tt>split()</tt>".  Reversely, "<tt>join</tt>" concatenates a list of words into a single string.

In [None]:
s = "We learn Python for data science. Python is great!"
# here we use " " (a space) as the delimiter
tokens = s.split(" ")
print(tokens)
# "delim".join(strList) connect strings with "delim" in between
s2 = "/".join(tokens)
print(s2)

If you omit the delimiter in "<tt>split(delimiter)</tt>", it uses default delimiters (any sequence made of "<tt> </tt>", "<tt>\t</tt>", "<tt>\n</tt>", or "<tt>\r</tt>") to split a string into words.

In [None]:
wl = "There  must\tbe\n\n   an\r\nangel".split()
print(wl)

If you split a string into characters, you can do it in a couple of ways.

In [None]:
for c in "Wow":
    print(c)

In [None]:
s = "Wow"
cs = list(s)
print(cs)

<hr style="border-color: #cf6868; border-width: 2px;" />
### Internationalization (i17n)

You can use non-roman letters, such as Japanese letters, in Python strings.  Python uses the "<strong>utf-8</strong>" character set, which can handle virtually all languages on this planet.

In [None]:
# Hope you can read Japanese
j = "これは日本語で書いた文字列です"
# indexing also works
print(j[3:6] + j[-2:])
for c in (j[4:6] + j[-2:]):
    print(c)

In [None]:
# ロシア語です（日本語のコメントもOK）
ru = "Здравствуйте"
print(ru)
for c in ru[1:]:
    print(c)

<hr style="border-color: #cf6868; border-width: 2px;" />
### Search and replace

Searching a substring in a string is quite easy in Python.  Use "<tt>find()</tt>", which returns the index of the first occurrence (if no occurrence, -1).

In [None]:
#    012345678901234567890123456789012345678901234567890
s = "We learn Python for data science. Python is great!"
# find() returns the index of the first occurrence
print(s.find("Python"))
# if not found, find() returns -1
print(s.find("JavaScript"))
# counts up the number of occurrence
print(s.count("Python"))
# "in" checks the occurrence and returns True/False
keyword = "Python"
if (keyword in s):
    print("Yes, we have", keyword)
else:
    print("Sorry, we don't have", keyword)

Sometimes you need to find every occurrence of a keyword in a string.  In such a case, you could use something like the following.  Note that "<tt>find(keyword, start)</tt> searches the keyword from the index "start".

In [None]:
#    0123456789012345678901234567890123456789
s = "Your Python may not work on Monday."
keyword = "on"
keylen = len(keyword)
index = s.find(keyword)
while index != -1:
    print(index)
    index = s.find(keyword, index + keylen)

If you want to perform a case-insensitive search or count, you can do something like the following.

In [None]:
# search "python" in the "lowered" string.
s = "You know pi is 3.14... and pie is something to eat."
keyword = "Pi"
index = s.lower().find(keyword.lower());
print(" " * index + "v")
print(s)
print("Found", s.lower().count(keyword.lower()), "pi(es)")

To replace a substring with another, use "<tt>replace(old, new)</tt>", which replace all occurrence of "old" with "new".

In [None]:
s = "We learn Python for data science. Python is great!"
s2 = s.replace("Python", "JavaScript")
print(s2)

<a name="Modules"></a>
<hr style="border-color: #cf6868; border-width: 12px;" />
## Modules

A module is a dynamically loadable package of functions and constants. There are standard modules like "<a href="https://docs.python.org/3/library/math.html"><tt>math</tt></a>" (like exp, log, sin, etc.), "<a href="https://docs.python.org/3/library/random.html"><tt>random</tt></a>" (like randint, randrange, etc.).  Also you could make your own modules.

<hr style="border-color: #cf6868; border-width: 2px;" />
### Math

To incorporate a module (for example, "<tt>math</tt>") into your program, just say "<tt>import math</tt>". After that line, you can use any function or constants in "<tt>math</tt>" module like math.sqrt() and math.pi (3.14159...). Take a look at this example.

In [None]:
import math
for i in range(1, 26):
    print("sqrt(" + str(i) + ") =", math.sqrt(i))

Here is a list of "math" functions and constants that you might often use.

<pre>
ceil(x)      floor(x)     trunc(x)     modf(x)
fabs(x)      log(x)       log2(x)      log10(x)
exp(x)       pow(x, y)    sqrt(x)      hypot(x, y)
sin(x)       asin(x)      cos(x)       acos(x)
tan(x)       atan(x)      radians(x)   degrees(x)
pi           e
</pre>

<hr style="border-color: #cf6868; border-width: 2px;" />
### Random

Sometimes the module name, such as <a href="https://docs.python.org/3/library/random.html">"<tt>random</tt>"</a>, is a bit too long to add to its function and constant names. In such a case, you can <strong>rename</strong> it as in the following example, where "<tt>ran.randrange(0, 4)</tt>" returns 0, 1, 2, or 3, randomly.

In [None]:
import random as ran
bases = list("AGCT")
for i in range(0, 100):
    k = ran.randrange(0, len(bases))
    print(bases[k], end="")
print()

When you want to have random numbers in floats, use "<tt>random()</tt>" or "<tt>uniform(min, max)</tt>" instead.

In [None]:
import random as ran
for i in range(0, 10):
    # uniform in [0, 1)
    r1 = ran.random()
    # uniform in [min, max]
    r2 = ran.uniform(-1.0, 1.0)
    print("r1 = {0:.5f},  r2 = {1:8.5f}".format(r1, r2))

Here are other convenient features of "<tt>random</tt>", namely "<tt>choice()</tt>" and "<tt>shuffle()</tt>".

In [None]:
import random as ran
a = ["L", "R", "F", "B"]
# feed a "seed" (from the current time)
ran.seed()
# random choice
print("random choice: ", end="")
for i in range(0, 20):
    action = ran.choice(a)
    print(action, end=" ")
print()
# shuffle it
print("original:", a)
for i in range(0, 4):
    ran.shuffle(a)
    print("shuffled:", a)

<hr style="border-color: #cf6868; border-width: 2px;" />
### Other modules

There are tons of modules available. Some modules you will be using in the study of "data science" may include:

* numpy : Numerical computation tools including n-dimensional arrays.
* matplotlib : Plotting tools for data visualization.
* scipy : Works with numpy for computing integrals, differential equiations, etc.
* pandas : Data maniputation tools for statistical data analysis.

<a name="Comprehensions"></a>
<hr style="border-color: #cf6868; border-width: 12px;" />
## List comprehensions

"List comprehensions" are methods for generating lists from methematically natural notations like the followings.

* $A = \left\{ 2^n : n \in \mathbb{N}, \ n < 10 \right\}$
* $B = \left\{ x \mid x \in A, \ x < 100 \right\}$

In [None]:
# this works as a simple mapping
A = [2 ** n for n in range(0, 10)]
print(A)
# and this works as a filter
B = [x for x in A if x < 100]
print(B)

You could also filter something out of an existing list.

In [None]:
dic = [("Rod", "M"), ("Cindy", "F"), ("Brian", "M"), 
       ("Jessy", "F"), ("Matt", "M"), ("Paul", "M"), 
       ("Charlie", "M"), ("Marilyn", "F") ]
women = [p[0] for p in dic if p[1] == "F"]
print(women)

In [None]:
tests = ["English", "Math", "Science", "History", "Essay"]
scores = [90, 85, 88, 57, 64]
# you can make a new dictionary using comprehension
dic = {t : s for (t, s) in zip(tests, scores) }
print(dic)

List comprehensions are quite powerful.  You could nest another "<tt>for</tt>" inside the "<tt>for</tt>" of a comprehension.

In [None]:
# embedded comprehensions
a1 = [[i, j] for i in range(1, 4) for j in range(7, 10)]
print("a1  =", a1)
# you can rewrite above as follows
a2 = []
for i in range(1, 4):
    for j in range(7, 10):
        a2.append([i, j])
print("a2 =", a2)

Also you could use another comprehension inside a comprehesion. The example below is to generate a list of prime numbers upto 50.  Note that this is a bit "tricky" example.

In [None]:
pr = [p for p in range(2, 50) if not p in [np for i in range(2, 8) for np in range(i * 2, 50, i)]]
print(pr)

<a name="Functions"></a>
<hr style="border-color: #cf6868; border-width: 12px;" />
## Functions

<hr style="border-color: #cf6868; border-width: 2px;" />
### Defining functions

A <strong>function</strong> in Python is, just like functions in other programming languages, a chunk of reusable code to perform a certain "function". We have already used built-in functions like "<tt>print()</tt>" and others. Here we learn to make your original function. Look at the two examples below. Both compute the factorial of n (n! = 1 × 2 × ... × n).

In [None]:
def fact(n):
    res = 1
    for i in range(1, n + 1):
        res *= i
    return res
for i in range(1, 10):
    print("fact({0:d}) = {1:d}".format(i, fact(i)))

A functions terminates and returns some value when it encounters "<strong><tt>return</tt></strong>" statement.  If the function does not have "<tt>return</tt>", no value is returned. (Strictly speaking, such a function returns "<tt>None</tt>", which will be printed as "<tt>None</tt>".)

The second example above uses <strong>recursion</strong>. The recursion stops at n = 1 (for 1! = 1); otherwise, it computes (n - 1)! by calling recursively itself, then computes n × (n - 1)! to get the result of n!.

In [None]:
def fact(n):
    if n <= 1:
        return 1
    else:
        return n * fact(n - 1)
for i in range(1, 10):
    print("fact({0:d}) = {1:d}".format(i, fact(i)))

You can make a function that returns a string, a list, or others.

In [None]:
def week(n):
    if n == 1:
        return "Monday"
    elif n == 2:
        return "Tuesday"
    elif n == 3:
        return "Wednesday"
    elif n == 4:
        return "Thursday"
    elif n == 5:
        return "Friday"
    elif n == 6:
        return "Saturday"
    elif n == 7:
        return "Sunday"
for i in range(1, 8):
    print(i, week(i))
def weekDay():
    return [week(1), week(2), week(3), week(4), week(5)]
def weekEnd():
    return [week(6), week(7)]
print(weekDay())
print(weekEnd())

Note that Python does not have "<tt>switch-case</tt>" control structure.  Use "<tt>if-elif-elif-...-else</tt>" instead.

<hr style="border-color: #cf6868; border-width: 2px;" />
### Scope rules

Arguments of a function (for example, "<tt>n</tt>" inside "<tt>def f(n):</tt>" of the example below) works as a <strong>local</strong> variable, so any assignment to it does not affect outside of the function.

In [None]:
def foo(n):
    # arguments are "local" variables
    n = n + 1
    print("local:  n =", n)
# this is "global"
n = 1
foo(n)
# "global" variables do not change
print("global: n =", n)

You can <strong>read</strong> global variables from the inside of a function.

In [None]:
def foo(n):
    # arguments are "local" variables
    print("through: m =", m)
    n = m + 1
    print("local:   n =", n)
# this is "global"
m = 2
n = 1
foo(n)
# "global" variables do not change
print("global:  m =", m)
print("global:  n =", n)

However, <strong>any variables that are assigned some values inside the function</strong> are considered to be <strong>local</strong>.  Assignments to such local variables do not affect outside of the function.

In [None]:
def foo(n):
    # assignment to the "local" variable
    m = n * 2
    print("local:  m =", m)
# this is "global"
m = 1
foo(1)
# "global" variables do not change
print("global: m =", m)

If you really need to assign some values to "global" variables, you need to explicitly "declare" that the variables are "<strong><tt>global</tt><strong>".

In [None]:
def foo(n):
    # now m is imported from "global"
    global m
    m = n * 2
    print("inside: m =", m)
    
# this is "global"
m = 1
foo(1)
# m is changed inside f()
print("global: m =", m)

<hr style="border-color: #cf6868; border-width: 2px;" />
### Keywords and defaults

When you call a function with some arguments, you can give an argument in  "<strong><tt>keyword=value</tt></strong>" style, by which you can arrange the arguments in a arbitrary order.

In [None]:
def cylinderVolume(radius, height):
    return 3.14159 * radius * radius * height
# all produce the same result
print(cylinderVolume(5, 10))
print(cylinderVolume(radius=5, height=10))
print(cylinderVolume(height=10, radius=5))

When you define a function, you can give a <strong>default value</strong> to an argument.  When the function is called with a missing argument, the default value will be used in the function.

In [None]:
def cylinderVolume(radius, height=10, myPi=3.14159):
    return myPi * radius * radius * height
print(cylinderVolume(5, 10, 3.0))
print(cylinderVolume(5, myPi=3.0))
print(cylinderVolume(5, 10))
print(cylinderVolume(5))

<hr style="border-color: #cf6868; border-width: 2px;" />
### Arguments of mutable data

When you give a mutable data (lists, dictoinaries, sets, etc.) as an argument of a function, any change to the argument in the function body will be carried to the outer environment.  See the example below.

In [None]:
def addPoo(lst):
    lst.append("Poo")
a = ["Head", "Body"]
print("before:", a)
addPoo(a)
print("after: ", a)

For mutable data as an argument to a function, the actual object ouside the function and the actual object inside the function are identical.

In [None]:
def addPoo(lst):
    lst.append("Poo")
    print("inside:", id(lst), lst)
a = ["Head", "Body"]
print("before:", id(a), a)
addPoo(a)
print("after: ", id(a), a)

Above example proves that the outside data ("<tt>a</tt>" for here) and the inside data ("<tt>lst</tt>" for here) share <tt>id</tt>, which can be considered as the "memory address" of the beginning of the data.

<hr style="border-color: #cf6868; border-width: 2px;" />
### Variable-length arguments

You could make a function that takes an arbitrary number of arguments.

In [None]:
def vari(first, *rest):
    print(1, first, end=", ")
    for i, arg in enumerate(rest):
        print(i + 2, arg, end=", ")
    print()
vari("One")
vari("One", "Two")
vari("One", "Two", "Three", "Four")

<hr style="border-color: #cf6868; border-width: 2px;" />
### Lambda expressions

<strong>Lambda expressions</strong> can be considered as functions without names.  As exemplified below, "<tt>lambda z:...</tt>" is a function that executes some codes ("<tt>...</tt>") on the argument "<tt>z</tt>".  Note that "<tt>A if C else B</tt>" returns A if C is True, otherwise returns B.

In [None]:
# lambda expression itself does not do anything
lambda z: 1 if z >= 0 else 0
# (lambda...)() works as a function
z = 1
y = (lambda z: 1 if z >= 0 else 0)(z)
print("{0:5.2f} {1:5.2f}".format(z, y))
z = -1
y = (lambda z: 1 if z >= 0 else 0)(z)
print("{0:5.2f} {1:5.2f}".format(z, y))

In [None]:
phi = lambda z: 1 if z >= 0 else 0
for z in [-1.0, -0.5, 0.0, 0.5, 1.0]:
    y = phi(z)
    print("{0:5.2f} {1:5.2f}".format(z, y))

Lambda expressions can be an argument of a function.  In the following example, the function "<tt>mother()</tt>" take different formula (such as $x^2$, $x^3$, and $x^4$) to compute.

In [None]:
# mother of all functions?
def mother(f):
    for x in [-2.0, -1.0, 0.0, 1.0, 2.0]:
        y = f(x)
        print("{0:5.2f} {1:5.2f}".format(x, y))
# give "lambda" to mother
print("lambda x: x ** 2")
mother(lambda x: x ** 2)
# give fun to the mother
fun = lambda x: x ** 3
print("lambda x: x ** 3")
mother(f3)
# give function to the mother
print("def f4(x):...")
def f4(x):
    return x ** 4
mother(f4)

<hr style="border-color: #cf6868; border-width: 2px;" />
### Mapping, filtering, and sorting

Lambda expressions can also be used for mapping, filtering, and sorting the elements of a list.

In [None]:
miles = [314, 159, 265, 358, 97]
# using loop
kms = []
for mi in miles:
    kms.append(mi * 1.609)
print("loop:  ", kms)
# using comprehension
kms = [mi * 1.609 for mi in miles]
print("compre:", kms)
# using lambda and "map()"
kms = map(lambda mi: mi * 1.609, miles)
print("lambda:", list(kms))
# using indirect lambda and "map()"
f = lambda mi: mi * 1.609
kms = map(f, miles)
print("f=lamb:", list(kms))

In [None]:
numbers = [314, 159, 265, 358, 97, 323, 84]
mod2 = lambda num: num % 2 == 0
# "filter()" selects items that give True
evens = filter(mod2, numbers)
print(list(evens))

In [None]:
# sort a list by a key
score = [("Math", 72), ("Essay", 64), ("Science", 87), ("English", 85)]
score2 = sorted(score, key=lambda p:p[1], reverse=True)
print(score2)
# for dictionary, you need to convert it into a list of tuples
dic = {"Math": 72, "Essay": 64, "Science": 87, "English": 85}
dic2lt = dic.items()    # dic -> [("Math", 72), ...]
dic2 = sorted(dic2lt, key=lambda p:p[0])    # alphabetical order
print(dic2)

<hr style="border-color: #cf6868; border-width: 2px;" />
### Generators

A function that has "<strong><tt>yield</tt></strong>" (and no "<tt>return</tt>") returns a <strong>generator</strong>, which trickles each elements in a sequence.  Each time it is called with "<tt>\_\_next\_\_()</tt>" method, the generator returns the "next" element in the sequence.

In [None]:
def trickle(string):
    for char in string:
        yield "letter-{0:s}".format(char)
gen = trickle("Peace")
print(gen.__next__())
print(gen.__next__())
print(gen.__next__())
print(gen.__next__())
print(gen.__next__())
# "__next__()" again, exception(StopIteration) occurs

In [None]:
def trickle(string):
    for char in string:
        yield "letter-'{0:s}'".format(char)
for c in trickle("Happiness"):
    print(c)

<a name="Files"></a>
<hr style="border-color: #cf6868; border-width: 12px;" />
## Reading/writing files

You can read/write text files and binary files.  Assume you have the following text file, which is actually located in "data/textfile.txt" relative to this Jupyter Notebook file.
<pre>
Yesterday, all my troubles seemed so far away
Now it looks as though they're here to stay
Oh, I believe in yesterday
</pre>

<hr style="border-color: #cf6868; border-width: 2px;" />
### Reading files

You can read a text file as a single whole string, or a list of lines.  Also you can repeat reading the file line by line, or character by character.

In [None]:
file = open("data/textfile.txt", "r")
s = file.read()
print("<begin>" + s + "<end>")
file.close()

In [None]:
file = open("data/textfile.txt", "r")
# read entire text as a list of lines
lineList = file.readlines()
# show each of the lines
for line in lineList:
    print("<begin>" + line + "<end>")
file.close()

In [None]:
file = open("data/textfile.txt", "r")
# read the first line
line = file.readline()
# if EOF, readline() returns ""
while line != "":
    print("<begin>" + line + "<end>")
    # read the next line
    line = file.readline()
file.close()

In [None]:
file = open("data/textfile.txt", "r")
# read the first 10 characters
for i in range(0, 10):
    # read just one character
    ch = file.read(1)
    print("'{0:s}', ".format(ch), end="")
file.close()
print()

<hr style="border-color: #cf6868; border-width: 2px;" />
### Writing files

Writing strings to a text file is fairly easy, too.  Speficy "<tt>w</tt>" for writing (actually, overwriting, if the file already exists), or "<tt>a</tt>" for appending new text at the end of existing contents.

In [None]:
# string that contains "\n" (newline)
s_out = "Why she had to go?\nI don't know, she wouldn't say\n"
file = open("data/newTextfile.txt", "w")
file.write(s_out)
file.close()
# try reading back
file = open("data/newTextfile.txt", "r")
s_in = file.read()
file.close()
print("<begin>" + s_in + "<end>")

In [None]:
# string of two lines
lineList = ["I said something wrong\n", "Now I long for yesterday.\n"]
# append to the existing file
file = open("data/newTextfile.txt", "a")
for line in lineList:
    file.write(line)
file.close()
# try reading back
file = open("data/newTextfile.txt", "r")
s_in = file.read()
file.close()
print("<begin>" + s_in + "<end>")

<hr style="border-color: #cf6868; border-width: 2px;" />
### CSV files

You would need to handle CSV files.  CSV (comma separated values) is the major format of open data (like those of <a href="https://resas.go.jp/">"RESAS"</a>) and outputs of spreadsheets (like Excel).

In [None]:
file = open("data/csvfile.csv", "r")
lineList = file.readlines()
file.close()
# just show the first 7 prefs
for i in range(0, 7):
    # remove "\n" at the end of each line
    if lineList[i][-1] == "\n":
        lineList[i] = lineList[i][0:-1]
    print(lineList[i])

In [None]:
file = open("data/csvfile.csv", "r")
lineList = file.readlines()
file.close()
# just show the first 7 prefs
for i in range(0, 7):
    # remove "\n" at the end of each line
    if lineList[i][-1] == "\n":
        lineList[i] = lineList[i][0:-1]
    wordList = lineList[i].split(",")
    print(wordList)

In [None]:
file = open("data/csvfile.csv", "r")
lineList = file.readlines()
file.close()
# let us make a dictionary
population = {}
# just show the first 7 prefs
for i in range(0, 7):
    # remove "\n" at the end of each line
    if lineList[i][-1] == "\n":
        lineList[i] = lineList[i][0:-1]
    wordList = lineList[i].split(",")
    population[wordList[0]] = wordList[1]
print(population)

Note that above examples assume the file "<tt>data/csvfile.csv</tt>" is encoded in "<tt>utf_8</tt>" (default).  As far as you are dealing with numbers and English texts, there would be no problem.  However, if you are dealing with Japanese language in the CSV files, you may need to cope with "encoding" methods.  Microsoft Excel usually makes csv files in "<tt>cp932</tt>" (Microsoft version of Shift-JIS) for Japanese texts.  In such a case, try specifying "<tt>encoding='cp932'</tt>" when opening the file.

In [None]:
file = open("data/csvfile2.csv", "r", encoding="cp932")
lineList = file.readlines()
file.close()
# just show the first 7 prefs
for i in range(0, 7):
    # remove "\n" at the end of each line
    if lineList[i][-1] == "\n":
        lineList[i] = lineList[i][0:-1]
    wordList = lineList[i].split(",")
    print(wordList)

<h3 style="background-color: #cf6868; color: white; padding: 24px; text-align: center;">(cc) Koziken, MMXVII</h3>