<a data-flickr-embed="true" href="https://www.flickr.com/photos/kirbyurner/51883694941/in/album-72177720296706479/" title="week2_schedule"><img src="https://live.staticflickr.com/65535/51883694941_84ef7655e9.jpg" width="359" height="500" alt="week2_schedule"></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

# Session 6:  Clarusway Mini-Bootcamp

## Control Flow Statements

* Conditionals (if elif else)
* Loops (for and while)
* List Comprehensions (combining list with for loop syntax)
* Match-Case (Python's new Switch statement)

### Useful Links: 
(sometimes repeated)

* [As We May Think (*Atlantic Montly*, 1945)](https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/)
* [The Scientific Paper is Obsolete (*Atlantic Monthly*, 2018)](https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/) 
* [HOW TO: sqlite3 in Python](https://docs.python.org/3/library/sqlite3.html)
* [StackOverflow:  Field contains Text](https://stackoverflow.com/questions/68161715/does-something-like-followin-exists-select-from-table-where-in-field)
* [Online Encyclopedia of Integer Sequences](https://oeis.org/)
* [Regular Expressions in Wikipedia](https://en.wikipedia.org/wiki/Regular_expression)
* [json module](https://docs.python.org/3/library/string.html#formatspec)
* [Case-Match Syntax in 3.10+](https://www.pythonpool.com/match-case-python/)
* [pandas read_sql](https://pandas.pydata.org/pandas-docs/version/0.15.0/generated/pandas.read_sql.html)
* [pandas read_csv](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)
* [Sampling data from a DataFrame](https://youtu.be/CBCCcssjXQY)
* [Search and Lookup in pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html)
* [Socratica Channel](https://www.youtube.com/watch?v=bY6m6_IIN94&list=PLi01XoE8jYohWFPpC17Z-wWhPOSuh8Er-)
* [Python Data Science Handbook by Jake VanderPlas](https://jakevdp.github.io/PythonDataScienceHandbook/)

### Glossary of Terms 
(not alphabetical)

* ndarray: n-dimensional array (often 2D), a rectangle of cells, in the numpy package
* range:  a type of Python object, a sequence defined by start, stop and step
* list: a type of Python object, like an array, but allowed to be heterogenous
* array:  a type of Python object, like a list but of uniform type (more like an ndarray)
* dict: a type of Python object consisting of key-value pairs, fast lookup, core glue
* hash table: a more generic name for the dict idea, common across most languages
* tuple: a type of Python object, like a list but frozen

# Control Flow Statements

What makes computer programs so useful is their ability to take a different route through the code depending on circumstances.  During any one use of a program, large sections of code may never be executed, because the user wants to do this, and not that.

Even at a much lower level than user choices, programs need to be able to execute code "conditionally" meaning only if specific conditions are met.  

The ability to control the flow of execution is what we will take up next.

## Conditional Statements

Conditional statements revole around three keywords: if, else, and elif.

In [1]:
if "A":  # bool("A")
    print("\"A\" is True")

"A" is True


In [2]:
bool(123)

True

In [3]:
if 2 + 2 == 5:
    print("We're in a parallel universe")
else:
    print("Another day in the neighborhood")

Another day in the neighborhood


The construct below is sometimes called an if-else ladder.  Any number of elifs is OK, and `else` is optional.

In [4]:
from random import randint

guess = randint(1, 6)  # inclusive of both lower and upper bound

if guess   == 1:
    print("One is for Fun")
elif guess == 2:
    print("Two is for Blue")
elif guess == 3:
    print("Three is for Free")
elif guess == 4:
    print("Four is for door")
elif guess == 5:
    print("Five is alive")
else:
    print("Not 1-5")

Two is for Blue


Quite often one will find, at the end of a Python module, a conditional statement that says what to do, if and only if, the module is being run top-level i.e. is not being imported by some other module.

Some modules, which we could call library modules, define a lot of callable objects when run, but leave it to the top-level module to do any actual calling.  For example, the `math` module, when imported, does nothing on its own except define a lot of ready-to-hand objects (e.g. sqrt, log, cos etc.).

In [5]:
if __name__ == "__main__":
    print("Run some code only if the module is being run, not imported")

Run some code only if the module is being run, not imported


Finally there's what's called a ternary statement.

In [6]:
expr = "a" in "cat"

a = 1 if expr else 2  

a

1

That's the same as:

In [7]:
if expr:
    a= 1
else:
    a = 2
    
a

1

## Loops

Loops are among the most useful of flow control structures, and, combined with if / elif / else, give you (the programmer) a lot of power and flexibility.  You also get to be more concise, as we will see with "list comprehension syntax" -- a combination of list and for-loop syntax.

Python has only two looping statements:  `for` and `while`.  Both work with additional keywords: `else`, `break`, and `continue`.

`for` is for iteration over a for-loop body, an indented suite or block.  The `for` syntax causes a Python name or names to assume a succession of values.  These names are available within the block to drive whatever computations.

In [8]:
total = 0
for x in range(15):
    total = total + x  # keep accumulating
total

105

In [9]:
sum(range(15))

105

In [10]:
for c in "Iterating over a string is fine":
    if c == "s":
        print("found an 's'")
    if c == "f":
        print("found an 'f'")
    if c == "i":
        print("found an 'i'")

print()

found an 'i'
found an 's'
found an 'i'
found an 'i'
found an 's'
found an 'f'
found an 'i'



Remembering our dict type:

In [11]:
zoo = {'monkey': 4, 'dog': 3, 'penguin': 5}
zoo

{'monkey': 4, 'dog': 3, 'penguin': 5}

Poking fun at a worst practice:

In [12]:
try:
    # all your buggy code
    pass
except:
    pass

import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


The loop below breaks long before s reaches 1000 - 1, or 999.

In [13]:
for s in range(1000):
    print(s, end=" ")
    if s > 10:
        print()
        break

0 1 2 3 4 5 6 7 8 9 10 11 


Below is a Python script for playing a guessing game.  

The `while True` loop keeps giving the player a next turn, but when the player wins or runs out of turns or quits, the keyword `break` stops the action. 

The keyword `continue` does not break out of the loop, but merely jumps us to the top of the loop.  One could call it a "short cut" as it tells Python to skip executing the rest of the loop and go back to the initial `while` or `for` statement.

Both `break` and `continue` have a role to play in the code below.

In [14]:
from random import randint

guesses = 5
guess   = randint(1, 10)  # adjust at will

print("I'm thinking of a number from 1 to 10.\nWhat is it?")
print("You have 5 guesses, enter q to quit")

while True:
    
    # are we done?
    if guesses == 0:
        print("Sorry, no more guesses, you lose.")
        break

    ans = input("Your guess > ")
    
    # initial answer check
    if ans.upper() == "Q":
        print("OK bye")
        break
    if not ans.isdigit() or not 1 <= int(ans) <= 10:
        print("Integer between 1 and 10 please")
        # guesses -= 1
        print(f"Guesses remaining: {guesses}")
        continue
        
    # if not quitting and not illegal input...
    answer = int(ans)
    if answer == guess:
        print("Yay, you won!  Congratulations!")
        break
    elif answer < guess:
        print("Too low.")
    else:
        print("Too high.")
        
    guesses -= 1
    print(f"Guesses remaining: {guesses}")

print("Lets play again soon")

I'm thinking of a number from 1 to 10.
What is it?
You have 5 guesses, enter q to quit


Your guess >  9


Too high.
Guesses remaining: 4


Your guess >  3


Yay, you won!  Congratulations!
Lets play again soon


### List Comprehension Syntax

Now that you know about lists collections, and about for loops, lets check out a pithy syntax Pythonistas tend to favor for its conciseness.

In [15]:
# https://oeis.org/A000217
A000217 = [n * (n + 1)//2 for n in range(0, 11)]

A000217

[0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55]

In [16]:
# https://oeis.org/A000290

A000290 = [n**2 for n in range(0, 11)]

A000290

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [17]:
# https://oeis.org/A000217
A000217 = [n * (n + 1)//2 for n in range(0, 11)]

A000217

[0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55]

In [18]:
# https://oeis.org/A005901
A005901 = [(10 * f * f + 2 if f > 1 else 1) 
           for f in range(1, 11)]

A005901

[1, 42, 92, 162, 252, 362, 492, 642, 812, 1002]

List comprehensions also accept an optional if clause, which may be used to filter out unwanted values.

The if clause below only keeps `eye` if it has no factors in common with `n` other than 1.  Two integers with no factors in common other than 1 are known as "strangers" to one another, or as "relatively prime".  The number of strangers from n-1 down to 1, is known as the "totient" of n.

In [19]:
from math import gcd
n = 20

strangers = [ eye for eye in range(n) if gcd(eye, n) == 1 ]
print(strangers)

[1, 3, 7, 9, 11, 13, 17, 19]


In [20]:
totient = len(strangers)
print(f"The totient of n={n} is {totient}")

The totient of n=20 is 8


The idea of list comprehensions gave rise to set and dict comprehension syntax.  There's also the parentheses-based "generator expression".  You might have imagined a "tuple comprehension" but that turns out to be unnecessary.  

A generator expression doesn't actually contain all the values in precomputed form, like a list would.  It computes next values "on demand".  This turns out to be a deep concept.

### Match-Case Syntax (new in 3.10)

Until version 3.10, Python never had a true switch statement, a common pattern in other languages.  Python's match-case is a powerful addition to the flow control toolkit.

## Some Summary Scripts

Now that we have a lot of Python behind us, lets use fragments from previous sessions to convert the `links.txt` file into .csv and .json files.

What you're checking for are signs that Python has become readable, and the "plots" of the following "stories" comes through loud and clear. Remember these scripts run top to bottom.

In each case, we're either reading from one format in order to save in another, or simply consulting the data in one form or another, perhaps pulling it up in a sorted order.

In [21]:
from csv import writer
links_obj = open("links.txt", 'r')
csv_file  = open("links.csv", 'w')

output = writer(csv_file, delimiter=",")

output.writerow(["Link", "URL"])
for line in links_obj.readlines():
    row = line[:-2].replace("* [","").split("](")
    output.writerow(row)
    
csv_file.close()
links_obj.close()

In [22]:
csv_file = open("links.csv", "r")
print(csv_file.read()[:1000])  # or print the whole thing
csv_file.close()

Link,URL
This Notebook on Colab,https://colab.research.google.com/github/4dsolutions/bootcamp/blob/main/session1.ipynb
This Notebook on nbviewer,https://nbviewer.org/github/4dsolutions/bootcamp/blob/main/session1.ipynb
Python.org Home Page,https://www.python.org
Beginner's Guide,https://wiki.python.org/moin/BeginnersGuide
Python PEPs,https://www.python.org/dev/peps/
PEP 8,https://www.python.org/dev/peps/pep-0008/
PEP8.org,https://pep8.org
Standard Library Modules,https://docs.python.org/3/library/index.html
Formatting Mini-Language,https://docs.python.org/3/library/string.html#formatspec
Python Documentation,https://docs.python.org/3/
Anaconda Home Page,https://anaconda.org
Jupyter Project,https://jupyter.org/
Markdown Cheat Sheet,https://notebook.community/tschinz/iPython_Workspace/00_Admin/CheatSheet/Markdown%20CheatSheet
Repl.it,https://replit.com/site/ide
Course Album,https://flic.kr/s/aHBqjzCs82
Course Repository,https://github.com/4dsolutions/bootcamp
This Notebook on Colab,https

In [23]:
import json
import csv

csv_file = open("links.csv", "r")
csv_reader = csv.reader(csv_file)

pairs = {}  # empty dict
for line in csv_reader:
    if line[0] == "Link":
        continue
    pairs[line[0]]=line[1]  # building up the dict
csv_file.close()

# print(pairs)  -- uncomment if you wish

json_file = open("links.json",'w')
json.dump(pairs, json_file)
json_file.close()

The script below uses the `links.json` file we just created to populate a SQLite database.

<a data-flickr-embed="true" href="https://www.flickr.com/photos/kirbyurner/24749338009/in/album-72177720296706479/" title="Pythonic Ecosystem"><img src="https://live.staticflickr.com/1624/24749338009_537ab57eb1.jpg" width="375" height="500" alt="Pythonic Ecosystem"></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

In [24]:
import sqlite3
import json

con = sqlite3.connect('links.db')
cur = con.cursor()

# Create table
cur.execute("""DROP TABLE IF EXISTS BookMarks""")
cur.execute('''CREATE TABLE BookMarks
               (location text, 
                url text UNIQUE)''')

json_file = open("links.json",'r')
pairs = json.load(json_file)
json_file.close()

# Insert a row of data
for key in pairs:
    value = pairs[key]
    cur.execute('INSERT OR REPLACE INTO BookMarks (location, url) VALUES(?,?)', (key, value))

# Save (commit) the changes
con.commit()

# We can also close the connection if we are done with it.
# Just be sure any changes have been committed or they will be lost.
con.close()

In [25]:
con = sqlite3.connect('links.db')
cur = con.cursor()
cur.execute('SELECT * FROM BookMarks WHERE INSTR(location, "Python")')
for record in cur.fetchall():
    print(record)
con.close()

('Python.org Home Page', 'https://www.python.org')
('Python PEPs', 'https://www.python.org/dev/peps/')
('Python Documentation', 'https://docs.python.org/3/')
('Python Environment (XKCD)', 'https://xkcd.com/1987/')
('I-Python magics', 'https://ipython.readthedocs.io/en/stable/interactive/magics.html')
('Python-ds', 'https://www.python-ds.com/')
('Python Data Science Handbook by Jake VanderPlas', 'https://jakevdp.github.io/PythonDataScienceHandbook')
('Python Data Structures', 'https://docs.python.org/3/tutorial/datastructures.html')
('Python driver for MongoDB', 'https://www.mongodb.com/languages/python')
('HOW TO: sqlite3 in Python', 'https://docs.python.org/3/library/sqlite3.html')


In [26]:
con = sqlite3.connect('links.db')
cur = con.cursor()
cur.execute('SELECT * FROM BookMarks WHERE INSTR(location, "vZome")')
for record in cur.fetchall():
    print(record)
con.close()

('vZome: json case study', 'https://github.com/dekay5555555/vzome-sharing/tree/main/2022/02/16/19-38-24-icosa-test')


### Previewing pandas

The star object in pandas is the DataFrame, much as the star (celebrated) object in numpy is the ndarray.  The two go together.  ndarrays stand up vertical column vectors, called Series in pandas. Series stand side-by-side to create a DataFrame of multiple columns.

A DataFrame is akin to a table in a spreadsheet, a rows and columns affair.  Rows and columns remain our everyday data structure rolling forward.  Only the API has changed.

In [27]:
import pandas as pd
pd.set_option('display.max_colwidth', None)  # or 199
con = sqlite3.connect('links.db')

# read directly from sqlite3 and/or a SQL Alchemy database
df = pd.read_sql("SELECT * FROM BookMarks", con)
df.head(10)

Unnamed: 0,location,url
0,This Notebook on Colab,https://colab.research.google.com/github/4dsolutions/bootcamp/blob/main/session1.ipynb
1,This Notebook on nbviewer,https://nbviewer.org/github/4dsolutions/bootcamp/blob/main/session1.ipynb
2,Python.org Home Page,https://www.python.org
3,Beginner's Guide,https://wiki.python.org/moin/BeginnersGuide
4,Python PEPs,https://www.python.org/dev/peps/
5,PEP 8,https://www.python.org/dev/peps/pep-0008/
6,PEP8.org,https://pep8.org
7,Standard Library Modules,https://docs.python.org/3/library/index.html
8,Python Documentation,https://docs.python.org/3/
9,Anaconda Home Page,https://anaconda.org


In [28]:
df[20:30]

Unnamed: 0,location,url
20,Python Environment (XKCD),https://xkcd.com/1987/
21,LaTeX cheat sheet,https://joshua.smcvt.edu/undergradmath/undergradmath.pdf
22,I-Python magics,https://ipython.readthedocs.io/en/stable/interactive/magics.html
23,json module,https://docs.python.org/3/library/string.html#formatspec
24,vZome: json case study,https://github.com/dekay5555555/vzome-sharing/tree/main/2022/02/16/19-38-24-icosa-test
25,PEP 238 re Changing the Division Operator,https://www.python.org/dev/peps/pep-0238/
26,Brief Tour of the Standard Library,https://docs.python.org/3/tutorial/stdlib.html
27,HOW TO: Unicode,https://docs.python.org/3/howto/unicode.html
28,Escape Sequences,https://www.python-ds.com/python-3-escape-sequences
29,Python-ds,https://www.python-ds.com/


In [29]:
df[50:51].url

50    https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/
Name: url, dtype: object

In [30]:
sorted_table = df.sort_values(["location"])

In [31]:
sorted_table.head(15)

Unnamed: 0,location,url
43,A Neural Network Playground,https://playground.tensorflow.org/
9,Anaconda Home Page,https://anaconda.org
50,"As We May Think (*Atlantic Montly*, 1945)",https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/
3,Beginner's Guide,https://wiki.python.org/moin/BeginnersGuide
26,Brief Tour of the Standard Library,https://docs.python.org/3/tutorial/stdlib.html
17,Built-in Types,https://docs.python.org/3/library/stdtypes.html
56,Case-Match Syntax in 3.10+,https://www.pythonpool.com/match-case-python/
35,Common String Operations,https://docs.python.org/3/library/string.html
13,Course Album,https://flic.kr/s/aHBqjzCs82
14,Course Repository,https://github.com/4dsolutions/bootcamp


In [32]:
df.columns

Index(['location', 'url'], dtype='object')

### HTML as Structured Storage

There's still another put format that's considered a standard, and human readable:  HTML itself.  Having `links.txt` and even `links.csv` with no `links.html` would seem an oversight.  After all, even with no JavaScript running, an HTML file consisting of links, in a browser tab, will still be readable and clickable.

In [33]:
webpage = \
"""<html>
<head>
<title>{title}</title>
</head>
<body>
{body}
</body>
</html>
"""

In [34]:
title = "Useful Links"

json_file = open("links.json",'r')
pairs = json.load(json_file)
json_file.close()

links = ""
for location in sorted(pairs.keys()):
    url = pairs[location]
    links += f"<li><a href='{url}'>{location}</a></li>\n"

# insert links into body below
body = \
f"""<h1>Useful Links</h1>
<ul>
{links}</ul>"""

# save completed web page
outfile = open("links.html", "w")
print(webpage.format(title=title, body=body), file=outfile)
outfile.close()

Are we ready to play similar games using `glossary.txt` as input?  Like with `links.txt`, all the markdown from each of the Notebooks has been cut and pasted into a single text file.  The job now is to parse it and save it in a more standard format, such as csv, json or sqlite3.

In [35]:
from csv import writer
links_obj = open("glossary.txt", 'r')
csv_file  = open("glossary.csv", 'w')

output = writer(csv_file, delimiter="|")

output.writerow(["Term", "Definition"])
for line in links_obj.readlines():
    row = line[:-1].replace("* ","").split(":")
    output.writerow(row)
    
csv_file.close()
links_obj.close()

In [36]:
gloss_df = pd.read_csv("glossary.csv", delimiter="|")

In [37]:
gloss_df.head()

Unnamed: 0,Term,Definition
0,localhost,"your own computer, with IP number 127.0.0.1"
1,ASCII,"American Standard Code for Information Interchange, 256 glyphs max."
2,BDFL,"Guido's old title, Benevolent Dictator for Life (he's now BDFL emeritus)"
3,Python,"a general purpose, object-oriented computer language"
4,Python 3,any 3.x version of Python


In [38]:
gloss_sorted_table = gloss_df.sort_values(["Term"])

In [39]:
gloss_sorted_table.head()

Unnamed: 0,Term,Definition
40,"""magic method""",Python method with a special name e.g. `__invert__`
1,ASCII,"American Standard Code for Information Interchange, 256 glyphs max."
52,Angular,JS framework from Google
14,Apache,popular free webserver
20,Apache Cassandra,a popular NoSQL DB


In [40]:
gloss_sorted_table.tail()

Unnamed: 0,Term,Definition
67,regex,Regular Expression
26,requests,popular Python package for talking to web servers
56,scikit-learn,Machine Learning in Python
27,sphinx,package for building documentation as a website
74,tuple,"a type of Python object, like a list but froze"
