<a data-flickr-embed="true" href="https://www.flickr.com/photos/kirbyurner/51883694941/in/album-72177720296706479/" title="week2_schedule"><img src="https://live.staticflickr.com/65535/51883694941_84ef7655e9.jpg" width="359" height="500" alt="week2_schedule"></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

# Session 5:  Clarusway Mini-Bootcamp

* Main String Operations
* Collection Types


### Useful Links

* [This Notebook on Colab](https://colab.research.google.com/github/4dsolutions/bootcamp/blob/main/session1.ipynb)
* [This Notebook on nbviewer](https://nbviewer.org/github/4dsolutions/bootcamp/blob/main/session1.ipynb)
* [Kaggle](https://www.kaggle.com)
* [A Neural Network Playground](https://playground.tensorflow.org/)
* [Machine Learning Guide](https://www.datasciencecentral.com/top-down-learning-path-machine-learning-for-software-engineers/)
* [Common String Operations](https://docs.python.org/3/library/string.html)
* [PyData: Built-in Superheros by David Beazley](https://youtu.be/lyDLAutA88s)
* [Python Data Structures](https://docs.python.org/3/tutorial/datastructures.html)
* [Walrus Operator](https://youtu.be/QVIpHAsgMas)
* [Course Album](https://flic.kr/s/aHBqjzCs82)
* [Course Repository](https://github.com/4dsolutions/bootcamp)
* [MongoDB localhost edition](https://www.mongodb.com/try/download/community)
* [Python driver for MongoDB](https://www.mongodb.com/languages/python)

### Glossary of Terms (not alphabetical)

* Boole, George: inventor of Boolean algebra
* Byron, Ada: inventor of computer programming
* Hopper, Grace:  computer scientists, US Navy Rear Admiral
* DARPA: Defense Advanced Research Projects Agency
* IDLE:  Python's official IDE (which DARPA helped fund)
* Kleene, Stephen Cole:  mathematician, inventor of regex syntax
* Regular Expression: a pattern-matching string-based a mini-language
* re:  Python's Standard Library module implementing Regular Expressions
* regex:  Regular Expression

## Main String Operations

Computer programming is as much about string manipulation as number crunching.



In [None]:
import string

In [None]:
string.ascii_lowercase

In [None]:
string.punctuation

In [None]:
"I like JavaScript".replace("JavaScript", "Python")  # or like both

In [None]:
webpage = \
"""
<html>
<head>
<title>{title}</title>
</head>
<body>
{body}
</body>
</html>
"""

In [None]:
body = \
"""
<p>Here is some simple HTML.  This section
will be substituted for {body} in a template
the contains outermost tags.</p>
<ul>
<li>Python:  Batteries included</li>
<li>Python:  Fits Your Brain</li>
<li>Python:  Plays Well with Others</li>
</ul>
"""

title = "Why Python is Great"

outfile = open("why_python.html", "w")
print(webpage.format(title=title, body=body), file=outfile)
outfile.close()

## Collection Types

See Slides.

Collections store multiple elements within some kind of structure.  The elements may themselves be collections, meaning collection types may be nested to arbitrary depth, which is not to say they should be nested too deeply.

### Sequences versus Mappings

Sequence types have a left to right and right to left order.  One my access them with numeric indexes, and also slice them.

Mappings are not necessarily ordered by a numeric index and cannot be sliced.  They do not have "leftmost" or "rightmost" elements.

Actually a string is a good example of a sequence in that each character is an individual element thereof, and addressible using index and/or slice notation.

The list should seem similar, in having a left to right ascending index for each element, starting from 0.  The elements may be of different types.  The list is also mutable.

In [None]:
the_list = ["string", 1, [], 10.10]
the_list

In [None]:
type(the_list[0])

In [None]:
type(the_list[1])

In [None]:
type(the_list[2])

The list is one of the most used data structures.  Because they're sequences, one is able to "slice" them using slice notation.

In [None]:
the_list

In [None]:
whole_list  = the_list.copy()

In [None]:
whole_list is the_list

The Python slice is actually its own type of object.

In [None]:
the_slice = slice(1, None)
the_list[the_slice]

In [None]:
the_list[0] = 10
the_list

In [None]:
the_list[2].append('monkey')
the_list[2].append('parrot')
the_list[2].append('zebra')

the_list

In [None]:
the_list.insert(0, 'penguin')


In [None]:
the_list

The `tuple` is very like a list in being able to contain a mix of types.  However the `tuple` type is immutable but for its mutable elements, which cannot themselves be eliminated.  Lets take a look...

In [None]:
my_tuple = ('dog', 'cat', [ ], 'monkey')
my_tuple

In [None]:
try:
    my_tuple[0] = 'snake'
except Exception as e:
    print(e)

In [None]:
my_tuple[2]

In [None]:
my_tuple[2].append("still")
my_tuple[2].append("mutable")
my_tuple

In [None]:
try:
    del my_tuple[2]  # trying to delete the whole list
except Exception as e:
    print(e)

In [None]:
from collections import namedtuple
Element  = namedtuple("Atom", "Symbol Name Protons Weight Series")
hydrogen = Element("H", "Hydrogen", 1, 1.008, "diatomic nonmetal")
lead     = Element("Pb", "Lead", 82, 207.21, "post-transition metal")

In [None]:
hydrogen

In [None]:
hydrogen.Symbol

In [None]:
list(hydrogen)

A truly immutable piece of data, such as `(2, 1, 1, 0)` is suitable for use as a key in a dictionary.  Lists are not suitable because not immutable.  Think of the corruption one might cause if two keys started out unique, but then were mutated to become identical.  The whole premise of the dict type is that keys are unique.

In [None]:
from collections import deque

In [None]:
pipe = deque()

In [None]:
pipe.append('job1')
pipe.append('job2')
pipe.append('job3')

In [None]:
pipe

In [None]:
job = pipe.popleft()
job

In [None]:
pipe

## Mappings

<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Remember <a href="https://twitter.com/hashtag/Python?src=hash&amp;ref_src=twsrc%5Etfw">#Python</a> sets and dicts use a hash table internally which makes them very fast - algorithmic complexity of O(1) - for lookups (e.g. using the &quot;in&quot; operator).</p>&mdash; Bob Belderbos (@bbelderbos) <a href="https://twitter.com/bbelderbos/status/1493526761195417602?ref_src=twsrc%5Etfw">February 15, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> 

In [None]:
p = set([1, 4, 3, 9, 2, 2, 5, 1, 10])
p

In [None]:
q = set([2, 17, 9, 5, 2, 11, 12])

In [None]:
p.intersection(q)

In [None]:
p.union(q)

In [None]:
import numpy as np    # may need to pip install
import pandas as pd   # may need to pip install
import sys

print("np:", np.__version__, "\npd:", 
pd.__version__, '\npython:', sys.version[:5])
sys.path

In [None]:
the_dict = {"python": '3.7.9', "numpy": '1.17.3', "pandas": '1.2.4'}

In [None]:
the_dict.get("Julia", "N/A")

In [None]:
the_dict.get("python", "N/A")

In [None]:
the_dict['pandas']

In [None]:
the_dict.keys()

In [None]:
the_dict.values()

In [None]:
list(the_dict.items())

### Russian Doll Nesting of Data Structures

Remember the comment above about lists not being suitable as dict keys, because they're mutable.  Ergo, not every data structure is able to play in every position.  Yet to a remarkable extend, the collection types introduced so far may be treated as just more everyday Python objects, meaning they interoperate and nest seamlessly.

In [None]:
rock_stars = [{"Yolandi Visser":"Die Antwoord"}, {"Mick Jagger":"Rolling Stones"},
              {"John Lennon":"The Beatles"}, {"George Harrison":"The Beatles"}]

rock_stars[1]

In [None]:
tuple_keys = {tuple([2, 1, 1, 0]): ["center", "ball 1"],
              tuple([1, 2, 1, 0]): ["center", "ball 7"]}

In [None]:
tuple_keys[(1, 2, 1, 0)]

Might we nest much more deeply?  Of course.  But have mercey on your readers, which may include yourself at a later date.

Let's show off our Notebook's ability to showcase Youtubes.  The awesome talk below is from PyData 2016 in Chicago, and features David Beazeley talking about built-in data structures as superheros.

In [None]:
from IPython.display import YouTubeVideo

In [None]:
# https://youtu.be/lyDLAutA88s
YouTubeVideo("lyDLAutA88s")

In [None]:
zoo = {"tiger": 8, "bear": 7, "monkey": 4}

In [None]:
len(zoo)

In [None]:
list(zoo)

In [None]:
zoo.update({"bear": 10, "parrot": 101, "lizard": 4})

In [None]:
zoo

In [None]:
dir(zoo)

In [None]:
dir([])

In [None]:
data = [(1, 2), (3, 5), (10, 11), (0, 3)]

In [None]:
data.sort()

In [None]:
data

In [None]:
sorted(data, key = lambda i: i[1])

In [None]:
dir([])

In [None]:
np.ndarray

In [None]:
dir(np.ndarray)

In [None]:
np.ones((10, 10)) + 2