#### Clarusway Python

* [Instructor Landing Page](landing_page.ipynb)
* <a href="https://colab.research.google.com/github/4dsolutions/clarusway_data_analysis/blob/main/basic_python/sandbox_week_06.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>
* [![nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/4dsolutions/clarusway_data_analysis/blob/main/basic_python/sandbox_week_06.ipynb)

# Sandbox 6: Looking Back; Looking Ahead

<a data-flickr-embed="true" href="https://www.flickr.com/photos/kirbyurner/54149728768/in/dateposted/" title="Monty Python Allusion"><img src="https://live.staticflickr.com/65535/54149728768_34d3453331.jpg" width="500" height="279" alt="Monty Python Allusion"/></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

In [1]:
from IPython.display import YouTubeVideo # you only need to do this once, then go like...

In [2]:
YouTubeVideo("4vuW6tQ0218") # every Youtube comes with a unique address

## Week 6

Monday:

These will be some of the highlights although not necessarily in this order (a set, not a list):

* Upcoming Schedule (holiday breaks, final sessions)
* LMS: Modules and Packages (additional material)
* Statistics Ahead (intro)
* Transition to DA (completing Python Prep)
* DataFrame Previews (Data Analysis Ahead)
* Hands on Practice (more about callables and their args and kwargs)
* The Rest of Python (ongoing theme -- there's always a little more)

Tuesday:

* Continue in Sandbox 6
* Continue in Notebook17&18
* Using AI to plot a Bell Curve
* Sampling a Population (DA concepts)
* Dice as Objects
* The Rest of Python

Wednesday:
* Lab
  
My Python Warmup (same repo, different subfolder) has some relevant [intro material](https://github.com/4dsolutions/clarusway_data_analysis/blob/main/python_warm_up/warmup_3rd_party_datascience.ipynb).

This figure shows how web development and data science are like two branches of a river or tree. It's a simplified view as there are many more branches and lots of cross-over.

SQL is somewhat at the root of both, as both the web and data science depend on having lots of well organized data (picture file cabinets).

<br />

<a data-flickr-embed="true" href="https://www.flickr.com/photos/kirbyurner/24749338009/in/album-72177720296706479" title="Pythonic Ecosystem"><img src="https://live.staticflickr.com/1624/24749338009_537ab57eb1.jpg" width="375" height="500" alt="Pythonic Ecosystem"/></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

<br />

<a data-flickr-embed="true" href="https://www.flickr.com/photos/kirbyurner/54150809997/in/dateposted/" title="data_access_tree"><img src="https://live.staticflickr.com/65535/54150809997_e4f2596499.jpg" width="401" height="500" alt="data_access_tree"/></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

Information workers access files through a file tree.

## Playing with Foods

In [None]:
foods = [chr(codepoint) for codepoint in range(127815, 127815 + 50)]

In [None]:
print(foods)

In [None]:
from random import shuffle, randint, choice

In [None]:
test_list = (list(choice(foods) * randint(1,10)) + 
             list(choice(foods) * randint(1,10)) + 
             list(choice(foods) * randint(1,10)))
shuffle(test_list)

print(test_list)

In [None]:
test_list.count(test_list[0])

In [None]:
highest_count = max([test_list.count(food) for food in test_list])
highest_count

In [None]:
help(max)

In [None]:
most_frequent = max(test_list, key=test_list.count)  # which is highest count?
most_frequent

In [None]:
print(f"The food that occurs most frequently is {most_frequent}, it occurs {highest_count} times.")
print("Possibly other foods occur just as frequently.")

Unicode: the basis of Python's native string type (str).

<a data-flickr-embed="true" href="https://www.flickr.com/photos/kirbyurner/29832307687/in/album-72177720296706479" title="Unicode on Windows"><img src="https://live.staticflickr.com/1847/29832307687_0aee594ec5_c.jpg" width="800" height="551" alt="Unicode on Windows"/></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

## The built-in map function

map is a higher order function, meaning it takes a function as its argument, followed by an iterator. 

The function may be a named function, or perhaps a lambda expression.

Languages that allow functions to also play the role of arguments are in a class by themselves. Python is one of them.

In [None]:
help(map)

In [None]:
m = map(lambda x, y, z: x + y + z, (1,2), (3,4), (10, 10))  # reminds of zip

In [None]:
next(m)

In [None]:
next(m)

In [None]:
pow2 = map(lambda x: x**2, range(20))

In [None]:
print(list(pow2))

## Variable number of arguments

Perhaps much sooner than you will need to write your own functions, you will want to get help on what callables are already out there.  In that case, understanding the role of * and / helps a lot.

The * used alone means: only named arguments to its right.
The / used alone means: only positionals to its left.

If / is used, then the ordinary practice of being able to pass by name, even to parameters with no default value, is suspended.

In [None]:
def func1(a, b):
    return a + b

func1(b=1, a=2)

In [None]:
def func2(a, b, /):
    return a + b

func2(1, 2)

In [None]:
try:
    func2(b=1, a=2)
except:
    print("Nope!")

In [None]:
def average(*all_args):
    return sum(all_args)/len(all_args)

In [None]:
average(3,2,4,5,1,4,0,1,8)

## Galton Board and Pascal's Triangle

Here's some raw material for thinking more about statistics.

![](http://4dsolutions.net/ocn/graphics/randtrianim.gif)

From:  [Numeracy Series: Python + POV-Ray](http://4dsolutions.net/ocn/numeracy0.html) (Oregon Curriculum Network, early 2000s)

In [None]:
from pascal import pascal

In [None]:
p = pascal()

In [None]:
next(p)

In [None]:
for _ in range(10):
    next(p)
print(next(p))

In [None]:
from IPython.display import YouTubeVideo

In [None]:
YouTubeVideo("EvHiee7gs9Y")

The two cells below preview where we'll be going in Data Analysis with Python: into pandas, with its most important type, the DataFrame.

In [None]:
import pandas as pd  # see landing_page for other examples

The cell below says "make a data table with one column, entitled pumpkins, with a next row from Pascal's Triangle for data. The leftmost indexing column, starting at zero, shows up for free.

In [None]:
table = pd.DataFrame({"pumpkins": next(p)}) # next(p) is a whole row
table

Once you have a pandas data table, named table in this example, you have immediate access to its plot methods.

In [None]:
table.plot(title="Binomial Distribution");

Why a Bell Curve?  Think of how many ways a pumpkin, raining down through a Galton Board of pegs, might find its way to the bottom row.  It has many more ways to reach a center slot than an extreme left or right slot.

The balls falling through the Galton Board end up in buckets and the mathematical description of their counts is the Binomial Distribution. That's what the stair step pattern centers on, given enough trials.

Going from a stair step pattern to a smooth curve makes it the Normal Distribution, which is continuous versus discrete. The Normal Distribution is also known as a Gaussian Distribution or, most informally, as the Bell Curve.

Exhibit:  [Asking Gemini to plot the Gaussian Distribution](https://colab.research.google.com/drive/1ZcZLKAjQsa3Pui-trngWvkON50Ab2tpu?usp=sharing)



In [None]:
YouTubeVideo("zeJD6dqJ5lo")

In [None]:
YouTubeVideo("_YOr_yYPytM")

Since we have the pandas DataFrame in play, lets preview creating a table with more than one column. 

Remember the unnamed numeric index column on the far left comes to us for free.  

Our initializing arguments if passed as a dictionary take the form of column_name:values, and will appear in the same order as the keys.

In [None]:
df = pd.DataFrame({
    'Name': ['Luke','Gina','Sam','Emma'],
    'Status': ['Father', 'Mother', 'Son', 'Daughter'],
    'Birthyear': [1976, 1984, 2013, 2016],
})

In [None]:
df

This example is from one of the linked readings about how lambda functions get used in the real world. 

Here we use the `apply` method to change all the entries in the age column, based on a simple computation.

In [None]:
df['age'] = df['Birthyear'].apply(lambda x: 2024-x)

In [None]:
df

### Sampling from a Population

Lets do a simulation the exercises all our skills plus adds more reading fluency around the class concept.

In [None]:
if not 'randint' in dir():
    from random import randint
    
class Dino:

    def __init__(self):
        self.health  = randint(1, 21)
        self.age     = randint(1, 101)
        self.stomach = [ ]

    def eat(self, food):
        # print("Yum!")
        self.stomach.append(food)

    def __repr__(self):
        return f"Dino at {id(self)}"

In [None]:
population = [Dino() for _ in range(100)] # make 100 dinos

In [None]:
print(foods) # still available from the top of the Notebook

In [None]:
def feed_some(pop, the_food):
    for p in pop:
        if randint(1, 10) < 5:
            p.eat(the_food)
        
def feed_all(pop, the_food):
    for p in pop:
        p.eat(the_food)

In [None]:
feed_some(population, '🍔')
feed_some(population, '🍩')
feed_some(population, '🍦')

In [None]:
def sample(pop, size=30):
    shuffle(pop)
    return pop[:size]

In [None]:
sample1 = sample(population, 10)
sample1

In [None]:
for d in sample1:
    print(d, d.stomach)

## More Practice with Chance

In [None]:
"\u2680"

In [None]:
class Die:
    """
    A single die that only keeps track of one value,
    and represents itself with that value.

    Method: throw
    """
    unicode_faces = {1:"\u2680", 2:"\u2681", 
                     3:"\u2682", 4:"\u2683", 
                     5:"\u2684", 6:"\u2685"}

    show_faces = True # use Unicode display
    
    def __init__(self):
        # 1 facing up to start
        self.faceup = 1
        
    def throw(self):
        """
        throw me, return int
        """
        self.faceup = randint(1, 6)
        return self.faceup
        
    def __repr__(self):
        """
        represent myself
        """
        if self.show_faces: # finds at class level
            output = Die.unicode_faces[self.faceup]
        else: 
            output = "Value: {}".format(self.faceup)
        return output

In [None]:
def shake_em(how_many):
    dice = [Die() for _ in range(how_many)]
    for d in dice:
        d.throw()
    return dice

In [None]:
my_turn = shake_em(5)
list(my_turn)

In [None]:
Die.show_faces = False
list(my_turn)

## itertools (continued)

We have looked at some of the iterators we're able to create using this Standard Library module. Lets look at a couple more, again with emphasis on how lambda expressions might come in handy.

In [None]:
from itertools import filterfalse, starmap, takewhile
from random import randint

In [None]:
r_nums = [randint(0, 20) for _ in range(20)]
print(r_nums)

In [None]:
# play with the lambda condition e.g. get nums not > 5
iterator = filterfalse(lambda x: x%2==0, r_nums)

In [None]:
for odd in iterator:
    print(odd, end=" ")

In [None]:
print(list(filter(lambda x: x%2==0, r_nums)))

In [None]:
output = starmap(lambda x, y: x + " " + y, zip(("Joe", "Sally", "Tom"), 
        ("Smith", "Conner", "Wells")))

In [None]:
list(output)

In [None]:
import primes

In [None]:
p = primes.iter_prime()

In [None]:
under1000 = takewhile(lambda x: x < 1000, p)  # similar to islice

In [None]:
print(list(under1000))