## In the Rear View Mirror

What have we seen already?

####  Session 1:  numpy arrays

The numpy package depends on Python, and introduces a new star of the show:  the numpy array, or nd-array.

In [1]:
import numpy as np

new_star = np.array(list("123456789"), dtype=int).reshape(3,3)
new_star

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

####  Session 2:  numpy array powers

Array based computing removes a lot of the looping syntax, making concise expressions the new reality.

In [2]:
a = np.arange(1, 10, 2)
a

array([1, 3, 5, 7, 9])

In [3]:
b = np.linspace(1, 9, 5)
b

array([1., 3., 5., 7., 9.])

In [4]:
c = a + b
c

array([ 2.,  6., 10., 14., 18.])

####  Session 3:  pandas Series

The pd.Series "wraps" or "includes" a single dimensional numpy array, which one may extract using the .to_list method.

In [5]:
import pandas as pd

c_p = pd.Series(c)
c_p

0     2.0
1     6.0
2    10.0
3    14.0
4    18.0
dtype: float64

In [6]:
c_p.to_numpy()

array([ 2.,  6., 10., 14., 18.])

####  Session 4:  pandas DataFrame

We have many ways to initialize a pandas DataFrame, getting us a "cityscape" of side-by-side Series (columnar objects, each with its own data type).

In [7]:
rng = np.random.RandomState(0)
df = pd.DataFrame({"Col_A":a, 
                   "Col_B":b,
                   "Col_C":rng.randint(11, 30, 5)},
                   index = ['mercury', 'mars', 'earth', 
                            'neptune', 'saturn'])
df

Unnamed: 0,Col_A,Col_B,Col_C
mercury,1,1.0,23
mars,3,3.0,26
earth,5,5.0,11
neptune,7,7.0,14
saturn,9,9.0,14


#### Session 5:  DF powers (groupby, agg, filter, lambda, regex)

With groupby, we're able to:  split, chunk, combine (aggregate).

In [8]:
system = df.groupby(["near", "near", "near", "far", "far"])
system[['Col_A']].agg('count')

Unnamed: 0,Col_A
far,2
near,3


####  Session 6:  more DF ops (apply, transform, pivot, stack)

With hierarchical indexing, we have the freedom swap columns for rows, potentially bringing about new insights into the data.

[Check It Out!](https://nbviewer.org/github/4dsolutions/clarusway_data_analysis/blob/main/DAwPy_S5_6_%28Groupby_and_Useful_Operations%29/Advanced.ipynb) (what our class work really looks like)

####  Session 7:  NaNs and Nones

These may be fine as is, but oft times we see them as problematic in the data.  We have cleaning tools.

In [9]:
type(np.NaN) == type(None)  # nope!

False

####  Session 8:  Outliers

####  Session 9:  Combining DFs

####  Session 10:  Text and Time

####  Session 11:  RegEx and I/O

####  Session 12:  Recap

## Our Daily Schedule

Substitute the start time in your timezone.  

Make a pandas program using the np.datetime type?  Think about it.

* The first lesson is between => 10.00 - 10.45 
* 10-minute break between => 10.45 - 10.55
* The second lesson is between => 10.55 - 11.40 
* 10-minute break between => 11.40 - 11.50 
* The third lesson is between => 11.50 - 12.35 
* 10-minute break between => 12.35 - 12.45
* The fourth (last) lesson is between => 12.45 - 13.30 

Where are the files:

The Notebooks we used [were here](https://github.com/4dsolutions/clarusway_data_analysis).  Feel free to clone locally and pull down updates such as these may be pushed up to the repo.

## Let's Look at Git

<a data-flickr-embed="true" href="https://www.flickr.com/photos/kirbyurner/52567606601/in/dateposted-public/" title="Screen Shot 2022-12-16 at 12.44.11 PM"><img src="https://live.staticflickr.com/65535/52567606601_22a034eb5f_z.jpg" width="640" height="215" alt="Screen Shot 2022-12-16 at 12.44.11 PM"></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

The instructor updates these notebooks(s) locally (localhost), then adds (stages) the notebook(s) for uploading, then commits them and pushes them.  These are Git actions.

## Topics in Python...

The rest of this Notebook is devoted to topics we took up in class.

In [10]:
class Python:

    def __repr__(self):           # <-- __rib__
        return f'Python named {self.name}'
    
    def __init__(self, nm):
        """birth method"""
        self.name = nm
        self.stomach = [ ]
        
    def eat(self, food):
        self.stomach.append(food)
        
    def __call__(self, food):     # <-- __rib__
        self.eat(food)
        
    def __getitem__(self, idx):   # <-- __rib__
        print(type(idx))
        return self.stomach[idx]

In [11]:
snake = Python("Monty")
snake

Python named Monty

In [12]:
snake2 = Python("Jerry")
snake2

Python named Jerry

In [13]:
snake.name

'Monty'

In [14]:
snake2.__dict__

{'name': 'Jerry', 'stomach': []}

In [15]:
snake.__dict__

{'name': 'Monty', 'stomach': []}

In [16]:
snake.eat("cheese")

In [17]:
snake.stomach

['cheese']

In [18]:
snake[0]

<class 'int'>


'cheese'

In [19]:
snake('tapioca')

In [20]:
snake.stomach

['cheese', 'tapioca']

In [21]:
snake[slice(None,None)]

<class 'slice'>


['cheese', 'tapioca']

In [22]:
Python.eat(snake, "cake")

In [23]:
snake.stomach

['cheese', 'tapioca', 'cake']

In [24]:
snake2.stomach

[]

In [25]:
s = "hello, world"
str.upper(s)

'HELLO, WORLD'

In [26]:
s.upper()

'HELLO, WORLD'

In [27]:
repr(snake)

'Python named Monty'

In [28]:
snake2.__repr__()

'Python named Jerry'

# Sudoku Project


In [29]:
import sudoku

In [30]:
sudoku.generate()

Complete; Time: 2.431652784347534


array([[9, 5, 8, 7, 2, 1, 6, 4, 3],
       [4, 6, 2, 9, 8, 3, 7, 5, 1],
       [1, 7, 3, 5, 6, 4, 2, 8, 9],
       [7, 8, 5, 3, 1, 6, 4, 9, 2],
       [6, 3, 4, 2, 9, 5, 1, 7, 8],
       [2, 9, 1, 8, 4, 7, 3, 6, 5],
       [8, 4, 9, 1, 7, 2, 5, 3, 6],
       [5, 1, 7, 6, 3, 8, 9, 2, 4],
       [3, 2, 6, 4, 5, 9, 8, 1, 7]])

In [31]:
B = sudoku.generate()

Complete; Time: 2.887970209121704


In [32]:
B

array([[8, 3, 9, 2, 4, 6, 1, 5, 7],
       [6, 7, 1, 8, 3, 5, 2, 4, 9],
       [4, 5, 2, 7, 1, 9, 8, 6, 3],
       [5, 2, 6, 3, 9, 1, 7, 8, 4],
       [1, 4, 8, 6, 5, 7, 9, 3, 2],
       [3, 9, 7, 4, 8, 2, 6, 1, 5],
       [7, 8, 3, 9, 6, 4, 5, 2, 1],
       [2, 6, 5, 1, 7, 3, 4, 9, 8],
       [9, 1, 4, 5, 2, 8, 3, 7, 6]])

In [33]:
sudoku.verify(B)

[45 45 45 45 45 45 45 45 45]
[45 45 45 45 45 45 45 45 45]
[45, 45, 45, 45, 45, 45, 45, 45, 45]


In [34]:
str(1.000000000000000)

'1.0'

In [35]:
len(str(1.1230000000))

5

In [36]:
import numpy as np
np.NaN

nan

In [37]:
str(np.NaN)

'nan'