[View in Colaboratory](https://colab.research.google.com/github/cliffrc/orely-jupyter-live/blob/master/SafariOnline_Day2_Part1.ipynb)

# **Essential Machine Learning and Exploratory Data Analysis with Python and Jupyter Notebook**

### Part 1: IO Operations in Python and Pandas (180 minutes)

* Writing a file
* Reading a file
* Using Subprocess Module
* Reading YAML files

# Writing to a file

In [0]:
f = open('workfile.txt', 'w')
f.write("foo")
f.close()
!cat workfile.txt

foo

## Writing to a file with 'context'

In [0]:
with open("workfile.txt", "w") as workfile:
    workfile.write("bam")
!cat workfile.txt

bam

### Reading a file in

In [0]:
f = open("workfile.txt", "r")
out = f.readlines()
f.close()
print(out)

['bam']


### Write two things with complex sentence


In [0]:
number = 1.0
my_string = "My favorite number"
statement = f"{my_string} {number}"

In [0]:
print(statement)

My favorite number 1.0


In [0]:
with open("workfile2.txt", "w") as workfile:
    workfile.write(statement)
!cat workfile2.txt

My favorite number 1.0

#### How do I create a DataFrame in Pandas
* Load a csv file (remotely or locally)
* Load a list or dictionary to create a dataframe
* Create an empty dataframe and programtically insert values into it


### Load CSV from Github (or any url) into Pandas DataFrame

Just path in a url with a CSV.  Note the "raw path"

In [0]:
import pandas as pd
csv_url = 'https://raw.githubusercontent.com/noahgift/mma/master/data/ufc_fights_all.csv'
mma_df = pd.read_csv(csv_url)
mma_df.head()

Unnamed: 0,pageurl,eid,mid,event_name,event_org,event_date,event_place,f1pageurl,f2pageurl,f1name,f2name,f1result,f2result,f1fid,f2fid,method,method_d,ref,round,time
0,/events/UFC-1-The-Beginning-7,7,8,UFC 1 - The Beginning,Ultimate Fighting Championship,11/12/93,"McNichols Arena, Denver, Colorado, United States",/fighter/Royce-Gracie-19,/fighter/Gerard-Gordeau-15,Royce Gracie,Gerard Gordeau,win,loss,19,15,Submission,Rear-Naked Choke,Helio Vigio,1,1:44
1,/events/UFC-1-The-Beginning-7,7,7,UFC 1 - The Beginning,Ultimate Fighting Championship,11/12/93,"McNichols Arena, Denver, Colorado, United States",/fighter/Jason-DeLucia-22,/fighter/Trent-Jenkins-23,Jason DeLucia,Trent Jenkins,win,loss,22,23,Submission,Rear-Naked Choke,Joao Alberto Barreto,1,0:52
2,/events/UFC-1-The-Beginning-7,7,6,UFC 1 - The Beginning,Ultimate Fighting Championship,11/12/93,"McNichols Arena, Denver, Colorado, United States",/fighter/Royce-Gracie-19,/fighter/Ken-Shamrock-4,Royce Gracie,Ken Shamrock,win,loss,19,4,Submission,Rear-Naked Choke,Helio Vigio,1,0:57
3,/events/UFC-1-The-Beginning-7,7,5,UFC 1 - The Beginning,Ultimate Fighting Championship,11/12/93,"McNichols Arena, Denver, Colorado, United States",/fighter/Gerard-Gordeau-15,/fighter/Kevin-Rosier-17,Gerard Gordeau,Kevin Rosier,win,loss,15,17,TKO,Corner Stoppage,Joao Alberto Barreto,1,0:59
4,/events/UFC-1-The-Beginning-7,7,4,UFC 1 - The Beginning,Ultimate Fighting Championship,11/12/93,"McNichols Arena, Denver, Colorado, United States",/fighter/Ken-Shamrock-4,/fighter/Patrick-Smith-21,Ken Shamrock,Patrick Smith,win,loss,4,21,Submission,Heel Hook,Helio Vigio,1,1:49


### Pandas DataFrame Column to List



In [0]:
winning_technique_list = mma_df['method_d'].tolist()
winning_technique_list[0:4]

['Rear-Naked Choke', 'Rear-Naked Choke', 'Rear-Naked Choke', 'Corner Stoppage']

### List to Pandas DataFrame 


In [0]:
techniques_df = pd.DataFrame(winning_technique_list)
techniques_df.columns = ["stoppage"]
techniques_df.head()

Unnamed: 0,stoppage
0,Rear-Naked Choke
1,Rear-Naked Choke
2,Rear-Naked Choke
3,Corner Stoppage
4,Heel Hook


### Pandas DataFrame to Dictionary

Grab a couple of records and convert to Python dictionary

In [0]:
mma_dict = mma_df.head(2).to_dict()
mma_dict

{'eid': {0: 7, 1: 7},
 'event_date': {0: '11/12/93', 1: '11/12/93'},
 'event_name': {0: 'UFC 1 - The Beginning', 1: 'UFC 1 - The Beginning'},
 'event_org': {0: 'Ultimate Fighting Championship',
  1: 'Ultimate Fighting Championship'},
 'event_place': {0: 'McNichols Arena, Denver, Colorado, United States',
  1: 'McNichols Arena, Denver, Colorado, United States'},
 'f1fid': {0: 19, 1: 22},
 'f1name': {0: 'Royce Gracie', 1: 'Jason DeLucia'},
 'f1pageurl': {0: '/fighter/Royce-Gracie-19', 1: '/fighter/Jason-DeLucia-22'},
 'f1result': {0: 'win', 1: 'win'},
 'f2fid': {0: 15, 1: 23},
 'f2name': {0: 'Gerard Gordeau', 1: 'Trent Jenkins'},
 'f2pageurl': {0: '/fighter/Gerard-Gordeau-15',
  1: '/fighter/Trent-Jenkins-23'},
 'f2result': {0: 'loss', 1: 'loss'},
 'method': {0: 'Submission', 1: 'Submission'},
 'method_d': {0: 'Rear-Naked Choke', 1: 'Rear-Naked Choke'},
 'mid': {0: 8, 1: 7},
 'pageurl': {0: '/events/UFC-1-The-Beginning-7',
  1: '/events/UFC-1-The-Beginning-7'},
 'ref': {0: 'Helio Vigio',

### Dictionary to Pandas DataFrame

Take dictionary full of dictionaries and make it a Pandas DataFrame


In [0]:
mma_df2 = pd.DataFrame(mma_dict)
mma_df2.head()

Unnamed: 0,eid,event_date,event_name,event_org,event_place,f1fid,f1name,f1pageurl,f1result,f2fid,f2name,f2pageurl,f2result,method,method_d,mid,pageurl,ref,round,time
0,7,11/12/93,UFC 1 - The Beginning,Ultimate Fighting Championship,"McNichols Arena, Denver, Colorado, United States",19,Royce Gracie,/fighter/Royce-Gracie-19,win,15,Gerard Gordeau,/fighter/Gerard-Gordeau-15,loss,Submission,Rear-Naked Choke,8,/events/UFC-1-The-Beginning-7,Helio Vigio,1,1:44
1,7,11/12/93,UFC 1 - The Beginning,Ultimate Fighting Championship,"McNichols Arena, Denver, Colorado, United States",22,Jason DeLucia,/fighter/Jason-DeLucia-22,win,23,Trent Jenkins,/fighter/Trent-Jenkins-23,loss,Submission,Rear-Naked Choke,7,/events/UFC-1-The-Beginning-7,Joao Alberto Barreto,1,0:52


### Pandas DataFrame to CSV

In [0]:
mma_df2.to_csv("small_mma_records.csv")
!cat small_mma_records.csv

,eid,event_date,event_name,event_org,event_place,f1fid,f1name,f1pageurl,f1result,f2fid,f2name,f2pageurl,f2result,method,method_d,mid,pageurl,ref,round,time
0,7,11/12/93,UFC 1 - The Beginning,Ultimate Fighting Championship,"McNichols Arena, Denver, Colorado, United States",19,Royce Gracie,/fighter/Royce-Gracie-19,win,15,Gerard Gordeau,/fighter/Gerard-Gordeau-15,loss,Submission,Rear-Naked Choke,8,/events/UFC-1-The-Beginning-7,Helio Vigio,1,1:44
1,7,11/12/93,UFC 1 - The Beginning,Ultimate Fighting Championship,"McNichols Arena, Denver, Colorado, United States",22,Jason DeLucia,/fighter/Jason-DeLucia-22,win,23,Trent Jenkins,/fighter/Trent-Jenkins-23,loss,Submission,Rear-Naked Choke,7,/events/UFC-1-The-Beginning-7,Joao Alberto Barreto,1,0:52


### Google Sheets to Pandas DataFrame

**Install Google Spreadsheet Library**

In [0]:
!pip install --upgrade -q gspread

**Authenticate to API**

In [0]:
from google.colab import auth
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials

gc = gspread.authorize(GoogleCredentials.get_application_default())

**Create a Spreadsheet and Put Items in It**

Note, could use existing spreadsheet

In [0]:
sh = gc.create('pramaticai-test')
worksheet = gc.open('pramaticai-test').sheet1
cell_list = worksheet.range('A1:A10')

import random
count = 0
for cell in cell_list:
  count +=1
  cell.value = count
worksheet.update_cells(cell_list)

**Convert Spreadsheet Data to Pandas DataFrame**

In [0]:
worksheet = gc.open('pramaticai-test').sheet1
rows = worksheet.get_all_values()
import pandas as pd
df = pd.DataFrame.from_records(rows)
df

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5
5,6
6,7
7,8
8,9
9,10


### Create A Flask API Around Pandas


Example of creating an API around Pandas with Flask

https://github.com/noahgift/pai-aws



### Serialize a Python Dictionary to Pickle

In [0]:
mydict = {"one":1, "two":2}


In [0]:
import pickle


In [0]:
pickle.dump(mydict, open('mydictionary.pickle', 'wb'))

In [0]:
!ls -l mydictionary.pickle

-rw-r--r-- 1 root root 32 Apr  3 13:29 mydictionary.pickle


In [0]:
!cat mydictionary.pickle

�}q (X   oneqKX   twoqKu.

In [0]:
res = pickle.load(open('mydictionary.pickle', "rb"))

In [0]:
print(res)

{'one': 1, 'two': 2}


### Serialize a Python Dictionary to JSON


In [0]:
import json
with open('data.json', 'w') as outfile:
    json.dump(res, outfile)

In [0]:
!cat data.json

{"one": 1, "two": 2}

In [0]:
with open('data.json', 'rb') as outfile:
    res2 = json.load(outfile)

In [0]:
print(res2)

{'one': 1, 'two': 2}


### Save to Yaml

In [0]:
import yaml

In [0]:
with open("data.yaml", "w") as yamlfile:                                               
    yaml.safe_dump(res2, yamlfile, default_flow_style=False) 

In [0]:
!cat data.yaml

one: 1
two: 2


### Load Yaml

In [0]:
with open("data.yaml", "rb") as yamlfile:                                               
    res3 = yaml.safe_load(yamlfile) 

In [0]:
print(res3)

{'one': 1, 'two': 2}


## Concurrency in Python

### Using the subprocess command



In [0]:
import subprocess


In [0]:
res = subprocess.Popen("ls -l", shell=True, stdout=subprocess.PIPE)


In [0]:
out = res.stdout.readlines()

In [0]:
print(out)

[b'total 12\n', b'drwxr-xr-x 1 root root 4096 Mar 13 21:48 datalab\n', b'-rw-r--r-- 1 root root   22 Apr  3 12:42 workfile2.txt\n', b'-rw-r--r-- 1 root root    3 Apr  3 12:42 workfile.txt\n']


### Multiprocessing

In [0]:
from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    p = Pool(5)
    print(p.map(f, [1, 2, 3]))

[1, 4, 9]


In [0]:
from multiprocessing import Process, Queue

def f(q):
    q.put([42, None, 'hello'])

if __name__ == '__main__':
    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    print(q.get())    # prints "[42, None, 'hello']"
    p.join()

[42, None, 'hello']


### Async IO in Python

More info here:  https://docs.python.org/3/library/asyncio.html


```
# This is formatted as code
```


```python
import asyncio

def send_async_firehose_events(count=100):
    """Async sends events to firehose"""

    start = time.time() 
    client = firehose_client()
    extra_msg = {"aws_service": "firehose"}
    loop = asyncio.get_event_loop()
    tasks = []
    LOG.info(f"sending aysnc events TOTAL {count}",extra=extra_msg)
    num = 0
    for _ in range(count):
        tasks.append(asyncio.ensure_future(put_record(gen_uuid_events(), client)))
        LOG.info(f"sending aysnc events: COUNT {num}/{count}")
        num +=1
    loop.run_until_complete(asyncio.wait(tasks))
    loop.close()
    end = time.time()  
    LOG.info("Total time: {}".format(end - start))
```


### Larger Scale Concurrency



*   AWS Batch
*   Custom Architectures



### Pyomo Article

https://www.ibm.com/developerworks/cloud/library/cl-optimizepythoncloud1/

![Worker Farm](https://www.ibm.com/developerworks/cloud/library/cl-optimizepythoncloud2/figure1.gif)

### Walk Through Social Power NBA EDA and ML Project

https://github.com/noahgift/socialpowernba/blob/master/notebooks/exploring_team_valuation_nba.ipynb