# Week 1 Getting your Python On

objectives:
+ the different types of operating systems
+ how to get your python code ready
+ set up your environment
+ install python modules
+ interpreted vs complied
+ benefits and pitfalls of automation

## Getting familiar with the Operating system

The **operaing system** is a soft that manages everything that goes on in the computer, it reads writes and deletes files from the hard drive, it handles how the processes start, how they interact with each other, and how they eventually finish, it manages how memory gets allocated different processes, how network packets are sent and received, and how each programming can access the different hard ware components.

An OS contains *2 parts*:
+ Kernel
+ User space

### kernel:
+ the main core of an OS
+ interact directly with hardware and manages our systems resources
+ as a user we don't directly interact with the kernel

### user space:
+ basically everything outside of the kernel
+ we users interact with directly, like the system program and user interface

### major operating systems:
+ MacOS
    + The Mac OS kernel and some of its user space are also based on a kernel and User space tools from the Unix family known as BSD. So although the graphical interface is extremely different between the 2, the command line is actually similar.
+ Windows
+ Linux
    + open source.
    + Most servers in the world today are running Linux.
    + Linux itself is actually the name of a kernel.
    + a lot of different organizations package their own version of it, called *distribution*.
+ Unix
    + the fundamental ideas of how Linux works today are based on the Unix principles.
    + a lot of the tools that we use to interact to interact with the operating system are open source versions of those originally developed for Unix and are usually referred to as Unix.

## Getting Your Computer Ready for Python

### version checking
Go to the *command line* and type:

python --version

or

python3 --version

to check whether python has been installed

### install module
**install additional module**:
*a command line tool* on MacOS:

pip3

### arrow module

**install arrow module** in the command line:

pip3 install arrow

In [9]:
import arrow
date = arrow.get('2020-01-18', 'YYYY-MM-DD')
date.format('MMM DD YYYY')

'Jan 18 2020'

In [6]:
#increment 6 weeks
date.shift(weeks=+6).format('MMM DD YYYY')

'Feb 29 2020'

## Interpreted vs. Compiled Languages

### Compiled Language
The source code is fed into a piece of software called a **compiler**, which translates this code into machine level language, machine level language can be read by computer directly.

compiler is *operating system specific*.

### Interpreted Language
Programs written in interpreted language generally rely on a intermediary program called an **interpreter**, these programs use interpreters to execute the instructions specified in the code, rather than running them through a compiler first.

## How to run Python Script

see **Unix Workbench** *Writing Program* section for more

save python code in file with .py postfix

### Shebang
shebang will tell the operating system what command we want to use to execute that script

#! /usr/bin/env python3 

chmod +x filepath

## Your own Python Module

see **beginning python from novice to professional** *chapter 10* for more

Promote code reuse

## IDES-integrated development environment(集成开发环境)

# Week 2 Ways to interact with File System using Python

## programming with File

+ Operating System use **File System** to control how data is stored and access
+ Data is usually stored on a disk and saved in **files** which are held in containers called **directorie/folders**.
+ File system are usually organized in a **tree structure**, with directories and files are nested under their parents.
+ A file or folder is located within the structure by its **path**.
        (1) absolute path
        (2) relative path
  

## Reading FIles
when processing large chunks of data, it's a good idea to read that data from files.

 **file object** 

In [39]:
#creating a new file object and assigning it 
#to a variable called file
file = open("spider.txt")

FileNotFoundError: [Errno 2] No such file or directory: 'spider.txt'

when doing this, the operating system checks that we have **permission** to access that file and then gives our code a **file descriptor**.
+ ***file descriptor*** - A token, generated by the OS, that allows programs to do more operations with the file. In python, this file descriptor is stored as an *attribute* of the files object.

In [16]:
#let us to read a single line in the file.
#each time we call this method, the file object updates the current position.
print(file.readline())

le to learning python's mechanism of dealing with file.



In [17]:
#read from the current position until the end of the file.
print(file.read())

The second line
The third line
The forth line
The fifth line.
......



In [None]:
#close the file using the close method
file.close()

This **open-use-close** pattern is a typical way of working with files in most programming languages. It's a good idea to close files after you've opened them for a few **reasons**:
+ When a file is opening your script, your file system usually lock it down so that no other programs or scripts can use it until you're finished.
+ there's a limited number of file descriptors that you can create before your file systems runs out of them.
+ leaving open files hanging around can lead to race conditions which occur when multiple processes try to modify and read from one resource at the same time and can cause all sorts of unexpected behavior.

The **with** block

In [18]:
with open("spider.txt") as file:
    print(file.readline())    

This is a sample file to learning python's mechanism of dealing with file.



The with keyword lets us create a block of code with the work we'd want to do with the file inside of it, and python will **automatically close** the file after the block.

## Iterating through Files

file objects can be iterated in the same way as other Python sequences like list or strings.

In [1]:
with open("spider.txt") as file:
    for line in file:
        print(line.upper())
#files have newline character at the end of each line.
#they are not remove, give rise to the empty lines.

THIS IS A SAMPLE FILE TO LEARNING PYTHON'S MECHANISM OF DEALING WITH FILE.

THE SECOND LINE

THE THIRD LINE

THE FORTH LINE

THE FIFTH LINE.

......



In [38]:
#to improve
with open("spider.txt") as file:
  for line in file:
    print(line.strip().upper())
#the empty lines hve been removed    

FileNotFoundError: [Errno 2] No such file or directory: 'spider.txt'

read the file line into a list.

In [3]:
file = open('spider.txt')
lines = file.readlines()
file.close()

["This is a sample file to learning python's mechanism of dealing with file.\n",
 'The second line\n',
 'The third line\n',
 'The forth line\n',
 'The fifth line.\n',
 '......\n']

the way readlines method read all lines of a file is useful, but be cautious that if the file is too large, it will use up too much *memory space* and lead to poor performance. So when dealing with large files, it's better to read them **line-by-line**. 

## Writing Files

In [4]:
with open("novel.txt","a") as file:
    file.write("This is a sunny day.\n")

file objects can be opened in different modes, a mode is similar to a file permission. It governs what you can do with the file you've just opened. 
- **r** -- the read only mode, it's the default mode.
- **w** -- the write only mode, if the file doesn't exist then python will create it, if the file does exist, then its current contents will be overwritten by whatever we decide to write using our script.
- **a** -- the append mode, append contents to the end of the file.
- **r+** -- the read-write mode, you can both read contents and overwrite it.

**words of caution:** 
if you open a file in a write mode and the file already exist, then the content will be deleted as soon as you open the file. So be sure to check whether you 


## Working with Files

using functions in **os** module, this module provides a layer abstraction between python and operating system, it let us to interact with the operating system without concerning about which platform like windows, MacOS or Linux we are on.

**caution**:Paths can be different across different operating systems.

In [5]:
#delete the file we created earlier
import os
os.remove("novel.txt")

In [6]:
#rename a file
#the first argument is the originial file and the second argument
#is the new name.
os.rename("first_draft.txt","finished_masterpiece.txt")

FileNotFoundError: [Errno 2] No such file or directory: 'first_draft.txt' -> 'finished_masterpiece.txt'

there is a sub-module inside the OS module for dealing with things related to file information like whether they exist or not. This is called the **OS path** sub-module.

In [7]:
#to check whether a file exists
os.path.exists("finished_masterpiece.txt")

True

In [8]:
os.path.exists("userlist.txt")

False

we can use **exists** function to check whether the file exists before we read or write it to avoid losing data.

## More File information

In [9]:
#check how big a file is
os.path.getsize("spider.txt")
#the return number indicate bytes 

86

In [10]:
#when a file is last modified
os.path.getmtime("spider.txt")

1598782292.057646

the returned number is a **Unix timestamp**, it represents the number of seconds since january 1st, 1970. We can use the **datetime** module to make it easier for us humans to read.

In [42]:
#make timestamp more readable
import datetime
timestamp = os.path.getmtime("spider.txt")
time = datetime.datetime.fromtimestamp(timestamp)

In [15]:
#to check whether a file exists
os.path.isfile("spider.txt")

True

we can use both **relative path** and **absolute path**.

In [17]:
#find out the absolute path of a file
os.path.abspath("spider.txt")

'/Users/zhengshaojun/material/PYOS/spider.txt'

## Directories

In [18]:
#check the current directory
print(os.getcwd())

/Users/zhengshaojun/material/PYOS


In [22]:
#create a directory
os.mkdir("new_dir")

In [23]:
#change a directory
os.chdir("new_dir")
os.getcwd()

'/Users/zhengshaojun/material/PYOS/new_dir/new_dir'

In [24]:
#remove a directory
os.mkdir("newer_dir")
os.rmdir("newer_dir")

**caution**:the *rmdir* will only work if the directory is *empty*, so to delete a directory, we need to delete everything in that directory

In [33]:
#list everything in a directory
os.listdir("/Users/zhengshaojun/material/PYOS")

['finished_masterpiece.txt',
 'spider.txt',
 'new_dir',
 '.ipynb_checkpoints',
 'Week 2 Managing Files.ipynb']

In [34]:
d = "/Users/zhengshaojun/material/PYOS"
for i in os.listdir(d):
    #to connect to create a new path
    path = os.path.join(d, i)
    #to see whether it's a directory
    if os.path.isdir(path):
        print("{} is a directory".format(path))
    else:
        print("{} is a file".format(path))

/Users/zhengshaojun/material/PYOS/finished_masterpiece.txt is a file
/Users/zhengshaojun/material/PYOS/spider.txt is a file
/Users/zhengshaojun/material/PYOS/new_dir is a directory
/Users/zhengshaojun/material/PYOS/.ipynb_checkpoints is a directory
/Users/zhengshaojun/material/PYOS/Week 2 Managing Files.ipynb is a file


 '..' is a relative path alias that means **"go up to the parent directory"**

In [66]:
os.path.split(os.getcwd())

('/Users/zhengshaojun/material', 'PYOS')

##  CSV file

### What's a CSV File

**q:Why different types of files?** 

**a:To promote efficiency**

To be able to process a data set, it helps to know aheah of time how that data set will be arranged. If you can expect data to be represented in a certain way, it's easier to extract meaning from it.

**Parsing**:
Analyzing a file's content to correctly structure the data.

**CSV** -- **Comma Separated Values(逗号分隔型取值格式)**:
is a very common data format used to store data as segment of text separated by commas. It's a very simple format, these files are stored in plaintext, and each line in a CSV file generally represents *a single data record*, *each field* in that record is separated by a comma.


### Reading CSV Files

Python standatd library includes a *module* named **csv** which lets us read, create and manipulate CSV files.

In [11]:
import csv
#open the file as usual
f = open("csv_file.txt")
csv_f = csv.reader(f)
for row in csv_f:
    name, phone, role = row
    print("Name:{}, Phone:{}, Role:{}".format(name, phone, role))
f.close()

Name:Sabrina Green, Phone:802-867-5309, Role:System Administrator
Name:Eli Jones, Phone:684-3481127, Role:IT specialist
Name:Melody Daniels, Phone:846-687-7436, Role:Programmer
Name:Charlie Rivera, Phone:689-746-3357, Role:Web Developer


### Generating CSV

#### using lists

In [20]:
hosts = [["workstation.local", "192.168.25.46"], 
         ["webserver.cloud","10.2.5.6"]]
with open('host.csv',"w") as hosts_csv:
    writer = csv.writer(hosts_csv)
    #writerow: write one row at a time
    #writerows:write all the rows at a time
    writer.writerows(hosts) 

#### using dictionaries

Sometimes in a csv file, the first line of the file contains the names of each fields.

In [21]:
#Using DictReader
with open('software.csv') as software:
    reader = csv.DictReader(software)
    for row in reader:
        print("{} has {} users".format(row['name'],row['users']))

MailTree has 324 users
CalDoor has 22 users
Chatty Chicken has 4 users


In [3]:
#Using DictWriter
users = [{'name':'Sol Mansi', 'username':'solm',
          'department':'IT infrastructure'},
        {'name':'Lio Nelson','username':'lion',
         'department':'User Experience Research'},
        {'name':'CHarlie Grey','username':'greyc',
         'department':'development'}]

keys = ['name','username', 'department']
with open('by_department.csv', 'w') as by_department:
    writer = csv.DictWriter(by_department, fieldnames=keys)
    writer.writeheader()
    writer.writerows(users)

In [12]:
with open('by_department.csv') as file:
    reader = csv.DictReader(file)
    l = list(reader)
    
l

[OrderedDict([('name', 'Sol Mansi'),
              ('username', 'solm'),
              ('department', 'IT infrastructure')]),
 OrderedDict([('name', 'Lio Nelson'),
              ('username', 'lion'),
              ('department', 'User Experience Research')]),
 OrderedDict([('name', 'CHarlie Grey'),
              ('username', 'greyc'),
              ('department', 'development')])]

# Week 3 Regular Expression

## What are regular expression ?

A regular expression, also known as **regex** or **regexp**, is essentially *a search query* for text that's expressed by *string pattern*. When you run a search against a particular piece of text, anything that matches a regular expression parrern you specified, is returned as a result of the search.

**re** module

In [18]:
#a simple example
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
import re
regex = r"\[(\d+)\]"
result = re.search(regex, log)
print(result[1])

12345


## Basic Matching with grep

function **grep** works by printing out any *line* that matches the query that we pass to it, it's a command line function. See the *'Unix Workbench'* for more.

## Simple Matching in Python

In [5]:
import re
result = re.search(r"aza", "plaza")
#the r at the beginning of the pattern indicates that this is a
#rawstring, this means that Python interpreter should not try to
#interpret any special characters.
#It's good to always use rawstring in python
print(result)

<re.Match object; span=(2, 5), match='aza'>


In [20]:
result = re.search(r"aza", "plaza")
print(result)

<re.Match object; span=(2, 5), match='aza'>


In [7]:
#expression not matched
result = re.search(r"aza", "maze")
print(result)

None


In [21]:
#make match case insensitive
print(re.search(r"p.ng","Pangaea",re.IGNORECASE))

<re.Match object; span=(0, 4), match='Pang'>


## Wildcards and Character Classes

**Character classes**:use square brackets.

To see more, go to **Unix workbench**'s section *search*

In [28]:
import re
print(re.search(r"[Pp]ython","Python"))

<re.Match object; span=(0, 6), match='Python'>


In [29]:
print(re.search(r"[a-z]way","The end of the highway"))

<re.Match object; span=(18, 22), match='hway'>


In [30]:
print(re.search(r"[a-z]way","What a way to go"))

None


In [36]:
print(re.search(r"[^a-zA-Z0-9]","This is a sentence with spaces."))

<re.Match object; span=(4, 5), match=' '>


In [38]:
#pipe symbol
print(re.search(r"cat|dog", "I like cats."))

<re.Match object; span=(7, 10), match='cat'>


In [39]:
print(re.search(r"cat|dog", "I like dogs."))

<re.Match object; span=(7, 10), match='dog'>


In [40]:
print(re.search(r"cat|dog", "I like dogs and cats."))
#only find one

<re.Match object; span=(7, 10), match='dog'>


In [41]:
#to avoid this
print(re.findall(r"cat|dog", "I like both cats and dogs"))

['cat', 'dog']


## Repetition Qualifiers

In [43]:
print(re.search(r"Py.*n","Pygmalion"))

<re.Match object; span=(0, 9), match='Pygmalion'>


In [44]:
print(re.search(r"Py.*n","Python programming"))

<re.Match object; span=(0, 17), match='Python programmin'>


In [45]:
print(re.search(r"Py[a-z]*n","Python programming"))

<re.Match object; span=(0, 6), match='Python'>


## Escaping Characters

**\b**:word boundaries

## Capturing Groups

we may want to take the information that we matched and use it for something else

**Capturing Groups**: Portions of the pattern that are enclosed in parentheses.

In [26]:
import re
result = re.search(r"^(\w*), (\w*)$", "Lovelace, Ada")
print(result)

<re.Match object; span=(0, 13), match='Lovelace, Ada'>


In [8]:
result.groups()

('Lovelace', 'Ada')

In [15]:
result.groups()[0]

'Lovelace'

In [14]:
result.groups()[1]

'Ada'

In [16]:
"{} {}".format(result.groups()[1],result.groups()[0])

'Ada Lovelace'

In [79]:
rearrange_name("Hopper, Grace M.")

'Grace M. Hopper'

In [78]:
import re
def rearrange_name(name):
  result = re.search(r"^([\w .-]*), ([\w .-]*)$", name)
  if result == None:
    return name
  return "{} {}".format(result[2], result[1])

rearrange_name("Kennedy, John F.")


'John F. Kennedy'

In [80]:
rearrange_name("Ritche, Dennis")

'Dennis Ritche'

## More on Repetition Qualifiers

 **numeric repetition qualifiers**: written between curly brackets and can be 1 or 2 numbers specifying a range.

In [58]:
print(re.findall(r"[a-zA-Z]{5}", "a scary ghost appeared"))

['scary', 'ghost', 'appea']


In [57]:
print(re.findall(r"\b[a-zA-Z]{5}\b", "a scary ghost appeared"))

['scary', 'ghost']


## Extracting a PID Using Regexes in Python

Sometimes We need to have a function that extracts the process ID or PID when possible, and does something else if not.

In [62]:
import re 
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
regex = r"\[(\d+)\]"
result = re.search(regex, log)
print(result[1])

12345


**problem**:If there is no matching, an error will be raised

In [66]:
log = "99 elephants in a [cage]"
result = re.search(regex, log)
print(result[1])

TypeError: 'NoneType' object is not subscriptable

we need to do something so that the error won't be raise on the line that has no matching

In [64]:
def extract_pid(log_line):
    regex = r"\[(\d+)\]"
    result = re.search(regex, log_line)
    if result is None:
        return ""
    return result[1]

In [65]:
print(extract_pid("99 elephants in a [cage]"))




## Splitting and Replacing

**re.split**

In [67]:
re.split(r"[.?!]", "One sentence. Another one? And the last one!")

['One sentence', ' Another one', ' And the last one', '']

Anything that's inside the square brackets is taken for its literal character, not its special meaning.

The splitting marks aren't presented in the list, if we want them to be, then we can use parenthese like this:

In [68]:
re.split(r"([.?!])", "One sentence. Another one? And the last one!")

['One sentence', '.', ' Another one', '?', ' And the last one', '!', '']

**re.sub**: It's used for creating new strings by substituting all or part of them for a different string, similar to the replace string method but using regular expressions for both the matching and the replacing.

*an example:*

In [70]:
re.sub(r"[\w.%+-]+@[\w.-]+", "[REDACTED]", "Received an email for go_nuts95@my.example.com")

'Received an email for [REDACTED]'

*another example:*

In [74]:
re.sub(r"^([\w .-]*), ([\w .-]*)$", r"\2 \1", "Lovelace, Ada")

'Ada Lovelace'

## Advanced Regular Expressions Cheat-sheet

https://regexcrossword.com/

In [2]:
import re
def transform_record(record):
  new_record = re.sub(r"(\d*-\d*-?\d*)", r"+1-\1", record)
  return new_record

print(transform_record("Sabrina Green,802-867-5309,System Administrator")) 
# Sabrina Green,+1-802-867-5309,System Administrator

print(transform_record("Eli Jones,684-3481127,IT specialist")) 
# Eli Jones,+1-684-3481127,IT specialist

print(transform_record("Melody Daniels,846-687-7436,Programmer")) 
# Melody Daniels,+1-846-687-7436,Programmer

print(transform_record("Charlie Rivera,698-746-3357,Web Developer")) 
# Charlie Rivera,+1-698-746-3357,Web Developer

Sabrina Green,+1-802-867-5309,System Administrator
Eli Jones,+1-684-3481127,IT specialist
Melody Daniels,+1-846-687-7436,Programmer
Charlie Rivera,+1-698-746-3357,Web Developer


In [3]:
result = re.search(r"([aeiou{3}])", "Obviously, the queen is courageous and gracious.")

In [10]:
type(result.groups())

tuple

# Week 4 Managing Data and Processes

## Reading Data Interactively

user input *function* in python: **input**

In [11]:
name = input("Please enter your name:")
print("hello," + name)

Please enter your name:shaojun
hello,shaojun


In [14]:
def to_seconds(hours, minutes, seconds):
    return hours * 3600 + minutes * 60 + seconds

print("Welcome to this time converter")

cont = "y"
while(cont.lower() == "y"):
    hours = int(input("Enter the number of hours:"))
    minutes = int(input("Enter the number of munutes:"))
    seconds = int(input("Enter the number of seconds:"))
    
    print("That's {} seconds.".format(to_seconds(hours, minutes, seconds)))
    print()
    cont = input("Do you want to do another conversion? [y to continue]")
print("Good bye!")

Welcome to this time converter
Enter the number of hours:3
Enter the number of munutes:3
Enter the number of seconds:3
That's 10983 seconds.

Do you want to do another conversion? [y to continue]no
Good bye!


## Standard streams

**I/O streams**: The basic mechanism for performing input and output operations in your programs

+ standard input
+ standard outpur
+ standard error

## Environmental Variable

**Shell**: A command-line interface used to interact with your operating system, popular ones include bash, zsh.

Python programs get executed inside a shell command-line environment, The variables set in that environment are another source of information taht we can use in our scripts.

To check all the environmantal varibables, type the command below in the bash shell:

**env**

We can also access environmental variables **in python** with **environ** *dictionary* provided by the **os** module.

In [19]:
import os
#using get method of dictionaries. we can set default return value
#if the key that we specify doesn't exist.
print(os.environ.get("HOME",""))
print(os.environ.get("SHELL",""))
print(os.environ.get("FRUIT",""))



/Users/zhengshaojun
/bin/bash



## Command-Line Arguments and Exit Status

**Command-line arguments**: Parameters that are passed to a program when it's started.

We can access these values with list **argv** in the **sys** module

In [20]:
import sys
print(sys.argv) 

['/Users/zhengshaojun/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py', '-f', '/Users/zhengshaojun/Library/Jupyter/runtime/kernel-510f8841-6949-4599-a892-12a4d7ca517d.json']


**Exit status**: The value returned by a program to the shell. See more about shell exit status, go to *Unix Workbench*. About *python*, when a python script finishes successfully, it exits with an exit value of 0, when it finishes with an error like type error or value error, it exits with a different value than 0.

in python use **sys.exit( )** to tell the shell whether the program succeed. see the **create_file.py** file in this directory.

## python subprocesses

Sometimes it's easier or faster to use a system command as part of our Python script to accomplish a task, or use some functionality that doesn't exit in the Python modules, neither built in or external.

Python provides a way to execute system commands in our scripts, using functions provided by the **subprocess** module.

In [21]:
import subprocess

In [22]:
subprocess.run(["date"])

CompletedProcess(args=['date'], returncode=0)

The run function returns an object of the **CompletedProcess** type, This object includes information related to the execution of the command. From the information that got printed we can see that the returncode of the command was 0. 

To run the external command a **secondary environment** is created fot the subprocess where the command is executed, while the *parent process*, which is our script, is waiting on the subprocess to finish, it's *blocked*, which means that the parent can't do any work until the child finishes. After the external command completes it's work, the chile process exits and the flow of control returns to the parent, then the script can continue with normal execution.

In [25]:
#an example
#the sleep command will wait for a number 
#of seconds that we tell it before returning.
subprocess.run(["sleep", "2"])
#check out how we call the command.

CompletedProcess(args=['sleep', '2'], returncode=0)

In the last 2 examples the commands executed successfully, and so the returncode inside the completed process instance was 0. Let's check out an example where the exit status isn't 0.

In [29]:
subprocess.run(["ls","this_file_doesn't_exist"])
#the return code is 1, telling us it fails, we can utilize this
#property to do sth different in case the failure.

CompletedProcess(args=['ls', "this_file_doesn't_exist"], returncode=1)

Using the **run** function like this is useful if we just want to run a command and only care about whether or not it was successful. The output of the command will be printed to the screen, which means that our script has no control over it. This can be handy for system commands that either don't have useful output, or when we don't care about processing the output any further.

But if we want to utilize the output further, we need a different strategy