# Python Essentials I

Today, we will begin exploring some foundational Python concepts:

* Code Structure and Whitespace
* Object-Oriented Programming
* Print Statements
* Variable Assignment
* Importing Modules and Scripts
* Scalar Objects
* Control Flow

Each of these concepts (and ones that follow) form building blocks for more functional work processes in Python. Our (initial) goal is to understand the syntax and purpose behind these individual concepts, but we will build on them as we become more proficient.

## Code Structure and Whitespace

Python code is primarily structured around *whitespace* (i.e., spaces, tabs, and line breaks). This is different than many other programming languages (e.g., C/C++, Java) that require additional punctuation in the code structure (e.g., semicolon at the end of each statement, braces/brackets around nested statements).

* Good for readability
* Generally, one statment per line, but you can combine multiple statements on a line using the ';' separator

In [8]:
# Example of code structure and whitespace - Applies to def, class, if, for, while statements
for x in range(15):
    print(x)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14


In [9]:
# Example of multiple statments on a single line
print(5); print(6); print(7)

5
6
7


### Comments

Any text preceded by '#' on a given line of code is ignored by the Python interpreter.

Comments are an excellent way to document and communicate what is happening in your code. They can help you remember what each part of your code is trying to accomplish (which you are not likely to remember in detail at a later time), and also help others understand what your code does.

Comments are primarily used to explain your code, but they can also be used to ignore code that you want to save for a later time. Or alternatively, you can ignore a part of your code that does not work but potentially test another part.

Comments can also be used to help you plan and structure your code. Spell out the steps that you need to complete using comments, then work on each step.

In [10]:
# Typically, comments are listed before a statement (or a set of statements) 
# or at the beginning of a cell
print(5)

5


In [11]:
print(5) # Statements can be inline as well

5


## Object-Oriented Programming

**Everything** in Python is an object!

* Scalars, sequences, dictionaries, DataFrames, etc.
* Functions
* Modules
* Generators
* And more!

Each type of object has a set of:

* Attributes - Characteristics of the object
* Methods - Functions that can operate on the object and/or other objects

Attributes and methods are accessible by:

* Attributes: obj.attribute_name or getattr(obj, 'attribute_name')
* Methods: obj.method_name(*args*)

In [1]:
# Demonstrate attributes and methods using an imported module
import pandas as pd

In [2]:
# Explore attributes and methods using tab completion
pd.

SyntaxError: invalid syntax (<ipython-input-2-1200e0d169a1>, line 2)

In [4]:
# Extract module name#if you forget the name ,you can use this function to retrieve it
pd.__name__

'pandas'

In [5]:
# Extract module name
getattr(pd, '__name__')

'pandas'

In [6]:
# Extract module method
pd.concat

<function pandas.core.reshape.concat.concat>

### Mutable vs. Immutable Objects

Mutable objects can be modified via assignment or an appropriate function/method. Examples include lists, dictionaries, NumPy arrays, instances of a class

In [14]:
# Example of the mutability of a list
L = [1,2,3]
L[1] = 4
L

[1, 4, 3]

Immutable objects cannot be modified. Examples include strings, tuples, and sets

In [16]:
# Example of the immutability of a string
s = 'this is a string'
s[0] = 'T'

TypeError: 'str' object does not support item assignment

## Print Statements

Any expression that you enter will automatically print (if it's the last line in the cell). Examples of expressions include computation or indexing into an appropriate object (e.g., sequence, dictionary).

In [7]:
3 + 5

8

In [13]:
'Hello World'#automately print the last line
#pass

'Hello World'

In [10]:
'Hello World'
pass

In [12]:
range(10)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [11]:
range(10)[5]

5

### Important input 

Print statements are typically only needed if you have a more complex statement that you would like to display, or if you have multiple statements to print in a single cell.

In [21]:
# Combine string with numerical output
print('The sum of 3 and 5 is:', 3+5)#take comma seperately

('The sum of 3 and 5 is:', 8)


In [24]:
# Formatted output - Method 1
print('The values are %d, %f, and %s' % (5, 3.14157, 'Monday'))

The values are 5, 3.141570, and Monday


In [31]:
print('The values are %d, %f, and %s' % (5, 3.14157, 'Monday'))
#%d for number, %f for float ,%s for string

The values are 5, 3.141570, and Monday


In [17]:
# Formatted output - Method 2
print('The values are {0:d}, {1:f}, and {2:s}'.format(5, 3.14157, 'Monday'))

The values are 5, 3.141570, and Monday


In [33]:
print('The values are {0:s}, {1:s}, and {2:s}'.format('5, 3.14157','interesting', 'Monday'))

The values are 5, 3.14157, interesting, and Monday


Print statements also properly format special string characters.

In [34]:
s = 'This is a \tstring with special \ncharacters'
s

'This is a \tstring with special \ncharacters'

In [46]:
li='This is a cool girl \t who loves \n coding'
li

'This is a cool girl \t who loves \n coding'

In [47]:
print(li)

This is a cool girl 	 who loves 
 coding


In [48]:
print(s)#\t means tab, \n means another line

This is a 	string with special 
characters


## Variable Assignment

Variable assignment is a critical task, especially if need to utilize objects in multiple steps of your process. The Python character for variable assignment is the equal sign (=). Any object can be assigned to a variable.

In [49]:
# Scalars
a = 5
b = 10
a + b

15

In [50]:
# Functions
import numpy as np
sum_func = np.sum
sum_func

<function numpy.core.fromnumeric.sum>

In [51]:
sum_func([a,b])

15

In [52]:
# Modules
import pandas as pd
pd

<module 'pandas' from '/Applications/anaconda2/envs/py27/lib/python2.7/site-packages/pandas/__init__.pyc'>

## Importing Modules and Scripts

Modules and scripts are both loaded using the **import** statement. Remember, both modules and scripts are written as Python (.py) files, so they are really the same thing. The main difference is semantic, in that modules are often libraries from which we want to load functionality and scripts are often some type of automated process (that often leveragies functionality from other modules).

Primary modules for this course (conventions for shorthand names in parentheses):

* numpy (np) - Data processing and analysis
* pandas (pd) - Data processing and analysis
* matplotlib.pyplot (plt) - Data visualization
* seaborn (sns) - Data visualization
* statsmodels (sm) - Statistical analysis
* sklearn - Machine learning
* scipy - Scientific computing
* nltk - Natural language processing

Standard Python library modules that will also be useful:

* os - Operating system
* re - Regular expression
* string - String processing
* urllib2 - Processing HTML
* glob - File directory
* csv - Data import/export for .csv files
* copy - Deep object copying
* datetime, time - Functionality for working with datetime and time objects

In [53]:
# Load entire module
import pandas
pandas

<module 'pandas' from '/Applications/anaconda2/envs/py27/lib/python2.7/site-packages/pandas/__init__.pyc'>

In [54]:
# Load module and assign to shorter variable name
import pandas as pd

In [61]:
# Load specific functionality from module - Use * for all functions (not recommended)
from numpy import floor as fl# import multiple functions via comma separated list
fl(4.6)
#help(floor)
fl.__name__#the _ is double_

'floor'

If you are working with a module or script that has been updated since your import, you can reload using the **reload** function (within the importlib module). Be careful! The module object itself will be updated but any variables associated with previous statements are not!

In [64]:
import importlib
np = importlib.reload(np)

AttributeError: 'module' object has no attribute 'reload'

## Scalar Objects

Scalar objects are basically singluar data structures (i.e., they have a single value). The most common scalar objects are:

* Numerical: int, float
* Boolean: bool
* String: str
* Dates and Times (later)

There is also the None object, which is technically a singleton object (not a scalar), but also important to know.

### Numerical Scalars

Integer and floating point numbers are the most common numerical scalar types. We typically use integers to represent indivisible (discrete) quanties, whereas floating point numbers are continuous.

In [65]:
# Integer
a = 5
type(a)

int

In [66]:
# Float
b = 10.
type(b)

float

In [67]:
# Casting to an integer
int(10.6)

10

In [68]:
# Casting to a floating point integer
float(a)

5.0

In [69]:
# Scientific notation
1e6

1000000.0

In [71]:
# Basic arithmetic - +, -, /, *, ** (exponential), % (modulo), // (floor division)
op = '-'
print(a)
print(b)
eval(str(a) + op + str(b))#eval() interprets a string as code. 

5
10.0


-5.0

In [89]:
print(eval('5'+'-'+'6'))
print(eval('5'+'6'))

-1
56


In [None]:
# Combine arithmetic with assignment
a += b
a

### Boolean

Boolean objects convey truthiness, and can only take one two possible values, True or False.

In [72]:
True

True

In [73]:
False

False

In [75]:
not not not False

True

Boolean objects are most often created via comparison operators or casting (using the **bool** function). Objects that are equivalent to zero or emptiness are cast as False, otherwise they are True.

In [79]:
# Comparison operators - <, >, ==, !=, <=, >=, is, is not
print(a < b)
print(a != b)
print(a >= b)
print(a is None)
print(type(a) is not str)

True
True
False
False
True


In [80]:
a= None
print(a is None)
print(a == None)

True
True


In [81]:
# Combine boolean objects
print(True & False) # and
print(True | False) # or
print(True ^ False) # xor 

False
True
True


In [83]:
#xor : exactly one of them is True ,if both true ,will return False

In [85]:
# Chaining comparisons
print(5 < 10 and 10 < 25)
print(5 < 10 < 25)

True
False


In [86]:
# Casting
print(bool(0))#F
print(bool(1))#T
print(bool([]))#F
print(bool([0]))#T
print(bool(""))#F
print(bool({}))#F
print(bool(()))#F
print(bool(set()))#F
print(bool(None))#F

False
True
False
True
False
False
False
False
False


### Strings

Strings are essentially sequences of characters (which means that they have length), which allow us to do things with text. You can create strings by enclosing any text within single (''), double(""), or triple ('''''') quotes.

In [92]:
# Example string
s = 'This is a string'
len(s)#string can be slicing,but immulatble

16

In [91]:
# Index substring (from beginning)
s[0:4] # Python is zero indexed, indexing stops at one prior to last index

'This'

In [93]:
# Index substring (from end)
s[-6:] # negative indices are equivalent to n (length of sequence) - 1

'string'

In [94]:
s[-1]

'g'

Similar to other scalars, there are implementations of addition (concatenation) and multiplication (replication), as well as boolean comparisons.

In [95]:
'This is ' + 'a string'#be aware of the blank

'This is a string'

In [99]:
'Beetlejuice - ' * 3

'Beetlejuice - Beetlejuice - Beetlejuice - '

In [100]:
'Apple' < 'Banana'#order by number

True

You can also cast numeric types to strings (primarily for output purposes) and vice versa (typically, when input data is interpreted as text when it's actually numerical)

In [101]:
str(4.6)

'4.6'

In [104]:
'4.6 '*2

'4.6 4.6 '

In [102]:
float('4.6') * 2

9.2

There are many built-in methods for manipulating strings:

* Case: s.capitalize, s.lower, s.swapcase, s.title, s.upper 
* Conditions: s.isalnum, s.isalpha, s.isdigit, s.isupper, s.islower, s.istitle, s.isspace
* Basic search: s.startswith, s.endswith, s.find, s.index, s.rfind, s.rindex 
* Format: s.format  
* Split: s.split, s.rsplit, s.partition  
* Strip: s.strip, s.lstrip, s.rstrip  
* Replace: s.replace  
* Join: s.join(t) joins the strings in sequence t with s as a separator

As strings are immutable, all methods return a new string object (which must be assigned to a new variable, if applicable)

In [110]:
# Example of case
s.upper()

'THIS IS A STRING'

In [111]:
# Example of conditions
s.isupper()

False

In [112]:
u=s.upper()
u.isupper()

True

In [120]:
# Example of search
s.find('is')#find the first appearance of the sentence

2

In [119]:
s.find('string')

10

In [132]:
# Example of format
'Today, the temperature is {0:f} {1:f} degrees.'.format(35.8,10)

'Today, the temperature is 35.800000 10.000000 degrees.'

In [121]:
# Example of split
s.split()

['This', 'is', 'a', 'string']

In [126]:
# Example of strip
s.lstrip(
    kl'This')

' is a string'

In [117]:
# Example of replace
s.replace('string', 'new string')

'This is a new string'

In [118]:
# Example of join
', '.join(['Apple', 'Banana', 'Orange'])

'Apple, Banana, Orange'

## Control Flow

Control flow is the phrase used to describe the order in which code is executed, and allows for more complex tasks to be accomplished than a linear sequence of statements. This allows you to write code that can be used for multiple purposes, under multiple conditions, or on different types of objects. We will focus mostly on two categories of control flow:

* Conditionals (if statements)
* Loops (for, while statements)

### Conditionals

**if** statements are the most common type of conditional. They allow you to execute a different set of statements under different conditions (i.e., value of a particular boolean scalar).

In [2]:
# if-else statement
today = 'SATURARY'
if today in ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']:
    print('Today is a weekday')
else:
    print('Today is a weekend day')

Today is a weekend day


In [3]:
# if-elif-else statement
today = 'Someday'
if today in ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']:
    print('Today is a weekday')
elif today in ['Saturday', 'Sunday']:
    print('Today is a weekend day')
else:
    print('Invalid day given')

Invalid day given


In [10]:
# Nested if-else statement
today = 'Someday'
if today in ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']:
    print('Today is a weekday')
else:
    if today in ['Saturday', 'Sunday']:
        print('Today is a weekend day')
    else:
        print('Invalid day given')
        #be aware of the decent

Invalid day given


In [14]:
# pass statement
b = True
if b:
    pass
else:
    print(False)

In [15]:
b = False
if b:
    pass#skip to the next statement
else:
    print(False)

False


**Ternary expressions** are a nice (Pythonic) construction when you want to return one value under one condition and a different value otherwise:

In [13]:
# if statement approach
b = True
if b:
    x = 1
else:
    x = 0
x

1

In [12]:
# Ternary expression
b = True
x = 1 if b else 0
x

1

### Loops

Loops are used to perform a series of steps repeatedly. Similar to conditions, loops can be nested or combined with other control flow constructs. There are two primary constructs for loops:

* **for** loops are used to iterate through each element of a sequence
* **while** loops are used to repeat a series of steps as long as a particular condition remains True

**for** loops in Python are implemented in a more concise way, whereas **while** loops are quite similar to other languages. In general, **for** loops are much more commonly used in Python; **while** loops are not useful very often and while it's possible, you should not use a **while** loop when a **for** loop should be used.

**Be wary about whether a loop is the most appropriate approach for a particular task!** Think about whether a particular loop will take a long time to complete or whether it will run forever. We will also discuss comprehensions and vectorization, which are often more efficient approaches for completing many common tasks than a loop. These two constructs play a major role in why Python can be an easier language to process data.

In [16]:
# Standard for loop - Print each element from a list
for x in range(10):
    print(x)

0
1
2
3
4
5
6
7
8
9


In [17]:
# for loop - Enumerate each element from a list
for i, x in enumerate(range(10)):#enumerate funtion can show you the index of list
    print(i, x ** 2)

(0, 0)
(1, 1)
(2, 4)
(3, 9)
(4, 16)
(5, 25)
(6, 36)
(7, 49)
(8, 64)
(9, 81)


In [20]:
my_list = ['apple', 'banana', 'grapes', 'pear']
for c, value in enumerate(my_list, 1):#means start from 1
    print(c, value)

(1, 'apple')
(2, 'banana')
(3, 'grapes')
(4, 'pear')


In [19]:
enumerate(range(10))

<enumerate at 0x1093c9a00>

In [22]:
# Nested for loop
for x in range(5):
    for y in range(5):
        print(x * y)
#matrix:4*4 result

0
0
0
0
0
0
1
2
3
4
0
2
4
6
8
0
3
6
9
12
0
4
8
12
16


In [24]:
# for loop over tuples - Unpack each element
for x, y in zip(range(5), range(5)):#zip makes it a tuple#pair up each element to be a tuple
    print(x * y)

0
1
4
9
16


In [25]:
# for loop over tuples - Leave elements packed
for pair in zip(range(5), range(5)):
    print(pair[0] * pair[1])

0
1
4
9
16


In [26]:
# break statement
for x in range(10):
    if x > 5:
        break
    else:
        print(x)

0
1
2
3
4
5


In [29]:
# while loop as a for loop
i = 0
L = range(5)
while i < len(L):
    print(L[i])
    i += 1

0
1
2
3
4


In [30]:
# (Slightly) More appropriate while loop
x = 5
factorial = 1
while x > 0:
    factorial *= x
    x -= 1
factorial

120

## Next Time: Python Essentials II