# Illustrative Examples and Clear Explanations in Python

Please use the searh function of your browser to find a topic (or feel free to browse the TOC [will be enhanced] or manually).

<h1 id="tocheading">Table of Contents</h1>
<div id="toc"></div>

In [403]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

<IPython.core.display.Javascript object>

# SYNTAX

| Expression    | Description                                                 |
| -----------   | :----------------                                           |
|               |                                                             |
| GENERAL       |                                                             |
| '#'           | Comment character                                           |
|               |                                                             |
| DECLARATION   |                                                             |
| {}            | Unordered list, can be indexed with keys (i.e., dictionary) |
| []            | Ordered list, can be indexed with integer                   |
| ()            |                                                             |
|               |                                                             |
| SPECIFICATION |                                                             |
| x()           | Argument                                                    |
| x(a,b)        | Multiple arguments                                          |
| x[0]          | Indexing                                                    |
| x[0][0]       | Nested indexing                                             |
| x[a:b]        | Slicing                                                     |
|               |                                                             |
| SPECIAL       |                                                             |
| True/False    |                                                             |
| None          |                                                             |
| pass          |                                                             |


## Scienfitic Notation

in ae+x:
    - a   : number root
    - e+x : number tail (number of zeroes). 
            read 'e+x' as 'x zeroes after a'. 
Examples:
     - 1e+02 is 100
     - 12e+4 is 120000

In [186]:
1e+02 == 100

True

In [187]:
12e+4 == 120000

True

# SEMANTICS

## Built-in Types

| Truth Value Testing  | Numeric              | Sequence      | Text Sequence | Binary Sequence  | Set        | Mapping  | Other                      | ... |
| :------------------- | :------------------- | :------------ | :------------ | :--------------- | :--------- | :------- | :----------------------    | --- |
| False                | int                  | list          | str           | ...              | set        | dict     | Modules                    |     |
| True                 | float                | tuple         |               |                  | frozenset  |          | Classes and Class Intances |     |
|                      | complex              | range         |               |                  |            |          | Functions                  |     |
|                      |                      |               |               |                  |            |          | Methods                    |     |
|                      |                      |               |               |                  |            |          | Boolean Values             |     |
|                      |                      |               |               |                  |            |          | ...                        |     |

Types: https://docs.python.org/3/library/stdtypes.html

## Built-In Functions, Methods, and Operators

| General | Types        | Iteration and flow | Boolean           | Comparisons   | Numeric (Int, Float[?]) | Sequence (list, set, range[?])  | String                           |
| :------ | :----------- | :------------      | :---------------- | :------------ | :--------------------   | :------------------------------ | :------------------------------  |
| print() | float()      | for i in x:        | or                | <             | +                       | s[i]                            | str.capitalize()                 |
| help()  | set()        | enumerate()        | and               | <=            | -                       | s[i:j]                          | str.upper()                      |
| open()  | list()       |                    | not               | >             | *                       |                                 | str.lower()                      |
| read()  | dict()       | if: else:          |                   | >=            | /                       | x in s                          | str.islower()                    |
|         |              |                    |                   |               |                         | x not in s                      | str.swapcase()                   |
|         | bool()       | try: except:       |                   | ==            | //                      |                                 | str.casefold()                   |
|         | tuple()      |                    |                   | !=            | %                       | len(s)                          |                                  |
|         | int()        |                    |                   | is            | abs()                   | min(s)                          | str.partition(sep)               |
|         |              |                    |                   | is not        | pow(x,y)                | max(s)                          | str.split(sep=None, maxsplit=-1) |
|         | type()       |                    |                   |               | x**y                    | sum([x,y])                      |                                  |
|         | isinstance() |                    |                   |               |                         | s.count(x)                      | str.find(sub[, start[, end]])    |
|         | issubclass() |                    |                   |               | sum()                   |                                 | str.replace(old, new[, count])   |
|         |              |                    |                   |               | range(x,y)              | s + t                           | str.strip([chars])               |
|         |              |                    |                   |               |                         | s * n                           |                                  |
|         |              |                    |                   |               | a += x                  | s *= n                          | \\                               |
|         |              |                    |                   |               | a -= x                  | zip(s, t)                       | \n                               |
|         |              |                    |                   |               |                         |                                 | \t                               |
|         |              |                    |                   |               |                         | s.append(x)                     | \r 	                             |
|         |              |                    |                   |               |                         | s.extend(t)                     |                                  |
|         |              |                    |                   |               |                         | s.pop([i])                      |                                  |
|         |              |                    |                   |               |                         | del s[i:j]                      |                                  |
|         |              |                    |                   |               |                         | s.insert(i, x)                  |                                  |
|         |              |                    |                   |               |                         | s.remove(x)                     |                                  |
|         |              |                    |                   |               |                         | s.clear()                       |                                  |
|         |              |                    |                   |               |                         | s.reverse()                     |                                  |
|         |              |                    |                   |               |                         | ...                             |                                  |

- Functions: https://docs.python.org/3/library/functions.html
- Types and methods: https://docs.python.org/3/library/stdtypes.html
- Constants: https://docs.python.org/3/library/constants.html

### Print Function
**print()**

#### Printing to Python Console

*Print to console:*

In [555]:
a = "1"
b = "2"
c = 3

print(a, b, c)

print(a)
print(b)
print(c)

1 2 3
1
2
3


#### Printing to File

Python's print is the standard "print with newline" function in Python.

*Print to file:*

In [554]:
my_file = open('test_data//string_print.txt', mode='w')

#Python 3.x:
print('my string', file=my_file)
print('my other string', file=my_file)

# Python 2.x (would give error in Python 3.x):
# print  >> my_file, 'my string'

my_file.close()
my_file = open('test_data//string_print.txt')

print(my_file.read())

my string
my other string



#### Print special characters

Print command does not print special characters.

In [189]:
print("this is a \nmultiline string")

this is a 
multiline string


To accomplish this, the string should be converted to a list, and then printed:

In [190]:
print(["this is a \n multiline string"])

['this is a \n multiline string']


# INDEXING AND SELECTING

## Selecting Single Variables

In [191]:
my_list = [['a','b'], [1, 2], [True, False], ['word']]  # a list of lists

print(my_list[0])    # address an inner list
print(my_list[0][1]) # address an element within an inner list

['a', 'b']
b


## Selecting Ranges (Slicing)

In [192]:
print(my_list[0:4])     # select whole range 
                        # note that the second index number '4' is not included in the selection. 
                        # so, to select from the first element of a list to the fourth, index values 0:4 should be provided. 

print(my_list[3])       # as can be seen, the last element of the list corresponds to index value of '3' in single-value indexing (instead of the '4' as was the case in slicing)

print(my_list[1:4])     # slice (leave first element out)

[['a', 'b'], [1, 2], [True, False], ['word']]
['word']
[[1, 2], [True, False], ['word']]


### Meditation on Indexing and Slicing:

#### Conclusion

Addressing **single items**:
  - When indexing a single value, [x] refers to x+1'th element
  - When indexing a single value, [x, y] refers to x+1'th element and y+1'th element  
  
The reason why **slicing** is not intuitive: Double assymetry.
  - Assymmetry 1: When slicing, in [x:y] x points to x+1'th element
  - Asssymetry 2: When slicing, in [x:y] y points to y (hence, y does not follow the x+1 convention)


#### Inexing with Native Python

In [193]:
my_list = ["a", "b", "c", "d"]

# Show the first element:
print(my_list[0])   # <-- prints "a"
                    # '0' does not mean zeroth element. It means the first element.
                    #  Counting indeed starts from the actually entered index number, '0'.
                    #
                    #####################################################
print(my_list[3])   #  Read '[3]' as: '4th element.                     #
                    #  When indexing single value, read '[x]' as 'x+1'  #
                    #####################################################
                    # <-- prints "d"
                    # '3' does not mean the third element. It means the fourth element.
                    # Because indexing starts from 0, all numbers mean one more than their actual value. 
                    # e.g., 0 --> 1st      1 --> 2nd      2 --> 3rd
                    #
                    #    Actual order :  0      1st      2nd      3rd    4th
                    #                         /        /        /       /
                    #                        /        /        /      /
                    #                       /        /        /      /
                    #    Index number : [0]     [1]      [2]      [3]
                    #
                    #
# Show the first and second element:
                    #####################################################
print(my_list[0:3]) #  Read '[0:3]' as: 'From 1st to 3rd element.       #
                    #  When slicing, read '[x:y]' as 'x+1 to y'         #
                    #####################################################
                    # 
                    # prints " 'a', 'b', 'c' "
                    # <-- '0-3' does not mean 'from zeroth element to third element.' 
                    #  As each index number points to one higher actual position (i.e., 0 --> 1,   3 --> 4)...
                    #  ... It means 'from first element to the 4th element—but non inclusively. 
                    #  The confusion happens because of the non-inclusive last item:
                    #  ":3" means until (but not including) the 4th element! (i.e., < 4)
                    #   -- i.e., "[1, 4)" or "1 =< x < 4".  
                    #
                    #                         [  =<                      <   )
                    #    Actual order :  0      1st*    2nd*     3rd*  | 4th
                    #                          /       /        /      |/
                    #                         /      /        /       /|
                    #                       /      /        /       /  |
                    #    Index number : [0]    [1]      [2]      [3]   | 
                    #

a
d
['a', 'b', 'c']


<a name="indexing-with-numpy"></a>
#### Indexing with numpy
**[x,y] | [:, y] | [a:b,x:y] | [:,x:y]**

In [194]:
import numpy

my_matrix = numpy.array([
    [5, 10, 15],
    [20, 25, 30],
    [35, 40, 43]
])

In [195]:
####################################################################
print(my_matrix[0,2])  # Read '[0,2]'as '1st and 3rd'                                     #
                       # When indexing in two dimensions, read '[x, y]' as 'x+1 and y+1'  #
                       ####################################################################
                       # Read as: Select...
                       # first row
                       # third column 
        
                       # Selects first row's third column  

print(my_matrix[0,:])  # Read as: Select...
                       # first row
                       # all columns (':' means 'all' in numpy) 
                       #
                       # Selects first row 
                       #
                       # my_matrix[0] would also select the first row, 
                       # but this would not be the official notation for numpy.                

                       
            
print(my_matrix[:,0])  # Read as: Select...
                       # all rows
                       # the first column
                       # 
                       # Selects column 1 


print(my_matrix[1:3,2]) # Read as: Select...
                        # rows 2 and 3 (reminder: read "[x:y]" as "x+1 to y")
                        # column 3 (reminder    : read "[z]" as "z+1"))
                        #
                        # Selects rows 2 and 3 of 3rd column

15
[ 5 10 15]
[ 5 20 35]
[30 43]


***Is Is Python's indexing style a problem when selecting the last value of a list?***

In [196]:
my_list = ["a", "b", "c", "d"]

print(len(my_list))            # For a 4 item-long list, the length is naturally '4'

print(my_list[0:len(my_list)]) # <-- This is the equivalent of '[0:4]'.
                               # Since [0] --> 1st    and     [4] --> 5th   
                               # That means: 'Print from first up to (but not including) the fifth element'
                               # i.e., 1st, 2nd, 3rd, 4th
                               #
                               #                          [  =<                             <  )
                               #    Actual order :  0      1st*    2nd*     3rd*   4th* |  5th
                               #                          /       /        /      /     | /
                               #                         /      /        /       /      /
                               #                       /      /        /       /      / |
                               #    Index number : [0]    [1]      [2]      [3]    [4]  |

4
['a', 'b', 'c', 'd']


So, no; Python's indexing style is not a problem when selecting the last value of a list. [0:len()] simply does the job of selecting the whole list without the need to add or subtract '1'.

***Is Python's indexing style a problem during iteration?***

In [197]:
my_list = ["a", "b", "c", "d"]

for i in range(0, len(my_list)): # < equivalent of [0:4], which is 1th to 5th element non-inclusive at end
    print (my_list[i])           # 
                                 #    Actual order :  0      1st*    2nd*     3rd*   4th* |  5th
                                 #                          /       /        /      /     | /
                                 #                         /      /        /       /      /
                                 #                       /      /        /       /      / |
                                 #    Index number : [0]    [1]      [2]      [3]    [4]  |

a
b
c
d


So, no it is not a problem.

***Is Python's indexing style a problem when selecting first x'th elements?***

In [409]:
my_list = ["a", "b", "c", "d"]

print(my_list[0:3]) # Print first to 3rd element of my_list
print(my_list[0:4]) # Print first to 4th element of my_list

['a', 'b', 'c']
['a', 'b', 'c', 'd']


So, no, it is not a problem.

## Selecting Nested Variables

In [416]:
my_list = [
    ['a', 'b', 'c', 'd'],
    ['x', 'y', 'z', 't'],
    [1 ,2, 3, 4],
    [ ['one', 1], ['two', 2], ['three', 3], ['four', 4] ]
]

In [417]:
print(my_list[0][1])    # display the second element of the first inner list in my_list

print(my_list[3])       # print the 4th inner list
print(my_list[3][0])    # print the 1st element of the 4th innner list
print(my_list[3][0][0]) # print the 1st element of the 1st element of the 4th inner list (which is a string)

print(my_list[1:2][0])  # slice and display the first element of the newly sliced list

b
[['one', 1], ['two', 2], ['three', 3], ['four', 4]]
['one', 1]
one
['x', 'y', 'z', 't']


## Selecting All Values Except The Last X 

In [418]:
my_list = [1,2,3,4,5,6,7]
my_list[:-3]

[1, 2, 3, 4]

# WORKING WITH FILES

## Working Directory Operations

In [201]:
import os
print(os.getcwd())                   # print current working directory
print(os.listdir(os.getcwd()))       # print contents of the current working directory 
print(os.path.split(os.getcwd())[0]) # print the parent directory. Repeat as needed to get to higher directories.

C:\Users\Clokman\Google Drive\__Projects__\Code\Notebooks
['.git', '.gitattributes', '.gitignore', '.ipynb_checkpoints', 'Add to Python notebook.md', 'LICENSE', 'PyCharm.md', 'Python.ipynb', 'README.md', 'Windows Command Line Notebook.md']
C:\Users\Clokman\Google Drive\__Projects__\Code


## Relative Locations, and Accessing Files Nested in Subfolders

In [472]:
import csv
data = list(csv.reader(open("test_data//dummy.csv")))
print(data)

[['a', 'b', 'c'], ['1', '2', '3']]


## Importing Text File to Variable

In [473]:
data = open("test_data//dummy.txt").read()

## Opening and Closing Files

**Note**: In this section, open files are handled in a liberal way for purposes of simplicity and clarity, but they should always be closed properly once a process is completed. This can be accomlished most effectively by assigning open files to a variable, and calling .close() method on this variable when the work of reading or writing to that file is finished.

e.g.,:

In [659]:
my_file = open('test_data//open_close_test.txt')
my_file.close()

Also see the section on 'with' keyword for a more efficient approact to opening and closing files.

#### Opening a File for Reading

*Open a file as an object:*

In [616]:
# file path, mode, and encoding can be seen when file object is called
open('test_data//open_close_test.txt')

<_io.TextIOWrapper name='test_data//open_close_test.txt' mode='r' encoding='cp1252'>

*Open a file with specific encoding:*

In [644]:
# notice the change in encoding in the output
open('test_data//open_close_test.txt', encoding='utf8')

<_io.TextIOWrapper name='test_data//open_close_test.txt' mode='r' encoding='utf8'>

*Open and read a file:*

In [618]:
open('test_data//open_close_test.txt').read()

'a about after almost along alternating an and anguish around at back ball became best but buy came cold come crop crazy curled day days decided decided eat even ever everyone farmer farmer find finest for found freezing farmer gave go going great grow growing guidance had hard he heat him his if immediately in into it journey julius juliuss keep knew last long magic managed many much months named never night noble noon of on once one out people persevered potatoes probably praises raining reggie roadside sang searing secret seek seen set shouldnt sign sky sleep so soaked started stopped store storekeeper that the there this to told travelled trees tried try umbrella underneath undeterred village was were who whole wondered world\n'

*Close an open file:*

In [621]:
open('test_data//open_close_test.txt').close()

#### Opening a File for Writing

*Open file for writing:*

In [652]:
# if the file path does not exist, it will be created
open('test_data//write_test.txt', mode='w')

<_io.TextIOWrapper name='test_data//write_test.txt' mode='w' encoding='cp1252'>

While opened in write mode, files are not readable: 

In [653]:
try:
    open('test_data//write_test.txt', mode='w').read()
except Exception as error_message:
    print(error_message)

not readable


*Close the file:*

In [654]:
open('test_data//write_test.txt', mode='w').close()

#### Opening files with 'With' Keyword

'with' keyword, when used in combination with open() function creates an environment where a file is assigned to an object. This object is closed upon exiting this environment.

*Open a file with 'with' keyword and print a part of it:*

In [663]:
with open('test_data//open_close_test.txt') as my_file:
    print (my_file.read()[0:100])

a about after almost along alternating an and anguish around at back ball became best but buy came c


*Open a file with 'with' keyword and print a part of it:*

In [670]:
with open('test_data//open_close_test.txt') as input_file:
    with open('test_data//write_test.txt', mode='w') as output_file:
        output_file.write(input_file.read()[0:25])
        
# since output_file is open in write mode, it has to be closed (by simply existing the 'with' environment by unindenting) 
# before attempting to read it.
with open('test_data//write_test.txt') as output_file:
    print(output_file.read())

a about after almost alon


## CSV Files

### Tokenizing CSV Files: The native way with open()
When opened first, CSV files are read as just a large block of string.
This string needs to be... 
1. Divided row by row (i.e., each row of CSV needs to be assigned to a list item by using '\n' as a separator)
2. And then, these rows need to be further divided into values (by comma separator) 

#### Step #1: Open file

When file is opened, indexing is done letter by letter when the file is just read. data_leg[0] is outputs only a letter.

In [474]:
data = open("test_data//legislators.csv").read()
print(data[0])

l


#### Step #2: Split line by line

Splitting line by line allows indexing by row. data_splitted[0] now outputs a row: 

In [475]:
data_splitted = data.split("\n")
print(data_splitted[0])

last_name,first_name,birthday,gender,type,state,party


#### Step #3: Tokenize

Further splitting each row by comma allows indexing rows. data_tokenized[0][0] is now a value:

In [476]:
data_tokenized = []
for item in data_splitted:
    item = item.split(",")
    data_tokenized.append(item)

print(data_tokenized[0][0])

last_name


### Tokenizing CSV Files: csv.reader()

csv.reader() does the same thing with the process above:

In [477]:
# open file, read it, and tokenize:
import csv
data = list(csv.reader(open("test_data//my_data.csv")))

# PRIMITIVES AND CONVERSIONS

## Query: type()

In [478]:
type(1)

int

In [433]:
type("t")

str

## Conversions

In [434]:
str(1)

'1'

In [435]:
int("1")

1

In [436]:
float(1)

1.0

In [437]:
[1] # <-- this syntax should be used when converting to a list.
    # See "Special Case: Conversion to List" section for more.

[1]

In [438]:
set(["a", "b", "b"])

{'a', 'b'}

In [439]:
eval("print('hello')") # converts string to object and runs it
                       # see the 'Converting Strings to Commands' section for more

hello


### Special Case: Conversion to List

list() function should not be used for direct conversion as it creates an iteration object rather than a simple list:

In [440]:
try:
    list(1)
except:
    print("Error: Integer is not iterable.")

Error: Integer is not iterable.


list() function works with strings though, as strings are iterable ojects. 
(But this is not a conversion; it's slicing.)

In [441]:
list("string")

['s', 't', 'r', 'i', 'n', 'g']

This is the correct way to convert an object to a list is by using '[]' syntax instead of list()command:

In [442]:
["string"]  # <-- converts inteeger to a list

['string']

### Special Case: String to List Conversions

When a string or multiple strings are converted to list in one command, each letter in these strings will be a list item.

#### list()

*Converstion to iterable object:*

In [443]:
list("1" "2" "abc" "Two words.")

['1', '2', 'a', 'b', 'c', 'T', 'w', 'o', ' ', 'w', 'o', 'r', 'd', 's', '.']

#### []

*Conversion to a list*:

In [220]:
print(["1" "2" "abc" "Two words."])
print(["1", "2", "abc", "Two words."])

['12abcTwo words.']
['1', '2', 'abc', 'Two words.']


#### .split()

Split method can be used to convert a string to a list, while using a separator character. (This is the basis of CSV files [see more in 'Working with Files' section]).

*Tokenizing in a list:*

In [221]:
("1 2 abc Two words.").split(" ")

['1', '2', 'abc', 'Two', 'words.']

### Converting Strings to Commands
**eval()**

Use scenario: Using eval to iteratively construct commands from strings
<br>[TODO] Revise: Unexplained code, and insignificant use case.

In [222]:
my_dataset            = [["id", "year"],[123, 1995],[441,1996], [123, 1995], [123, 1995], [123, 1995], [123, 1995]]
dataset_name       = "my_dataset"

indexes_list       = [1, 2, 3, 4, 5]

parameters_string = ""

for i, index_value in enumerate(indexes_list):
    parameters_string = parameters_string + dataset_name + "[" + str(index_value) + "], "
print(parameters_string)

transposed_table = list(zip(eval(parameters_string)))
print(transposed_table)

my_dataset[1], my_dataset[2], my_dataset[3], my_dataset[4], my_dataset[5], 
[([123, 1995],), ([441, 1996],), ([123, 1995],), ([123, 1995],), ([123, 1995],)]


# SYNTACTIC OPERATORS AND METHODS

## Logic Operators

- and
- or

## Comparison Operators

- == or 'is'
- != or 'is not'
- '>' 
- '<'

In [223]:
1 == 1

True

In [224]:
1 is 1

True

In [225]:
1 != 1

False

In [226]:
1 is not 1

False

## Search and Query Operators

- in
- type()

'in':

In [227]:
a = [1, 2]

1 in a

True

## Special Values

- True
- False
- None

### None

In [228]:
x = 0
print(x)

x = None
print(x)

0
None


Usage scenario:

In [229]:
# For positive numbers, this would have returned the maximum value.
# But for negative numbers, it does not work, as '0' is the greatest number.
# Initializing max_value with '0' is problematic.
# Something else should ve assigned as a value.
values = [-50, -80, -100]
max_value = 0
for i in values:
    if i > max_value:
        max_value = i
print("max_value (buggy): " + str(max_value))


max_value = None
for i in values:
    if max_value is None or i > max_value: # This change does the trick.
        max_value = i
print("max_value (fixed): " + str(max_value))

max_value (buggy): 0
max_value (fixed): -50


**Note:** In the above example, 'is' used instead of '==' (which serves the same purpose). Using == would not change things in this example, but using 'is' with 'None' is the recommended practice, and may prevent problems that can arise.

## Special Methods

### 'pass'

# FUNCTIONS

In [230]:
def function_name(argument1, argument2=True): # argument 2 has a default value
    "function body here"
    return(argument1)

# OBJECTS, CLASSES, AND METHODS

- **Class**: Template for constructing objects. Classes have default values.
- **Objects**: Entities with their own properties and behaviors. An instance of a class.
- **Properties**: Attributes of objects.  

## Constructing a Class and Setting its Properties

In [231]:
class Car():                   # Class names begin with capital letters.
    def __init__(self):        # '__init__' is a method. It allows us to specify the code that will be ran each time an instance/object of this class is created. 
                               # 'self' is an argument of the __init__ method. It is simply a placeholder that addresses the future object/instance—e.g., 'my_car'.
        self.color = "black"   # Each of these lines are 'properties' of an object.
        self.make = "honda"    # Read as 'my_car.black', 'my_car.make', etc... —as 'self' refers to the future instance/object, which will be 'my_car'
        self.model = "accord"  # If a vale is initially specified, it is interpreted as a default property of this class, and is assigned to each instance unless overriden by another value at the time of instantiation of that object.

my_car = Car()        # This initiates and assigns default properties of the 'Car' class to the new instance/object 'my_car'.

print(my_car.color)   # Dot notation can be used to address to properties of a class. 
                      # Note that even though properties and methods of a class both use dot notatio, parantheses are not used in the end when calling properties of a class, but they are used when calling methods of class.

black


## Using Initial Arguments as Class Properties

Classes can also have custom arguments that can be passed on as class properties at the time of instantiation of new objects.

In [232]:
class Student():
    def __init__(self, name, number): # <-- Initial arguments
        self.name = name
        self.number = number

student_1 = Student("John", 231)
print(student_1.name)
print(student_1.number)

John
231


## Creating Variables within Classes and Calling Them from Outside

In [233]:
class My_class():
    def __init__(self):
        self.property1 = "value"
        
    x = 5 # This is a variable that is initialized within the class environment.
    
my_instance = My_class()

print(my_instance.x)  # X can be called the same way a class property is called.
print(my_instance.property1)

5
value


## Adding Attributes to a Class at Runtime

- It is also possible to add attributes to a class dynamically at runtime. 
- However, it seems like the an arguments of an external function cannot be replaced with class parameters such as 'self'.

#### Adding an External Property

In [234]:
class My_class():
    def __init__(self):
        self.attribute1 = "value"

my_instance = My_class()

# Add an external variable as class property at runtime
my_instance.attibute2 = "another value"
print(my_instance.attibute2)

another value


#### Adding an External Function

***As a static externally added function***:

In [235]:
my_instance.special_print = print("x")
my_instance.special_print

x


***Externally added function as a dynamic function that accepts arguments?***

In [236]:
my_instance.special_print = print()
try:
    my_instance.special_print()  # <- These parentheses are not accepted
except:
    print("No, unfortunately, functions added from external scopes don't accept arguments.")


No, unfortunately, functions added from external scopes don't accept arguments.


***A slightly dynamic externally-added function***:

In [237]:
class My_class(object):
     pass

my_instance = My_class()
my_instance.my_var = 3
My_class.my_func = property(lambda self: self.my_var + 2)  # <-- This function uses 'self' variable to address to class...
my_instance.my_func                                        # ...attributes. It seems that this is as dynamic as it gets...
                                                           # ...for externally added functions

5

## Iterating over Class Instances and Their Properties at Runtime

***Class registry to hold all instances created in a class:***

In [238]:
class Person(object):
    _instance_registry = []  # variable to hold all instances that will be creted in this class.

    def __init__(self, name): 
        self._instance_registry.append(self)  # add all instances to the _instance_registry upon creation
        self.name = name

john = Person("John")
erik = Person("Erik")
        
for p in Person._instance_registry:
    print (p, p.name) #print each instance and their names

<__main__.Person object at 0x0000020DF918F390> John
<__main__.Person object at 0x0000020DF918F320> Erik


***Class registry for instance properties, and querying these like lists:***

In [239]:
class Person(object):
    _name_registry = []  # variable to hold all names that will be creted in this class.

    def __init__(self, name): 
        self.name = name
        self._name_registry.append(name)  # add all instances to the _instance_registry upon creation

john = Person("John")
erik = Person("Erik")

for p in Person._name_registry:
    print (p) #print each instance and their names

# Registry allows performing list search operations in classes.
if "John" in Person._name_registry:
    print ("\nName found")


John
Erik

Name found


Further reading:
https://stackoverflow.com/questions/739882/iterating-over-object-instances-of-a-given-class-in-python

In [240]:
class Person(object):
    _name_registry = []  # variable to hold all names that will be creted in this class.

    def __init__(self, name): 
        self.name = name
        self._name_registry.append(name)  # add all instances to the _instance_registry upon creation
    
    def add(self, attr, val):
        self.attr = val
         

john = Person("John")
erik = Person("Erik")

for p in Person._name_registry:
    print (p) #print each instance and their names

# Registry allows performing list search operations in classes.
if "John" in Person._name_registry:
    print ("\nName found")

john.add("myatribute", "myvalue")
john.attr


John
Erik

Name found


'myvalue'

## Methods

Methods are very similar to functions, and we define them with the same syntax. ****The only difference is that methods are "attached" to instances, while functions aren't***. We can use methods to run custom code that interacts with those instances.

To use a method:

- Define it in the code for the class.
- Call it on an instance of that class.

### Methods without Arguments

In [241]:
## new class
class Student():                    
    def __init__(self, name, surname, number):
        self.name = name
        self.surname = surname
        self.number = number

    ## New method—note that it is withing the class!
    def print_details(self):    
        print(self.name, self.surname, self.number)

# The print_details method can only be called by objects that belong to the same class
# print_name("John") would return an error for instance—as print_name is not a function, but a method.
# "John".print_name() would also not work, as 'print_name' method can only interact with members of its own class.

## Calling the new method
# First, create an instance of the object that wants to call the method—i.e., make it a member of the class.
john = Student("John", "X", 242)
# Now, the variable/object 'john' is the instance of the class 'Student', and it can use the methods available in this class.
john.print_details()

John X 242


### Methods with Arguments

Note that the 'year' argument in .count_wins_in_year method is not an object property, and not referenced anywhere except within the method itself.

In [445]:
import csv
class Team():
    def __init__(self, name):
        self.name = name
        f = open("test_data//nfl.csv", 'r')
        csvreader = csv.reader(f)
        self.nfl = list(csvreader)

    def count_wins_in_year(self, year):
        count = 0
        for i in self.nfl:
            if (i[2] == self.name) & (i[0] == year):
                count = count + 1
        return count

niners = Team("San Francisco 49ers")
niners_wins_2013 = niners.count_wins_in_year("2013")
print(niners_wins_2013)

12


## Objects

Objects are varariables that are different than lists or other primitives:



In [446]:
class Student():
    def __init__(self, name, number):
        self.name = name
        self.number = number

student_1 = Student("John", 231)
print("### 1")
print(student_1)
student_2 = Student("Sven", 532)
student_3 = Student("Erik", 242)

student_list = [student_1, student_2, student_3]

print("### 2")
print(student_list)

### 1
<__main__.Student object at 0x0000020DF9ABC550>
### 2
[<__main__.Student object at 0x0000020DF9ABC550>, <__main__.Student object at 0x0000020DF9ABC6A0>, <__main__.Student object at 0x0000020DF9ABC6D8>]


Objects can be indexed with integers, but cannot be iterated over:

In [244]:
# Objects can be indexed (regardless of what the error message below says)
print("### 1: Print first object")
print(student_list[0])

print("### 2: Print name of first and second object")
print(student_list[0].name)  
print(student_list[1].name)

# But they cannot be iterated over 
# This would give an error:
#for i in student_list:
#    print(i[0].name)

### 1: Print first object
<__main__.Student object at 0x0000020DF9180C18>
### 2: Print name of first and second object
John
Sven


Objects can have properties and methods. These properties and methods are defined within the class.

Properties can simply be primitives, or be more sophisticated variables such as functions and datasets.

In [452]:
class Student():
    def __init__(self, name, number):
        self.school = "VU"     # This is a simple property with a default value.
        self.name = name       # This is a simple property
        self.number = number
        import csv                                        # A function can be ran upon initialization.
        self.data = list(csv.reader(open("test_data//my_data.csv"))) # This property holds a data set.
    
    def print_name(self): # This is a method
        print(self.name)

john = Student("John", 234)

print("### Data")
print(john.data)

print("### Name")
print(john.name)

print("### Number")
print(john.number)

### Data
[['a about after almost along alternating an and anguish around at back ball became best but buy came cold come crop crazy curled day days decided decided eat even ever everyone farmer farmer find finest for found freezing farmer gave go going great grow growing guidance had hard he heat him his if immediately in into it journey julius juliuss keep knew last long magic managed many much months named never night noble noon of on once one out people persevered potatoes probably praises raining reggie roadside sang searing secret seek seen set shouldnt sign sky sleep so soaked started stopped store storekeeper that the there this to told travelled trees tried try umbrella underneath undeterred village was were who whole wondered world']]
### Name
John
### Number
234


### More on __init__(self)'

__init__ is a method that is called when an instance of a class is being created. It simply allows us to specify what properties will be assigned to the new instance/object, and what code will be ran while instance is being created.

'self' is a parameter of the 'init' method. 'Self' refers to the current instance being created (i.e., a new instance of the 'Car' class, which is 'my_car').

In the example below, the Python interpreter automatically calls __init__, with self and "Tampa Bay Buccaneers" as its arguments. In this example, self is really just a reference to bucs, so when we set self.name inside the __init__ method, we are really setting bucs.name. If we created a new team named my_team, we'd be setting my_team.name within the __init__ method. However, we don't need to worry about the instance's name when we are defining a method, since we can always access it using self.

In [456]:
class Team():
    def __init__(self, name):
        self.name = name # will be interpreted in 3 lines as:
      # bucs.name = "Tampa Bay Bucaneers"
    def print_name(self): # This is a method
        print(self.name)
        
bucs = Team("Tampa Bay Buccaneers")

### Instance vs. Automatic Methods

In [457]:
bucs = Team("Tampa Bay Buccaneers") ## Creates an instance of the Team class and auto-calls `__init__`, based on the class definition.
bucs.print_name() ## Calls the method ***on the instance***.

Tampa Bay Buccaneers


## Instances 

#### 'as'

Captures an instance of a class.

In [458]:
try:
    int("")
except Exception as exc:
    
    # Demonstration:
    print("• Type of Exception is a class (i.e., a 'type' in itself, in this context):")
    print(type(Exception))
    print("\n • Type of exc is 'ValueError', which is an instance of Exception class (note that 'Exception' class can have instances that has another type):")
    print(type(exc))
    print("\n")
    
    # Actual code block:
    print(str(exc))

• Type of Exception is a class (i.e., a 'type' in itself, in this context):
<class 'type'>

 • Type of exc is 'ValueError', which is an instance of Exception class (note that 'Exception' class can have instances that has another type):
<class 'ValueError'>


invalid literal for int() with base 10: ''


## Sub- and Superclasses, Inheritance, Overriding and Overloading


### Sub- and Superclasses

***Create a subclass with multiple superclasses and initialize \_\_init\_\_ method of the superclasses:***

In [459]:
# class subclass(superclass_1, superclass_2):
#   def __init__(self, param):
#       superclass_1.__init__(self)
#       superclass_2.__init__(self, param)

### Overriding Methods

Overriding can be thought of as ***'If a str class ever calls me, do not use its internal \_\_str\_\_() method, but use my \_\_str\_\_() method!'***

***As an important note***: In order to override another class' attribute, a class does not need to be a subclass of it. In the below examples, for instance, Person class overrides the \_\_str\_\_() method of the str class, but Person is not a subclass of str.

A more formal definition of method overriding: 

"Method overriding allows a subclass to provide a different implementation of a method that is already defined by its superclass or by one of its superclasses. The implementation in the subclass overrides the implementation of the superclass by providing a method with the same name, same parameters or signature, and same return type as the method of the parent class."
(Source: https://www.python-course.eu/python3_inheritance.php)

#### A Method Override Example: Person & Employee (With Step-by-step Annotation)

***Create a superclass and subclass with overriding methods:***

In [250]:
# Example taken from: https://www.python-course.eu/python3_inheritance.php

class Person:

    def __init__(self, first, last, age):
        self.firstname = first
        self.lastname = last
        self.age = age
        
    # override the str.__str__() method.
    def __str__(self):
        # when this object is called as a string (e.g., by print() or str()), return this:
        return self.firstname + " " + self.lastname + ", " + str(self.age)

    def constructName(self):
        return self.firstname + ' ' + self.lastname
    
class Employee(Person):
    
    # __init__() method of Employee class overrides any other __init__() method (e.g., from Person class) that may 
    # call the Employee object.
    
    # when __init__() method is called (or when simply instantiating this class)...
    def __init__(self, first, last, age, staffnum):
        
        # initialize the superclass with the required parameters
        # and by doing so, IMPORT the superclass Person's now-initialized ATTRIBUTES and METHODS
        # after this is done, first, last, and age attributes can also be called from an Employee class instance.
        super().__init__(first, last, age)
        
        # on top of the attributes imported from the Person superclass, add this attribute
        self.staffnumber = staffnum

    # also override __str__() method of the base 'str' class and 'Person' classes (by creating a method named as '__str__()' which is 
    # the name of the methods in str and Person classed that store the string representation of the object)
    def __str__(self):
        # when this object is called as a string (e.g., by print() or str()), return this:
        return super().__str__() + ", " +  self.staffnumber
    
    def getEmployee(self):
        # when this method is called, self.Name will be returned for Employee class, even though .constructName is a 
        # method of the Person class—an method that is initialized and imported when previously called by 
        # Employee.__init__() method)!
        return self.constructName() + ", " +  self.staffnumber
    
my_person   = Person("Marge", "Simpson", 36)
my_employee = Employee("Homer", "Simpson", 28, "1007")

my_employee.getEmployee()

'Homer Simpson, 1007'

***Call object as string to execute its \_\_str\_\_() method:***

When the object is called as a string, its \_\_str\_\_() method is executed.

In [251]:
print(my_person)
print(my_employee)

Marge Simpson, 36
Homer Simpson, 28, 1007


***Call object as object:***

When the object is NOT called a string, however, its \_\_str\_\_() method is not initialized. Instead, the object itself returns:

In [252]:
my_person

<__main__.Person at 0x20df91ac2b0>

In [253]:
my_employee

<__main__.Employee at 0x20df91ac2e8>

***Manually call \_\_str\_\_() method of object:***

The string representation of the object can also be requested by manually triggering its \_\_str\_\_() method.

In [254]:
my_person.__str__()

'Marge Simpson, 36'

In [255]:
my_employee.__str__()

'Homer Simpson, 28, 1007'

***Manually call atrributes of object:***

In [256]:
my_person.firstname

'Marge'

In [257]:
my_person.lastname

'Simpson'

#### Method Overriding with an Example: __str__() method


***Create a string in shortand and formal ways:***

Although a string is generally created and called like this:

In [258]:
my_string = 'my text'
my_string

'my text'

...it can also be created and called like this:

In [259]:
my_string = str('my text')
my_string.__str__()

'my text'

Therefore, what 'my_string' command does, is essentially to call my_string.__str__(), (i.e., the .__str__() method of 'mys_tring', which is a 'str' class object.). my_string.__str__() means: "return the string representation of this str object". when 'my_string' is called directly, it also does the same thing, but this is not the case in other object types; instead of returning the value of and attribute inside them (e.g., a string), they return an object such as <preprocessor.csv_tools.CSV_File object at 0x0000019A11BE8C18>.

This can be changed, however, by using a __str__() method inside a class, and thereby specifying what the string representation of an instance would be. (This is overriding the __str__() method of the str class.)

NOTE: **In order to override another class' attribute, a class does not need to be a subclass of it.**


***Override a method of a base class:***

Override the \_\_string\_\_()  method of the base Python class 'str', when an instance of Person object is called as a string (e.g., with str() or print(), this Person.\_\_str\_\_() will be executed, instead of str.\_\_str\_\_() from Python's base package):


In [260]:
class Person:
    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name
    def __str__(self):
        return self.first_name + ' ' + self.last_name

When called directly, object does not return a string:

In [261]:
person_1 = Person('John', 'Doe')
person_1

<__main__.Person at 0x20df918f748>

But when called ***as a string*** —that is, by a method that attempts to treat it like a string— the object does return a string (this is because those 'string-expecting' calls are directed to 
person\_1._\_str\_\_() method, instead of person\_1 object directly):

In [262]:
str(person_1)

'John Doe'

In [263]:
print(person_1)

John Doe


***Make class a subclass of a base class***

Another way to return a string when a class instance is called, could be by making the class a subclass of str class:


In [264]:
class Person(str):
    def __init__(self, first_name):
        self.first_name = first_name
        str().__init__(self)
person_1 = Person('John')
person_1

'John'

However, this would limit the initialization of the class, as it would now has to take exactly same number of parameters with the base str class:

In [265]:
class Person(str):
    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name
        str().__init__(self)

try:        
    person_1 = Person('John', 'Doe')
except Exception as exception_message:
    print('Error as expected: ' + str(exception_message))

Error as expected: decoding str is not supported


### Overloading

Overloading is the ability to define the same method, with the same name but with a different number of arguments and types. It's the ability of one function to perform different tasks, depending on the number of parameters or the types of the parameters. 

JCL: e.g., if a method does different things for string and list inputs, this is likely called overloading.

## Chaining Methods (Fluent Interfaces/Patterns) 

This is a one way to construct and call methods: 

In [578]:
class Poem(object):
    def __init__(self, content):
        self.content = content

    def indent(self, spaces):
        self.content = " " * spaces + self.content
        
    def suffix(self, content):
        self.content = self.content + " - " + content
        
my_poem = Poem("Road Not Travelled")
my_poem.indent(4)
my_poem.suffix("Robert Frost")
my_poem.content

'    Road Not Travelled - Robert Frost'

The above method does not allow chaining because its methods return nothing (Python's default behavior):

In [579]:
try:
    Poem("Road Not Travelled").indent(4).suffix("Robert Frost").content
except AttributeError as error_message:
    print(error_message)

'NoneType' object has no attribute 'suffix'


Returning `self` from methods is one way to implement chain methods (fluent interface).

In [580]:
class Poem(object):
    def __init__(self, content):
        self.content = content

    def indent(self, spaces):
        self.content = " " * spaces + self.content
        return self

    def suffix(self, content):
        self.content = self.content + " - " + content
        return self

Poem("Road Not Travelled").indent(4).suffix("Robert Frost").content

'    Road Not Travelled - Robert Frost'

### Sources for Classes and Inheritance: 
- https://www.python-course.eu/python3_inheritance.php


### Examples for Working with Classes and Inheritance:
- preprocessor.string_tools.File_Path


# MODULES AND PACKAGES

Hierarchy of nested strutures in Python:

1. **Package**: Python folder (with an "__ init _ _".py file in it) <br>
2. **Module**: Python file (can contain functions or classes)<br>
3. **Class**:  A dynamic way to group functions and variables <br>
4. **Function**: A way to group commands <br>
5. **Command and Variable**

Gist of section:
   

- Dot notation is used both in import function and variable assignments, but import function can only *directly* import packages and modules using dot notation (e.g., with *import my_package* or *import my_package.my_module*) , and NOT variables, functions, or classes. The latter elements can be either imported with 'from' operator (albeit after dot notation in import function; e.g., *from my_package.my_module import my_function*), or assigned to variables using dot notation AFTER (so, not during import) the modules they belong to are imported (e.g., *my_module.my_function*).   


- Use paths relative to an entity that is one level higher than the packages in directory structure, and do this no matter on which directory level an 'import' function is used in a package (i.e., everything is relative to top level).


- Send to iPython console of IDE's (e.g., Alt + Shift + E in PyCharm) may not work well with commands like import if due to difference in directory level a command is called from in a file, and the working directory   


- In --init--.py files, explicitly import modules (also sub-packages if they are not going to be imported separately later)

## Working with Directories and Import Function

The current working directory looks like this and it will be used throughout this section:

![image.png](attachment:image.png)

- Addressing to nested structures (i.e., folders/packages, files/modules) during Python imports should be made in the following fashion:<br> 


    folder/subfolder/file.py or rather package/sub_package/module.py becomes:<br>
    folder.subfolder.file    or rather package.sub_package/module

- Python searches for packages and modules ***only*** in three places:
    1. Own directory of the .py script file that called the module or package
    2. The 'PYTHONPATH'
    3. The installation-dependent default directory (e.g., Anaconda's python directory, or PyCharm's current project directory)


- These three directories are added by default to **sys.path** variable.


- The current ***'working directory' is not the same with 'import directory'***. Working directory can be changed with setwd command, but import directory cannot.

#### Import a path to sys.path variable

The way to import a file that is not on sys.path varialble is to insert the location of the file to sys.path variable as following:

inside my_file.py:<br>
import sys<br>
sys.path.insert(0, '/path/to/file/B') # inserting at position 0 guarantees that the name of the file/module will take<br> priority in case there is another file/module in the path with the same name<br>
<br>
outside my_file.py:<br>
import my_file<br>

### The Relative Paths Issue
While a package (or one of its modules) is being imported by the current file, the --init--.py file of the package (and its subdirectories) are inserted to the current file. I.e., the contents of the --init--.py files are NOT ran from the directory where they exist, but they are ran from the directory of the file that called them. This is why:

1 - Path names in --init--.py's import functions must be relative to file which used the import command.<br>
2 - "Send to terminal" functions in IDEs may not work (as terminal's working directory and file's directory may be different.)

These items are explained further below:

#### **1. All paths used in import functions must be relative to the file from which the first import command is originating from**<br>
When addressing nested structures (which is always the case one way or the other with packages and modules), the perspective of the file that is addressing the package (i.e., origin_of_import_command file) should be taken. Because this file would be outside the package, the relative path names in the package must always start from the beginning of the package (i.e., be relative to the directory that is one level higher than the package). For instance, in this directory structure:

-origin_of_import_command.py<br>
--package<br>
----subpackage<br>
-------init.py <br>
-------module_2<br>
----init.py <br>
----module_1<br>

module 1 and 2 should be addressed this way:<br>
- import package.module_1<br>
- import package.subpackage.module_2<br>

and not this way:<br>
- import module_1<br>
- import module_2<br>

(Tha latter way could work in some places and not in others (e.g., it works in PyCharm but does not in Jupyter), therefore it should not be used.)

This is especially true for import functions used in --init--.py files of packages (e.g., the latter way in the above example may want to be used by the --init--.py files of both packages, as they are relative to the package [and not to the origin_of_import_command.py]).

#### 2. Send to 'iPython Terminal' may not Work well for Commands that has Relative Paths
When a code segment is transferred to iPhthon terminal and ran (Shift + Alt + E), this command takes the project root directory that PyCharm set as the place to look for modules, and not the (sub-)directory of the file that the command that is now in the terminal came from. This is not the case when the file is run directly —if a file is ran (Shift + F10) in this case the place to look for modules is the (sub-)folder of the file where the command came from.

### --init--.py files

- If a folder contains an --init--.py file, this folder will be seen by Python as a package. 
- --init--.py files are generally left empty. However, PyCharm does not seem to accept this (while Jupyter seems to be OK with it). Therefore, for compatability purposes, it is likely to be a good idea to explicitly import modules and subpackages in --init--.py
    


#### Manually importing modules and subpackages of a package with --init--.py
- Seems to be a requirement in PyCharm
- Import functions in --init--.py files should use relative paths that are compatible with usage from one directory level upper than the package folder (see "The Relative Paths Issue" section for an explanation why).
- In the example directory, the two --init--.py files contain these:

test_package's --init--.py:

In [276]:
import test_package.print_hello_function_container
import test_package.print_hello_class_container
import test_package.print_hello_direct # note that  the paths should include root (i.e., package name)
                                       # a path without the root package name does not always work (e.g., it works inPyCharm, but not in Jupyter)

# If the subpackage will not be imported later with a separate import command, it can also be included in initial import 
# import package_within_package 

package_within_package's --init--.py:

In [277]:
import test_package.package_within_package.print_bye_function_container
import test_package.package_within_package.print_bye_class_container
import test_package.package_within_package.print_bye_direct

#### Other Elements that can be Added to an --init--.py file

- **all = []**: --init--.py files can also contain a statement such as "--all-- = [print_hello_direct, print_hello_function_container]", which would modify the --all-- variable (which is set to include all files/modules in the package folder by default) to only specific files, and thereby exclude loading of other files/modules. This could be useful if the excluded modules are used as internal  modules.


- Alternatively, an --init--.py file can contain **commands to be ran on initialization** of the package.


### Import Function ###
**import** (no parantheses required)


Only packages and modules can be imported and not variables, functions or classes. The latter elements can, however, be assigned to variables after import (see examples below). 

***Import a package***

In [278]:
import test_package

***Import a package within a package***

In [279]:
import test_package.package_within_package

In [280]:
from test_package import package_within_package

***Import a module of a package***

In [281]:
import test_package.print_hello_function_container

In [282]:
from test_package import print_hello_function_container

***Import a function of a module***

A function cannot be directly imported with dot notation:

In [283]:
try:
    import test_package.print_hello_function_container.print_hello_function
except:
    print("A function cannot be imported directly with dot notation")

A function cannot be imported directly with dot notation


The correct way to import a function is to first import the module that the function belongs to, and then call the function from this now-imported module (***Style 1***):

In [284]:
# Call function after import without variable assignment
import test_package.print_hello_function_container #import module
test_package.print_hello_function_container.print_hello_function() # call function from the imported copy of library
# Note that line 2 is not adressing any files outside the namespace; it addresses a varible in the now-imported module, 
# ... which is in the current namespace.  


# OR, assign the function to a variable first, and then call it

import test_package.print_hello_function_container
print_hello = test_package.print_hello_function_container.print_hello_function
print_hello()

hello
hello


An easier way to accomplish the same thing with ***Style 2*** can be:

In [285]:
from test_package.print_hello_function_container import print_hello_function as print_hello
print_hello()


# TO IMPORT A FUNCTION WITH ITS ORIGINAL NAME:
from test_package.print_hello_function_container import print_hello_function
print_hello_function()



# print_hello = from test_package.print_hello_function_container import print_hello_function
# print_hello = import test_package.print_hello_function_container
# The commented lines above give an "Invalid syntax" errors. 
# ... This is because impor statements (in any of their forms with operators including 'from') cannot be assigned to variables in a single command, and the 
# The correct way to assign on one line is to use 'as' operator


hello
hello


***Import a class, instantiate it, call a property and method*** 

Style 1:

In [286]:
import test_package.print_hello_class_container                          # Import module that the class is in
classified = test_package.print_hello_class_container.Print_hello_class  # Extract the class from the module and assign it to a variable
print_hello_instance = classified()                                      # Instantiate the class
print(print_hello_instance.name)                                         # Print a property of the instance
print_hello_instance.print_hello_method_within_class()                   # Call a method of the instance

Name
hello


Style 2:

In [287]:
from test_package.print_hello_class_container import Print_hello_class # Import the class
print_hello_instance = Print_hello_class()                             # Instantiate the class
print(print_hello_instance.name)                                       # Print a property of the instance
print_hello_instance.print_hello_method_within_class()                 # Call a method of the instance 

Name
hello


***Import a command***

In [288]:
import test_package.print_hello_direct

- The content of print_hello_direct.py is: <br>
    print("hello")
  
  
- In PyCharm, importing this file automatically prints "hello" to a console. This does not happen in Jupyter. 

- It is likely that if a command in an imported file is not wrapped in a function or class statement, it will be ran directly. This may, however, depend on security settings of the current environemnt.

***Import a subpackage*** (i.e., a package within package)

In [479]:
import test_package.package_within_package

***Import all modules, functions, or classes***

Below method can be used for this purpose, but it is considered messy because does not make explicit what is being imported, and thus, it is frowned upon.

In [480]:
from test_package import *

In [481]:
from test_package.print_hello_function_container import * 

### Additional Notes on Directories and PyCharm

- PyCharm adds a project's root directory to sys.path variable (and it does this per PyCharm session, not per open project)

    - When **more than one project** is open in PyCharm, this adds to sys.path only one of the projects, and this may lead to issues. Try to close all other projects and restart PyCharm in case of import()-related problems. 


## Type-1 Import: Module

In [482]:
import test_package

test_package.print_hello_function_container.print_hello_function()

hello


In [483]:
import test_package.print_hello_class_container

# Create an intance of the class
my_instance = test_package.print_hello_class_container.Print_hello_class()
my_instance

<test_package.print_hello_class_container.Print_hello_class at 0x20df9b28cf8>

In [484]:
import test_package.print_hello_function_container

test_package.print_hello_function_container.print_hello_function()

hello


In [485]:
try:
    import test_package.print_hello_function_container.print_hello_function
except: 
    print("Functions cannot be imported directly")

Functions cannot be imported directly


In [486]:
import test_package.print_hello_direct

## Type-2 Import: Function or Class

Modules can contain functions or classes. In either case, individual functions or classses within the module should be called when using type 2 imports

Import function with its own name:

In [487]:
from test_package.print_hello_function_container import print_hello_function

print_hello_function()

hello


Import module with alias:

from test_package.print_hello_function_container import print_hello_function as print_hello

print_hello()

# FOR LOOPS

## Simple For Loop

In [298]:
my_list = ["a", "b", "1", "2"]

for my_element in my_list:
    print(my_element)

a
b
1
2


***Usage scenario - Modify a column and update dataset:***

In [299]:
data = [
    ["a", 1],
    ["b", 2]
]

for row in data:
    row[0] = row[0].upper()
print(data)

[['A', 1], ['B', 2]]


***Usage scenario - Extract column:***

In [300]:
data = [
    ["a", 1],
    ["b", 2]
]

extracted_column = []
for row in data:
    extracted_column.append(row[0])
print(extracted_column)

['a', 'b']


## Iterating Over Multiple Lists Simulaneously

### enumerate()

**Usage:**<br>
enumerate(LIST_OR_DICTIONARY)

- 'enumerate' cretes a list of integers, whose items are simply the index positions of the inputed list.

- When put in a for loop, the inputted list and the the enumerated list are iterated simulataneously.

- This is likely the same thing as iterating over a range.

***Iterate over one list:***

In [301]:
animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]

for i, animal in enumerate(animals):
    print(i, animal)

0 Dog
1 Tiger
2 SuperLion
3 Cow
4 Panda


***Iterate over two lists***:

In [302]:
animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]
viciousness = [1, 5, 10, 10, 1]

for i, animal in enumerate(animals):
    print(animal)
    print(viciousness[i])

Dog
1
Tiger
5
SuperLion
10
Cow
10
Panda
1


***Iterate over a dictionary:***

In [303]:
animal_counts = {"Dog": 4, "Tiger": 1, "SuperLion": 0, "Cow": 2, "Panda":1}

for i, key in enumerate(animal_conts):
    print(i, key, animal_counts[key])

NameError: name 'animal_conts' is not defined

### range()

In [304]:
animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]

for i in range(0,5):
    print(i)
    print(animals[i])

0
Dog
1
Tiger
2
SuperLion
3
Cow
4
Panda


#### range() in combination with len():

In [305]:
animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]

for i in range(0,len(animals)):
    print(i)
    print(animals[i])

0
Dog
1
Tiger
2
SuperLion
3
Cow
4
Panda


The index ('i') can be used to iterate over multiple lists.
The same thing can also be achieved with the range function.

## zip()

***Iterate over multiple lists:***

In [306]:
a = [1,2,3]
b = [10,20,30]
c = [100,200,300]

for m,n,o in zip(a,b,c): 
    print(m)
    print(n)
    print(o)

1
10
100
2
20
200
3
30
300


***Usage scenario - calculate row totals:***

In [307]:
a = [1,2,3]
b = [10,20,30]
c = [100,200,300]

row_totals = []
for m,n,o in zip(a,b,c): 
    row_totals.append(m + n + o)
print(row_totals)

[111, 222, 333]


## Using Templates to Represent Structures of Input Variables 

Note the symbolic representation of x and y in statements below. These are simply placeholders corresponding to each element in the structure of the target variable (in this case, a dictionary). In this case, 'x' and 'y' represent the structure of the dictionary.

In [308]:
dictionary = {"a":1, "b":2}

for x, y in dictionary.items(): #.items method is necessary to enumerate elements of a dictionary 
    print(x)

a
b


Find expression at the beginning of target string (^...):


Placeholders can be used to represent more complex structures as well.

In [309]:
my_list = [["a", "A", 1], ["b", "B", 2]]

for x, y, z in my_list:
    print(z)

1
2


In [310]:
for arbitrarily, named, placeholders in my_list:
    print(named)

A
B


## List Comprehensions for For Loops and Control Flows

For loops can be compacted by writing them in a list.

List comprehensions follow the following template:

***result = [transformation | iteration | filter]***
- transformation: code block
- iteraton: for loop
- filter: if statement

Explanatory example:

In [311]:
animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]

# Traditional for loop (takes 3 lines)
# List lengths of strings in the animals list:
animal_lengths = []
for item in animals:
    animal_lengths.append(len(item))
print("animal_lengths (traditional expression)           : " + str(animal_lengths))

# POSSIBLE USAGE
# The same for loop can be written in a more compact form using list comprehension (takes 1 line):
# In this example, the code follows this structure:
# [result]
# [command | iteration] (this usage is not recommended over the one below)
animal_lengths = []
[animal_lengths.append(len(item)) for item in animals]
print("animal_lengths (comprehended expression)          : " + str(animal_lengths))
    
# CORRECT USAGE
# The way comprehensions work is slightly different than normal code blocks.
# If a comprehension is assigned to a variable, it will append all values returned from a for loop in it
# So, the code above can be further shortened as following:
# result = [transform | iteration]
animal_lengths = [ len(item) for item in animals ]
print("animal_lengths (even more comprehended expression): " + str(animal_lengths))

animal_lengths (traditional expression)           : [3, 5, 9, 3, 5]
animal_lengths (comprehended expression)          : [3, 5, 9, 3, 5]
animal_lengths (even more comprehended expression): [3, 5, 9, 3, 5]


In comparison to for loops, this is what happens in a comprehended list (notice that two code fragments below are identical in their results; however, an important difference between them is the 'append' method. .append is intrinsic to list comprehensions and is not needed to be explicitly written; results of each iteration in a for loop will be returned as a list from a list comprehension.):

![_auto_0](attachment:_auto_0)

#### Use case for list comprehensions: Extract columns from data

In [312]:
data = [
    ["score", "team", "year"], 
    [103, "blue", 2015], 
    [99, "red", 2015],
    [86, "green", 2015]
]

# Extract columns in data
scores_column = [row[0] for row in data[1:len(data)]]
teams_column = [row[1] for row in data[1:len(data)]]
years_column = [row[2] for row in data[1:len(data)]]

print("scores_column: " + str(scores_column))
print("teams_column: " + str(teams_column))
print("years_column: " + str(years_column))

scores_column: [103, 99, 86]
teams_column: ['blue', 'red', 'green']
years_column: [2015, 2015, 2015]


## Itertools

Also see: Itertools module
https://docs.python.org/2/library/itertools.html

# CONTROL FLOW

### If-Else Block

In [313]:
a = 5
b = 0

if (a < b):
    print("a is smaller")
else:
    print("a is bigger")

a is bigger


## Try-Except Block: Exception Handling

***Catch error and print custom error message:***

In [314]:
try:
    float("hello")
except Exception: # 'Exception' statement states that the followng code is of 'Exception' class. The code runs without this statement as well though.
    print("Error converting to float.")

Error converting to float.


***Catch and print error message:***

In [315]:
try:        
    1/0
except Exception as exception_message:
    print('This is the error message that returned: ' + str(exception_message))

This is the error message that returned: division by zero


## pass

Does nothing. Use when this is the desired behavior.

In [316]:
try:
    int('')
except Exception:
    pass

# TUPLES

- Tuples are indivisible (i.e., unindexed) lists.

In [317]:
a = (1, 2)
b = ("a", "b")

print(a)
print(b)

print(type(a), type(b))

(1, 2)
('a', 'b')
<class 'tuple'> <class 'tuple'>


# LISTS

Lists...
- Can have duplicate values
- Are ordered (each value has an index value)

*This is a list*:

In [318]:
["Dog", "Cat", "Hippo", "Dog"]

['Dog', 'Cat', 'Hippo', 'Dog']

Lists can be created this way:

In [319]:
my_animals = ["Dog", "Cat", "Hippo", "Dog"]
print(my_animals)

['Dog', 'Cat', 'Hippo', 'Dog']


## Insert item at given index position
**.insert()**

Usage:<br>
LIST.insert(OBJECT, INDEX)

In [320]:
my_list = [1, 2, 3]

my_list.insert(1, "insert")
print(my_list)

[1, 'insert', 2, 3]


## Concatenate values from multiple lists into a list of tuples
**list(zip())**

In [321]:
list_1 = [1, 2, 3]
list_2 = [4, 5, 6]

zipped_list = list(zip(list_1, list_2))
print(zipped_list)

[(1, 4), (2, 5), (3, 6)]


# SETS

In [322]:
my_animals = list(["Dog", "Cat", "Hippo", "Dog"])
my_animals

['Dog', 'Cat', 'Hippo', 'Dog']

Sets... 
- Cannot have duplicate values. 
 (If a duplicate item is attempted to be added, it is ignored.)
- Are unordered.
 (Not indexed)

*This is a set*:

In [323]:
["Dog", "Cat", "Hippo"]

['Dog', 'Cat', 'Hippo']

## set()


- creates sets 
- accepts lists as input

Note the curly brackets in the output below. Curly braces means that there is no indexing.

In [324]:
unique_animals = set(["Dog", "Cat", "Hippo", "Dog", "Cat", "Dog", "Dog", "Cat"])
print(unique_animals)

{'Dog', 'Cat', 'Hippo'}


## add()

In [325]:
unique_animals.add("Tiger")

## remove()

In [326]:
unique_animals.remove("Dog")

# DICTIONARIES

Create a dictonary:

In [327]:
# Ergonomic notation:
fruits = {
    "apple": 2,
    "orange": 5,
    "melon": 10
}

# Compact notation:
fruits = {"apple": 2, "orange": 5, "melon": 10}

Add item to or change a value in dictionary:

In [328]:
dictionary["c"]=5
print(dictionary)

{'a': 1, 'b': 2, 'c': 5}


In [329]:
dictionary["a"]="..."
print(dictionary)

{'a': '...', 'b': 2, 'c': 5}


## Iterating Over Dictionaries

### Basic Iteration for Retrieving Values (Traditional For Loop)
**for | dict[key]**

- For loops iterate over dictionaries using keys as index values.
- Therefore, when referring to the current index in the loop, key value should be used (of course, the word 'key' {which is in between 'for' 'in'} can be anything in the statement below [e.g., 'my_key', 'item', or 'i'], but its function would remain the same: it would refer to the KEY in the dictionary, and return its corresponding VALUE, just like a normal index integer does).

In [330]:
my_dictionary = {"a":1, "b":2, "c":3}

for key in my_dictionary:
    print(my_dictionary[key]) # Prints the VALUE of the currently addressed KEY in the DICTIONARY

1
2
3


### Basic Iteration for Retrieving Keys
**in | not in**

In [331]:
my_dictionary = {"a":1, "b":2, "c":3}

for key in my_dictionary:
    if "a" in key:
        print("key 'a' is found")
    else:
        print("key 'a' is not found")

key 'a' is found
key 'a' is not found
key 'a' is not found


### Advanced Iteration for Retrieving both Keys and Values
**.items()**



Print all keys and values (for loop with .items method):

In [332]:
dictionary = {"a":1, "b":2}

for key, value in dictionary.items():
    print (key)
    print (value)

a
1
b
2


### Changing a Key in a Dictionary
**indexing operator**

- Unfortunately, a key cannot be directly changed.
- But it can be addressed.
- So, a workaround is to create a new key by addressing the old one.
- And then removing the old key.

**Usage**:<br> 
  dictionary[new_key] = dictionary[old_key]<br>
  del dictionary[old_key]

In [333]:
my_dictionary = { 'first':1, 'second':2, 'third':3 }

my_dictionary['first_number'] = my_dictionary['first'] # Although this adds the key as the the last item of the dictonary ...
print(my_dictionary)                                   # ... dictionaries do not have an order, so that is not a problem.
del my_dictionary['first']

print(my_dictionary)

{'first': 1, 'second': 2, 'third': 3, 'first_number': 1}
{'second': 2, 'third': 3, 'first_number': 1}


### Iterating a Dictionary in combination with an 'i' counter
**.enumerate** | **dictionaries**

In [334]:
my_dictionary = { 'first':1, 'second':2, 'third':3 }

for i, key in enumerate(my_dictionary):
    print(i)
    print(key)
    print(my_dictionary[key])
    print("")

0
first
1

1
second
2

2
third
3



# SCOPES

## Inheritance Hierarchy
This is the order environments that will be searched when a variable name is called:

1. Local scope 
2. Enclosing scopes
3. Global scope 
4. Built-in functions (i.e., is the name called a built-in function or constant or name?)

This hierarchy means that even a built-in function can be overwritten if it is assigned a new value:



In [335]:
# Sum works normally
x = sum([1,2])
print("[1] x is " + str(x))

# Back up sum() before changing it
sum_backup = sum
print("[2] sum's type is 'function': " + str(type(sum)))
print("[3] sum_backup's type is 'function': " + str(type(sum_backup))) # backup variable does indeed seem to hold thesum() function itself


# Change built-in variable sum to an integer
sum = 0 # sum is now not a function, but the integer '0'
print("[4] sum is now " + str(sum))  
print("[5] sum is now 'int':" + str(type(sum))) 

try:                                           # test what happened to sum() 
    print(sum([1,2]))
except:
    print("[6] sum() does not work.")

sum = sum_backup                               # restore sum() function

try:
    print("[7] sum() works again:" + str(sum([1,2])))
except:
    print("Sum does not work.")

[1] x is 3
[2] sum's type is 'function': <class 'builtin_function_or_method'>
[3] sum_backup's type is 'function': <class 'builtin_function_or_method'>
[4] sum is now 0
[5] sum is now 'int':<class 'int'>
[6] sum() does not work.
[7] sum() works again:3


## Local Scopes

- Local variables are **destroyed upon exiting** the environment in which they are created in.
- <font color=red>Commans in local scopes cannot modify variables from a higher environment</font> (if asked to do so, they would create local copies of those variables, and modify only these local copies, which would be desroyed upon exit from the scope [see the example below].).

In [336]:
b = 1

# This is a function with a local variable:
def my_function():
    b = 10  # This is not a new value assignment to the file-level 'b'... 
            # ... but is creating a local copy of 'b' and assigning it a new value.
            # This local copy is destroyed upon exit, therefore cannot be printed.
            # This function does not have any effect on the file-level 'b' varibles.

my_function()
print("local_b: " + str(b))
# print(b) Does not print '1', but instead the latest assigned value of b. 
# This is because local variables are destroyed upon quitting the local environments

local_b: 1


## Enclosing Scopes and File Level Variables

A file itself is an example of an enclosing *environment*. However, although higher level than all the sub-scopes within it, a file is not a global variable (e.g., there may be other files that refer to, and thus locally contain this file). A global variable should be created using 'global' keyword.


In [339]:
b = 2  # a file level variable 

# This is a function with a local variable that is on the highest scope... 
# ... within this file:
def my_function():
    b = 20  # local commands cannot change variables from higher environments.
           # This value won't be printed when b is called from the outside...
           # ...because it only exists within the inside of the scope.

my_function()
print("file_b: " + str(b))

file_b: 2


## Global Scope
'global' keyword

In [340]:
b = 3

# This is a function with a global variable:
def my_function():
    global b # Global variables cannot be declared and initiated at the same time.
    b = 30
    
my_function()
print("global_b: " + str(b))

global_b: 30


It is generally not recommended to use global variables because using them makes functions dependent on values.

## For Loops and Scopes

### For Loops and If-then Environments Do not Create Isolated Envioroments

- Unlike functions, for loops and if statements do not create new environments.
- For loops and if statements can modify variables outside their own enclosures.

In [341]:
# For loops don't create new environments
b = 4
for i in range(0, 1):
    b = 40

print("for loop b: " + str(b))

for loop b: 40


### But Temporary Variables in For Loops are Still Desroyed After Loop<a name="test"></a>

- Like functions, *temporary* variables in for loops cannot be used to implicitly assign values to variables outside the loop.

#### ***Value Reassignment to External Variables - The Correct Way*** 

The section below explains through demonstrative examples.

**Correct usage**: ***a[i]*** and ***enumerate(a)*** (Explicit value assignment by addessing to the external variable's index position by using an integer, and not a list element).

In [342]:
a = ["1","2"] # <-- Changes!
for i, item in enumerate(a):
    a[i] = 0
print(a)

[0, 0]


**Common mistake**: ***item = 0***  (Implicit value assignment with tempory variable.)

In [343]:
a = ["1","2"]  # <-- Does not change
for item in a:
    item = 0
print(a)

['1', '2']


**Faulty Usage**: ***item[i] = x*** (Using the temporary index vector (i) to index an element instead of the list)

In [344]:
a = ["1","2"]  # <-- Does not change.
try: 
    for i, item in enumerate(a):
        item[i] = 0
    print(a)
except:
    print("Does not work.")

Does not work.


**Faulty usage**: ***a[item]*** (Using an element of the list a as an index value)

(However, see [Iterating with a Custom Order](#Iterating-with-a-Custom-Order) section for a potential use case of this style of writing a for loop.)

In [345]:
# A 'lucky' case:
# This (i.e., a[item]) is not a regular for loop style, and is faulty usage!
a = [1, 2] # <-- Changes. But (unless specifically intended) by chance. 
for item in a:
    a[item] = 0
print(str(a) + " <--Works, but only because list 'a' consists of integers.") # <-- Works only becase the elements of this list are sequential integers, and they can be used as index positions! 

# A not-so-lucky case:
try:
    a = ["1","2"] # <-- Does not change
    for item in a:
        a[item] = 0
    print(a)
except:
    print("Does not work. Indexes must be integers.")

[0, 0] <--Works, but only because list 'a' consists of integers.
Does not work. Indexes must be integers.


**On why the previous usage style works, and a potential use case for it**:


What happened above (and what happens below) is that the elements of the list is being simultaneously used as:
 - List's items
 - List's own 'custom order' index values (i) -- this is possible only because they are integers, just like the temporary index values that would be created by iteration commmands like enumerate() (these temporary index values are not created in reguar for loops in Python, which simpy iterates over the ***objects*** of a list).

In [346]:
# Values of a is used as both elements of the list, and index values.
a = [1, 0, 2]
for item in a:
    print(a[item])

print("\n")
    
#This can be used to create custom indexing orders:
iteration_order = [1, 0, 2]
a = ["a", "b", "c"]
for item in iteration_order:
    print(a[item])

0
1
2


b
a
c


# Iterating with a Custom Order

In [347]:
iteration_order = [1, 0, 2]
a = ["a", "b", "c"]
for item in iteration_order:
    print(a[item])

b
a
c


**IMPORTANT:** This iteration style (i.e., ***a[item]***) should not be used for regular for loops, as it only works because the elements of the list is integers. The same for loops structure would break if list elements were, for instance, strings. For more information, see [For Loops and Scopes](#But-Temporary-Variables-in-For-Loops-are-Still-Desroyed-After-Loop) section.


An example for this faulty usage (taken from the aforementioned section):

In [348]:
# A 'lucky' case:
# This (i.e., a[item]) is not a regular for loop style, and is faulty usage!
a = [1, 2] # <-- Changes. But (unless specifically intended) by chance. 
for item in a:
    a[item] = 0
print(str(a) + " <--Works, but only because list 'a' consists of integers.") # <-- Works only becase the elements of this list are sequential integers, and they can be used as index positions! 

# A not-so-lucky case:
try:
    a = ["1","2"] # <-- Does not change
    for item in a:
        a[item] = 0
    print(a)
except:
    print("Does not work. Indexes must be integers.")

[0, 0] <--Works, but only because list 'a' consists of integers.
Does not work. Indexes must be integers.


# SIMPLE STRING MANIPULATION (str methods)

## String Concatenation

### Simple String Concetanation

In [534]:
'my' + ' ' + 'string'

'my string'

### Advanced String Concetanation with Placeholder Characters

Single string substitution:

In [533]:
"this is a %s" % 'string'

'this is a string'

Multiple string substition:

In [538]:
"%s is a %s" % ('this', 'string')

'this is a string'

String and integer substitution:

In [539]:
"this is %s number %i" % ('string', 1)

'this is string number 1'

## Substring Search

### Find First Occurence
**.find()**

***Find a character (get index position of a character) :***

In [503]:
"my string".find('s')

3

***Find multiple characters (get index position of multiple characters):***

In [504]:
"my string".find('ng')

7

***Return -1 if character not found:***

In [511]:
'my string my'.find('x')

-1

### Index
**.index()**

.index() works in the same way with .find() except that it returns a ValueError when the character is not found:

***Return ValueError if character not found:***

In [515]:
try:
    'my string'.index('x')
except Exception as error_message:
    print(error_message)

substring not found


***Find a character (get index position of a character or characters):***

Everything else is as they are in .find():

In [510]:
'my string'.index('s')

3

In [520]:
'my string my'.index('ng')

7

### Find Last Occurrence (Reverse Find)

**.rfind()**

***Find last occurrence of a character:***

In [505]:
'my string my'.rfind('my')

10

### Count Occurrences
**.count()**

In [525]:
number_string = 'aaaXXaaaXXaaa'
number_string.count('XX')

2

# REGULAR EXPRESSIONS

| "Expression"  | Description                        | "\Escape character" | Description  |
| :------------ | :-----------                       | :------------------ | :----------- |
| .a            | a preceded by any letter           | .                   | Dot          |
| ^a            | strings that start with a          | n                   | New line     |
| a$            | strings that end with a            |                     |              |
| [abc]def      | 'adef', 'bdef', 'cef'              |                     |              |
| a│b           | a or b                             |                     |              |
| [0-9]         | integers between 0 and 9           |                     |              |
| [a-m]         | lowercase letters between a and m  |                     |              |
| [0-9][0-9]    | any two-digit number               |                     |              |
| [a-z][0-9]    | any letter-digit pair (e.g., 'a1') |                     |              |
| [0-9]{4}      | any 4-digit pair                   |                     |              |
| [a-z]{10}     | any 10-letter word                 |                     |              |
| \             | Escape character                   |                     |              |

## String Search
**re.search()**


**Usage**: <br>
re.search("REGEX_SEARCH_PATTERN", "TARGET_STRING")

- **Regex search pattern**: Search expression. Accepts regex and escape characters should be used when specifiying the search pattern.
- **Target string**:  The target string in which the regex search will be performed.


In [405]:
# Prepare data
import csv
data_with_header = list(csv.reader(open("test_data//askreddit_2015.csv", encoding="utf8")))
data = data_with_header[1:len(data_with_header)]

### Search for Multiple Versions of a String
**'Alternative characters' enclosure "[ ]"**

In [355]:
import re

of_reddit_count = 0    # Number of times
for row in data:
    if re.search("of [Rr]eddit", row[0]):
        of_reddit_count = of_reddit_count + 1
print(of_reddit_count)

102


###  Use Escape Character to Find a Pattern Containing Special Characters
**Escape character ("\")**

When searching for the pattern, account for capitalization and square brackets.
In the code below, "[" and "]"are escaped with "\[" and "\]" expressions:

In [356]:
serious_count = 0
for row in data:
    if re.search("\[[Ss]erious\]", row[0]):
        serious_count = serious_count + 1
print(serious_count)

77


When searching for the pattern, account for capitalization, square brackets, and parantheses. In the code below, "[\[\(] and [\]\)] "is used to escape to brackets and parantheses:

In [357]:
serious_count = 0
for row in data:
    if re.search("[\[\(][Ss]erious[\]\)]", row[0]):
        serious_count = serious_count + 1
print(serious_count)

80


### Search in the Beginning / End of String
**(^...) | (...$)**

Find expression at the beginning of target string (^...):

In [358]:
serious_start_count = 0
for row in data:
    if re.search("^[\[\(][Ss]erious[\]\)]", row[0]):
        serious_start_count = serious_start_count + 1
print("[1] serious_start_count: " + str(serious_start_count))

[1] serious_start_count: 69


Find expression at the end of target string (...$):

In [359]:
serious_end_count = 0
for row in data:
    if re.search("[\[\(][Ss]erious[\]\)]$", row[0]):
        serious_end_count = serious_end_count + 1
print("[2] serious_end_count: " + str(serious_end_count))

[2] serious_end_count: 11


### Or Expression 
**("|")**

Find questions that start or end with "[Serious]" and its all possible variants with different capitalization and enclosure types.


In [360]:
serious_count_final = 0
for row in data:
    if re.search("^[\[\(][Ss]erious[\]\)]|[\[\(][Ss]erious[\]\)]$", row[0]):
        serious_count_final = serious_count_final + 1
print("[3] serious_count_final: " + str(serious_count_final))

[3] serious_count_final: 80


### Regex Ranges and Search Templates
**[a-z][1-9] | [a]{x}**

*"We can indicate that we're looking for integers in a pattern by using square brackets ("[" and "]"), along with a dash ("-"). For example, "[0-9]" will match any character that falls between 0 and 9 (all of which will be one-digit integers). Similarly, "[a-z]" would match any lowercase letter. We can also specify smaller ranges like "[3-5]" or "[d-g]"*. (from DQ)

Search for 4 digit combinations that are between 1000 and 2999 (i.e., looks like years) using regex expressions:

In [361]:
import re
strings = ['War of 1812', 'There are 5280 feet to a mile', 'Happy New Year 2016!']

strings_with_year = []
for string in strings:
    if re.search("[1-2][0-9][0-9][0-9]", string):
        strings_with_year.append(string)

print("[1] strings: " + str(strings))        
print("[2] strings_with_year: " + str(strings_with_year))

[1] strings: ['War of 1812', 'There are 5280 feet to a mile', 'Happy New Year 2016!']
[2] strings_with_year: ['War of 1812', 'Happy New Year 2016!']


Alternative notation with repeat "{ }":

In [362]:
strings = ['War of 1812', 'There are 5280 feet to a mile', 'Happy New Year 2016!']

strings_with_year = []
for string in strings:
    if re.search("[1-2][0-9]{3}", string): #repeat the last statement 3 times
        strings_with_year.append(string)
print("[1] strings_with_year: " + str(strings_with_year))

[1] strings_with_year: ['War of 1812', 'Happy New Year 2016!']


### Find all Occurences of a String and Return it as a List
**re.findall()**

**Usage**: <br>
re.findall("REGEX", "TARGET_STRING")

In [363]:
years_string = "2015 was a good year, but 2016 will be better!"

re.findall("[1-2][0-9]{3}", years_string)

['2015', '2016']

## String Substitution
**re.sub()**

**Usage**: <br>
changed_string = re.sub("REGEX_SEARCH_PATTERN", "REPLACEMENT_STRING", "TARGET_STRING")

- **Regex search pattern**: Search expression, works the same way with 're.search(). ***Regex (and escape character) should be used*** when specifiying the search pattern.
- **Replacement string**: String to be injected
- **Target string**:  The target string in which the regex search will be performed.



In [364]:
import re
re.sub("Hi", "Hello", "Hi world")

'Hello world'

.sub() method **does not automatically replace the input variable**; it simply returns a subsituted version of the input variable, and this output should be stored in a new variable: 

In [365]:
my_string = "Hi world"
re.sub("Hi", "Hello", my_string)
print("[1] my_string is unchanged: " + str(my_string))

my_string = re.sub("Hi", "Hello", my_string)
print("[2] my_string is changed after variable assignment: " + str(my_string))

[1] my_string is unchanged: Hi world
[2] my_string is changed after variable assignment: Hello world


### Substitution with Advanced Regex Search 

In [366]:
string_list = ["[hi]", "[Hi]", "[hello]", "[Hello]"]

string_list_new = []
for row in string_list:
    string_list_new.append(re.sub("\[[Hh]i\]|\[hello\]", "[Hello]", row))
print("[0] string_list: " + str(string_list_new))

[0] string_list: ['[Hello]', '[Hello]', '[Hello]', '[Hello]']


# ENCODING
**.encode | .decode**

Encoding: Text to code conversion.

In [367]:
"maté".encode("utf8")

b'mat\xc3\xa9'

Decoding: Code to text conversion.

In [368]:
b'mat\xc3\xa9'.decode('utf8')

'maté'

In [369]:
len("maté")

4

In [370]:
len(b'mat\xc3\xa9')

5

In [371]:
try:
    b'mat\xc3\xa9'.decode('ascii')
except Exception:
    print ("The first statement gives an error because it is not possible to find an ascii equivalent of 'é' character\n")

    print(b'mat\xc3\xa9'.decode('ascii', errors="ignore")) # don't print the trouble character.        

The first statement gives an error because it is not possible to find an ascii equivalent of 'é' character

mat


### Bytes and Byte Arrays
**bytes() | bytearray()**

In [372]:
print( bytes    ("maté", encoding="utf8") )
print( bytearray("maté", encoding="utf8") )

print (b"easy bytes creation")

b'mat\xc3\xa9'
bytearray(b'mat\xc3\xa9')
b'easy bytes creation'


Slices of a bytes object or bytearray are also bytes or bytearrays:

In [373]:
print( bytes    ("maté", encoding="utf8")      )
print( bytes    ("maté", encoding="utf8")[:2]  )
print( bytes    ("maté", encoding="utf8")[3:5] )

print("")

print( bytearray("maté", encoding="utf8")[:1]  )
print( bytearray("maté", encoding="utf8")[3:5] )

b'mat\xc3\xa9'
b'ma'
b'\xc3\xa9'

bytearray(b'm')
bytearray(b'\xc3\xa9')


When sliced addressed individually, bytes and bytearray objects return bytes or bytearrays when a range of values is addressed (e.g., via slicing as in previous example). 

When addressed individually, however, they return integers.

In [374]:
print( bytes       ("maté", encoding="utf8")[0] )  # returns an integer
print( bytes       ("maté", encoding="utf8")[4] )  # returns an integer
print( type (bytes ("maté", encoding="utf8")[0]) )

print("")

print( bytearray("maté", encoding="utf8")[0] )        # returns an integer
print( type (bytearray("maté", encoding="utf8")[0]) )

109
169
<class 'int'>

109
<class 'int'>


## String and Integer Methods on Bytes and Byte Arrays

Applying a string method to bytes object:

In [375]:
x = "maté".encode("utf8", errors="ignore") # this is an implicit way to create a byte object (it is created automatically due to presence of "é")
print(x)                                   # one can also write   b"maté"  (with the 'u' prefix for 'bytes').
print(type(x), "\n")

print(x.upper()) # string method applies to bytes type object.
print(type(x))

b'mat\xc3\xa9'
<class 'bytes'> 

b'MAT\xc3\xa9'
<class 'bytes'>


Applying regex to bytes object:

In [376]:
import re

x = "maté".encode("utf8", errors="ignore")
print(type(x), "[0]\n")

try:
    y = re.sub("m","", x) # causes error because of the 'é' character involved. This character needs to taken care of.

except Exception:
    print("Try section did not work.\n")
    # Error from the first try due to character encoding problem: 'cannot use a string pattern on a bytes-like object.' "
    
    print(y) # SHOULD HAVE PRINTED 'mate', not 'ate'. WHY?
    print (type(y), "[1]\n") # THIS SHOWS y AS STRING, WHY?

    y = re.sub("m","", x.decode("utf")) 
    print(y)
    print (type(y), "[2]")

<class 'bytes'> [0]

Try section did not work.

B
<class 'str'> [1]

até
<class 'str'> [2]


Additional reading on encoding:
https://www.safaribooksonline.com/library/view/fluent-python/9781491946237/ch04.html

# WORKING WITH TIME

## 'time' Module

### Unix Timestamps
**time.time()**

A Unix timestamp is a floating point value with no explicit mention of day, month, or year. This value represents the number of seconds that have passed since the "epoch", or the first second of the year 1970. So, a timestamp of 0.0 would represent the epoch, and a timestamp of 60.0 would represent one minute after the epoch. We can represent any date after 1970 this way.

In [377]:
import time

current_time = time.time() # returns numbr of seconds passed since 1970 (current Unix time stamp)
print(current_time)

1515521151.9000075


### Structured Time
**time.gmtime() |time.gmtime().year**  

In [378]:
time.gmtime()

time.struct_time(tm_year=2018, tm_mon=1, tm_mday=9, tm_hour=18, tm_min=5, tm_sec=52, tm_wday=1, tm_yday=9, tm_isdst=0)

In [379]:
time.gmtime().tm_year

2018

## 'datetime' Module

### Datetime Object Creation and Queries
**datetime.datetime()**

**Usage**: <br>
- **Creation**:<br>
datetime.datetime(year = INT, month = INT, ...)


- **Query**:<br>
datetime.datetime(year = INT, month = INT, ...).ATTR

Attributes of 'datetime.datetime' objects:
- year
- month
- day
- hour
- minute
- second
- microsecond

***Create datetime object with specific date:***

In [380]:
import datetime 
datetime.datetime(year = 2010, month = 3, day=3)

datetime.datetime(2010, 3, 3, 0, 0)

In [381]:
data = [
    ["first", datetime.datetime(year = 2010, month = 3, day=3)],
    ["second", datetime.datetime(year = 2011, month = 6, day=12)],
    ["third", datetime.datetime(year = 2011, month = 4, day=30)]
]

data

[['first', datetime.datetime(2010, 3, 3, 0, 0)],
 ['second', datetime.datetime(2011, 6, 12, 0, 0)],
 ['third', datetime.datetime(2011, 4, 30, 0, 0)]]

***Extract day from a specific date:***

In [382]:
datetime.datetime(year = 2010, month = 3, day=3).day

3

***Select cases in dataset:<br>***

In [383]:
print("[1] data:\n" + str(data) + "\n")
print("[2] data[0][1]:\n" + str(data[0][1]) + "\n")

march_count = 0
for row in data:
    if row[1].month == 3:
        march_count += 1
print("[1] march_count: " + str(march_count))

[1] data:
[['first', datetime.datetime(2010, 3, 3, 0, 0)], ['second', datetime.datetime(2011, 6, 12, 0, 0)], ['third', datetime.datetime(2011, 4, 30, 0, 0)]]

[2] data[0][1]:
2010-03-03 00:00:00

[1] march_count: 1


***Get current date-time***:

In [384]:
import datetime
datetime.datetime.now()

datetime.datetime(2018, 1, 9, 19, 5, 57, 732398)

***Get current year***:

In [385]:
datetime.datetime.now().year

2018

### Calculating Time Differences with 'timedelta' Class
**datetime.timedelta()**

For calculating differences between dates.

**Usage**:
    - **Creation**: <br> 
    DIFF_VAR = datetime.timedelta(years=INT, months=INT, days=INT, ...)
    - **Calculaton**: <br>
    CALCULATED_TIME = DATETIME_OBJ +- DIFF_VAR 

Attributes of datetime.timedelta() class:
- years [?]
- months [?]
- weeks
- days
- hours
- minutes
- seconds
- milliseconds
- microseconds

***Find difference between today and another date:***

In [386]:
import datetime

today = datetime.datetime.now()
diff = datetime.timedelta(weeks = 5, days = 2)
result = today + diff

print(result)

2018-02-15 19:06:01.068562


### Building Strings that Express Dates with 'strftime' Class
datetime.datetime.strftime()

**Meaning:**<br>
strftime: STRing-Format-Time

**Usage:**<br>
DATETIME_OBJECT.strftime(STRING_TEMPLATE)

***Transform datetime object to custom formatted string:***

In [387]:
import datetime
datetime.datetime(2015, 12, 31, 0, 0).strftime("%I:%M%p on %A %B %d, %Y")

'12:00AM on Thursday December 31, 2015'

Full list of string arguments for datetime: <br> https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

***Shorten long microseconds:***

In [388]:
long_microseconds = datetime.datetime(2015, 12, 31, 23, 59, 12, 999999).strftime("%f")
print("long", long_microseconds)

short_microseconds = datetime.datetime(2015, 12, 31, 23, 59, 12, 999999).strftime("%f")[:-3]
print("short", short_microseconds)

long 999999
short 999


### Parsing a 'datetime' Object from a String
**datetime.datetime.strptime()**

**Meaning:**<br>
strptime: STRing-Parse-TIME

**Usage:**<br>
     datetime.datetime.strptime("STRING", "TEMPLATE")

In [389]:
datetime.datetime.strptime("Mar 03, 2010", "%b %d, %Y")

datetime.datetime(2010, 3, 3, 0, 0)

### Conversion from Unix Timestamp to 'datetime' Object 
**datetime.datetime.fromtimestamp()**

**Usage:**<br>
datetime.datetime.fromtimestamp(FLOAT_UNIX_TIME_STAMP)

In [390]:
datetime.datetime.fromtimestamp(1433213314.0)

datetime.datetime(2015, 6, 2, 4, 48, 34)

In [460]:
import csv
import datetime
data = list(csv.reader(open("test_data//askreddit_2015.csv", encoding="utf8")))

data = data[1:len(data)]
print("[1] an example time stamp in the 'data' is: " + str(data[0][2]) + "\n")


for row in data:
    row[2] = float(row[2])
    row[2] = datetime.datetime.fromtimestamp(row[2])

print("[2] an example time stamp in the updated data is: " + str(data[0][2]) + "\n")
print("[3] updated data is: \n" + str(data[0:2]) + "\n" + "..." + "\n")

[1] an example time stamp in the 'data' is: 1433213314.0

[2] an example time stamp in the updated data is: 2015-06-02 04:48:34

[3] updated data is: 
[['What\'s your internet "white whale", something you\'ve been searching for years to find with no luck?', '11510', datetime.datetime(2015, 6, 2, 4, 48, 34), '1', '26195'], ["What's your favorite video that is 10 seconds or less?", '8656', datetime.datetime(2015, 6, 13, 16, 25, 17), '4', '8479']]
...



# NUMPY 

## Importing Data into a numpy Array


- A numpy array can only contain one type of variable (e.g., str or int)
- While creating arrays, numpy assumes that data consists of floating point values (float). Thus, it shoul be specified what the data type (dtype) is.
    - There are codes for this:
        - 'Float64'   : 64-bit floating-point number
        - 'uint32'    : 32-bit unsigned integer 
        - 'U75'       : 75 byte unicode data type
        - (str, 35)   : 35-character string
        - ('U', 10)   : 10-character unicode string

In [463]:
import numpy

my_data = numpy.genfromtxt("test_data//world_alcohol.csv", delimiter = "," , dtype="U75", skip_header=1)

In [464]:
print(my_data)

[['1986' 'Western Pacific' 'Viet Nam' 'Wine' '0']
 ['1986' 'Americas' 'Uruguay' 'Other' '0.5']
 ['1985' 'Africa' "Cte d'Ivoire" 'Wine' '1.62']
 ..., 
 ['1986' 'Europe' 'Switzerland' 'Spirits' '2.54']
 ['1987' 'Western Pacific' 'Papua New Guinea' 'Other' '0']
 ['1986' 'Africa' 'Swaziland' 'Other' '5.15']]


## Creating numpy Arrays

In [465]:
my_vector = numpy.array([10, 20, 30])
my_matrix = numpy.array([[1, 2, 3], [10, 20, 30]])

print(my_vector)
print("")
print(my_matrix)

[10 20 30]

[[ 1  2  3]
 [10 20 30]]


## Queries on a numpy Array

### Array Dimensions
**.shape**

In [466]:
my_vector = numpy.array([10, 20, 30])
my_matrix = numpy.array([[1, 2, 3], [10, 20, 30]])

print(my_vector.shape)
print(my_matrix.shape)

(3,)
(2, 3)


### Array Type
**.dtype**

In [467]:
my_matrix.dtype

dtype('int32')

## Indexing and Slicing numpy Arrays
**[x,y] | [:, y] | [a:b,x:y] | [:,x:y]**

See [Indexing with numpy](#indexing-with-numpy) section.

# Docstrings

A good and compehensive docstring usage example from https://thomas-cokelaer.info/tutorials/sphinx/docstring_python.html 

In [468]:
"""This module illustrates how to write your docstring in OpenAlea
and other projects related to OpenAlea."""

__license__ = "Cecill-C"
__revision__ = " $Id: actor.py 1586 2009-01-30 15:56:25Z cokelaer $ "
__docformat__ = 'reStructuredText'


class MainClass1(object):
    """This class docstring shows how to use sphinx and rst syntax

    The first line is brief explanation, which may be completed with 
    a longer one. For instance to discuss about its methods. The only
    method here is :func:`function1`'s. The main idea is to document
    the class and methods's arguments with 

    - **parameters**, **types**, **return** and **return types**::

          :param arg1: description
          :param arg2: description
          :type arg1: type description
          :type arg1: type description
          :return: return description
          :rtype: the return type description

    - and to provide sections such as **Example** using the double commas syntax::

          :Example:

          followed by a blank line !

      which appears as follow:

      :Example:

      followed by a blank line

    - Finally special sections such as **See Also**, **Warnings**, **Notes**
      use the sphinx syntax (*paragraph directives*)::

          .. seealso:: blabla
          .. warnings also:: blabla
          .. note:: blabla
          .. todo:: blabla

    .. note::
        There are many other Info fields but they may be redundant:
            * param, parameter, arg, argument, key, keyword: Description of a
              parameter.
            * type: Type of a parameter.
            * raises, raise, except, exception: That (and when) a specific
              exception is raised.
            * var, ivar, cvar: Description of a variable.
            * returns, return: Description of the return value.
            * rtype: Return type.

    .. note::
        There are many other directives such as versionadded, versionchanged,
        rubric, centered, ... See the sphinx documentation for more details.

    Here below is the results of the :func:`function1` docstring.

    """

    def function1(self, arg1, arg2, arg3):
        """returns (arg1 / arg2) + arg3

        This is a longer explanation, which may include math with latex syntax
        :math:`\\alpha`.
        Then, you need to provide optional subsection in this order (just to be
        consistent and have a uniform documentation. Nothing prevent you to
        switch the order):

          - parameters using ``:param <name>: <description>``
          - type of the parameters ``:type <name>: <description>``
          - returns using ``:returns: <description>``
          - examples (doctest)
          - seealso using ``.. seealso:: text``
          - notes using ``.. note:: text``
          - warning using ``.. warning:: text``
          - todo ``.. todo:: text``

        **Advantages**:
         - Uses sphinx markups, which will certainly be improved in future
           version
         - Nice HTML output with the See Also, Note, Warnings directives


        **Drawbacks**:
         - Just looking at the docstring, the parameter, type and  return
           sections do not appear nicely

        :param arg1: the first value
        :param arg2: the first value
        :param arg3: the first value
        :type arg1: int, float,...
        :type arg2: int, float,...
        :type arg3: int, float,...
        :returns: arg1/arg2 +arg3
        :rtype: int, float

        :Example:

        >>> import template
        >>> a = template.MainClass1()
        >>> a.function1(1,1,1)
        2

        .. note:: can be useful to emphasize
            important feature
        .. seealso:: :class:`MainClass2`
        .. warning:: arg2 must be non-zero.
        .. todo:: check that arg2 is non zero.
        """
        return arg1/arg2 + arg3




if __name__ == "__main__":
    import doctest
    doctest.testmod()

**********************************************************************
File "__main__", line 104, in __main__.MainClass1.function1
Failed example:
    import template
Exception raised:
    Traceback (most recent call last):
      File "C:\ProgramData\Anaconda3\lib\doctest.py", line 1330, in __run
        compileflags, 1), test.globs)
      File "<doctest __main__.MainClass1.function1[0]>", line 1, in <module>
        import template
    ModuleNotFoundError: No module named 'template'
**********************************************************************
File "__main__", line 105, in __main__.MainClass1.function1
Failed example:
    a = template.MainClass1()
Exception raised:
    Traceback (most recent call last):
      File "C:\ProgramData\Anaconda3\lib\doctest.py", line 1330, in __run
        compileflags, 1), test.globs)
      File "<doctest __main__.MainClass1.function1[1]>", line 1, in <module>
        a = template.MainClass1()
    NameError: name 'template' is not defined
********

# pyCharm 

- ALT+SHIFT+E (Run in console)
- CTRL+Lclick a variable or function to go to its first relevant occurence (if element clicked is placed later than the original), or see all cases where a variable is used (if element clicked is the first occurence)
- Set multiple cursors in the editor area: Alt + Mouse Click. Note that on some systems you also have to use Shift with the shortcuts mentioned.
- Select/unselect the next occurrence: Alt + J / Shift + Alt + J 
- Select all occurrences: Shift + Ctrl + Alt + J 
- Live templates
- Installing packages: Project interpreter

- ALT + Enter: Apply suggested fix

# TO BE ADDED

In [184]:
----
NaN 
float("NaN")
decimal("NaN")


NaT
import pandas
pandas.NaT

float("NaN") == float("NaN")
import pandas
pandas.NaT == pandas.NaT
----

SyntaxError: invalid syntax (<ipython-input-184-65a4d5a0d340>, line 1)

In [185]:
for i, each_bed_plan, each_bed_time in zip(range(0, len(bed_times)), bed_plans, bed_times):

SyntaxError: unexpected EOF while parsing (<ipython-input-185-a8b08ca65b2f>, line 1)