# IEMS 308 Lab Session 1 
## TA: Andrea Treviño Gavito 
### January 13th, 2020.

## Python Basics 

When switching from R to Python, there are two major differences to begin with:

1. Indexing: Indexes begin from 0 instead of 1. Python also supports negative indexing, i.e. indexing from the rear.
2. Indentation: Demarcates blocks instead of curly brackets { }. For example, nested for loops

In [None]:
for i in range(10):
    for j in range(3):
        if i<j:
            print("%d < %d" %(i,j))

There are primarily three data-structures which hold collections of objects: lists, tuples and dictionaries.

In [None]:
# lists are quite similar to the ones in R: can hold different types
l = [1,2.0,3.14e-5,'What?',9.99]  
print(l)

# One useful method - append (to the end)
l.append(34)
l.append([4,5,6])
print("\nList after append operations:")
print(l)

# Negative indexing
print(l[-2])

In [None]:
# Tuples are immnutable versions of lists - once created can't add or modify
tup1 = (1,2.0,3.14e-5,'What?',9.99)
tup1[3] = 0.4e-4 # returns a TypeError

# Useful as arguments or return values of functions

In [None]:
# Dictionaries are used to map or associate things you want to store and the keys you need to get them.
# They can be created by either { } or dict and are defined into two elements: Keys and Values.
# Keys will be a single element (numeric or string).
# Values can be a list or list within a list, numbers, etc.
# There is no order in which elements are stored.

d = {'list1':l,'tup1':tup1,'color':'orange','temperature':-30, 'example': 1234}


# Adding new key-value pair
d['zzz'] = "sleeping"

# Deleting a value from the dictionary - access it by its key
del d['example']

print(d)

In [None]:
# Generator - Special kind of function that return an iterator. These are objects that you can loop over like a list

# Dictionaries have some built-in generators to be used in for loops
d.items() # returns list of key-value pairs
d.keys() # returns list of keys
d.values() # returns list of values

# Example Usage in for loop
for key,value in d.items():
    print((key,value))

In [None]:
# Overwrite a dictionary
d = {0: [0,1,2], 1.2 :[], '2': 2, 3.2: 3}
print(d)

# Other sintax to create a dictionary
d = dict()  # equivalent to d = {}
d['a'] = 'alpha'
d['g'] = 'gamma'
d['o'] = 'omega'
print(d)

In [None]:
# Shorthand for operations - access objects using 'enumerate'(adds a counter to the iterator objects)
l2 = [(i,z) for i,z in enumerate(l)] 
print(l2)

### Functions

Functions in python have the following syntax:

```python
def function_name(arg1,arg2,...,argn):
    # some operations
    return out1,out2,...,outm
```

**Exercise**: Complete the following function to return the minimum element of a list and its index within the list. Test it on the given list 

In [None]:
def min_list(x):
    '''
    Returns the minimum and its index in the list-like object
    
    Input:
    ---------
    x: list-like object
        
    Output:
    ---------
    min_x: The minimum value in x
    min_index: Index of the minimum value in x
    '''
    
    min_x = x[0]
    min_index = 0
    
    ## Loop through the elements to find the minimum
    #### Your code goes here ######
    
    return min_x, min_index

# Test list
x = [4,9,1,2,2,7,3,5,6,8]
min_list(x)

Notice the content within the ''' ''' block. This is called the function's docstring and is a convenient way of documenting functions and returning help files for the user. To access the docstring, use either the `python .__doc__` attribute of the function,

In [None]:
print(min_list.__doc__)

or use the `help` function.

In [None]:
help(min_list)

Docstrings are also used for documenting class methods as show in the next example.

### Classes and Objects

Classes and objects are basic concepts of Object Oriented Programming Class. A class is a user defined prototype from which objects are created.
Example: in a bank's system you might have a "customer" class, where all its attributes like transaction details, withdrawal and deposit details, outstanding debt, etc. would be listed out.

In Python, classes are defined by the 'class' keyword, and inside classes, you can define functions or methods that are part of this class.

The 'self'-argument refers to the object itself. Inside this method, 'self' will refer to the specific instance of this object that's being operated on. By convention, 'self' is  used to indicate the first parameter of instance methods in Python.

In [None]:
class myClass:
    def myfunction(self):
        print("Hello world.")
        

To create an instance of our class:

In [None]:
c = myClass()
c.myfunction()   # Note that when calling functions, we do not need to provide the 'self' attribute again.

A constructor is a class function that instantiates an object to predefined values.
By convention, the constructor function's name is "[double underscore]init[double underscore]".

In [None]:
class User:
    def __init__(self, name):
        self.name = name     # Assign provided 'name' to this instance's 'name' attribute.

    def welcome(self):
        print("Welcome to IEMS 308, " + self.name)

User1 = User("Alex")
User1.welcome()

Let's check out a more complex example:

In [None]:
# Example of class and object

class Rectangle:  
    def __init__(self, length, height):  
        ''''' 
        We write in here what we need to define the object. 
        Inside the class, we use the defined self-variables as "self.NameOfVariable" 
        '''  
        self.length = length  
        self.height = height  
    def area(self):  # Self-function: Only self-variables required
        ''''' 
        Remember to always specify "self" in our class functions
        '''  
        rec_area=self.length*self.height  
        return rec_area  
    def horizontal_Concatenation(self, other_rectangle):  
        '''
        Function that implies the current and another object.  
        It creates a new rectangle by merging the current and another rectangle.
        As it is a Horizontal concatenation we need equal heights.
        Input: 
        ----- 
        self: We always have to specify that 
        other_rectangle: The other rectangle we want to merge 
         
        Output: 
        ------ 
        new_rectangle: The merged rectangle '''    
        if (self.height != other_rectangle.height):  
            print('Ups! I cant do that...Different heights!')  
            new_rectangle = 'Error'  
        else:  
            new_rectangle = Rectangle(self.length+other_rectangle.length, self.height)  
        return new_rectangle  
          
    def vertical_Concatenation(self,other_rectangle):  
        ''' 
        Function that implies the current and another object. 
        It creates a new rectangle by merging the current and another rectangle.
        As it is a vertical concatenation we need equal lengths.
 
        Input: 
        ----- 
        self: We always have to specify that 
        Other_rectangle: The other rectangle we want to merge 
        
        Output: 
        ------ 
        new_rectangle: The merged rectangle 
        '''          
        if self.length != other_rectangle.length:  
            print('Agh! I cant do that...Different lengths!')  
            new_rectangle = 'Error'  
        else:  
            new_rectangle = Rectangle(self.length, self.height+other_rectangle.height)  
        return new_rectangle    


a=Rectangle(2,2)  
b=Rectangle(2,1)  
c = a.vertical_Concatenation(b)  
print(c.area())  


The best practice is to include a global function named 'main', which is the designated start of your program. The syntax to call your 'main' function is shown below:

In [None]:
class User:
    def __init__(self, name):
        self.name = name     # Assign provided 'name' to this instance's 'name' attribute.

    def welcome(self):
        print("Welcome to IEMS 308, " + self.name)

def main():
    User1 = User("Alex")
    User1.welcome()
    
if __name__== "__main__":
  main()

### Importing Libraries

In [None]:
# Import libraries
import os  # operating system dependent functionality. 
import numpy as np # support for large, multi-dimensional arrays and matrices, and mathematical functions.
import matplotlib.pyplot as plt  #plotting

# import specific submodules/functions
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score 

# needed if using jupyter notebook
%matplotlib inline

### Working directory (Get and set)

In [None]:
# Print current working directory
print(os.getcwd())

In [None]:
# Change working directory
# Path can be relative or absolute 
# Note: Windows users need to use escape sequences for
#       absolute paths
os.chdir(<path>)

# Print new working directory
### YOUR CODE GOES HERE

In [None]:
# Change working directory back to the original one
### YOUR CODE GOES HERE

## NumPy 

**Question**: How would you store vectors and matrices using Python built-in types?

**Answer**: Lists

In [None]:
# create a vector
a = [1,2,3,4]

# create a Matrix
A = [[1,2,4],[2,4,5],[9,0,1]]

NumPy (short for Numerical Python) provides an easy and efficient way to deal with vectors, matrices and multi-dimensional arrays. 

In [None]:
import numpy as np
np.__version__

### Creating arrays

#### From lists

In [None]:
np.array([1,2,3,4]) # 1D array

In [None]:
# 2-D array
A = np.array([[1,2,4],[2,4,5],[9,0,1]])

If we want to explicitly set the data type of the resulting array, we can use the `dtype` keyword:

In [None]:
np.array([1,2,3,4],dtype="float32")

#### Generating arrays from scratch

In [None]:
# Initalize 4x6 matrix of ones
np.ones((4,6))

In [None]:
# Integer-vectors of zeros
np.zeros(10,dtype="int")

In [None]:
# First 10 non-negative integers
np.arange(10)

In [None]:
# Sequences of numbers between -1 and 1 spaced by 0.4
np.arange(-1,1,0.4)

In [None]:
# 1d array of uniformly distributed random variables
np.random.seed(1245) # set seed for reproducibility
np.random.rand(3)

In [None]:
# 2d array of standard normal distributed variables
np.random.seed(1245)
np.random.randn(3,5)

In [None]:
# 3d array of integers
np.random.seed(1245)
np.random.randint(10, size=(3, 4, 5))

**Exercise**: Generate a 1darray (size 6) and 2darray (dimensions 5x4) of uniformly distributed rvs using the randn function. Use seed 678. Denote these arrays by x and x2 respectively.

In [None]:
# Arrays to be used for demonstration
### YOUR CODE GOES HERE

### Array attributes

In [None]:
# Basic array attributes
print("Array ndim: ",x.ndim)
print("Array shape: ",x.shape)
print("Array size: ",x.size)
print("Array dtype: ",x.dtype)

In [None]:
# Basic array attributes
print("Array ndim: ",x2.ndim)
print("Array shape: ",x2.shape)
print("Array size: ",x2.size)
print("Array dtype: ",x2.dtype)

### Array subsetting

In [None]:
# Single elements on a 1d array
# Print the first, last and the element at index 2
# Remember that the index begins at 0
#### YOUR CODE GOES HERE

In [None]:
# Single elements on a 2d array
# Need two indices
# Print the element in the last row and the last-but-one column
### YOUR CODE GOES HERE

#### Slices and subarrays

The syntax for a 1d array is as follows:

```python
x[start:stop:step]
```

If not provided, the default values are start = 0, stop = size of dimension, step = 1. Note that stop is exclusive, while start is inclusive.

This is similar for multi-dimensional arrays with multiple slices separated by commas.

**Exercise**: For the 1d array, do the following:

1. subset elements from index 2 to index 4, both included
2. subset all elements except the first
3. subset all elements except the last
4. subset every other element

In [None]:
### YOUR CODE GOES HERE

**Exercise**: For the 2d array, do the following:

1. subset first to third rows and first and second columns
2. assign 1 to all elements of the third column (column 2)
3. subset the last-but-one row

In [None]:
### YOUR CODE GOES HERE

#### Logical subsetting

In [None]:
# subset only non-negative elements
x[x>0]

### Basic math and broadcasting

Basic arithmetic operators work on a element-by-element basis. For example:

In [None]:
np.array([2.,4.,0.5]) * np.array([3,4,5])

NumPy has a feature called **broadcasting** which allows these operations to be performed on arrays with different shapes. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. 

**General broadcasting rules:** (from the NumPy documentation)

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

1. they are equal, or
2. one of them is 1

If these conditions are not met, a "ValueError: frames are not aligned" exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input arrays.

For a more detailed explanation and examples of compatiable, refer https://docs.scipy.org/doc/numpy-1.15.0/user/basics.broadcasting.html

In [None]:
# Application: centering columns of 2-d array by a 1-d array
np.random.seed(1245)
x = np.random.rand(5,3)

print(x)
print(x+np.array([3,4,5]))

###  Common array methods

Some examples:

| Method call            | Function alternative      | Description                                                 |
|------------------------|---------------------------|-------------------------------------------------------------|
| x.min()                | np.amin(x)                | Minimum of all elements in array                            |
| x.sum()                | np.sum(x)                 | Sum of all elements in array                                |
| x.mean(axis=0)         | np.mean(x,axis=0)         | Mean of elements across axis 0 (if 2d array - column means) |
| x.sort(axis=1)         | np.sort(x,axis=1)         | Sort array across axis 1 in increasing order                |
| x.std(axis=0)          | np.std(x,axis=0)          | Standard deviation of elements across axis 0                |
| x.max(axis=1)          | np.amax(x,axis=1)         | Maximum of elements across axis 1                        |
| x.argmax()             | np.argmax(x)              | Returns the indices of the maximum values                   |
| x.reshape(<new_shape>) | np.reshape(x,<new_shape>) | Reshape array according to new shape                        |

**Exercise**: Write a function named `normalize` which accepts a 1d or a 2d array and then performs the following operation:

- if the input is a 1d array, center it by its mean and then scale by its standard deviation
- if the input is a 2d array, center and scale each column by its respective mean and standard deviation

The function should return three arguments: the normalized array (x_normalized), centering factor (loc) and scale factor (scale).

In [None]:
# input arrays to test on
np.random.seed(234)
x = 4 + 2*np.random.randn(10,1)
x2 = np.array([2,5,3]) + np.array([2,1,3])*np.random.rand(10,3)

### YOUR CODE GOES HERE

## Load input from files 

In [None]:
import csv
iris_list = []

# You can use the "with open()" syntax for any file - just remember to replace csv.reader
# with the corresponding file parser. We will be using this syntax for saving
# model data in the later labs
with open("../data/iris.csv") as f:
    csv_reader = csv.reader(f,delimiter=',')
    # extract header and array data
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            col_names = row
        else:
            iris_list.append(row)
        line_count += 1

iris = np.array(iris_list,dtype=float)

print("Column names:")
print(col_names)
print("\nFirst 5 rows:")
print(iris[range(5),:])
        

In [None]:
# Using numpy without returning column names
iris = np.genfromtxt("../data/iris.csv",delimiter=",",names=None,skip_header=1)
print(iris[range(5),:])

## Plots

matplotlib is a widely used plotting library. Its submodule pyplot is used for 2d plots.

In [None]:
# uncomment the following lines if you haven't imported pyplot before
# import matplotlib.pyplot as plt
# %matplotlib inline

# Data for scatter plot demonstration
np.random.seed(3456)
x = np.random.randn(100)
y = np.random.randn(100) + 0.01*x
c = 1*(x > 0) + 1*(y>0) # color labels for the points

# Scatter plot
f = plt.figure()
plt.scatter(x,y,c=c);
# Axis labels
plt.xlabel("X-label")
plt.ylabel("Y-label")
plt.show()

In [None]:
# Data for line plot demonstration
x = np.array([0,2,3,4,5,10,15,20,27,40])
y1 = np.exp(-0.3-0.5*x)
y2 = np.exp(-0.5-0.2*x)

# Adjust figure size
plt.rcParams['figure.figsize']  = [6,4]

# Line plot with two different series
f = plt.figure()
plt.plot(x,y1,"-ro",label="Red")
plt.plot(x,y2,"-bo",label="Blue")
# Axis labels
plt.xlabel("X-label")
plt.ylabel("Y-label")
# Legend - loc = 0:best, 1:upperight, 2:upperleft, 3:lowerleft, 4:lowerright, 5:right  
plt.legend(loc = 0)
plt.show()
f.savefig("test.png")

**Exercise**: Do a scatter plot of the first two columns of the dataset and color code the points using the last column. The first two columns are 'Sepal_length' and 'Sepal_width'.

In [None]:
### YOUR CODE GOES HERE

 For a more detailed tutorial on pyplot, check https://matplotlib.org/users/pyplot_tutorial.html