# Working with the Class System in Python 

## Chapter 1: Getting ready for object-oriented programming

### Intro to Object Oriented Programming in Python

#### What's Object-Oriented Programming? (OOP)
* A way to build flexible, reproducible code
* Developing building blocks to developing more advanced modules and libraries

#### Imperative Style and OOP Style

* **Imperative**

In [1]:
our_list = [1, 2, 3]

for item in our_list:
    print(f"Item {item}")

Item 1
Item 2
Item 3


* **OOP**

In [2]:
class PrintList:
    
    def __init__(self, numberlist):
        self.numberlist = numberlist
        
    def print_list(self):
        for item in self.numberlist:
            print(f"Item {item}")
            
A = PrintList([1, 2, 3])
A.print_list()

Item 1
Item 2
Item 3


### Introduction to NumPy Internals

#### What's NumPy?
NumPy is a package for scientific computing in Python.
* Uses matrices and vectors as data structure
* Perfect for data science, where data is laid out in table-like formats

#### NumPy Array example
Example:

In [3]:
import numpy as np

our_array = np.array([2,3,4])
print(our_array)

[2 3 4]


In [4]:
print(type(our_array))

<class 'numpy.ndarray'>


#### Creating Multi-Dimensional Arrays
Example 1:

In [5]:
np.array([[0, 1, 2, 3, 4],
           [5, 6, 7, 8, 9],
           [10, 11, 12, 13, 14]])

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

Example 2:

In [6]:
np.array([6, 7, 8])

array([6, 7, 8])

### Introduction to Objects and Classes

#### What is a class?

A reusable chunk of code that has methods and variables.

#### OOP Vocabulary

Imperative    | OOP
--------------|--------
Variable      | Attribute/Field
Function      | Method

#### A Class is a template for an object

Class --> Object

Think of classes as cookiecutters and objects as the actual cookie

#### Declaring a Class
Declaring a class

In [7]:
class Dinosaur:
    pass

In [8]:
# Used in Python 3, with/without parentheses
class Dinosaur():
    pass

# Used in Python 2
class Dinosaur(object):
    pass

An object is an instance of a class.

## Chapter 2: Deep dive into classes and objects

### Intro to Classes

#### Working with DataFrames

#### Introducing the DataShell
From our DataShell, we'll build towards a DataFrame class that takes a file of data as input and gives us back some information about it.

#### Full Class
The basic functionality of this class is to take in a CSV filename, create a numpy array from each one of the rows and perform changes on the numpy array.

* Note: Methods are class functions

In [9]:
class DataShell:
    def __init__(self, filename):
        self.filename = filename
        
    def create_datashell(self):
        self.array = np.genfromtxt(self.filename, delimiter=',', dtype=None)
        return self.array
    
    def rename_column(self, old_colname, new_colname):
        for index, value in enumerate(self.array[0]):
            if value == old_colname.encode('UTF-8'):
                self.array[0][index] = new_colname
        return self.array
    
    def show_shell(self):
        print(self.array)
        
    def five_figure_summary(self, col_pos):
        statistics = stats.describe(self.array[1:,col_pos].astype(np.float))
        return f"Five-figure stats of column {col_pos}: {statistics}"

#### Parts of Class in Detail

**Review:**  The basic features of a class in Python are:
* Constructors
* Attributes
* Methods

In [10]:
class DataShell:
    
    # This is the constructor for the class
    def __init__(self, filename): # `filename` being passed here is a class variable (or attribute)
        self.filename = filename
        
    # This is a method    
    def create_datashell(self):
        self.array = np.genfromtxt(self.filename, delimiter=',', dtype=None)
        return self.array
    
    # This is another method
    def rename_column(self, old_colname, new_colname):
        for index, value in enumerate(self.array[0]):
            if value == old_colname.encode('UTF-8'):
                self.array[0][index] = new_colname
        return self.array

#### How to Call the Class

In [11]:
our_data_shell = DataShell('mtcars.csv')
our_data_shell

<__main__.DataShell at 0x10a8afcd0>

### Initializing a Class and Self

#### Understanding Constructors with Init
Empty Constructor

In [12]:
class Dinosaur:
    
    def __init__(self):
        pass

Constructor with Attributes

In [13]:
class Dinosaur:
    
    def __init__(self):
        self.tail = 'Yes'

#### Init and Our DataShell

In [17]:
# Modeled on Pandas read_csv
import pandas as pd

pd.read_csv('datasets/mtcars.csv').head()

Unnamed: 0,model,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2


Creating the dataShell with a Constructor

In [18]:
class DataShell:
    
    def __init__(self, filename):
        self.filename = filename

#### Understanding Self

* Self represents the instance of the class, or the specific object
* *Review:* An object is an instance of a class
* That object needs a way to reference that instance
* The first variable is always a reference to the current instance of the class
* In this case, the instance of the class is the class itself, so we put `self`

In [19]:
class DataShell:
    
    def __init__(self, filename):
        self.filename = filename

Initializing the Car DataShell

In [20]:
car_data_shell = DataShell('mtcars.csv')

In [None]:
    def __init__(car_data_shell, 'mtcars.csv')
        self.filename = filename

Initializing the ForestFire DataShell

In [22]:
forest_fires_data_shell = DataShell('forestfires.csv')

In [None]:
    def __init__(forest_fires_data_shell, 'forestfires.csv')
        self.filename = filename

#### Self is not a Python keyword but we use it like one

In [23]:
# Printing out the keyword list
import keyword
print(keyword.kwlist)

['False', 'None', 'True', 'and', 'as', 'assert', 'async', 'await', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']


In [24]:
# Using this as an object reference
def __init__(this, filename):
    this.filename = filename

#### Example 1 from exercises

In [25]:
# Create class: DataShell
class DataShell:
  
    # Initialize class with self and integerInput arguments
    def __init__(self, integerInput):
      
        # Set data as instance variable, and assign the value of integerInput
        self.data = integerInput

# Declare variable x with value of 10
x = 10      

# Instantiate DataShell passing x as argument: my_data_shell
my_data_shell = DataShell(x)

# Print my_data_shell
print(my_data_shell.data)

10


#### Example 2 from exercises

In [26]:
# Create class: DataShell
class DataShell:
  
    # Initialize class with self, identifier and data arguments
    def __init__(self, identifier, data):
      
        # Set identifier and data as instance variables, assigning value of input arguments
        self.identifier = identifier
        self.data = data

# Declare variable x with value of 100, and y with list of integers from 1 to 5
x = 100
y = [1, 2, 3, 4, 5]

# Instantiate DataShell passing x and y as arguments: my_data_shell
my_data_shell = DataShell(x, y)

# Print my_data_shell.identifier
print(my_data_shell.identifier)

# Print my_data_shell.data
print(my_data_shell.data)

100
[1, 2, 3, 4, 5]


### More on Self and Passing in Variables

#### Class Variables

In [27]:
# Our Dinasaur class

class Dinosaur():
    
    eyes = 2  # This is a static variable -- aka a variable that doesn't change   

    def __init__(self, teeth):  # This is an instance variable -- b/c we are passing it in when we construct the class
        self.teeth = teeth

In [28]:
# Building a Stegosaurus
stegosaurus = Dinosaur(40)
stegosaurus.teeth

40

In [29]:
stegosaurus.eyes

2

#### Instance Variables

In [30]:
Triceratops = Dinosaur(5)
Triceratops.teeth

5

In [31]:
Triceratops.eyes

2

#### Passing in parameters to objects

In [32]:
class DataShell(object):
    
    def __init__(self, filename):
        self.filename = filename

Results:

In [33]:
my_data_shell = DataShell('mtcars.csv')
print(my_data_shell.filename)

mtcars.csv


#### Example 1 from exercises

In [34]:
# Create class: DataShell
class DataShell:
  
    # Declare a class variable family, and assign value of "DataShell"
    family = "DataShell"
    
    # Initialize class with self, identifier arguments
    def __init__(self, identifier):
      
        # Set identifier as instance variable of input argument
        self.identifier = identifier

# Declare variable x with value of 100
x = 100

# Instantiate DataShell passing x as argument: my_data_shell
my_data_shell = DataShell(x)

# Print my_data_shell class variable family
print(my_data_shell.family)

DataShell


#### Example 2 from exercises: Overriding class variables

In [41]:
# Create class: DataShell
class DataShell:
  
    # Declare a class variable family, and assign value of "DataShell"
    family = "DataShell"
    
    # Initialize class with self, identifier arguments
    def __init__(self, identifier):
      
        # Set identifier as instance variables, assigning value of input arguments
        self.identifier = identifier

# Declare variable x with value of 100
x = 100

# Instantiate DataShell passing x as the argument: my_data_shell
my_data_shell = DataShell(x)

# Print my_data_shell class variable family
print(my_data_shell.family)

# Override the my_data_shell.family value with "NotDataShell"
my_data_shell.family = "NotDataShell"

# Print my_data_shell class variable family once again
print(my_data_shell.family)

DataShell
NotDataShell


### Methods in Classes

#### Methods

In [42]:
class DataShell:
    
    # init method
    def __init__(self, filename): # `filename` being passed here is a class variable (or attribute)
        self.filename = filename
        
    # create_datashell method   
    def create_datashell(self):
        self.array = np.genfromtxt(self.filename, delimiter=',', dtype=None)
        return self.array
    
    # rename_column method
    def rename_column(self, old_colname, new_colname):
        for index, value in enumerate(self.array[0]):
            if value == old_colname.encode('UTF-8'):
                self.array[0][index] = new_colname
        return self.array

#### Initializing Methods in Classes

In [37]:
    def create_datashell(self): # Name method, declare with `self` that it's part of our class
        self.array = np.genfromtxt(self.filename, delimiter=',', dtype=None)  # converts file object to an array
        return self.array

#### Methods with other parameters

In [38]:
    def rename_column(self, old_colname, new_colname):
        
        for index, value in enumerate(self.array[0]):
            if value == old_colname.encode('UTF-8'):  # If column name is equal to the old column name
                self.array[0][index] = new_colname  # Update it with the new column name
                
        return self.array

#### How to call methods

In [46]:
myDatashell = DataShell('datasets/mtcars.csv')

In [49]:
# Calling without passing in parameters
myDatashell.create_datashell()

  if __name__ == '__main__':


array([[b'model', b'mpg', b'cyl', b'disp', b'hp', b'drat', b'wt',
        b'qsec', b'vs', b'am', b'gear', b'carb'],
       [b'Mazda RX4', b'21', b'6', b'160', b'110', b'3.9', b'2.62',
        b'16.46', b'0', b'1', b'4', b'4'],
       [b'Mazda RX4 Wag', b'21', b'6', b'160', b'110', b'3.9', b'2.875',
        b'17.02', b'0', b'1', b'4', b'4'],
       [b'Datsun 710', b'22.8', b'4', b'108', b'93', b'3.85', b'2.32',
        b'18.61', b'1', b'1', b'4', b'1'],
       [b'Hornet 4 Drive', b'21.4', b'6', b'258', b'110', b'3.08',
        b'3.215', b'19.44', b'1', b'0', b'3', b'1'],
       [b'Hornet Sportabout', b'18.7', b'8', b'360', b'175', b'3.15',
        b'3.44', b'17.02', b'0', b'0', b'3', b'2'],
       [b'Valiant', b'18.1', b'6', b'225', b'105', b'2.76', b'3.46',
        b'20.22', b'1', b'0', b'3', b'1'],
       [b'Duster 360', b'14.3', b'8', b'360', b'245', b'3.21', b'3.57',
        b'15.84', b'0', b'0', b'3', b'4'],
       [b'Merc 240D', b'24.4', b'4', b'146.7', b'62', b'3.69', b'3.19',
  

In [51]:
# Calling by passing in a parameter
myDatashell.rename_column('cyl', 'cylinders')

array([[b'model', b'mpg', b'cylinders', b'disp', b'hp', b'drat', b'wt',
        b'qsec', b'vs', b'am', b'gear', b'carb'],
       [b'Mazda RX4', b'21', b'6', b'160', b'110', b'3.9', b'2.62',
        b'16.46', b'0', b'1', b'4', b'4'],
       [b'Mazda RX4 Wag', b'21', b'6', b'160', b'110', b'3.9', b'2.875',
        b'17.02', b'0', b'1', b'4', b'4'],
       [b'Datsun 710', b'22.8', b'4', b'108', b'93', b'3.85', b'2.32',
        b'18.61', b'1', b'1', b'4', b'1'],
       [b'Hornet 4 Drive', b'21.4', b'6', b'258', b'110', b'3.08',
        b'3.215', b'19.44', b'1', b'0', b'3', b'1'],
       [b'Hornet Sportabout', b'18.7', b'8', b'360', b'175', b'3.15',
        b'3.44', b'17.02', b'0', b'0', b'3', b'2'],
       [b'Valiant', b'18.1', b'6', b'225', b'105', b'2.76', b'3.46',
        b'20.22', b'1', b'0', b'3', b'1'],
       [b'Duster 360', b'14.3', b'8', b'360', b'245', b'3.21', b'3.57',
        b'15.84', b'0', b'0', b'3', b'4'],
       [b'Merc 240D', b'24.4', b'4', b'146.7', b'62', b'3.69', b'3.1

#### Example 1 from exercises

In [52]:
# Create class: DataShell
class DataShell:
  
    # Initialize class with self argument
    def __init__(self):
        pass
      
    # Define class method which takes self argument: print_static
    def print_static(self):
        # Print string
        print("You just executed a class method!")
        
# Instantiate DataShell taking no arguments: my_data_shell
my_data_shell = DataShell()

# Call the print_static method of your newly created object
my_data_shell.print_static()

You just executed a class method!


#### Example 2 from exercises

In [53]:
# Create class: DataShell
class DataShell:
  
    # Initialize class with self and dataList as arguments
    def __init__(self, dataList):
        # Set data as instance variable, and assign it the value of dataList
        self.data = dataList
        
    # Define class method which takes self argument: show
    def show(self):
        # Print the instance variable data
        print(self.data)

# Declare variable with list of integers from 1 to 10: integer_list   
integer_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
        
# Instantiate DataShell taking integer_list as argument: my_data_shell
my_data_shell = DataShell(integer_list)

# Call the show method of your newly created object
my_data_shell.show()

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


#### Example 3 from exercies

In [54]:
# Create class: DataShell
class DataShell:
  
    # Initialize class with self and dataList as arguments
    def __init__(self, dataList):
        # Set data as instance variable, and assign it the value of dataList
        self.data = dataList
        
    # Define method that prints data: show
    def show(self):
        print(self.data)
        
    # Define method that prints average of data: avg 
    def avg(self):
        # Declare avg and assign it the average of data
        avg = sum(self.data)/float(len(self.data))
        # Print avg
        print(avg)
        
# Instantiate DataShell taking integer_list as argument: my_data_shell
my_data_shell = DataShell(integer_list)

# Call the show and avg methods of your newly created object
my_data_shell.show()
my_data_shell.avg()

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
5.5


## Chapter 3: Fancy classes, fancy objects

### Working with a DataSet to Create DataFrames

#### MTCars

#### Creating our Cars Analysis DataShell
Creating an instance of a DataShell

In [78]:
car_data = DataShell('datasets/mtcars.csv')

Print the instance of the object

In [79]:
print(car_data)

<__main__.DataShell object at 0x11b891990>


#### Creating a method to introspect the object

In [82]:
class DataShell:
    
    def __init__(self, filename):
        self.filename = filename
        
    def create_datashell(self):
        # In this first line, this is where we define the "array" attribute
        self.array = np.genfromtxt(self.filename, delimiter=',', dtype=None)
        return self.array
    
    def show_shell(self):
        print(self.array)

#### Printing the array

In [83]:
# Must call this method first
car_data.create_datashell()

# Before you're able to see the array
print(type(car_data.array))

<class 'numpy.ndarray'>


  


In [84]:
print(car_data.array)

[[b'model' b'mpg' b'cyl' b'disp' b'hp' b'drat' b'wt' b'qsec' b'vs' b'am'
  b'gear' b'carb']
 [b'Mazda RX4' b'21' b'6' b'160' b'110' b'3.9' b'2.62' b'16.46' b'0' b'1'
  b'4' b'4']
 [b'Mazda RX4 Wag' b'21' b'6' b'160' b'110' b'3.9' b'2.875' b'17.02' b'0'
  b'1' b'4' b'4']
 [b'Datsun 710' b'22.8' b'4' b'108' b'93' b'3.85' b'2.32' b'18.61' b'1'
  b'1' b'4' b'1']
 [b'Hornet 4 Drive' b'21.4' b'6' b'258' b'110' b'3.08' b'3.215' b'19.44'
  b'1' b'0' b'3' b'1']
 [b'Hornet Sportabout' b'18.7' b'8' b'360' b'175' b'3.15' b'3.44'
  b'17.02' b'0' b'0' b'3' b'2']
 [b'Valiant' b'18.1' b'6' b'225' b'105' b'2.76' b'3.46' b'20.22' b'1'
  b'0' b'3' b'1']
 [b'Duster 360' b'14.3' b'8' b'360' b'245' b'3.21' b'3.57' b'15.84' b'0'
  b'0' b'3' b'4']
 [b'Merc 240D' b'24.4' b'4' b'146.7' b'62' b'3.69' b'3.19' b'20' b'1'
  b'0' b'4' b'2']
 [b'Merc 230' b'22.8' b'4' b'140.8' b'95' b'3.92' b'3.15' b'22.9' b'1'
  b'0' b'4' b'2']
 [b'Merc 280' b'19.2' b'6' b'167.6' b'123' b'3.92' b'3.44' b'18.3' b'1'
  b'0' b'4' b'4']

### Renaming Columns and the Five-Figure Summary

#### Taking a second look at our column names

In [92]:
print(car_data.array)

[[b'model' b'mpg' b'cyl' b'disp' b'hp' b'drat' b'wt' b'qsec' b'vs' b'am'
  b'gear' b'carb']
 [b'Mazda RX4' b'21' b'6' b'160' b'110' b'3.9' b'2.62' b'16.46' b'0' b'1'
  b'4' b'4']
 [b'Mazda RX4 Wag' b'21' b'6' b'160' b'110' b'3.9' b'2.875' b'17.02' b'0'
  b'1' b'4' b'4']
 [b'Datsun 710' b'22.8' b'4' b'108' b'93' b'3.85' b'2.32' b'18.61' b'1'
  b'1' b'4' b'1']
 [b'Hornet 4 Drive' b'21.4' b'6' b'258' b'110' b'3.08' b'3.215' b'19.44'
  b'1' b'0' b'3' b'1']
 [b'Hornet Sportabout' b'18.7' b'8' b'360' b'175' b'3.15' b'3.44'
  b'17.02' b'0' b'0' b'3' b'2']
 [b'Valiant' b'18.1' b'6' b'225' b'105' b'2.76' b'3.46' b'20.22' b'1'
  b'0' b'3' b'1']
 [b'Duster 360' b'14.3' b'8' b'360' b'245' b'3.21' b'3.57' b'15.84' b'0'
  b'0' b'3' b'4']
 [b'Merc 240D' b'24.4' b'4' b'146.7' b'62' b'3.69' b'3.19' b'20' b'1'
  b'0' b'4' b'2']
 [b'Merc 230' b'22.8' b'4' b'140.8' b'95' b'3.92' b'3.15' b'22.9' b'1'
  b'0' b'4' b'2']
 [b'Merc 280' b'19.2' b'6' b'167.6' b'123' b'3.92' b'3.44' b'18.3' b'1'
  b'0' b'4' b'4']

#### Accessing  Column Names

The list of columns can be accessed by calling `self.array[0]` because the first list in the list of lists is the column names.

#### Renaming the columns by passing in multiple parameters

In [121]:
from scipy import stats

class DataShell:
    def __init__(self, filename):
        self.filename = filename
        
    def create_datashell(self):
        # In this first line, this is where we define the "array" attribute
        self.array = np.genfromtxt(self.filename, delimiter=',', dtype=None)
        return self.array
        
    def rename_column(self, old_colname, new_colname):
        for index, value in enumerate(self.array[0]):
            if value == old_colname.encode('UTF-8'):
                self.array[0][index] = new_colname
        return self.array
    
    def five_figure_summary(self, col_pos):
        statistics = stats.describe(self.array[1:,col_pos].astype(np.float))
        return f"Five-figure status of column {col_pos}: {statistics}"

In [125]:
my_data_shell = DataShell('datasets/mtcars.csv')

#### Completing the Rename

In [126]:
my_data_shell.create_datashell()
my_data_shell.rename_column('cyl', 'cylinders')
print(my_data_shell.array)

[[b'model' b'mpg' b'cylinders' b'disp' b'hp' b'drat' b'wt' b'qsec' b'vs'
  b'am' b'gear' b'carb']
 [b'Mazda RX4' b'21' b'6' b'160' b'110' b'3.9' b'2.62' b'16.46' b'0' b'1'
  b'4' b'4']
 [b'Mazda RX4 Wag' b'21' b'6' b'160' b'110' b'3.9' b'2.875' b'17.02' b'0'
  b'1' b'4' b'4']
 [b'Datsun 710' b'22.8' b'4' b'108' b'93' b'3.85' b'2.32' b'18.61' b'1'
  b'1' b'4' b'1']
 [b'Hornet 4 Drive' b'21.4' b'6' b'258' b'110' b'3.08' b'3.215' b'19.44'
  b'1' b'0' b'3' b'1']
 [b'Hornet Sportabout' b'18.7' b'8' b'360' b'175' b'3.15' b'3.44'
  b'17.02' b'0' b'0' b'3' b'2']
 [b'Valiant' b'18.1' b'6' b'225' b'105' b'2.76' b'3.46' b'20.22' b'1'
  b'0' b'3' b'1']
 [b'Duster 360' b'14.3' b'8' b'360' b'245' b'3.21' b'3.57' b'15.84' b'0'
  b'0' b'3' b'4']
 [b'Merc 240D' b'24.4' b'4' b'146.7' b'62' b'3.69' b'3.19' b'20' b'1'
  b'0' b'4' b'2']
 [b'Merc 230' b'22.8' b'4' b'140.8' b'95' b'3.92' b'3.15' b'22.9' b'1'
  b'0' b'4' b'2']
 [b'Merc 280' b'19.2' b'6' b'167.6' b'123' b'3.92' b'3.44' b'18.3' b'1'
  b'0' b'4'

  if __name__ == '__main__':


#### Five-figure summary

In [120]:
def five_figure_summary(self, col_pos):
    statistics = stats.describe(self.array[1:,col_pos].astype(np.float))
    return f"Five-figure status of column {col_pos}: {statistics}"

Note that `f"a"` prints the string `a` with `{b}` being able to reference the variable b.

In [127]:
my_data_shell.five_figure_summary(1)

'Five-figure status of column 1: DescribeResult(nobs=32, minmax=(10.4, 33.9), mean=20.090625000000003, variance=36.32410282258064, skewness=0.6404398640318834, kurtosis=-0.20053320971549793)'

#### Example 1 from exercises

In [130]:
# Import numpy as np, pandas as pd
import numpy as np
import pandas as pd

# Create class: DataShell
class DataShell:
  
    # Define initialization method
    def __init__(self, filepath):
        # Set filepath as instance variable  
        self.filepath = filepath
        # Set data_as_csv as instance variable
        self.data_as_csv = pd.read_csv(filepath)

# Instantiate DataShell as us_data_shell
us_data_shell = DataShell('datasets/mtcars.csv')

# Print your object's data_as_csv attribute
print(us_data_shell.data_as_csv.head())

               model   mpg  cyl   disp   hp  drat     wt   qsec  vs  am  gear  \
0          Mazda RX4  21.0    6  160.0  110  3.90  2.620  16.46   0   1     4   
1      Mazda RX4 Wag  21.0    6  160.0  110  3.90  2.875  17.02   0   1     4   
2         Datsun 710  22.8    4  108.0   93  3.85  2.320  18.61   1   1     4   
3     Hornet 4 Drive  21.4    6  258.0  110  3.08  3.215  19.44   1   0     3   
4  Hornet Sportabout  18.7    8  360.0  175  3.15  3.440  17.02   0   0     3   

   carb  
0     4  
1     4  
2     1  
3     1  
4     2  


#### Example 2 from exercises

In [131]:
# Create class DataShell
class DataShell:
  
    # Define initialization method
    def __init__(self, filepath):
        self.filepath = filepath
        self.data_as_csv = pd.read_csv(filepath)
    
    # Define method rename_column, with arguments self, column_name, and new_column_name
    def rename_column(self, column_name, new_column_name):
        self.data_as_csv.columns = self.data_as_csv.columns.str.replace(column_name, new_column_name)

# Instantiate DataShell as us_data_shell with argument us_life_expectancy
us_data_shell = DataShell('datasets/mtcars.csv')

# Print the datatype of your object's data_as_csv attribute
print(us_data_shell.data_as_csv.dtypes)

# Rename your objects column 'code' to 'country_code'
us_data_shell.rename_column('code', 'country_code')

# Again, print the datatype of your object's data_as_csv attribute
print(us_data_shell.data_as_csv.dtypes)

model     object
mpg      float64
cyl        int64
disp     float64
hp         int64
drat     float64
wt       float64
qsec     float64
vs         int64
am         int64
gear       int64
carb       int64
dtype: object
model     object
mpg      float64
cyl        int64
disp     float64
hp         int64
drat     float64
wt       float64
qsec     float64
vs         int64
am         int64
gear       int64
carb       int64
dtype: object


#### Example 3 from exercises

In [132]:
# Create class DataShell
class DataShell:

    # Define initialization method
    def __init__(self, filepath):
        self.filepath = filepath
        self.data_as_csv = pd.read_csv(filepath)

    # Define method rename_column, with arguments self, column_name, and new_column_name
    def rename_column(self, column_name, new_column_name):
        self.data_as_csv.columns = self.data_as_csv.columns.str.replace(column_name, new_column_name)
        
    # Define get_stats method, with argument self
    def get_stats(self):
        # Return a description data_as_csv
        return self.data_as_csv.describe()
    
# Instantiate DataShell as us_data_shell
us_data_shell = DataShell('datasets/mtcars.csv')

# Print the output of your objects get_stats method
print(us_data_shell.get_stats())

             mpg        cyl        disp          hp       drat         wt  \
count  32.000000  32.000000   32.000000   32.000000  32.000000  32.000000   
mean   20.090625   6.187500  230.721875  146.687500   3.596563   3.217250   
std     6.026948   1.785922  123.938694   68.562868   0.534679   0.978457   
min    10.400000   4.000000   71.100000   52.000000   2.760000   1.513000   
25%    15.425000   4.000000  120.825000   96.500000   3.080000   2.581250   
50%    19.200000   6.000000  196.300000  123.000000   3.695000   3.325000   
75%    22.800000   8.000000  326.000000  180.000000   3.920000   3.610000   
max    33.900000   8.000000  472.000000  335.000000   4.930000   5.424000   

            qsec         vs         am       gear     carb  
count  32.000000  32.000000  32.000000  32.000000  32.0000  
mean   17.848750   0.437500   0.406250   3.687500   2.8125  
std     1.786943   0.504016   0.498991   0.737804   1.6152  
min    14.500000   0.000000   0.000000   3.000000   1.0000  
2

### OOP Best Practices

#### Reading Other People's Code
1. Check out GitHub Code.
2. Check out good examples of Python code.
3. Read the codebase.

#### Pandas and Spark

Full codebase for this class found [here](https://github.com/apache/spark/blob/feeca63198466640ac461a2a34922493fa6162a8/python/pyspark/context.py):

In [None]:
class SparkContext(object):

    """
    Main entry point for Spark functionality. A SparkContext represents the
    connection to a Spark cluster, and can be used to create :class:`RDD` and
    broadcast variables on that cluster.
    .. note:: Only one :class:`SparkContext` should be active per JVM. You must `stop()`
        the active :class:`SparkContext` before creating a new one.
    .. note:: :class:`SparkContext` instance is not supported to share across multiple
        processes out of the box, and PySpark does not guarantee multi-processing execution.
        Use threads instead for concurrent processing purpose.
    """

    _gateway = None
    _jvm = None
    _next_accum_id = 0
    _active_spark_context = None
    _lock = RLock()
    _python_includes = None  # zip and egg files that need to be added to PYTHONPATH

#### Spark Class: The Constructor
* Contains several attributes and parameters initialized with the class

#### Spark Class: A Method

* `printSchema` takes a DataFrame and prints out its schema or column structure for us to look at

#### PEP Style
* PEP 8 -- Style Guide for Python Code
* Some preferred guidelines:
    * Preferred use of spaces as an indentation style
    * Maximum line length of 79 characters
    * Docstrings for class comments, including return parameters
    * Class names should normally use the CapWords convention, meaning camelcase.
    
#### Separation of Concerns
* Remember: you want your class to do a single thing or group of things, like a cookie cutter
* Separation of concerns is the practice of organizing a program into distinct sections, so that each part of a program addresses one unique unit of work.

## Chapter 4: Inheritance, polymorphism and composition 

### Inheritance

#### Extending our DataShells
Current state of the world:
> DataShell --> "I only process CSV datasets"

But what if we wanted a class that reads in a file from json, or txt format?

We could modify DataShell, but that would take some work.

OR, we could create another class that has the same functionality as DataShell, and some more of it's own.

This is known as **inheritance** -- A class that takes on attributes from another "parent" class and adds some more of its own functionality.

#### Is a and Has a Relationship
* A Pterodactyl **is-a** Dinosaur
* A Tyrannosaurus **is-a** Dinosaur
* Is a Pterodactl a dinosaur? Yes, pterodactly inherits from dinosaur.
* Is a Tyrannosaurus a pterodactyl? No, but they're both dinosaurs.
* Is a dinosaur a pterodactyl? No, so it doesn't work the other way, either.

#### Inheriting a DataShell

In [133]:
# new class which inherits from DataShell: StDevDataShell

class StDevDataShell(DataShell): # pass in DataShell here
    # no this class inherits all attributes and methods from DataShell
    pass

#### Example 1 from exercises

In [135]:
# Create a class Animal
class Animal:
    def __init__(self, name):
        self.name = name

# Create a class Mammal, which inherits from Animal
class Mammal(Animal):
    def __init__(self, name, animal_type):
        self.animal_type = animal_type

# Create a class Reptile, which also inherits from Animal
class Reptile(Animal):
    def __init__(self, name, animal_type):
        self.animal_type = animal_type

# Instantiate a mammal with name 'Daisy' and animal_type 'dog': daisy
daisy = Mammal('Daisy', 'dog')

# Instantiate a reptile with name 'Stella' and animal_type 'alligator': stella
stella = Reptile('Stella', 'alligator')

# Print both objects
print(daisy)
print(stella)

<__main__.Mammal object at 0x12bd86310>
<__main__.Reptile object at 0x12bd86c50>


#### Example 2 from exercises

In [136]:
# Create a class Vertebrate
class Vertebrate:
    spinal_cord = True
    def __init__(self, name):
        self.name = name

# Create a class Mammal, which inherits from Vertebrate
class Mammal(Vertebrate):
    def __init__(self, name, animal_type):
        self.animal_type = animal_type
        self.temperature_regulation = True

# Create a class Reptile, which also inherits from Vertebrate
class Reptile(Vertebrate):
    def __init__(self, name, animal_type):
        self.animal_type = animal_type
        self.temperature_regulation = False

# Instantiate a mammal with name 'Daisy' and animal_type 'dog': daisy
daisy = Mammal('Daisy', 'dog')

# Instantiate a reptile with name 'Stella' and animal_type 'alligator': stella
stella = Reptile('Stella', 'alligator')

# Print stella's attributes spinal_cord and temperature_regulation
print("Stella Spinal cord: " + str(stella.spinal_cord))
print("Stella temperature regulation: " + str(stella.temperature_regulation))

# Print daisy's attributes spinal_cord and temperature_regulation
print("Daisy Spinal cord: " + str(daisy.spinal_cord))
print("Daisy temperature regulation: " + str(daisy.temperature_regulation))

Stella Spinal cord: True
Stella temperature regulation: False
Daisy Spinal cord: True
Daisy temperature regulation: True


### Inheritance with DataShells

#### DataShell with Standard Deviation
* Same DataShell class with additional functionality

#### Changing the DataShell

In [144]:
from scipy import stats

class DataShell:
    def __init__(self, filename):
        self.filename = filename
        
    def create_datashell(self):
        # In this first line, this is where we define the "array" attribute
        self.array = np.genfromtxt(self.filename, delimiter=',', dtype=None)
        return self.array
        
    def rename_column(self, old_colname, new_colname):
        for index, value in enumerate(self.array[0]):
            if value == old_colname.encode('UTF-8'):
                self.array[0][index] = new_colname
        return self.array
    
    def five_figure_summary(self, col_pos):
        statistics = stats.describe(self.array[1:,col_pos].astype(np.float))
        return f"Five-figure status of column {col_pos}: {statistics}"

#### Allowing for a standard deviation

In [145]:
def get_stdev(self, col_pos):
    column = self.array[1:, col_pos].astype(np.float)
    stdev = np.ndarray.std(column, axis=0)
    return f"Standard Deviation of column {col_pos}: {stdev}"

#### Inheritance with DataShells

In [146]:
class DataStDev(DataShell):
    
    def __init__(self, filename):
        DataShell.filename = filename
        
    def get_stdev(self, col_pos):
        column = self.array[1:, col_pos].astype(np.float)
        stdev = np.ndarray.std(column, axis=0)
        return f"Standard Deviation of column {col_pos}: {stdev}"

#### Calling our new DataShell

In [147]:
car_data = 'datasets/mtcars.csv'

my_st_dev_shell = DataStDev(car_data)
my_st_dev_shell.create_datashell()
my_st_dev_shell.get_stdev(1)

  if __name__ == '__main__':


'Standard Deviation of column 1: 5.932029552301218'

### Composition

#### Inheritance verses Composition
* Inheritance: uses the structure of another class to add onto a current class. Is-a relationship of dependencies
* Composition: takes elements of several differernt classes to create a kind of "Frankenstein" class

#### Composition In a DataShell - 1
Five-Figure Summary Composition

In [150]:
# Doesn't use anything else from the class
def five_figure_summary(self, col_pos):
    statistics = stats.describe(self.array[1:,col_pos].astype(np.float))
    return f"Five-figure status of column {col_pos}: {statistics}"

#### Composition In a DataShell - 2
Create DataShell Composition"

In [149]:
# Also doesn't use anything else from the class
def create_datashell(self):
    data_array = np.genfromtext(self.filename, delimiiter=',', dtype=None)
    self.array = data_array
    return self.array

#### Composing with Pandas
Create DataShell Composition:


In [156]:
# Original class
class DataShell:
    
    def __init__(self, filename):
        self.filename = filename
        
    def create_datashell(self):
        data_array = np.genfromtext(self.filename, delimiiter=',', dtype=None)
        self.array = data_array
        return self.array 

In [159]:
# Class replaced with pandas
class DataShellComposed:
    
    def __init__(self, filename):
        self.filename = filename
        
    def create_datashell(self):
        self.df = pd.read_csv(self.filename) # Only downside is that you don't know what's going on under the hood
        return self.df 

#### What does our new class look like?

In [160]:
car_data = 'datasets/mtcars.csv'
my_data_shell = DataShellComposed(car_data)
my_data_shell.create_datashell()
print(type(my_data_shell.df))

<class 'pandas.core.frame.DataFrame'>


#### Example 1 from exerciese

In [161]:
# Define abstract class DataShell
class DataShell:
    # Class variable family
    family = 'DataShell'
    # Initialization method with arguments, and instance variables
    def __init__(self, name, filepath): 
        self.name = name
        self.filepath = filepath

# Define class CsvDataShell      
class CsvDataShell(DataShell):
    # Initialization method with arguments self, name, filepath
    def __init__(self, name, filepath):
        # Instance variable data
        self.data = pd.read_csv(filepath)
        # Instance variable stats
        self.stats = self.data.describe()

# Instantiate CsvDataShell as us_data_shell
us_data_shell = CsvDataShell("US", 'datasets/mtcars.csv')

# Print us_data_shell.stats
print(us_data_shell.stats)

             mpg        cyl        disp          hp       drat         wt  \
count  32.000000  32.000000   32.000000   32.000000  32.000000  32.000000   
mean   20.090625   6.187500  230.721875  146.687500   3.596563   3.217250   
std     6.026948   1.785922  123.938694   68.562868   0.534679   0.978457   
min    10.400000   4.000000   71.100000   52.000000   2.760000   1.513000   
25%    15.425000   4.000000  120.825000   96.500000   3.080000   2.581250   
50%    19.200000   6.000000  196.300000  123.000000   3.695000   3.325000   
75%    22.800000   8.000000  326.000000  180.000000   3.920000   3.610000   
max    33.900000   8.000000  472.000000  335.000000   4.930000   5.424000   

            qsec         vs         am       gear     carb  
count  32.000000  32.000000  32.000000  32.000000  32.0000  
mean   17.848750   0.437500   0.406250   3.687500   2.8125  
std     1.786943   0.504016   0.498991   0.737804   1.6152  
min    14.500000   0.000000   0.000000   3.000000   1.0000  
2

#### Example 2 from exercisese

In [163]:
# Define abstract class DataShell
class DataShell:
    family = 'DataShell'
    def __init__(self, name, filepath): 
        self.name = name
        self.filepath = filepath

# Define class CsvDataShell
class CsvDataShell(DataShell):
    def __init__(self, name, filepath):
        self.data = pd.read_csv(filepath)
        self.stats = self.data.describe()

# Define class TsvDataShell
class TsvDataShell(DataShell):
    # Initialization method with arguments self, name, filepath
    def __init__(self, name, filepath):
        # Instance variable data
        self.data = pd.read_table(filepath)
        # Instance variable stats
        self.stats = self.data.describe()

# Instantiate CsvDataShell as us_data_shell, print us_data_shell.stats
us_data_shell = CsvDataShell("US", 'datasets/mtcars.csv')
print(us_data_shell.stats)

# Instantiate TsvDataShell as france_data_shell, print france_data_shell.stats
france_data_shell = TsvDataShell("US", 'datasets/mtcars.csv')
print(france_data_shell.stats)

             mpg        cyl        disp          hp       drat         wt  \
count  32.000000  32.000000   32.000000   32.000000  32.000000  32.000000   
mean   20.090625   6.187500  230.721875  146.687500   3.596563   3.217250   
std     6.026948   1.785922  123.938694   68.562868   0.534679   0.978457   
min    10.400000   4.000000   71.100000   52.000000   2.760000   1.513000   
25%    15.425000   4.000000  120.825000   96.500000   3.080000   2.581250   
50%    19.200000   6.000000  196.300000  123.000000   3.695000   3.325000   
75%    22.800000   8.000000  326.000000  180.000000   3.920000   3.610000   
max    33.900000   8.000000  472.000000  335.000000   4.930000   5.424000   

            qsec         vs         am       gear     carb  
count  32.000000  32.000000  32.000000  32.000000  32.0000  
mean   17.848750   0.437500   0.406250   3.687500   2.8125  
std     1.786943   0.504016   0.498991   0.737804   1.6152  
min    14.500000   0.000000   0.000000   3.000000   1.0000  
2