# Data Science at UCSB
# Python for Data Science: Introduction
## Jason Freeberg, Fall 2016

Welcome to the first tutorial! Let's get started with the basic foundations of OOP (Object Oriented Programming) with Python and cover some useful functions. Specifically, we will go over *classes*, *objects*, *modules*, basic *data structures*, and the functions for getting information on them.

Some helpful resources to come back to:
- [Data Camp](https://www.datacamp.com) Where you'll be doing homework.
- [Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do) Written by the creator of the pandas module. A free PDF is available online.
- [Reddit Datasets](https://www.reddit.com/r/datasets/) A good place to get data for personal projects.
- [Python Walkthrough](https://docs.python.org/3/tutorial/index.html) Covers the basics of Python at a low level, essentially a guided tour through the Python documentation.
- [Stack Exchange](http://stackexchange.com/) Because why figure it out on your own when someone smarter already did?
- [Jason](https://www.linkedin.com/in/jfreeberg) I will share my email and cell number in lab (these notebooks are public on GitHub). 


# Classes

Have you taken an intro programming class? Congratulations, you already know what a class is.

If not, ever been in a car? Congratulations, you already know what a class is.

At a high level, classes are the instructions (or blueprints) for creating objects, their initial values, and the functions to manipulate them. Let's use an example that's not related to programming... Ford Motor Company has the schematics for building their trucks and cars, like the Ford F-150. The schematics tell the employees where to put each piece, the starting values for things like the speedometer, fluid tanks levels, and so on. The schematics can also tell the employees to change the truck based on input from consumer--think of trim options like fog lights and leather steering wheels. 

The cell below defines a class for Ford F-150 trucks.

In [10]:
class F150:
    """
    A basic class definition. We have attributes that are numeric, logical, and strings.
    
    The methods are simple, they only update the attributes. 
    """
    # These are the shared attributes of the class. When an 
    # 'F150' object is created, it will have these values.
    make = 'Ford'
    year = 2016
    
    # This method defines the inital values when an object is created. 
    # Notice that the function parameters have default values.
    def __init__(self, cyl=6, speedoTop=120, fluidsOK=True, 
                 SteeringWheel='Standard', fogLights=True):
        self.cylinders = cyl # Number of cylinders in the engine, the standard is a V8
        self.speedoTop = speedoTop # Top speed on the speedometer, in MPH
        self.fuildLevelsOK = fluidsOK
        self.SteeringWheel = SteeringWheel
        self.fogLights = fogLights
    
    # And here are the methods...
    # This method changes the type of steering wheel that will be built into the truck
    # Use the "self." operator to indicate that you are accessing the class's attributes
    def updateWheel(self, NewWheelType):
        self.SteeringWheel = NewWheelType
    
    # This updates the top speed that will be printer on the speedomter
    def updateMeterTop(self, NewTop):
        self.speedoMeterTop = NewTop

 **Note**: *In Python, we generally don't write functions to edit the private variables (cylinders, speedoMeterTop, etc.) we just access them directly. I defined updateWheel() and updateMeterTop() only to show the syntax for member function definition and preview the dot operator. *

# Objects

Let's continue with our truck example and tie it back to a programming context. Now we have the blueprint (class) for making F150 trucks! In the physical world, we use blueprints and instructions to make *things*. Literally, we use the intructions in a Lego set to make the final toy. Likewise, we can make *things* from our class and interact with them in our script, and we call them **objects**. We can make a an entire fleet of F150 truck obects and have them race... if we edit the above class definition to allow for a throttle and brakes ;).

In this hypothetical, the blueprints also have the instructions for different trims, engines and other bells and whistles. So when we create a new truck object, we can change those attributes and use its methods/functions. To access the attributes of an object, or to use functions to edit those attributes, we use the **dot operator**. The syntax is: *object*.*attribute* or *object*.*method()*

In the cell below we create two objects, **truck1** and **truck2**. Look closely at the print statements to see the dot operator for accessing object attributes.

**FYI:** class instance == object of a class

In [11]:
# Here is the syntax for creating an instance of a class, using the deafaults


truck1 = F150()
truck2 = F150(cyl=4, speedoTop=100, fluidsOK=False, 
              SteeringWheel='Leather', fogLights=False)

print('The number of cylinders in truck1 = ', truck1.cylinders)
print('The type of steering wheel in truck1 is', truck1.SteeringWheel)
if truck1.fogLights == True:
    print('truck1 has fog lights')
else:
    print('truck1 does not have fog lights')

print('------------------------------------------------------------')

print('The number of cylinders in truck1 = ', truck2.cylinders)
print('The type of steering wheel in truck1 is', truck2.SteeringWheel)

# Since .fogLights is a boolean, we don't actually need to check "== True"
if truck2.fogLights:
    print('truck2 has fog lights')
else:
    print('truck2 does not have fog lights')

SyntaxError: Missing parentheses in call to 'print' (<ipython-input-11-40c97dcf32bd>, line 8)

In [12]:
# Adjusting the values

truck1.updateMeterTop(160)
truck1.updateWheel('Batmobile Steering Wheel')

truck2.updateMeterTop('One Hundred and Thirty')
truck2.updateWheel('GameCube Controller')

# Print the new values

print("truck1's top speedo tick =", truck1.speedoMeterTop, \
"and it's new steering wheel is a", truck1.SteeringWheel)
print("truck2's top speedo tick =", truck2.speedoMeterTop, \
"and it's new steering wheel is a", truck2.SteeringWheel)

SyntaxError: Missing parentheses in call to 'print' (<ipython-input-12-1ca9a2418315>, line 11)

# Modules

If you end up using Python in a data-related role, chances are that you won't often be defining your own classes... you will be using those defined by other people. NumPy and pandas are collections of data structures and classes all neatly packaged into modules\*. Once the modules are downloaded onto your computer via pip or 'conda, you can quickly bring them into your script. Let's take a look.

\* Also known as libraries or packages if you come from an R background

In [13]:
# These all accomplish the same task of importing the module, with slight differences. 
# For more information, look to the Python documentation. For this quarter's tutorial, 
# we will use the second convention.

import numpy
import numpy as np # np is the naming convention for importing numpy
from numpy import *
import pandas as pd

# Helpful Functions

As stated above, you will often use modules created by other people. This begs the question: *how do I learn how to use the classes and functions?* Well that requires a two-part answer.

* **Abstraction**: Assume we have a list and use a method called "average(*list*)" from the base Python library. We can safely assume that it will return the average of the list. Do we need to know exactly **how** it accomplishes this? Absolutely not! It would be a waste of time to learn and check the implementations of every function in the NumPy package.
* **Inspection**: There are base Python functions to inspect classes, objects, and methods. This way we can get the attributes and methods of an object we're working with. Let's take a look at some.

In [14]:
import inspect
anotherTruck = F150()

print("Output of dir() method:")
print( dir(anotherTruck), '\n' ) # Base function, returns list of attributes

print("Output of getmembers() method called on class:")
for item in inspect.getmembers(F150):  # From inspect module, returns 
    print item                         # tuples of (attribute, description)
  
print '\n', "Output of getmembers() method called on class instance:"
for item in inspect.getmembers(anotherTruck): 
    print(item)

SyntaxError: invalid syntax (<ipython-input-14-28f9291f4d81>, line 4)

# Built-In Data Structures

Next week we will learn about pandas DataFrames, a way of representing tabular data. However, there are more basic data types in Python that you should get familiar with...
- [Lists]()
- [Tuples]()
- [Dictionaries]()

### Tuples
Tuples hold data in ......

### Lists
Lists are a class that hold data in a specified number of dimensions. You'll usually only need one or two dimensions, but it's possible to have three or ten. You can access the elements using square bracket notation: *a_list*\[index1\]\[index2\]

### Dictionaries
Dictionaries hold data in key-value pairs. The simplest dictionaries (or dicts) have keys as strings, and values as numbers. But really, the values could be a string, tuple, list, or another dictionary! If you hear the term JSON, think of them as dictionaries.

In [15]:
# Data type examples!
# Tuples

foo = 9
tupl = (1,2,3,4,5)
tupl = ("Monty", "Python") 
tupl = ("More", "Complex", 1234, foo)
print('- Tuple:')
print('0.', tupl)

# Lists

lst = [1, 2, 3, 4, 5]
lst = ['a', 'b', 'c', 'd']
lst = ['a', 2, 'c', foo]
lst = ['a', 'list', ['within', 'a', 'list']]
print('- Lists:')
print('1.', lst[2])
print('2.', lst[2][0])

# Dictionaries

dct = {'key' : 'value',
       'key2' : 2,
       'key3' : foo
      }
dct = {'key' : (1,2),
       'key2' : ['list', 'as', 'value'],
       'key3' : {'inner_key': 1,
                 'inner_key2' : 2}
      }
print('- Dictionaries:')
print('3.', dct)
print('4.', dct['key2'])
print('5.', dct['key3']['inner_key2'])

# Example of nested data types

bigList = [tupl, lst, dct]
print('- Big list:')
for thing in bigList:
    print(thing)

SyntaxError: Missing parentheses in call to 'print' (<ipython-input-15-09185d54808b>, line 8)

# Your Turn!

Now it's time for you to try making some class and function definitions. Use the previous cells as reference, but try it on your own first! The exercises start simple and become increasingly difficult. The last couple exercises might require you to consult the Python documentation... or search Stack Exchange. 

Look for the &lt;FILL IN&gt; bits.

In [49]:
# Define a class called collegeCourse. 
# It should have the following attributes; give them default any reasonable 
# default values in the initializer.
# - 1 the department abbreviation
# - 2 the course number
# - 3 instructor's last name
# - 4 the hour the class starts (24 hour clock)
# - 5 the room number

# And finish the function info( ), which nearly prints the information on the class.

class collegeCourse:
    
    def __init__(self):
        self.<FILL IN> = <FILL IN>
        self.<FILL IN> = <FILL IN>
        self.<FILL IN> = <FILL IN>
        self.<FILL IN> = <FILL IN>
        self.<FILL IN> = <FILL IN>
    
    def info(self):
        print("Professor", <FILL IN>, "will be teaching", <FILL IN>, <FILL IN>, 
        "at", <FILL IN>, "in room number", <FILL IN>)

Professor None will be teaching None


In [47]:
# Let's instantiate our class and try our method 

coolCourse = collegeCourse()

#coolCourse.<FILL IN>
coolCourse.info()

# Well it works, but it doesn't hold much information, does it? 
# Edit the attributes so the existing object describes the course "DSUCSB 101", 
# which is taught by professor Freeberg in room 1007 at 19:00.
coolCourse.<FILL IN> = "Freeberg"
coolCourse.<FILL IN> = "DSUCSB"
coolCourse.<FILL IN> = 101
coolCourse.<FILL IN> = 19.00
coolCourse.<FILL IN> = 1007 

# Let's print again!
coolCourse.info()

SyntaxError: invalid syntax (<ipython-input-47-f7d68999cbaf>, line 10)

In [None]:
# Wouldn't it be great if we could just put in all that information on instantiation?
# Yes, yes it would be. Let's do that.

# Copy your class definition into this cell, replace the __init__ method with the one 
# below. It gives the default values when an object is made.

"""
def __init__(self, 
             dept = <FILL IN>, 
             instr = <FILL IN>, 
             course = <FILL IN>, 
             room = <FILL IN>, 
             hour = <FILL IN>):
        self.dept = <FILL IN>
        self.instr = <FILL IN>
        self.course = <FILL IN>
        self.room = <FILL IN>
        self.hour = <FILL IN>
"""

# PASTE OLD DEFINITION HERE:




# Now let's try instantiating with the same course information as the last cell.

coolCourse2 = collegeCourse(dept = <FILL IN>, 
                            instr = <FILL IN>,
                            course = <FILL IN>, 
                            room = <FILL IN>, 
                            hour = <FILL IN>)

coolCourse2.info()

In [30]:
# Now let's try something more challenging. Like before, copy your class
# definition into this cell. 

# Then add a new attribute, 'students'. It should be a dictionary with the students below 
# names as keys, and their GPAs as values. You don't need to make a default parameter in 
# the initializer.
# Mary, 3.0
# Jane, 4.0
# Jerry, 3.5

# The last challenge is to finish the 'meanGPA( )' class method. It should return the mean
# GPA of the students in the course.

"""
def meanGPA(self):
    for <FILL IN> in self.<FILL IN>:
        <FILL IN>
        
    return avgGPA
"""

# PASTE OLD DEFINITION HERE:





# Instantiate a new colleCourse.
lastCourse = <FILL IN>



if lastCourse.meanGPA() == 3.5:
    print('Nice! You\'re all done! Good job.')
else:
    print('Not quite. Try iterating across the values and putting them in a list.')

'\ndef meanGPA(self):\n    for <FILL IN> in self.<FILL IN>:\n        <FILL IN>\n        \n    return avgGPA\n'

SyntaxError: Missing parentheses in call to 'print' (<ipython-input-9-772059772294>, line 3)