# Data Science at UCSB
# Python for Data Science: Introduction
**Jason Freeberg, Fall 2016

Welcome to the club! This quarter we will write Python code to load, manipulate, and analyze data using the pandas and NumPy packages. The labs and corresponding notebooks are designed to *supplement* what you will learn from DataCamp.com. You will learn Python syntax on their website, and reinforce it in lab. To put it another way, the labs assume that you are keeping up with the assigned DataCamp.com tutorials. Towards the end of the lab you will **work in groups of 3 or 4 on quick exercises**. Weeks 6 through 10 will be focused on a small-scale group project.

Some resources:
* [Data Camp](https://www.datacamp.com) Where you'll be doing homework.
* [Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do) Written by the creator of the pandas module. A free PDF is available online.
* [Reddit Datasets](https://www.reddit.com/r/datasets/) A good place to get data for personal projects.
* [Python Walkthrough](https://docs.python.org/3/tutorial/index.html) Covers the basics of Python at a low level, essentially a guided tour through the Python documentation.
* [Stack Exchange](http://stackexchange.com/) Because why figure it out on your own when someone smarter already did?
* [Jason](https://www.linkedin.com/in/jfreeberg) I'll put up my email and cell number in lab (these notebooks are public). 


Today we'll go over some foundations of OOP (Object Oriented Programming) with Python and cover some useful functions. Specifically, we will go over *classes*, *objects*, *modules* and the functions for getting information on them.

# Classes

Have you taken an intro programming class? Congratulations, you already know what a class is.

If not, ever been in a car? Congratulations, you already know what a class is.

At a high level, classes are the instructions (or blueprints) for creating objects, their initial values, and the functions to manipulate them. Let's use an example that's not related to programming... Ford Motor Company has the schematics for building their trucks and cars, like the Ford F-150. The schematics tell the employees where to put each piece, the starting values for things like the speedometer, fluid tanks, and so on. The schematics can also tell the employees to change the truck based on input--think of trim options like fog lights and leather steering wheels. 

The cell below defines a class for Ford F-150 trucks.

In [None]:
class F150:
    """
    A basic class definition. We have attributes that are numeric, logical, and strings.
    
    The methods are simple, they only update the attributes. 
    """
    # These are the shared attributes of the class. When an 'F150' object is created, it will have these values.
    make = 'Ford'
    year = 2016
    
    # This method defines the inital values when an object is created. 
    # Notice that the function parameters have default values.
    def __init__(self, cyl = 6, speedoTop = 120, fluidsOK = True, SteeringWheel = 'Standard', fogLights = True):
        self.cylinders = cyl # Number of cylinders in the engine, the standard is a V8
        self.speedoTop = speedoTop # Top speed on the speedometer, in MPH
        self.fuildLevelsOK = fluidsOK
        self.SteeringWheel = SteeringWheel
        self.fogLights = fogLights
    
    # And here are the methods...
    # This method changes the type of steering wheel that will be built into the truck
    # Use the "self." operator to indicate that you are accessing the class's attributes
    def updateWheel(self, NewWheelType):
        self.SteeringWheel = NewWheelType
    
    # This updates the top speed that will be printer on the speedomter
    def updateMeterTop(self, NewTop):
        self.speedoMeterTop = NewTop

 **Note**: *In Python, we generally don't write functions to edit the private variables (cylinders, speedoMeterTop, etc.) we just access them directly. I defined updateWheel() and updateMeterTop() only to show the syntax for member function definition and preview the dot operator. *

# Objects

Let's continue with our truck example and tie it back to a programming context. Now we have the blueprint (class) for making F150 trucks! In the physical world, we use blueprints and instructions to make *things*. Literally, we use the intructions in a Lego set to make the final toy. Likewise, we can make *things* from our class and interact with them in our script, and we call them **objects**. We can make a an entire fleet of F150 truck obects and have them race... if we edit the above class definition to allow for a throttle and brakes ;).

In this hypothetical, the blueprints also have the instructions for different trims, engines and other bells and whistles. So when we create a new truck object, we can change those attributes and use its methods/functions. To access the attributes of an object, or to use functions to edit those attributes, we use the **dot operator**. The syntax is: *object*.*attribute* or *object*.*method()*

In the cell below we create two objects, **truck1** and **truck2**. Look closely at the print statements to see the dot operator for accessing object attributes.

**FYI:** class instance == object of a class

In [None]:
# Here is the syntax for creating an instance of a class, using the deafaults
truck1 = F150()
truck2 = F150(cyl = 4, speedoTop = 100, fluidsOK = False, SteeringWheel = 'Leather', fogLights = False)

print 'The number of cylinders in truck1 = ', truck1.cylinders
print 'The type of steering wheel in truck1 is', truck1.SteeringWheel
if truck1.fogLights == True:
    print 'truck1 has fog lights'
else:
    print 'truck1 does not have fog lights'

print '------------------------------------------------------------'

print 'The number of cylinders in truck1 = ', truck2.cylinders
print 'The type of steering wheel in truck1 is', truck2.SteeringWheel

# Since .fogLights is a boolean, we don't actually need to check "== True"
if truck2.fogLights:
    print 'truck2 has fog lights'
else:
    print 'truck2 does not have fog lights'

In [None]:
# Adjusting the values

truck1.updateMeterTop(160)
truck1.updateWheel('Batmobile Steering Wheel')

truck2.updateMeterTop('One Hundred and Thirty')
truck2.updateWheel('GameCube Controller')

# Print the new values

print "truck1's top speedo tick =", truck1.speedoMeterTop, "and it's new steering wheel is a", truck1.SteeringWheel
print "truck2's top speedo tick =", truck2.speedoMeterTop, "and it's new steering wheel is a", truck2.SteeringWheel

# Modules

If you end up using Python in a data-related role, chances are that you won't often be defining your own classes... you will be using those defined by other people. NumPy and pandas are collections of data structures, classes, and modules all neatly packaged into modules\*. Once the modules are downloaded onto your computer via pip or 'conda, you can quickly bring them into your script. Let's take a look.

\* Also known as libraries or packages if you come from an R background ;)

In [None]:
# These all acomplish the same task of importing the module, with slight differences. For more information, look to the Python documention. For this quarter's tutorial, we will use the second convention.

import numpy
import numpy as np # This is the common convention for importing numpy
from numpy import *

# Helpful Functions

As stated above, you will often use modules created by other people. This begs the question: *how do I learn how to use the classes and functions?* Well that requires a two-part answer.

* Abstraction: Assume we have a list and use a method called "average(*list*)" from the base Python library. We can safely assume that it will return the average of the list. Do we need to know exactly **how** it accomplishes this? Absolutely not! It would be a waste of time to learn and check the implementations of every function in the NumPy package.
* Inspection: There are base Python functions to inspect classes, objects, and methods. This way we can get the attributes and methods of an object we're working with. Let's take a look at some.

In [None]:
import inspect
anotherTruck = F150()

print "Output of dir() method:"
print dir(anotherTruck), '\n'  # Base function, returns list of attributes

print "Output of getmembers() method called on class:"
for item in inspect.getmembers(F150):  # From inspect module, returns tuples of (attribute, description)
    print item
  
print '\n', "Output of getmembers() method called on class instance:"
for item in inspect.getmembers(anotherTruck):  # From inspect module, returns tuples of (attribute, description)
    print item