# Lesson 1: Introduction to Python and Object Oriented Programming (OOP)

In order to use this file, please do the following:
*   Go to the top left corner of Google Colab and click  `File -> Save a copy in Drive`

Here's a rough overview of our plan for the weekly curriculum:

* **Week 2: Introduction to Python and OOP**
* Week 3: How to Code Like a Pro
* Week 4: Data Preparation and Cleaning
* Week 5: Data Visualization and Exploratory Data Analysis
* Week 6: Miscallenous Topics: Command Line, Pathing, and GitHub
* Week 7: Introduction to Machine Learning
* Week 8: Advanced Topics in Machine Learning
* Week 9: How to Succeed Going Forward

Additionally, here is what you can expect in terms of assignments during curriculum this quarter:
* Homework 1: Assigned during Week 3, due by April 20th at 6PM
* Homework 2: Assigned during Week 4, due by April 27th at 6PM
* Mid-quarter assignment: Assigned during Week 5, due by May 11th at 6PM
* EDA Contest: Assigned during Week 7, due by May 24th at 11PM

These assignments will be important to reinforce understanding of various concepts introduced throughout this curriculum, but we do not intend for them to take obscene amounts of time. We will always be available on Slack throughout the quarter to answer any questions that may arise! We will share further details regarding each assignment later.

Going forward, we will share all curriculum content and assignments via Google Drive. We'll also set up a folder for each of you to submit assignments.

# Outline




1. What is Python?
  * "The basics"
  * If statements
  * Lists & dicts
  * Loops & list comprehension
  * Functions
2. What is OOP?
  * Objects
  * Methods
  * Attributes
3. Objects in Python
  * Basic data types (int, string, float)
  * Built-in data structures (lists, dicts, tuples)
  * Custom objects (classes)

Without further ado, let's begin!

# What is Python?

* "The best programming language." - *Daniel*

* "The greatest programming language." - *Nick*

* "Far superior to R." - *Madison Kohls, DSU Executive Advisor*

Python is a programming language with a wide range of applications. Due to its quick prototyping speed and convenience, it's very popular among data scientists who constantly need to prototype and try new ideas. It's held back because its runs code slower than some other programming languages like C++ and Java, but for the most part it's considered the go-to programming language for data science.

![Example python code from Daniel's code editor](https://media.discordapp.net/attachments/738679791804219442/959967183268626472/Screen_Shot_2022-04-02_at_5.06.00_PM.png)

Example Python code from Daniel's code editor (notice the serif font, used by only the most esteemed programmers)

# Python Basics

## Basic operations and data types

Basic Python operations work the way you think they do. Addition, subtraction, multiplication, and division are all incredibly intuitive.

In [1]:
2 + 2

4

In [2]:
100 - 31

69

A common operator you should know about is the modulo (%) operator. It shows you the remainder when you divide the left by the right. 

In [3]:
4 % 3

1

Exponentiation is a bit different in Python than you may be used to. Instead of using `^` to denote the exponent, Python uses `**`.

In [4]:
3**2

9

Note that we can also work with decimals (known as *floats* in Python) by simply writing them out. Internally speaking, 1 is different than 1.0 to the Python compiler, as 1 is stored as an *integer* and 1.0 is stored as a *float*. 

In [5]:
2.3 * 4.5

10.35

## Variables

Variables help us store information. Here, we assigned the number 4 to the variable `my_variable`. We can also see which value is stored in a given variable by calling `print(my_variable)`.

In [6]:
my_variable = 4
print(4)

4


In [7]:
my_variable * 2
print(my_variable)

4


### Question: Why did the value of my_variable not change?

In [8]:
my_variable = my_variable * 2
print(my_variable)

8


### Question: Why did the value of my_variable change?

When printing the value of variables in Python, we often rely on **f-strings** to format our output in an intuitive way. The general form of an f-string is
```
f"Some text here {var1}. Additional text here {var2}."
```
Variables can be of any types previously discussed. Consider the example below.

In [9]:
name = "Nick"
age = 20
print(f"This is {age}. He is{name} years old.")

This is 20. He isNick years old.


**Strings** are how Python handles text data. They're surrounded by quotes to tell the compiler that it's dealing with text.

In [10]:
my_string = "Hello!"
print(my_string)

Hello!


You can add text to strings by using the + operator. Note that you can't subtract from strings like you would with basic arithmetic.

In [11]:
my_string + " Kevin!"

'Hello! Kevin!'

In order to access a specific character in a string, we use something called *indexing*. Each character has a position in a string called an index (starting at position 0), and in order to access a character in a given position we simply write ``string_name[index]``.

In [12]:
my_string[5]

'!'

In order to access a *subset* of a string, we can index with a colon. ``my_string[start:end]`` gives us the characters from the start index to the end index. Note that start and end **MUST** be integers, **NOT** floats. This is a common error.

In [13]:
my_string[0:2]

'He'

When indexing a subset of a string, the integer value passed in for *end* is exclusive. For example `"hello"[0:1]` will return `h` rather than `he` as one might expect.

When indexing a string, we can also define a *step* that defines how often Python should access characters over a given interval. The general form of this indexing is ``my_string[start:end:step]``. If the value of step is not defined, it is taken to be 1.

In [14]:
my_string[0:5:1]

'Hello'

Note that we can also denote a string with single quotations. However, we may encounter issues if the string contains single quotations already. In this case, we require double quotations.

In [15]:
my_string2 = 'Hello!'
print(my_string2)

Hello!


In [16]:
my_string3 = 'He said 'hello' to me'
print(my_string3)

SyntaxError: invalid syntax (<ipython-input-16-53b82cb22491>, line 1)

In [18]:
my_string3 = "He said 'hello' to me"
print(my_string3)

He said 'hello' to me


In Python, we can insert **comments** into our code by using the hash character `#`. Comments are extremely useful when writing code because they allow us to insert commentary and describe what a function does, further enabling our code to be shared and better understood. Consider the code block below.

In [19]:
# creates new string
greeting = "Hello there!"
# prints contents of string
print(greeting)

Hello there!


We can also comment out lines of code. Notice that nothing is printed by the code block below.

In [20]:
# creates new string
greeting = "Hello there!"
# prints contents of string
print(greeting)

Hello there!


Here, our comments allow us to guide the reader through exactly what our code does. When you write more complicated code, this is extremely useful--especially for yourself.

## If statements and conditionals

In [21]:
if my_variable > 5:
    print("Let's gooooooo")

Let's gooooooo


If statements are ways to let us interact with our variables conditionally. They are some of the fundamental building blocks of all programming languages.

```
if condition:
  do_stuff()
  ```

In [22]:
3>=4

False

If statements in Python are super easy to write and tend to read like English. Above is the general format for all if statements. Very clean and easy!

## WTF does a condition **look** like?

* statements with relational operators
  - ``(3 > 4)``, ``(4 == 3)``, ``(9 <= 3)``, ``(x >= y)``, ``(3 != 4)``
  

*  statements with boolean operators (``and``, ``or``, ``not``)
  - ``(not True)``, ``(True or True)``, ``(True and False)``

## Lists and Dicts

Sometimes, variables can hold more than one value. We call these things lists, and they work similarly to vectors in R.

In [23]:
x = [1,2,3]
x

[1, 2, 3]

If you want to access an element in a list, we can index these as if they were like a string.

In [24]:
x[0]

1

In [25]:
x[0] + x[2]

4

In [26]:
x[3]

IndexError: list index out of range

The last code block failed because we tried to find the element indexed at 3, whereas our list *x* only has elements indexed up to 2 (indices 0, 1, and 2). Note that whenever you try to access an index that doesn't exist, Python calls this an **IndexError**.

In [27]:
x.append('Daniel')
x

[1, 2, 3, 'Daniel']

To add a new element to a list, simply write ``list_name.append(value)``. Note that lists can have multiple different types of data -- they aren't constricted to holding just one data type, as with vectors in R.

Also note that lists have a fixed order in which elements appear, which allows us to access the elements in the fashion seen above.

## Dicts

In [28]:
y = {"DSU_members": ["Daniel", "Nick", "Vince"]}

Dictionaries are a bit fancier than lists. Instead of having an unorganized list of elements, dictionaries contain pairs of **KEYS** and **VALUES**. The general format looks like:

``{key1: value2, key2: value2, ...}``

In [29]:
y['DSU_members']

['Daniel', 'Nick', 'Vince']

In [30]:
# Does not access value of first key
y[0]

KeyError: 0

To access elements, the syntax is dict_name[key]. Keys have to be strings (aka words or phrases surrounded by quotes), but values can be any data type (integer, string, list, etc). Unlike lists, dictionaries are unordered, so we cannot access values contained within the dictionary by index.

In [31]:
y['Nick_is_awesome'] = True
y

{'DSU_members': ['Daniel', 'Nick', 'Vince'], 'Nick_is_awesome': True}

In [32]:
y['Nick_is_awesome'] = False
y

{'DSU_members': ['Daniel', 'Nick', 'Vince'], 'Nick_is_awesome': False}

In order to add a new entry, you simply access the new key you want to add and set it equal to the value you want. If the key doesn't yet exist, it'll create a new one, but if it already exists, you'll overwrite the existing value.

## Loops and List Comprehension

In [35]:
best_dsu_members = ['Daniel', 'Nick']

for member in best_dsu_members:

    if member is not 'Daniel':
        print('yoo whats good')
        print('Yoooo I like ' + member)
    else:
        print('Damnn ' + member + ' is kinda mid lowkey...')

Damnn Daniel is kinda mid lowkey...
yoo whats good
Yoooo I like Nick


  if member is not 'Daniel':


Loops also read like English, similarly to if statements. The general syntax is:

```
for variable in list:
  do_stuff()
```
What the for loop does is it goes through each element in the list. For loops are a good way to iterate through all the elements of a list or dictionary.

In [36]:
for index in range(10):
    if index % 5 == 0:
        print(index)

0
5


In [37]:
x = [1,2,43]
len(x)

3

If you want to iterate through a list of numbers, the ``range()`` function is super useful. 

How it works is if you call ``range(n)``, it'll output an *iterable* (different but similar to a list) where each element is the numbers 0...n-1. So ``range(3)`` will give you (0, 1, 2) to iterate through. 

In [38]:
list(range(0, 10, 2))

[0, 2, 4, 6, 8]

You can also pass in a start and stop, as well as a step parameter. ``range(2,5)`` will return (2, 3, 4). And ``range(0, 10, 2)`` will give you the numbers between 0 and 10 (not inclusive) and move in steps of 2, so the iterable will look like (0, 2, 4, 6, 8)

In [39]:
all_dsu_members = ['Sean', 'Tristan', 'Vince', 'Emily', 'Nick', 'Daniel']

short_names = [member for member in all_dsu_members if len(member) <= 5]
short_names

['Sean', 'Vince', 'Emily', 'Nick']

Python lets you define lists with for loops inside (which is pretty dang cool). This is called **list comprehension**. In the case of this list comprehension, we iterate through each element of `all_dsu_members`, and then add it to the `short_names` list if the length of their name is less than or equal to 5 characters.

This can be VERY convenient when trying to create lists with filtered data.

In [40]:
i = 0

while i < 10:

    print(i)
    i = i + 1

0
1
2
3
4
5
6
7
8
9


Another common loop is the *while* loop, which works similarly to a for loop. However, instead of iterating through a list, while loops repeat UNTIL their condition is false. In the case of the above loop, the variable `i` is initialized to 0, and then we keep increasing the value of `i` each time the loop runs. The loop will only stop when the value of `i` is NOT less than 10, i.e., it's greater than or EQUAL to 10. 

Note: while loops can be dangerous. If their condition is never false, then they will run forever, causing your program to crash. This is called an infinite loop, and you should always try to avoid them.

## Functions

In [41]:
def is_name_short(name):
    return len(name) <= 5
  

is_name_short('Sean')

True

Functions are super useful in Python and let us package a bunch of code that does a useful operation with ease and convenience.

The general form of most functions is:

```
def function_name(param1, param2, ...):
  do_stuff_with_params()
  ...
  ...
  return stuff
```

A few key pointers:

- Each function has *parameters*, which are what you input to the function. These can take on any data type, but often times certain functions with a set purpose expect a specific data type for the inputs.

- Often times, functions have multiple lines of code which somehow does something meaningful with the inputs.

- At the end of the function, if you want to output some information, be sure to include a **return** statement, which ensures that your function outputs what you wanted.

In [47]:
def is_name_cool(name):
    cool_names = ['Daniel', 'Nick']
    cool_name = True
    if name in cool_names:
        cool_name = True
    else:
        cool_name = False

print(is_name_cool('Sean'))

None


## Question: This code didn't throw an error, but the function didn't output what we wanted to. What went wrong here?

In [45]:
def mean(data_list):

    total_value = 0
    for item in data_list:
        total_value = total_value + item

    total_value = total_value / len(data_list)

    return total_value

mean([1, 2, 3, 4, 9])

3.8

This is an implementation of a function which calculates the mean of a list of numbers. There's a few ways it could be improved, but this example is meant to illustrate the use of functions and a simple for loop algorithm.

In [13]:
def mystery_function(my_string):

    out_str = ''
    for idx in range(0, len(my_string), 2):
        out_str += my_string[idx]

    return out_str

## Question: What does this function do?

In [14]:
mystery_function('Daniel and Nick!')

'Dne n ik'

# Classes

Python is an *object-oriented* programming language, meaning that it is designed so that most of the code we write manipulates **objects**.

**Everything in Python is an object** with a specific data `type`. *Classes* in Python provide the mechanism for creating *objects* or *instances* of each type, and defining each type's *attributes* and *methods*. Attributes are variables that belong to an object/class instance, and methods are functions that belong to an object/class instance.

Here's an example of how to use them, taken from the Python tutorial:

In [48]:
class MyClass:
    """A simple example class"""
    def __init__(self, data):
        self.i = data
        self.name = 'Nick'
        print("I have been initialized!")

    def f(self):
        return 'hello world'

In [49]:
my_object = MyClass(1234)

I have been initialized!


Note that every class needs an `__init__()` method. This tells the class how to initialize it's attributes. When defining a class, the `__init__()` is vital to how people create *instances* of your class.

Another note: Every function WITHIN a class (we call these *methods*) has a parameter called "self". When writing methods for a class, those methods can access object attributes (see below), but in order to do that you need to pass "self" to each function within the class. You can think of the "self" parameter as holding all the attribute information for a given object. 

In [50]:
print(my_object.name) # print class attribute
my_object.f()      # run class method

Nick


'hello world'

Here, ``my_object`` is called an *instance* of the class `MyClass`. And `i` is an *attribute* of ``my_object``, or piece of information about ``my_object``. We can also create classes that allow us to specify attributes of our own. For example, consider the `Name` class below with attributes name and height.

In [51]:
class Person:
    """A class to give information about your name"""
    def __init__(self, name = 'Anonymous', height = 0):
        self.name = name
        self.height = height
        print(f'Hi, {name}! Class initialized.')

    def is_tall(self):
        if self.height > 64:
            return True
        else:
            return False



In [52]:
person1 = Person()

Hi, Anonymous! Class initialized.


Here, we didn't specify attributes so the instance is initialized with the default attributes. We can access these attributes as follows:

In [53]:
person1.name

'Anonymous'

In [54]:
person1.height

0

Instead, let's now initialize an instance of the class `Name` with name attribute Nick and height attribute 72.

In [55]:
person2 = Person('Nick', 72)

Hi, Nick! Class initialized.


In [56]:
person2.height

72

In our class definition above, we also defined a method `is_tall()`. We can *call* the method on an instance person2 of the class `Name` as follows:

In [57]:
person2.is_tall()

True

We will initialize another instance of the class `Name` with name attribute Danny (Devito) and height attribute 58.

In [58]:
person2 = Person('Danny', 58)

Hi, Danny! Class initialized.


In [59]:
person2.is_tall()

False

When defining classes, we can also use docstrings (in triple quotations) to provide information about what the class does. Consider the class ``Car`` below.

In [60]:
class Car():
    """
    Simple class that has the make and model of a car and allows user to accelerate it.
    """
    def __init__(self, model="Unknown", make="Unknown"):
        self.model = model
        self.make = make
        self.speed = 0
        print(f'You have created a {model} {make}. Nice choice!')

    def accelerate(self, x):
        self.speed += x  # += is shorthand for incrementing a variable's value

    def get_speed(self):
        print(f'You are going {self.speed} miles per hour!')

# Question: Is this a good docstring?

Below, we will create an instance `civic` of the class `Car`. We will then call the method `accelerate()` on this instance to increase its speed by 10 mph. We will call the method again to increase its speed by 15 mph.

In [61]:
civic = Car('Honda', 'Civic')

You have created a Honda Civic. Nice choice!


In [62]:
civic.accelerate(10)
civic.get_speed()

You are going 10 miles per hour!


In [63]:
civic.accelerate(15)
civic.get_speed()

You are going 25 miles per hour!


![](https://cdn.discordapp.com/attachments/400839337052209152/960292893262753883/Screen_Shot_2022-04-03_at_2.40.57_PM.png)

An example docstring from Daniel's job.

# EVERYTHING is an object (except for functions, kinda)

In [64]:
x = 4
type(x)

int

Almost everything we've covered so far has been an object. Integers, floats, strings, lists, dicts, etc are all objects which python has implemented for you. Things like strings, lists, and dicts also have attributes and methods.

In [65]:
x = {"Hey there!": "What's up?"}
x.keys()

dict_keys(['Hey there!'])

You can also initialize built-in objects like lists and dictionaries as you would a custom object.

In [66]:
x = dict(x=4, y=3)
x

{'x': 4, 'y': 3}

This is all we'll cover about classes for now, but more detailed information can be found in the Python documentation: https://docs.python.org/3/tutorial/classes.html#a-first-look-at-classes.

All of the data science libraries we will be using introduce new classes, and we need to understand how to use them effectively for our data science needs. Below, we'll provide examples of just how fundamental classes will be throughout your data science journey. Don't worry too much about the details yet--we'll cover them in later weeks, but wanted to give you a glimpse into what you'll be learning.

## Imports and Packages

One of the coolest things about Python is the variety of **packages** available. Packages are snippets of code that people publish online that anybody using python can use. The number of packages available for data science is one of the main draws of Python.

In [67]:
import numpy
x = numpy.array([1,2,3])
type(x)

numpy.ndarray

Notice how x is NOT a list. Instead, it's a "np.ndarray" (numpy n-dimensional array). This is a custom object implemented in the numpy package, and it has a lot of cool features that make them more efficient and convenient to use in comparison to default python lists.

In the above code snippet, the package "numpy" was imported. In general, after you've downloaded an external package (we'll go over how to do this in detail later), you simply write import [package_name] to get access to all the classes and methods in that package.

## What does a package actually **LOOK** like?

Under the hood, packages are just Python files. Often times they provide useful class definitions and methods which help you speed up certain computation tasks. 

![](https://cdn.discordapp.com/attachments/492430462053253123/960285868596293682/Screen_Shot_2022-04-03_at_2.12.56_PM.png)

Above is the Array class as seen in the numpy source code. Fun fact: NumPy is built to be efficient, and so a lot of the computations are actually done in C++ code and then translated into python by something called a Wrapper.

In [68]:
import numpy as np 
np.array([1, 2, 3])

array([1, 2, 3])

Note that you can choose a custom name to reference packages when you import them. In this case, instead of writing ``numpy.array`` or ``numpy.[anything]``, instead we write ``np.[anything]"``since we included the ``as np`` term in the import statement. 

In [69]:
import numpy as np
from numpy.linalg import norm
x1 = np.array([3, 4])

norm(x1)

5.0

Note that we can also import *specific* methods and attributes directly by using the **from** keyword. In the above code example, we import the *norm* function (finds the magnitude of an array, like you would compute for a vector) from numpy.linalg. If we wanted to call the `norm()` function regularly, we'd have to write `numpy.linalg.norm(x1)`, but because we had wrote `from numpy.linalg import norm `, we can just write norm(x1) which saves us time.

## Imports in action #DataScience

Lasso is a type of regression model commonly used in machine learning (we plan to touch on this during Week 7, so stay tuned) that uses shrinkage to perform variable selection and reduce model complexity. Below, we import the `Lasso` class from the sklearn.linear_model library. We then create an instance of the class `Lasso` called `clf`.

In [70]:
from sklearn.linear_model import Lasso
clf = Lasso()

Next, we call the method `.fit()` on the instance `clf` of the class `Lasso` to fit a regression model to some arbitrary data points.

In [71]:
clf.fit([[1,4], [2,6], [6,6], [7,10]], [1,2,6,7])

Lasso()

We can then extract the model's coefficients, which are stored as attributes of `clf`.

In [72]:
print(clf.coef_)

[0.84615385 0.        ]


In [73]:
print(clf.intercept_)

0.6153846153846154


Don't worry--you don't need to understand the specifics of this yet. However, this illustrates just how important classes will be in your exploration of data science.

## Jupyter Notebook

### How familiar are you with Jupyter Notebook? 

a. I have it up and running already! 

b. I tried to install it, but I still have some questions.

c. I've got no clue what to do and I wouldn't mind some help. 

### A quick demonstration of Anaconda & Jupyter Notebook


1.   Open the Anaconda Navigator
2.   Scroll down and launch Jupyter Notebook
3.   A webpage should automatically pop up with many folders
4.   For example, if you downloaded the notebook we posted earlier in Slack, you should be able to find it under the Downloads folder (or whichever folder you saved it in)
5.   By double clicking on the .ipynb file, you can follow along within the Jupyter notebook in the browser

Here's a 30-minute introduction to Jupyter Notebook: https://youtu.be/HW29067qVWk







### Basic Short Cuts in Jupyter Notebook

* Run cell: `Ctrl + Enter`
* Run cell & select below: `Shift + Enter`
* Run cell & insert below: `Option + Enter` (mac); `Alt + Enter` (windows)

### Essential Short Cuts for Fast Workflow

Understand the difference between **edit** and **command** mode.
*   Activate Edit Mode: Click on a cell or Press `Enter`
*   Activate Command Mode: `Esc`

Requires Command Mode:
* Add a cell above the current cell: `a`
* Add a cell below the current cell: `b`
* Delete the current cell: `dd`
* Markdown (to type text): `m`
* Cut cell: `x`
* Copy cell: `c`
* Paste cell: `p`







# Anonymous feedback

If you have any feedback for us, please let us know! The feedback form is completely anonymous, and we promise we'll take your suggestions into consideration for future meetings: https://forms.gle/eve5noVB9ccXbomM7

# References

Throughout the quarter, we will mainly be drawing our material from the following sources. Most of your learning will be done through trial and error, so we strongly encourage you to experiment by running code that you write from scratch!

For basic Python:
* The Python Tutorial: https://docs.python.org/3/tutorial/
* Basics of Python 3: https://www.learnpython.org/
* CodeAcademy Python 3 Course: https://www.codecademy.com/learn/learn-python-3

And for the rest of the quarter:
* Introducing Data Science: http://bedford-computing.co.uk/learning/wp-content/uploads/2016/09/introducing-data-science-machine-learning-python.pdf 
* Python for Data Analysis: http://bedford-computing.co.uk/learning/wp-content/uploads/2015/10/Python-for-Data-Analysis.pdf 
* Pandas user guide: https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html 
* Sklearn user guide: https://scikit-learn.org/stable/user_guide.html 

# Credits

Primary Contributors:
* Daniel Mendelevitch
* Nick Monozon



Secondary Contributors:
*   Sean Tjoa
*   Tristan Dewing
*   Andy Chen
*   Emily Gong
*   Tara Jaigopal

