## Python vs. SQL

#### SQL

- specialized language, tied to data
- goal when writing queries is to optomize those queries for faster data I/O
- white-box, meaning everything about the data extraction, input, and/or visualization can be understood on a fundamental level from the query itself
- queries are more lines compared to other languages to do a similar process
- no loops or functions

#### Python

- general-purpose language, with packages to solve a variety of problems (data-analysis, web design, game design, etc...)
- built-in functionality to create queries as SQL does
- black-box, meaning understanding of the packages imported in Python isn't fully known, and are taken on in faith as working correctly (this is a fair assumption though)
- emphasis on writing minimalist code, with as little lines as necessary
- object-oriented language, with: loops, functions, classes, and other advanced features

## Python Data-Types

### Built-in types
#### Basic types
- int: Integers
- float: Floating-Point numbers
- complex: Complex Numbers
- str: String

#### Storage containers
- list - ordered list of items that can be modified (mutable)
- tuple - order list of items that CAN'T be modified (immutable)
- set - unordered collection of items (no duplicates allowed)
- dictionary (dict) - key-value pair with no duplicate keys allowed

#### Extra:
- range()

In [55]:
l = ['a', 'b', 'c', 'a'] # list
t = ('a', 'b', 'c', 'a') # tuple
s = {'a', 'b', 'c', 'a'} # set
d = {'a':1, 'b':2, 'c':3, 'd':4} # dictionary

In [56]:
l

['a', 'b', 'c', 'a']

In [57]:
t

('a', 'b', 'c', 'a')

In [58]:
# Note the duplicate a is removed in set
s

{'a', 'b', 'c'}

In [59]:
d

{'a': 1, 'b': 2, 'c': 3, 'd': 4}

#### List Operations (Same for Tuples and Sets)

In [60]:
l.append('d')
l

['a', 'b', 'c', 'a', 'd']

In [61]:
l.remove('d')
l

['a', 'b', 'c', 'a']

In [62]:
l.remove('a')
l

['b', 'c', 'a']

In [63]:
l[2] = 'e'
l

['b', 'c', 'e']

In [64]:
'e' in l

True

In [65]:
# Iterating through each element in the list l
# The (x) variable changes to the next item in l after each iteration
for x in l:
    print(x)

b
c
e


In [66]:
if 'a' in l:
    print(True)
else:
    print('False')

False


In [67]:
# different ways of accessing values in a list
# [-1] operator which gives you automatic access to last element
for i in [l, l[0], l[-1], l[:-1], l[1:]]:
    print(i)

['b', 'c', 'e']
b
e
['b', 'c']
['c', 'e']


#### Dictionary (dict) Operations

In [68]:
d

{'a': 1, 'b': 2, 'c': 3, 'd': 4}

In [69]:
d['e'] = 'apples'
d

{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 'apples'}

In [70]:
if 'e' in d.keys():
    del d['e']
d

{'a': 1, 'b': 2, 'c': 3, 'd': 4}

In [71]:
# Compare this to the iteration for lists where there was only one value (x) as we were iterating
# Now we have two values (key, value) b/c dicts follow a key-value pair relationship
# We have to account for both when iterating
# Also have to add (.items()) at the end of our dict (d) to actually iterate through it
for key, value in d.items():
    print(key, value)

a 1
b 2
c 3
d 4


#### Extra: range()

In [109]:
# never includes the last element, so it includes [0, 4]
# only single value given, so assumed to be the ending value
# starting value always defaults to 0
range(5)

range(0, 5)

In [78]:
range(2,5)

range(2, 5)

In [80]:
range(2, 10, 2) # can also write as range(start=2, stop=10, step=2)

range(2, 10, 2)

In [81]:
for i in range(2, 10, 2):
    print(i)

2
4
6
8


### Imported Types
- datetime
- numpy arrays
- pandas Series and DataFrames

### Notes on numpy arrays and pandas DataFrames
- Each DataFrame column is a Series object
- DataFrames provide significant speed of operations due to applying operations over the entire dataset, or portions of it in one sweep, versus element-wise manipulation
- The same principle can be said for numpy arrays as well

#### datetime

In [82]:
import datetime

In [84]:
datetime.datetime.today()

datetime.datetime(2017, 9, 27, 10, 35, 20, 518788)

In [85]:
datetime.date.today()

datetime.date(2017, 9, 27)

In [89]:
datetime.date.today() - datetime.timedelta(days=1)

datetime.date(2017, 9, 26)

#### numpy arrays

In [90]:
import numpy as np

In [108]:
# using list as the argument we pass to the function 'np.array()'
np.array([2,3,4])

array([2, 3, 4])

In [107]:
np.array([[2,   3,   4],
              [1.5, 2.5, 6]
             ])

array([[ 2. ,  3. ,  4. ],
       [ 1.5,  2.5,  6. ]])

In [106]:
# create empty arrays that you can fill later in the program w/ default starting value 0
np.zeros((2,3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [104]:
# can also use ones, and multiply the array by a value to set a different starting value
c = np.ones((3,2))
c * 5 - 1

array([[ 4.,  4.],
       [ 4.,  4.],
       [ 4.,  4.]])

In [105]:
# create custom arrays, similar to the range() function earlier
np.arange(5, 15, 3)

array([ 5,  8, 11, 14])

In [112]:
# create custom 2-d arrays
# the end is set to 12 and the start defaults to 0. same as range()
np.arange(12).reshape(4,3)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

#### pandas Series and DataFrames

In [115]:
import pandas as pd

In [125]:
# an index is automatically assigned to Series objects (default is incrementing integers)
pd.Series([1,2,3, 20, 'apples'])

0         1
1         2
2         3
3        20
4    apples
dtype: object

In [127]:
# can pass numpy arrays to Series as well
pd.Series(np.array([1,2,3, 20, 'apples']))

0         1
1         2
2         3
3        20
4    apples
dtype: object

In [128]:
# can also specify the index of Series
pd.Series(np.array([1,2,3, 20, 'apples']),index=[100,101,102,103, 104])

100         1
101         2
102         3
103        20
104    apples
dtype: object

##### **Quick note here, you won't be working with Series hardly at all. You will read data in from files or from databases directly into a DataFrame and modify it from there.

In [117]:
data = np.arange(12).reshape(4,3)

df = pd.DataFrame(data)
df

Unnamed: 0,0,1,2
0,0,1,2
1,3,4,5
2,6,7,8
3,9,10,11


In [118]:
df = pd.DataFrame(data, columns=['a', 'b', 'c'], index=['w', 'x', 'y', 'z'])
df

Unnamed: 0,a,b,c
w,0,1,2
x,3,4,5
y,6,7,8
z,9,10,11


In [122]:
# Access columns
df['a']

w    0
x    3
y    6
z    9
Name: a, dtype: int32

In [123]:
# Access rows
df.loc['w']

a    0
b    1
c    2
Name: w, dtype: int32

In [135]:
# Access rows using index positions, and not names
df.iloc[1] # selects second row

a    3
b    4
c    5
Name: x, dtype: int32

In [132]:
# Can apply same mutiple accessing we applied with lists earlier
df.iloc[:2]

Unnamed: 0,a,b,c
w,0,1,2
x,3,4,5


In [137]:
# Reassigning columns and index of the DataFrame
df

Unnamed: 0,a,b,c
w,0,1,2
x,3,4,5
y,6,7,8
z,9,10,11


In [139]:
df.columns = ['w', 'x', 'y']
df.index = ['a', 'b', 'c', 'd']
df

Unnamed: 0,w,x,y
a,0,1,2
b,3,4,5
c,6,7,8
d,9,10,11


## Functions

### Purpose of functions: 
- block of organized, reusable code that is used to perform a single, related action
- allow for code reuse and minimizes lengths of the program (same idea with no rewritting SQL query more than once)

### Notes for Functions
- functions always begin with 'def' keyword
- define any input parameters in parentheses after function name
- the return function at the end of a function is optional. If none is provided it defaults to returning None

In [145]:
# Example
def printme(str):
    print(str)
    return str
    
printme(5), printme('apples')

5
apples


(5, 'apples')

In [147]:
def Square_Numbers(l):
    for i in range(len(l)):
        l[i] = l[i] ** 2
    return l

Square_Numbers([2,3,4,5])

[4, 9, 16, 25]