#  ------------------------------------------------------
# ---------- TOPIC 1: ON PROGRAMMING ---------
#  ------------------------------------------------------

# `Declarative vs. Imperative`


![Screen%20Shot%202018-04-19%20at%204.14.59%20PM.png](attachment:Screen%20Shot%202018-04-19%20at%204.14.59%20PM.png)

# `Algorithms`
* #### Specify a sequence of instructions for execution by a machine that produces output when provided with input
   ## Properties of Algorithms
   * ### 1. Precise:
   Each instruction and the next possible instruction to be taken must be unambiguous
   * ### 2. Effective:
   Each instruction must be executable by the underlying machine

# `Programs`
* Algorithms that can be computed automatically by a computer
### Fixed- Program Computers
these only have one purpose (ex. digital clock)
### Stored-Program Computers
store a program's sequence of instructions of which it can execute any sequence 

# `Differences in Programming Languages`


![Screen%20Shot%202018-04-19%20at%204.34.31%20PM.png](attachment:Screen%20Shot%202018-04-19%20at%204.34.31%20PM.png)

![Screen%20Shot%202018-04-19%20at%204.36.56%20PM.png](attachment:Screen%20Shot%202018-04-19%20at%204.36.56%20PM.png)

#  
#  
# -------------------------------------------------------
# -------------- TOPIC 2: EXPRESSIONS -----------
# -------------------------------------------------------

`Object:` Something we can operate on
* `int:` positive & negative integers 
* `float:` real numbers as floating point numbers 
    * do not have perfect precision
    * cannot represent all real numbers

`Scalar:` Indivisible; not comprised of sub-components

`Literal:` Represented directly in source code rather than being computed from a different operation

## Variables
* names of objects
* can contain uppercase, lowercase, digits ans "_"
* can't start w/ a digit
* can't contain reserved words (keywords, ex. def/import/return, etc)

## Boolean Operations
* scalar object type w/ values either True or False
* uses comparators to compare things to yield a Boolean

![Screen%20Shot%202018-04-19%20at%205.05.03%20PM.png](attachment:Screen%20Shot%202018-04-19%20at%205.05.03%20PM.png)

![Screen%20Shot%202018-04-19%20at%205.11.00%20PM.png](attachment:Screen%20Shot%202018-04-19%20at%205.11.00%20PM.png)

## Syntax of Functions

![Screen%20Shot%202018-04-19%20at%205.06.45%20PM.png](attachment:Screen%20Shot%202018-04-19%20at%205.06.45%20PM.png)

#  
#  
# -------------------------------------------------------
# -------------- TOPIC 3: STATEMENTS -------------
#  -------------------------------------------------------

# `Structured Flowcharts`

![Screen%20Shot%202018-04-19%20at%206.59.22%20PM.png](attachment:Screen%20Shot%202018-04-19%20at%206.59.22%20PM.png)

### Factorial Iteration Example 

In [6]:
def factorial(n):
    i= 0
    f = 1 
    while i < n:
        f = f*(i+1) 
        i += 1
    return f

factorial (5)

120

### Division Example
Write a function that determines whether a number is divisible by 2, 5, or 10.

In [7]:
def divisibility(n):
    if (n % 2 == 0) and (n % 5 == 0):
        return "by 10"
    elif n % 2 == 0:
        return "by 2"
    elif n % 5 == 0:
        return "by 5"
    
print(divisibility(5) + "," + divisibility(10))

by 5,by 10


# `Break Function`
Can use the break keyword to stop execution of a loop and jump to the next statement after the end of the loop

# `The Range Object`
* `range(n):` generates sequence from 0-(n-1)
* `range(a,b):` generates sequence from a-(b-1)
* `range(a,b,step):` generates sequence starting at a, increasing by 'step', ending at a number < b 



# `For vs. While Loops`

`For loops:` 
* automatically initialize and increment the loop variable
* have multiple exits using "return" or "break" functions w/ if statements
    * `while loops:` have a single exit w/ an explicit termination condition

#  
#  
# -------------------------------------------------------
# ----- TOPIC 4: STRUCTURED DATA TYPES -----
# ------------------------------------------------------

# `Lists vs. Tuples vs. Sets`

![Screen%20Shot%202018-04-19%20at%209.46.53%20PM.png](attachment:Screen%20Shot%202018-04-19%20at%209.46.53%20PM.png)

# `Dictionaries`
consist of key-value pairs 
* `Key:` the index
* `Value:` the value of the index


## Computing Change Example
Compute the smallest number of coins/bills that sums to the target amount

We will use a `greedy algorithm`
*  Take out as many of the biggest coin until we can't take out any more
* Take out as many of the next biggest coin until we can't take out any more

In [11]:
def change(target, coins):
    result = {}
    i = len(coins) - 1
    while i > 0:
        coin = coins[i]
        numOfThisCoin = target // coin
        result[coin] = numOfThisCoin
        target -= coin * numOfThisCoin
        i -= 1
    return result

change(185, (5, 10, 25, 100, 200))

{10: 1, 25: 3, 100: 1, 200: 0}

#  
#  
# -------------------------------------------------------
# ----- TOPIC 5: FUNCTIONS & RECURSION -----
# -------------------------------------------------------

# `Functions`
A way of grouping together a sequence of operations under a common name so we can refer to it multiple times later
* `Formal Parameters:` included in function definition 
* `Actual Parameters:` provided when the function is called 
    * Python uses `pass by assignment` to assign formal parameters to the actual parameters

# `Scoping Variables`
Where in your code a variable is available
* `Global Variables:` available to the main program and to all functions
    * Functions can read global variables outside of them: global variable doesn't have to be defined before the function, only has to be defined before the function is called
    * #### Functions CANNOT MODIFY global variables w/o permission
        * must use `"global x"` keyword to allow function to modify the global variable x 
     
 
* `Local variables:` only available to the function they're defined in

# `Software Design`
* ### Modularity / Decomposition
    * Program is broken into functions that are
        * self-contained
        * achieve a clear purpose
        * can be reused
* ### Abstraction
    * A function of a program can be used without knowing how it achieves its goal

* ### Docstring
    * provided with a function to describe 
        * #### Input assumptions / requirements / preconditions
        * #### Output guarantees / postconditions

# `Recursion`
Recurses through the function to create a new scope for the variables in the function 

### Factorial Recursion Example

In [14]:
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

factorial(5)

120

### Theoretical vs. Practical Efficiency 

![Screen%20Shot%202018-04-19%20at%2010.17.11%20PM.png](attachment:Screen%20Shot%202018-04-19%20at%2010.17.11%20PM.png)

### Memoization to Improve Recursion Efficiency 
* save intermediate results during recursion

#### Fibonacci Sequence Example

In [None]:
def fib_fast(n, memo):
    global counter
    counter += 1
    if (n == 0) or (n == 1):
        return 1
    else:
        if n-1 not in memo:
            memo[n-1] = fib_fast(n-1, memo)
        if n-2 not in memo:
            memo[n-2] = fib_fast(n-2, memo)
        return memo[n-1] + memo[n-2]

counter = 0
fib_fast(4, {})
print("Only went through the function " + str(counter) + " times")

## `Functions as Objects`
we can pass functions as arguments to other functions

## `Anonymous Functions`
* ### Notation
    * lambda "variable names" : "expression"
* Ecxamples:
    * lambda x, y: x** y
    * lambda x: x ** 2

#  
#  
# -------------------------------------------------------
# ---- TOPIC 6: NUMERICAL COMPUTATIONS ---
# -------------------------------------------------------

# `Representing Integers`

![Screen%20Shot%202018-04-19%20at%2010.43.40%20PM.png](attachment:Screen%20Shot%202018-04-19%20at%2010.43.40%20PM.png)

![Screen%20Shot%202018-04-19%20at%2010.51.32%20PM.png](attachment:Screen%20Shot%202018-04-19%20at%2010.51.32%20PM.png)

## Binary to Decimal Conversion

In [23]:
def binaryToDec(b: "string of 0s and 1s"): 
    d = 0
    for i in range(len(b)):
        d = 2*d + int(b[i])
    return d

binaryToDec('1101')

13

## Decimal to Binary Conversion

In [22]:
def decToBinary(d):
    b = ""
    while d > 0:
        b = str(d % 2) + b
        d = d // 2
    return b

decToBinary(13)

'1101'

## `Rounding Errors`
Fractions cannot always be represented with a finite number of digits, sometimes leading to rounding errors
* thus never compare fractional numbers for equality
    * a == b ---> `abs(b-a) <= epsilon`
### Instead of Floating Point
* #### Arbritary Precision Decimal
    * a = Decimal("0.1")
* #### Rational Numbers
    separates numerator and denominator and stores as a pair
    * a = Fraction(1) / Fraction(10)

# `Approximation Algorithms`

## Computing the X-intersect

![Screen%20Shot%202018-04-19%20at%2011.29.12%20PM.png](attachment:Screen%20Shot%202018-04-19%20at%2011.29.12%20PM.png)

In [26]:
def x_intersect(f, a, b, eps):
    while b-a > eps:
        m = (a+b)/2
        if f(m) <= 0:
            a=m 
        else:
            b=m
    return a, b

x_intersect(lambda x: x**3-17, 0, 100, 1e-8)

(2.571281587006524, 2.5712815928272903)

## Approximating Square Root

In [27]:
def linearSqrt(n): 
    a=0
    while (a+1)*(a+1) <= n:
        a = a+1
    return a

linearSqrt(10)

3

## Pythagorean Triples
A triple (a,b,c) of integers is￼ Pythagorean if a^2 + b^2 = c^2

In [29]:
def printPythagoreanTriples1(n):
    for a in range(1, n+1):
        for b in range(1, n+1):
            for c in range(1, n+1):
                if a**2+b**2 == c**2:
                    print(a, b, c)

printPythagoreanTriples1(10)

3 4 5
4 3 5
6 8 10
8 6 10


## Quadratic Formula w/o Loss of SigDigs
Uses Vieta's Formula to avoid loss of significant digits

In [31]:
import math
def quadraticEquationSolutionPlus(a, b, c):
    d = math.sqrt(b*b-4*a*c)
    x1 = -(b+d)/(2*a) if b>=0 else (d-b)/(2*a)
    x2 = c/(x1*a)
    return x1, x2
quadraticEquationSolutionPlus(1, -2e8, 1)

(200000000.0, 5e-09)

#  
#  
# -------------------------------------------------------
# ------ TOPIC 7: TESTING & EXCEPTIONS ------
# -------------------------------------------------------

**`Testing:`** determine whether a program works as inteded or not

**`Debugging:`** Process of trying to fix a program that you already know has a bug

# Testing Methods

## `White-box Techniques`
* rely on knowing the program's source code
* **`CODE REVIEW:`** Process whereby 2+ developrs visually inspect program code several times 
    * Tools exist to help the process however is still very time consuming
* **`CODE COVERAGE:`** Check if test cases exercise all lines of the source code 
* **`STATIC ANALYSIS:`** Run automated tools to find flaws in programs 

## `Black-Box Techniques`
* only rely on ability to put in an input and observe corresponding output 
* Focus on developing **test cases** to see if we get the expected output


* **`UNIT TESTING:`** uses **modularity** (program broken into functions) such that we can develop a test case for every function within the program 
* **`INTEGRATION TESTING:`** test cases to determine if the program as a whole works as expected
    * **Continuous Integration Testing:** Automated testing framework that runs all your tests every time you make a change to the code
* **`REGRESSION TESTING:`** develop two test cases (1) test case that fails if bug is present (2) test case that should pass once the bug is fixed 
    * aims to avoid situation where a bug is fixed however later changes in source code reintroduce the same/similar bug

### Developing Test Cases
* **`EDGE CASES:`** when the input is at some extreme or might require special handling 
    * **For numbers:** positive, negative, 0, 1, odd, even, prime, composite, ...
    * **For strings:** empty string, 1 character string, ...
    * **For lists:** empty list, 1 element list, ...
    * **For subsequences:** at the beginning, at the end

# Debugging
Process of identifying and fixing bugs within your program 
* **(1)** Need a test case that reliably fails to allow us to identify the bug
* **(2)** Trace through the program and understand the internal state as it runs through a problem 
    * can do this using print statements 
    * can also use a `debugger:` steps through each line of source code and observes the variables // sets **breakpoints** at points in code where it pauses to let developer inspect the line

# Exceptions
Allows python to deal w/ situation where a statement isn't well defined or violates some condition 
* Python will raise an exception 
* `Exception Handler:` tries to recover the program 

**Unhandled Exception:** Exception is raised and program crashes 

**Handled Exception:** When exception is raised, another code is specified to run 

* Put the code in a `"try"` block and `"except ErrorName"` in another block which specifies what to do if code casues ErrorName to be raised
* `"finally"` block can be put used as well such that the code runs afterward no matter if an exception was raised or not

## Raising Exceptions + Common Exceptions
Can raise exception in our code if it enters a situation that it isn't intended to handle (ex. division by 0) 
--> "raise XError" 
* `Index Error:` when the index to a list/string/etc. is out of bounds
* `Type Error:` when a function/operator is applied to an object of the inappropriate type
* `Value Error:` when a function/operator is applied to an object of the right type but otherwise inappropriate
* `NotImplemented Error:` when a function has not been implemented
* `RecursionError:` when the maximum recursion depth has been exceeded
* `RuntimeError:` some error occurred that isn't classified elsewhere
* `ZeroDivisionError:` division by zero

## Assertions
Raises an `AssertionError` if the expression evalues to False
* assert a == b 
    * if a does not equal b, AssertionErrror is raised


#  
#  
# -------------------------------------------------------
# --------------- TOPIC 8: CLASSES ----------------
# -------------------------------------------------------

# Custom Exceptions
### can create a new type of exception that is specific to our code using classes

![Screen%20Shot%202018-04-20%20at%2012.48.16%20AM.png](attachment:Screen%20Shot%202018-04-20%20at%2012.48.16%20AM.png)

# Custom Datatypes
* classes that are built from an existing data type 

Must define a "template" for a class
* (1) How to initialize it
* (2) What operations you can do with it

We can then create **instances** of classes called **objects**

### Define a new class
"class MyClassName(WhatItsBuiltOn):
* MyClassName will **extend** WhatItsBuildOn

### Initializing an Instance
"def --init--(self)"
* special function called an **initializer** used to make a new instance of the class 
* inside this function, define **instance variables** that act as properties/ attributes of the class
    * these are referenced using "class.myAttribute" 

### Methods/ Member Functions 
Can add functions to a class
"def myFunction(self):" 
* called using dot notation 


![Screen%20Shot%202018-04-20%20at%201.03.09%20AM.png](attachment:Screen%20Shot%202018-04-20%20at%201.03.09%20AM.png)

### `STRING METHOD:` 
* overrides the internal representation of an object 
* 'def --str--(self): return "whatever you wanted to print"'

### `EQUALITY OF OBJECTS:`
"==" will compare an internal representatin (memory address) so will incorrectly return "false"
* Override this with "def --eq--(self, other):" method
    * "return self.x -- other.x and self.y == other.y" 

![Screen%20Shot%202018-04-20%20at%201.25.35%20AM.png](attachment:Screen%20Shot%202018-04-20%20at%201.25.35%20AM.png)

## Inheritance
All the definitions of the base class (parent class) are inherited, as if they were copy-and-pasted

![Screen%20Shot%202018-04-20%20at%201.11.28%20AM.png](attachment:Screen%20Shot%202018-04-20%20at%201.11.28%20AM.png)

## `TYPE OF AN OBJECT`
`isinstance(x,class)` returns "true" or "false" depending on if x is an instance of that particular class, respecting inheritance

# Abstraction in Classes
Classes can be used without needing to know the data structures of the implementation
* assumes that the attributes are not manipulated from outside the class
* If you prefix a member by__(doubleunderscore), the member will be private
    * Python will prevent any code outside the class from accessing private members

#  
#  
# -------------------------------------------------------
# ----------------- TOPIC 9: FILES ------------------
# -------------------------------------------------------

# Basic procedure for working with files
1. Open the file for reading or writing to obtain a "file handle".
    * Can only open file exclusively for reading or exclusively for writing at one time – can't read and write to the same file at the same time.
2. Read or write one or more lines to the file by referring to its file handle.
3. Close the file.

## `Reading a File`


![Screen%20Shot%202018-04-20%20at%201.29.48%20AM.png](attachment:Screen%20Shot%202018-04-20%20at%201.29.48%20AM.png)

![Screen%20Shot%202018-04-20%20at%201.57.59%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%201.57.59%20PM.png)

## `Writing to a File`

![Screen%20Shot%202018-04-20%20at%201.57.14%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%201.57.14%20PM.png)

![Screen%20Shot%202018-04-20%20at%202.02.10%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%202.02.10%20PM.png)

## `Reading Data from the Internet`
* Same idea as reading data from a file, except our file handle is a remote resource rather than a local file
* Use the **urllib module** to open remote resources

![Screen%20Shot%202018-04-20%20at%202.05.33%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%202.05.33%20PM.png)

* Data read from a URL is a sequence of bytes.
    * It could be binary data or it could represent a string.
* If it represents a string, special characters (accents, emoji, ...) could be encoded in one of many different **character sets**
    * So we need to decode it.
* `character set` helps python decode
    * website usually specifies the character set, but if not then **ASCII** can be used as a default to decode the character set 

![Screen%20Shot%202018-04-20%20at%202.08.19%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%202.08.19%20PM.png)

## `Handling File Errors`

![Screen%20Shot%202018-04-20%20at%202.09.37%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%202.09.37%20PM.png)

![Screen%20Shot%202018-04-20%20at%202.09.24%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%202.09.24%20PM.png)

# File Formats
## `CSV: Comma-separated values`
* A plain-text format for storing tabular data
    * Commonly used for importing/exporting spreadsheets between applications
    * however no official standard on what a CSV file is so can have compatibility issues
    * Each line of the file is a data record
        * Each record of the file contains multiple fields separated by commas
    * Can optionally have a **header row** containing field names

![Screen%20Shot%202018-04-20%20at%202.15.18%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%202.15.18%20PM.png)

![Screen%20Shot%202018-04-20%20at%202.16.48%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%202.16.48%20PM.png)


###  However, if the first line of CSV file is a `header row` can use `csv.DictReader` instead to return each row as a dictionary

![Screen%20Shot%202018-04-20%20at%202.17.55%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%202.17.55%20PM.png)

## `JSON: JavaScript Object Notation`
* Originally a subset of the JavaScript programming language for representing objects
    * Now a language-independent plain-text format for **representing dictionaries and lists**
    * Commonly used for transmitting objects between applications
* Basic format: overall file = list/ dictionary 
* Every entry can be one of the following:
    * List (enclosed in [ ... ])
    * Dictionary
        * Enclosed in { ... }
        * Every key is a quoted string
        * Colon between key and value • Value can be any kind of entry
    * Number (int or flaot)
    * String (enclosed in " ... ") – Boolean (true or false)
    * null

![Screen%20Shot%202018-04-20%20at%202.25.43%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%202.25.43%20PM.png)

## `HTML: Hypertext Markup Language`
* Plaintext language used for describing the content of web pages
* Every page you browse on the web is an HTML file
* Hierarchical structure
* Similar to XML
* Possible to extract data from HTML files ("scraping"), but this can be tricky since websites change their HTML source code frequently
    * Preferable to use CSV/JSON/XML if available

# Dates
* The datetime module provides classes to represent dates and times, and to manipulate them
    * `datetime.date`: Class representing dates in Gregorian calendar (y, m, d)
    * `datetime.time`: Class representing a time on an abstract day of 24*60*60 seconds (h, min, sec, microsecond, timezone)
    * `datetime.datetime`: Class representing a date and a time on that date
    * `datetime.timedelta`: Class representing a difference between two time/date objects for arithmetic

## Datetime Objects > Strings 
* use the `strftime` method 


![Screen%20Shot%202018-04-20%20at%202.38.13%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%202.38.13%20PM.png)

## Strings > Datetime Objects
* use the `strptime` method
* must specify the exact format of the string 

![Screen%20Shot%202018-04-20%20at%202.40.12%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%202.40.12%20PM.png)

#  
#  
# -------------------------------------------------------
# ---------- TOPIC 10: VISUALIZATION ------------
# -------------------------------------------------------

### `%matplotlib inline` 
* displays non-interactive images in the notebook

### `%matplotlib nbagg` 
* inserts interactive plots into the notebook that can be dragged etc.

![Screen%20Shot%202018-04-20%20at%203.17.00%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%203.17.00%20PM.png)

### `frange ( x1, x2, floating point step)`
* function that allows us to creat a range stepped by floating points 

# Plotting Data from a CSV File
1. Read data from CSV file using approach from previous lectures
2. Construct xvalues and yvalues lists by iterating through the rows of the CSV file

![Screen%20Shot%202018-04-20%20at%203.22.26%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%203.22.26%20PM.png)

#  
#  
# -------------------------------------------------------
# -------------TOPIC 11: DATABASES --------------
# -------------------------------------------------------

### Databases allow us to process and store data:
* data stored in dedicated files and loaded as needed
    * no need to explicitly open files and read data
* can be accessed "simultaneously" by "clients" either locally or remotely
* atomically updated: data is either stored or not but never is corrupt
* must structure data in specific ways 
    * `Relational Databases:` data items and their relationship is stored in tables 
    * `Database Schema:` specifies the names and types of fileds of each table 


### **`SQL: Structured Query Langage:`**
* database query language that can be used to interact with a database:
    * Fetch a set of records
    * Add data to a table
    * Modify data
    * Delete data


![Screen%20Shot%202018-04-20%20at%205.31.19%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%205.31.19%20PM.png)

`PRIMARY KEY:` a field that uniquely identifies a record. add PRIMARY KEY keyword

`REQUIRED FIELDS:` fields that are required by adding the REQUIRED keyword.
    * If you try to insert a record but omit a required field, the command fails.

### Inserting data into a table
* insert rows one at a time
* any omitted fields get a default value of NULL 

![Screen%20Shot%202018-04-20%20at%205.34.43%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%205.34.43%20PM.png)

### Selecting data from a table
Have to specify four things:
1. Which fields to return
    * SELECT title
2. Which table
    * SELECT title from movies
3. Which rows to select
    * SELECT title from movies WHERE genre LIKE "%drama%" 
        * `LIKE and %` = wildcard matching on strings
4. What order to sort
    * SELECT title from movies where genre LIKE "%drama%" ORDER by title; 

### **`SQLite:`**
serverless database manager (files stored locally) 

![Screen%20Shot%202018-04-20%20at%205.29.26%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%205.29.26%20PM.png)

#  
#  
# -------------------------------------------------------
# ---------TOPIC 12: MACHINE LEARNING--------
# -------------------------------------------------------

Machine learning is a computer system whose performance on a particular task improves as it gains experience

# Types of Machine Learning

![Screen%20Shot%202018-04-20%20at%206.01.06%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%206.01.06%20PM.png)

# Applications of Machine Learning

![Screen%20Shot%202018-04-20%20at%206.03.52%20PM.png](attachment:Screen%20Shot%202018-04-20%20at%206.03.52%20PM.png)

## Linear Regression
1. Obtain data
2. Divide the data into two sets:a `training set` and a `testing set`
    * this allows us to see if our model can predict another dataset well 
3. Run linear regression on the training data to get proposed coefficient and intercept
   
4. Use the proposed coefficient and intercept to predict the y value of the testing data from its x values
5. Measure how far the predicted yvalues are from the real y values in the testing data set
    * The `mean squared error` is the average (mean) of the squared errors for all samples (smaller is better)
    * `R2 Score:` Coefficient of determination, a measure of the "goodness of fit" of a regression model
    
    
## Classification 
Measures similarity by constructing a numerical vector to represent features then computing the distance between the vectors
* `Feature Vectors:` Turn each feature into a vector
* `Minkowski Distance:` generalization of Euclidean distance

 1. Obtain labelled data
 2. Construct feature vectors for the labelled data
 3. Divide the data into two sets: a training set and a testing set
 4. Train a classifier on the training data to develop a model
 5. Use the classifier to predict the labels for the testing data
 6. Measure the accuracy of the predicted label on the testing data
        * Accuracy = #(True) / Total 

## `Sentiment Analysis`
special type of classification problem focused on identifying the sentiment of a piece of text

**Basic Approach:** label individual words as being associated with a sentiment, then compute a sentiment score for a particular piece of text

**Machine Learning Approach:** label pieces of text as having a particular sentiment, then use classification to predict sentiment

### Natural Language Processing
transforms natural language text into a form that is more effectively processed

`Tokenizing:`  splitting a passage of text into a list of words ("tokens")

`Tagging:` marking the tokens (words) in a passage of text with tags representing their grammatical role

`Stemming:` reducing tokens (words) to their root ("stem") so that we can identify recurrences of the same notion

`Stopwords:` remove common words with little meaning