>>> Work in Progress (Following are the lecture notes of Prof Percy Liang/Prof Dorsa Sadigh - CS221 - Stanford. This is my interpretation of his excellent teaching and I take full responsibility of any misinterpretation/misinformation provided herein.)

## Lecture 01: Machine Learning 1 - AI Principles and Techniques | Stanford CS221

<img src="images/01_history.png" width=400 height=400>

#### Course overview:

- An intelligent agent
  - Perception

- Bias in machine translation - not distinction between he/she
 - open areas of research

- AI agents - achieving human level intelligence
- AI tools - consequences, making system that help human
  - Predicting poverty
  - saving energy in data centees
  - self driving cars
  - bias in machine translation(he/she)

- Paradigm - How to solve complex AI problems - gap
  - learn how to solve problem
    - Modeling
      - take real world problem, formulate simple mathematical models around it
        - build best way to get around in the city using graphs
    - Inference
      - asking questions to the model
        - what is the shortest path from point A to point B
    - Learning
      - where did this model come from
        - Model without parameters (skeleton)
        - by adding data
        - Model with parameters (trained)
        - go back and do inference and ask questions
        
----

#### Course plan
- go from low level to high level intelligence
  - Machine learning
    - you have data, make a model around it, then have faith in the model to predict in future - generalization
  - Reflex based model (low level intelligence)
    - requires fix set of computation
      - linear classifiers, deep neural networks
  - State based model
    - powerful and gives foresight
    - chess - agents who can plan and think, like position in a game
    - robotics - motion planning
    - NLP - machine translation, 
    - 3 types of state based model problems
      - Search problems 
        - you control everything - looking for best path
      - Markov decision processes 
        - randomness - while going from point A to point B, there is traffic or 
        - you are rolling dice
      - Adversial games
        - playing against opponents - chess


  - Variable based models
    - for example - Sudoku - set of constraints - order in how you solve problem is not important, whereas in Chess its important
    - 2 types
      - Constraint satisfaction problems
        - hard constraints - e.g., Sudoku, scheduling, a person cannot be at 2 places at once
      - Bayesian networks
        - soft dependencies - trying to track car over time using positions and sensor readings
  - Logic based model (high level intelligence)
    - Motivation - Virtual assistant
      - talk to using NLP
      - ask question
        - digest heterogenous information
        - reason deeply with the given information
        
-----

#### Homeworks
- **Introduction:** foundations
- **Machine learning:** sentiment classification
- **Search:** text reconstruction
- **MDPs:** blackjack
- **Games:** Pac-Man (+ competition with extra credit)
- **CSPs:** course scheduling
- **Bayesian networks:** car tracking
- **Logic:** language and logic

------

### Optimization
- **Discrete optimization:** find the best discrete object (path that minimizes the cost)
  - Algorithm tool: Dynamic programming
  > $min_{p \in \text{Paths}} \text{Cost}(p)$
    - best path where path p minimizes the cost 
    - the number of paths is huge
- **Continuous optimization:** find the best vector of real numbers that minimizes the objective
  - Algorithm tool: gradient descent
  > $min_{w \in \mathbb R^{d}} \text{TrainingError}(w)$
    - minimizes the objective function
    
-----

### Discrete optimization
- Input: two strings s and t
- Output: minimum no of character insertions, deletions and substitutions it takes to change s into t
- Examples:
  > "cat", "cat" $\rightarrow$ 0  
  > "cat", "dog" $\rightarrow$ 3  
  > "cat", "at" $\rightarrow$ 1  
  > "cat", "cats" $\rightarrow$ 1  
  > "a cat!", "the cats!" $\rightarrow$ 4  
  
- Solve using DP
  - simplify the problem
  - use recurrence
  
- Memoization
  - use cache
  
----

### Example - compute edit distance
- basic dynamic programming  

<img src="images/01_editDistance.png" width=400 height=400>
$\tiny{\text{YouTube-Stanford-CS221-Percy Liang}}$   


----

In [1]:
def computeEditDistance(s,t):
    cache = {} # (m,n) => result
    def recurse(m,n):
        """
        Return the minimum edit distance between
        - first m letters of s
        - first n letters of t
        """
        if (m,n) in cache:
            return cache[(m,n)]
        elif m == 0:
            result = n
        elif n == 0:
            result = m
        elif s[m-1] == t[n-1]: # Last letter matches
            result = recurse(m-1, n-1)
        else:
            subCost = 1 + recurse(m-1, n-1)
            delCost = 1 + recurse(m-1, n  )
            insCost = 1 + recurse(m  , n-1)
            result = min(subCost, delCost, insCost)
        cache[(m,n)] = result
        return result
    return recurse(len(s), len(t))

# print(computeEditDistance('a cat!', 'the cats!'))
# print(computeEditDistance('cat', 'cat'))
# print(computeEditDistance('cat', 'dog'))
print(computeEditDistance('a cat!'*10, 'the cats!'*10))


40


----

### Continuous optimization
  - how do you do regression
    - For a given slope vector, tell me how bad the fit is

      <img src="images/01_regression.png" width=400 height=400>
      $\tiny{\text{YouTube-Stanford-CS221-Percy Liang}}$   

    - how to solve regression problem i.e., how to optimize
      - abstract away the details
        - minimize F(w)
        - take the derivative
        - use gradient descent
        
        <img src="images/01_regressionOpt.png" width=400 height=400>
        $\tiny{\text{YouTube-Stanford-CS221-Percy Liang}}$   

-----

In [2]:
import pandas as pd

points = [(2,4), (4,2)]

def F(w):
    return sum((w * x - y)**2 for x, y in points)

def dF(w):
    return sum(2*(w * x - y) * x for x, y in points)

def gradientDescent():
    w = 0
    eta = 0.01

    lst = []
    for t in range(100):
        value = F(w)
        gradient = dF(w)
        w = w - eta * gradient
        lst.append([w, value])
    df = pd.DataFrame(lst, columns = ['w', 'F(w)'])
    df.index.name = 'Iteration'
    return df

print(gradientDescent())


                  w       F(w)
Iteration                     
0          0.320000  20.000000
1          0.512000  11.808000
2          0.627200   8.858880
3          0.696320   7.797197
4          0.737792   7.414991
...             ...        ...
95         0.800000   7.200000
96         0.800000   7.200000
97         0.800000   7.200000
98         0.800000   7.200000
99         0.800000   7.200000

[100 rows x 2 columns]


----

## TODO

- Overview of AI
  - AI agents
    - Perception, communicate, Reason, Learn, 
  - Variable step size
