### Session 3 - Intro to Machine Learning - Intelligence and Learning

[Playlist link](https://www.youtube.com/playlist?list=PLRqwX-V7Uu6bCN8LKrcMa6zF4FPtXyXYj)

[Lesson README link](https://github.com/shiffman/NOC-S17-2-Intelligence-Learning/tree/master/week3-classification-regression)

#### What is ML

There is some sort of Input. That ip goes into some ML recipe (algorithm). Out of that ML recipe we get an op.

2 common kinds of op:

1. Classification

2. Regression

#### But how does this happen?

We have to train the system, apply some sort of learning method to the system

- **Supervised Learning**

    Here we have traing data and test data. Also we have unknown data. We have training data with the labels/response variable. The inputs go into the ML recipe and come out of the other side and some sort of guess is made. So for house price pred say y_actual = 1 mil and y_predicted = 1.5 mil. So ML recipe got it wrong. So we turn some knobs in the recipe to get a better result and minimize error. We do this over and over again with lots of training data. Then we have test data. Test data is similar to training data - ips with corr ops. But we didnt use it while training. So we feed test data and see how well it does and evaluate its performance. If good we publish it out to the world to interact with the unknown data 
    

- **Unsupervised Learning**

    This is generally applied to data that we know nothing about. It is typically applied to CLustering problems. Say we have lots of data and we want to arrange it in groups and we dont know anything about the data. The algo figures out the clusters based on the patterns in the data


- **Reinforcement Learning**

    This is a kind of learning where an agent observes the environment and chooses an action. Think of a mouse trying to get through a maze. The mouse looks around and finds walls. Say mouse decides to go Left.Once it makes that decision, it receives a reward, positive or negative. As mouse receives more positive rewards for certain kinds of actions, it does more of those actions over the long term and gets better and better at things
    
    


### K Nearest Neighbors Recommendation Engine - Part 1

[Youtube link](https://www.youtube.com/watch?v=N8Fabn1om2k&list=PLRqwX-V7Uu6bCN8LKrcMa6zF4FPtXyXYj&index=2)
[Link to code](https://github.com/CodingTrain/website/tree/master/CodingChallenges/CC_070_3_movie_recommender)

We have ratings of various people on the Star wars movie collection

**We want to determine if people have similar tastes based on their movie ratings**

Thus we have to calculate a **similarity score**. The kind of Similarity Score weare going to use here is **Euclidean Distance**

We can calculate ED bw 2 pts in 2d or 3d space. Now imagine we have 2 (RGB) color values - R1,G1,B1 and R2,G2,B2

We can use ED to calculate distance bw these 2 colors.

Similarly we acn use ED for 100s of dimensions not just 2 or 3. The math remains the same

What is the math? 
- say we have 2 pts (A and B). The ED is **sqrt(a^2 + b^2)**

![](./data/img/data1.png)

---

Say A and B are 2 users and they have given a rating to 5 movies

We just compute the ED bw the ratings 



We want a sort of interface..

We want 2 dropdowns where users can select the reviewers and get a similarity score between them

```javascript

var data;

function preload(){
    data = loadJSON('./data/movie_data.json');
}

function setup(){
    noCanvas();
    //console.log(data);

    var users = data.users;

    // using p5.js to create dropdowns

    var dropdown1 = createSelect('');
    var dropdown2 = createSelect('');

    for (var i = 0; i < users.length; ++i){

        // populate the dropdowns with names
        dropdown1.option(users[i].name);
        dropdown2.option(users[i].name);
    }

    // create a button

    var button = createButton('Submit');
                 
    button.mousePressed(euclideanSimilarity);
}

```

Now we have a little interface set up

On click of submit button we want the function **euclideanSimilarity()** to run

```javascript

//distance
        var d = sqrt(sumSquares);

        // similarity is actually inversely related to the distance
        // we do (1+d) to handle div by 0 error and also it gives us a score bw 0 and 1
        // if d = 0, score = 1
        // if d is v high score tends to be 0
        var similarity = 1/(1+d);

        // output the similarity value
        createP(similarity);//distance
        var d = sqrt(sumSquares);

        // similarity is actually inversely related to the distance
        // we do (1+d) to handle div by 0 error and also it gives us a score bw 0 and 1
        // if d = 0, score = 1
        // if d is v high score tends to be 0
        var similarity = 1/(1+d);

        // output the similarity value
        createP(similarity);

```

**At this stage we have a similarity score based on Euclidean dist of the ratings**

#### Euclidean Distance - Problems and Alternative

Lets say A rates all movies 4 and 5 stars. B rates them 1 and 2 stars. For every movie A rates 4, B rates 1 and for every movie that A rates 5 B rates 2. Based on ED, distance will be v far apart and similarity less. But their tastes are actually quite similar.

**Person's Correlation** is a way of calculating a similarity score that kind of accounts for that diff in range.

It considers the slope of the line, not just the distance



### Nearest Neighbors Recommendation Engine - Part 2

[YouTube link](https://www.youtube.com/watch?v=Lo89NLmSgl0&index=3&list=PLRqwX-V7Uu6bCN8LKrcMa6zF4FPtXyXYj)

What we want to do is when we select a user, we want to see the five most similar users

1. We create an object which has name -> similarity_score mapping

```javascript
var similarityScores = {};

        for(var i = 0; i < data.users.length; ++i){
            var otherUser = data.users[i].name;
            
            if (otherUser != name){
                // compute similarity score
                var similarity = euclideanDistance(name,otherUser)
                similarityScores[otherUser] = similarity
            }
            else{
                // same user, assign -1 
                similarityScores[otherUser] = -1
            }
        }
```

2. Sort users desc based on similarity scores

```javascript

data.users.sort(compareSimilarity);

        function compareSimilarity(a, b){
            var score1 = similarityScores[a.name];
            var score2 = similarityScores[b.name];
            return score2 - score1;
        }

```

3. Select top 5 similar users

```javascript

var k = 5;

        for (var i = 0; i < k; ++i){
            var name = data.users[i].name
            var div = createDiv(name + ': Similarity Score: ' + similarityScores[name]);
            resultDivs.push(div);
            resultP.parent(div);
        }

```

---

Now what if a new person comes in the scene. He rates a few movies but hasnt seen one of the movies. We want to **guess a new rating for the new user based on their similarity to the k nearerst neighbors**.

This is a Reg