# Machine Learning Basics
In this module, you'll be implementing a simple Linear Regressor and Logistic Regressor. You will be using the Salary Data for the tasks in this module. <br> <br>
**Pipeline:**
* Acquiring the data - done
* Handling files and formats - done
* Data Analysis - done
* Prediction
* Analysing results

## Imports
You may require NumPy, pandas, matplotlib and scikit-learn for this module. Do not, however, use the inbuilt Linear and Logistic Regressors from scikit-learn.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

## Dataset
You can load the dataset and perform any dataset related operations here. Split the data into training and testing sets. Do this separately for the regression and classification problems.

In [None]:
data = pd.read_csv('/home/cupgreek/Documents/ML_Assignments/Data/SalaryData.csv')

## Task 1a - Linear Regressor
Code your own Linear Regressor here, and fit it to your training data. You will be predicting salary based on years of experience.

In [None]:
def calc_coef(x, y): 
    size = np.size(x)  
    mean_x, mean_y = np.mean(x), np.mean(y) 
    SS_xy = np.sum(y*x) - size*mean_y*mean_x 
    SS_xx = np.sum(x*x) - size*mean_x*mean_x  
    b1 = SS_xy / SS_xx 
    b0 = mean_y - b1*mean_x 
    return(b0, b1) 
  
def predict(x, y, b): 
    plt.scatter(x, y) 
    y_pred = b[0] + b[1]*x 
    plt.plot(x, y_pred, color = "g") 
    plt.xlabel('Years Of Experience') 
    plt.ylabel('Salary')  
    plt.show()
    
x = np.array(data.iloc[:,0:1]) 
y = np.array(data.iloc[:,1:])

x_train,x_test = train_test_split(x, test_size = 0.3,random_state = 2)
y_train,y_test = train_test_split(y, test_size = 0.3, random_state = 2)

b = calc_coef(x_train, y_train)
predict(x_test, y_test, b) 

## Task 1b - Logistic Regression
Code your own Logistic Regressor here, and fit it to your training data. You will first have to create a column, 'Salary<60000', which contains '1' if salary is less than 60000 and '0' otherwise. This is your target variable, which you will aim to predict based on years of experience.

In [None]:
lst = []
for i in data['Salary']:
    if i > 60000:
        lst.append(1)
    else:
        lst.append(0)
        
y = np.array(lst)
x = np.array(data.iloc[:,0:1]) 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import random

x_train,x_test = train_test_split(x, test_size = 0.3,random_state = 2)
y_train,y_test = train_test_split(y, test_size = 0.3,random_state = 2)

one1 = np.ones((1,15))
one2 = np.ones((1,15))

x_train = np.append(one1,x_train)
x_test = np.append(one2,x_test)

x_train = x_train.reshape((2,-1))
x_test = x_test.reshape((2,-1))

In [None]:
beta = np.array([0,random.randint(1,101)/100])
for i in range(10000):
    beta = beta.reshape((2,-1))
    sig = 1.0/(1 + np.exp(-np.dot(beta.T, x_train)))
    if np.dot((y_train-sig),(x_train.T)).all() <= 0:
        break
    else:
        beta = beta + 0.5*np.dot((y_train-sig),(x_train.T)).T
        
y_pred = np.dot(beta.T,x_test)

In [None]:
lst = []
for i in y_pred:
    for j in i:
        if j>0:
            lst.append(1)
        else:
            lst.append(0)
lst = np.array(lst)
plt.scatter(x_test[1],y_test)

In [None]:
plt.scatter(x_test[1],lst)

## Task 2 - Results
Analyse the quality of the ML models you built using metrics such as R2, MAE and RMSE for the Linear Regressor, and Accuracy for the Logistic Regressor. Evaluate their performance on the testing set.