## Logistic Regression Model from scratch for Divorce Prediction

### Logistic regression
Logistic regression uses an equation as the representation, very much like linear regression.

Input values (x) are combined linearly using weights or coefficient values (referred to as W) to predict an output value (y). A key difference from linear regression is that the output value being modeled is a binary values (0 or 1) rather than a numeric value.<br>

#####  $\hat{y}$ (w, x) = 1/(1+exp^-(w_0 + w_1 * x_1 + ... + w_p * x_ps))

#### Dataset
The dataset is available at http://archive.ics.uci.edu/ml/machine-learning-databases/00520/data.zip. Unzip the file and use either CSV or xlsx file. Dataset is based on rating for questionnaire filled by people who already got divorse and those who is happily married.<br>

#### Features (X)
1. If one of us apologizes when our discussion deteriorates, the discussion ends.
2. I know we can ignore our differences, even if things get hard sometimes.
3. When we need it, we can take our discussions with my spouse from the beginning and correct it.
4. When I discuss with my spouse, to contact him will eventually work.
5. The time I spent with my wife is special for us.
6. We don't have time at home as partners.
7. We are like two strangers who share the same environment at home rather than family.
8. I enjoy our holidays with my wife.
9. I enjoy traveling with my wife.
10. Most of our goals are common to my spouse.
11. I think that one day in the future, when I look back, I see that my spouse and I have been in harmony with each other.
12. My spouse and I have similar values in terms of personal freedom.
13. My spouse and I have similar sense of entertainment.
14. Most of our goals for people (children, friends, etc.) are the same.
15. Our dreams with my spouse are similar and harmonious.
16. We're compatible with my spouse about what love should be.
17. We share the same views about being happy in our life with my spouse
18. My spouse and I have similar ideas about how marriage should be
19. My spouse and I have similar ideas about how roles should be in marriage
20. My spouse and I have similar values in trust.
21. I know exactly what my wife likes.
22. I know how my spouse wants to be taken care of when she/he sick.
23. I know my spouse's favorite food.
24. I can tell you what kind of stress my spouse is facing in her/his life.
25. I have knowledge of my spouse's inner world.
26. I know my spouse's basic anxieties.
27. I know what my spouse's current sources of stress are.
28. I know my spouse's hopes and wishes.
29. I know my spouse very well.
30. I know my spouse's friends and their social relationships.
31. I feel aggressive when I argue with my spouse.
32. When discussing with my spouse, I usually use expressions such as ‘you always’ or ‘you never’ .
33. I can use negative statements about my spouse's personality during our discussions.
34. I can use offensive expressions during our discussions.
35. I can insult my spouse during our discussions.
36. I can be humiliating when we discussions.
37. My discussion with my spouse is not calm.
38. I hate my spouse's way of open a subject.
39. Our discussions often occur suddenly.
40. We're just starting a discussion before I know what's going on.
41. When I talk to my spouse about something, my calm suddenly breaks.
42. When I argue with my spouse, ı only go out and I don't say a word.
43. I mostly stay silent to calm the environment a little bit.
44. Sometimes I think it's good for me to leave home for a while.
45. I'd rather stay silent than discuss with my spouse.
46. Even if I'm right in the discussion, I stay silent to hurt my spouse.
47. When I discuss with my spouse, I stay silent because I am afraid of not being able to control my anger.
48. I feel right in our discussions.
49. I have nothing to do with what I've been accused of.
50. I'm not actually the one who's guilty about what I'm accused of.
51. I'm not the one who's wrong about problems at home.
52. I wouldn't hesitate to tell my spouse about her/his inadequacy.
53. When I discuss, I remind my spouse of her/his inadequacy.
54. I'm not afraid to tell my spouse about her/his incompetence.

#### Target (y)
55. Class: 1 - Divorsed, 0 - Not Divorsed (Yet) 

#### Objective
To gain understanding of logistic regression through implementing the model from scratch

#### Tasks
- Download the data from above mentioned dataset link
- Extract the zip file and use either CSV file or XLSX file (Note: CSV files contain semicolon ; as delimiter). Load the data and define X and y as numpy array
- Add column at position 0 with all values=1 (pandas.DataFrame.insert function). This is for input to the bias W0.
- Print the shape and datatype of both X and y
- Dataset contains missing values, hence fill the missing values (NA) by performing missing value prediction.
- Since the all the features are in higher range, columns can be normalized into smaller scale (like 0 to 1) using different methods such as scaling, standardizing or any other suitable preprocessing technique. Use Sklearn builtin functions from sklearn.preprocessing.
- Split the dataset into 60% for training and rest 40% for testing. You can utilize built in function train_test_split in sklearn for this task. 
- Follow code cells to implement simple logistic regression from scratch
    - Write hypothesis function to predict values
    - Write function for calculating cross entropy loss (or log loss)
    - Write function to return gradients for given weights
    - Perform gradient descent taking help of above functions
    - Write function for calculating accuracy
- Train the model using training set using the function implementation
- Predict the output for testing set samples and compute accuracy

#### Further Fun (will not be evaluated)
- Play with learning rate and max_iterations
- Testing between whether label encoder vs one hot encoder for categorical features gives better results.
- Running model with different feature scaling methods (i.e. scaling, normalization, standardization etc using sklearn)
- Training model with different sizes of dataset splitting such as 60-40, 50-50, 70-30, 80-20, 90-10, 95-5 etc.
- Shuffling of training samples with different random seed values in the train_test_split function. Check the model error for the testing data for each setup.
- Write functions for other classification metrics such as confusion matrix,  precision, recall and f1 scores.


#### Helpful links
- How Logistic Regression works: https://machinelearningmastery.com/logistic-regression-for-machine-learning/
- Feature Scaling: https://scikit-learn.org/stable/modules/preprocessing.html
- Training testing splitting: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
- Use slack for doubts: https://join.slack.com/t/deepconnectai/shared_invite/zt-givlfnf6-~cn3SQ43k0BGDrG9_YOn4g


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
# Download the dataset from the source
!wget URL

In [1]:
# Unzip the file to the local cloud directory
!unzip file_name

In [None]:
# Read the data from local cloud directory
data = 

In [None]:
# Add column which has all 1s
# The idea is that weight corresponding to this column is equal to intercept
# This way it is efficient and easier to handle the bias/intercept term


In [None]:
# Print the dataframe rows to just see some samples


In [None]:
# Define X (input features) and y (output feature) 
X = 
y = 

In [None]:
X_shape = 
X_type  = 
y_shape = 
y_type  = 
print(f'X: Type-{X_type}, Shape-{X_shape}')
print(f'y: Type-{y_type}, Shape-{y_shape}')

In [None]:
# Perform missing value prediction



In [None]:
# Perform feature scaling



In [None]:
# Split the dataset into training and testing here
X_train, X_test, y_train, y_test = 



In [None]:
# Print the shape of features and target of training and testing: X_train, X_test, y_train, y_test



##### Let us start implementing logistic regression from scratch. Just follow code cells, see hints if required.

In [None]:
def predict(X, weights):

    ### START CODE HERE ###
    y_pred = 
    ### END CODE HERE ###
    
    assert (y_pred.shape==(X.shape[0],)), 'Wrong implementation of predict function. Check carefully'
    
    return y_pred

In [None]:
def cross_entropy_loss(y_train, y_pred) : 
    '''
    y_true : (m,1)
    y_pred : (m,1)
    
    cross entropy loss (or log loss)
    '''
    ### START CODE HERE ###
    loss = 
    ### END CODE HERE ###
    
    return loss

In [None]:
def gradient(X_train, y_train, y_pred):

    # Initialize the gradient vector 
    grad = np.zeros(,)
    
    ### START CODE HERE ###
    
    grad[0] = 
    grad[1] = 
    
    ### END CODE HERE ###
    
    return grad

In [None]:
def gradient_descent(X_train, y_train, learning_rate=0.01, max_iterations=100):

    # Initialise weights vector of random values
    weights = np.random.rand()
    # Initialize a list to record all the losses 
    losses  = []
    
    ### START CODE HERE ###
    
    ### END CODE HERE ###
    
    return weights, losses

In [None]:
def accuracy(y_pred, y_test):
    
    ### START CODE HERE ###
    accuracy = 
    ### END CODE HERE ###
    
    return accuracy

##### Congratulations! You have implemented logistic regression from scratch. Let's see this in action.

In [None]:
# Perform gradient descent
optimal_weights, losses = gradient_descent(X_train, y_train)

In [None]:
# Print final loss
print("Cross Entropy loss:", losses[-1])

In [None]:
# Plot the loss curve
plt.plot([i for i in range(len(losses))], losses)
plt.title("Loss curve")
plt.xlabel("Iteration num")
plt.ylabel("Loss")
plt.show()

In [None]:
# Make predictions for testing set using trained weights
y_pred = hypothesis(X_test, optimal_weights)

In [None]:
# Calculate accuracy score for the testing set

accuracy = 