## What is all about?
### Once I came up with idea for simple Logistic Regression: what if we split one big model evaluated on all features into few smaller one evaluated only on some of the features.

### Not only the theoretical evidence show that it will converge faster *but it indeed is*!

##### The theoretical complexety of LogisticRegression is $O(k^2\cdot(n + k))$, where $k$ is number of features and n - number of data points (though complexety maybe less in practice)
##### If we divide data features into $m$ unique features subsets and fit $m$ models on them, the resulting complexety will be:
$$
m \cdot O(\frac{{k^2}}{{m^2}}\cdot(n + \frac{{k}}{{m}}))
$$
##### Which is approximately by m times less than fitting one model on the whole data.
### But in reality this result depends on a data and hyperparameters (such as number of iterations)

## For demonstation I can run some experiments on toy datasets and demonstrate the result:

In [36]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import warnings
import numpy as np
import pandas as pd
import random as rd
import time

from AdditiveLogisticRegression import ALogisticRegression
from sklearn.linear_model import LogisticRegression

In [38]:
warnings.filterwarnings('ignore')

## 1. Test on wine data (which is classification on only numerical data)

In [39]:
data = load_wine()
target = pd.DataFrame(data.target)
data = pd.DataFrame(data.data)
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.3)
random_sample = rd.randint(0, 100)
print('Random sample data:', data.loc[random_sample], sep='\n')
print('Target(one of the following: (0, 1, 2)):', target.iloc[random_sample], sep='\n')

Random sample data:
0       14.02
1        1.68
2        2.21
3       16.00
4       96.00
5        2.65
6        2.33
7        0.26
8        1.98
9        4.70
10       1.04
11       3.59
12    1035.00
Name: 29, dtype: float64
Target(one of the following: (0, 1, 2)):
0    0
Name: 29, dtype: int32


## 2. Now let's run experiments to find expectation of accuracy and execution time ratio:

In [43]:
ratios = []
accuracy_sub = []
accuracy_full = []

for i in range(100):
    # Using my own LogisticRegression class which just rewrites the original
    # You can set k (number of models for split) to a different value
    # max_iter found to be the most controversial here, you can set it too
    # and all the other parameters of simple LogisticRegression can be passed
    additive_logistic = ALogisticRegression(k=4, max_iter=1000)  
    additive_logistic.fit(X_train, y_train)

    y_prediction = additive_logistic.predict(X_test)
    score = accuracy_score(y_test, y_prediction)
    accuracy_sub.append(score)

    # Built-in class attribute
    sub_time = additive_logistic.time_evaluation

    # Using the classic one on the whole dataset this time
    model_check = LogisticRegression(max_iter=1000)
    start = time.time()
    model_check.fit(X_train, np.ravel(y_train))
    end = time.time()

    full_model_time = end - start
    
    y_preds = model_check.predict(X_test)
    score = accuracy_score(y_test, y_preds)
    accuracy_full.append(score)

    ratios.append(full_model_time/sub_time)

print('Accuracy of standard model:', np.mean(accuracy_full), 'Accuracy of splitted model:', np.mean(accuracy_sub))
print('Mean ration of execution time (standard / splitted):', np.mean(ratios))

Accuracy of standard model: 0.9259259259259257 Accuracy of splitted model: 0.962962962962963
Mean ration of execution time (standard / splitted): 1.4471698807695474
