# Logistic Regression
You should build a machine learning pipeline using a logistic regression model. In particular, you should do the following:
- Load the `mnist` dataset using [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html). You can find this dataset in the datasets folder.
- Split the dataset into training and test sets using [Scikit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html). 
- Train and test a logistic regression model using [Scikit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).
- Check the documentation to identify the most important hyperparameters, attributes, and methods of the model. Use them in practice.

## Import libraries

In [13]:
import pandas as pd
import sklearn.model_selection
import sklearn.linear_model
import sklearn.metrics

## Import dataset

In [14]:
mnist_db = pd.read_csv('/Users/adolfomytr/Documents/Alemania/Master/GISMA/Materias/teaching-main/datasets/mnist.csv')
mnist_db = mnist_db.set_index('id')
mnist_db.head()

Unnamed: 0_level_0,class,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
31953,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
34452,8,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
60897,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
36953,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1981,3,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Split into training set and testing set

- It is not a discrete variable the target value

In [15]:
x = mnist_db.drop(['class'], axis=1)
y = mnist_db['class']
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y)

print('mnist_db', mnist_db.shape)
print('x_train', x_train.shape)
print('x_test', x_test.shape)
print('y_train', y_train.shape)
print('y_test', y_test.shape)

mnist_db (4000, 785)
x_train (3000, 784)
x_test (1000, 784)
y_train (3000,)
y_test (1000,)


## Train the model

- We reached the limit of maximun iterations. What does that mean?
    - We must increase the max iterations. Review documentation 10,000
- What is the function to calculate the maximun likelihood? To validate that the algorithm is accurate?

In [16]:
model = sklearn.linear_model.LogisticRegression(max_iter=5000)
model.fit(x_train, y_train)

LogisticRegression(max_iter=5000)

## Test the model

In [17]:
y_predicted = model.predict(x_test)
##print(y_predicted)
accuracy = sklearn.metrics.accuracy_score(y_test, y_predicted)
print(accuracy)

[1 7 3 7 2 7 7 7 4 1 3 0 6 9 8 8 8 7 9 8 7 5 7 1 4 5 0 4 8 6 6 4 7 1 0 5 5
 0 1 0 5 4 5 0 7 1 0 6 2 0 9 5 3 1 2 0 9 8 3 3 2 2 0 4 9 8 8 1 1 5 3 3 2 9
 5 9 7 1 6 0 1 8 3 2 1 6 6 2 7 2 8 8 8 1 8 4 1 0 2 9 6 5 5 3 0 9 9 2 0 3 7
 4 0 2 5 2 6 0 1 6 3 1 2 5 0 3 6 7 5 4 0 7 6 8 6 8 8 0 7 8 4 2 8 5 1 0 3 0
 4 9 7 4 9 9 8 8 7 3 8 9 4 3 1 8 1 1 6 7 5 1 8 7 7 0 1 0 8 9 6 1 8 2 3 0 4
 7 4 2 1 7 2 1 1 5 3 1 4 7 1 4 0 6 7 0 1 6 4 9 2 2 5 7 9 0 6 2 4 1 8 0 6 0
 3 4 6 1 6 6 7 0 0 4 7 6 7 8 1 7 2 8 5 9 8 4 5 1 3 6 8 7 6 4 9 9 5 8 9 6 3
 2 3 2 0 6 3 8 7 4 3 3 3 2 7 1 7 9 8 4 3 4 6 7 7 2 8 3 6 2 7 2 3 2 4 5 9 2
 1 7 7 1 7 9 7 6 9 9 0 7 1 2 4 6 5 1 1 3 4 4 7 1 4 1 2 4 6 8 6 8 1 2 7 3 5
 8 6 8 1 5 5 8 9 3 6 0 1 3 0 1 0 4 3 2 6 2 8 9 0 2 7 3 9 0 0 4 0 2 4 2 8 2
 7 9 7 5 8 2 1 0 1 2 6 1 5 4 4 7 9 0 3 9 5 7 7 6 8 4 0 1 4 8 5 6 9 4 8 4 2
 6 3 9 4 9 3 4 6 6 5 1 7 2 5 8 6 0 0 4 4 4 4 4 0 0 1 9 6 2 7 2 6 1 6 8 6 9
 9 2 6 6 9 5 5 2 5 3 2 2 2 3 2 7 9 2 0 6 5 6 3 7 6 0 5 1 1 9 2 3 1 5 4 2 6
 6 9 3 4 9 3 3 0 4 2 7 8 