# Let's Get This Snowball Rolling

Evaluate how a logistic regression model performs. The context is trying to help a fintech startup more quickly grow its user base. By applying the ML code to customer data, discover how machine learning has the potential to turbocharge the growth trajectory of a fintech firm.

## Instructions

1. Read in the dataset about the current customers of the startup.

2. Split the data into X and y and then into testing and training sets.

3. Fit a logistic regression classifier.

4. Create the predicted values for the testing and the training data.

5. Print a confusion matrix for the training data.

6. Print a confusion matrix for the testing data.

7. Print the training classification report.

8. Print the testing classification report.

9. Answer the following question: How does the model performance compare between the training data and the testing data?


## Resources:

Following are links to modules from the scikit learn library that will be utilized:

[Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)

[train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)

[classifiction_report](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html)

[confusion_matrix](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html)


In [1]:
# Import the required modules
import pandas as pd
from pathlib import Path
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report


## Step 1: Read in the dataset about the current customers of the startup.

In [2]:
# Read the usage_stats.csv file from the Resources folder into a Pandas DataFrame
customer_df = pd.read_csv(Path('./Resources/usage_stats.csv'))

# Review the DataFrame
customer_df.head()


Unnamed: 0,Usage Stats,Referral History,Customer Rank,target
0,1.054075,-2.010163,-0.918689,0
1,2.033251,-0.212776,-2.947451,0
2,1.049233,-2.239878,-0.77708,0
3,0.837035,-1.926558,-1.113686,0
4,1.19377,-1.550953,-1.539586,0


## Step 2: Split the data into X and y and then into testing and training sets.

In [3]:
# Split the data into X (features) and y (target)

# The y variable should focus on the target column
y = customer_df['target']

# The X variable should include all features except the target
X = customer_df.drop(columns='target')


In [5]:
# Split into testing and training sets using train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y)


## Step 3: Fit a logistic regression classifier.

In [10]:
# Declare a logistic regression model.
# Apply a random_state of 9 to the model
logistic_regression_model = LogisticRegression(random_state=9)

# Fit and save the logistic regression model using the training data
lr_model = logistic_regression_model.fit(X_train,y_train)


## Step 4: Create the predicted values for the testing and the training data.

In [12]:
#Generate training predictions
training_predictions = lr_model.predict(X_train)

#Generate testing predictions
testing_predictions = lr_model.predict(X_test)


## Step 5: Print a confusion matrix for the training data.

In [13]:
# Import the model for sklearn's confusion matrix
from sklearn.metrics import confusion_matrix

# Create and save the confustion matrix for the training data
training_matrix = confusion_matrix(y_train,training_predictions)

# Print the confusion matrix for the training data
print(training_matrix)


[[806   6]
 [ 19  76]]


## Step 6: Pring a confusion matrix for the testing data.

In [14]:
# Create and save the confustion matrix for the testing data
test_matrix = confusion_matrix(y_test,testing_predictions)

# Print the confusion matrix for the testing data
print(test_matrix)


[[277   0]
 [  3  23]]


## Step 7: Print the training classification report.

In [15]:
# Create and save the training classifiction report
training_report = classification_report(y_train,training_predictions)

# Print the training classification report
print(training_report)


              precision    recall  f1-score   support

           0       0.98      0.99      0.98       812
           1       0.93      0.80      0.86        95

    accuracy                           0.97       907
   macro avg       0.95      0.90      0.92       907
weighted avg       0.97      0.97      0.97       907



## Step 8: Print the testing classification report.

In [16]:
# Create and save the testing classifiction report
testing_report = classification_report(y_test,testing_predictions)

# Print the testing classification report
print(testing_report)


              precision    recall  f1-score   support

           0       0.99      1.00      0.99       277
           1       1.00      0.88      0.94        26

    accuracy                           0.99       303
   macro avg       0.99      0.94      0.97       303
weighted avg       0.99      0.99      0.99       303



## Step 9: Answer the following question

**Question:** How does the performance of the training and test dataset compare?

**Answer:** The test dataset seems to have experienced better performance than the training dataset, beating the training dataset classification_Report in every category.