![](COMM_493_Banner.png)

# This is a Jupyter Notebook

This is where you will develop the "Business" part of your assignments.

You can create…


**Headings**

# Heading 1

## Header 2

### Heading 3


**Lists**

1. One
2. Two
3. Three


[Markdown Cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)

In [None]:
# You can also run code
print("Hello World")

# We will be using the folowing libraries throughout this course

![](NumPy.png)

NumPy can be used to perform a wide variety of mathematical operations on arrays. It adds powerful data structures to Python that guarantee efficient calculations with arrays and matrices and it supplies an enormous library of high-level mathematical functions that operate on these arrays and matrices.

**Why use NumPy**

NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.

[NumPy Documentation](https://numpy.org/devdocs/index.html)

In [None]:
# Python Array
a = [1,2,3,4,5]
print(a);

In [None]:
import numpy as np

b = np.array([1,2,3,4,5])
print(b);

Not that exciting is it?  It will be when we start dealing with larger datasets.

![](matplotlib_logo.png)

# Matplotlib: Visualization with Python

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible.

* Create publication quality plots.
* Make interactive figures that can zoom, pan, update.
* Customize visual style and layout.
* Export to many file formats.
* Embed in JupyterLab and Graphical User Interfaces.
* Use a rich array of third-party packages built on Matplotlib.

[matplotlib Documentation](https://matplotlib.org/stable/index.html)

In [None]:
import matplotlib.pyplot as plt

plt.plot([1, 2, 3, 4])
plt.ylabel('some numbers')
plt.show()

In [None]:
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.ylabel('Y Axis Numbers')
plt.xlabel('X Axis Numbers')
plt.show();

In [None]:
names = ['group_a', 'group_b', 'group_c']
values = [1, 10, 100]

plt.figure(figsize=(9, 3))

plt.subplot(131)
plt.bar(names, values)
plt.subplot(132)
plt.scatter(names, values)
plt.subplot(133)
plt.plot(names, values)
plt.suptitle('Categorical Plotting')
plt.show()

![](Seaborn.png)

# Seaborn: Fancy visualization with Python

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

[Seaborn Documentation](https://seaborn.pydata.org/tutorial.html)

In [None]:
# Import seaborn
import seaborn as sns

# Apply the default theme
sns.set_theme()

# Load an example dataset
tips = sns.load_dataset("tips")

# Create a visualization
sns.relplot(
    data=tips,
    x="total_bill", y="tip", col="time",
    hue="smoker", style="smoker", size="size",
)

In [None]:
dots = sns.load_dataset("dots")
sns.relplot(
    data=dots, kind="line",
    x="time", y="firing_rate", col="align",
    hue="choice", size="coherence", style="choice",
    facet_kws=dict(sharex=False),
)

![](Pandas_logo.png)

# Matplotlib: Visualization with Python

**pandas** is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.

[Pandas Documentation](https://pandas.pydata.org/docs/user_guide/index.html)

In [None]:
# Remember numpy?
In [1]: import numpy as np
In [2]: import pandas as pd

## Object creation

Creating a Series by passing a list of values, letting pandas create a default integer index:

In [None]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])

print(s)

**Creating a DataFrame by passing a NumPy array, with a datetime index using date_range() and labeled columns:**

In [None]:
dates = pd.date_range("20130101", periods=6)

print("These are the dates")
print("===================")
print(dates)

df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))

print("")
print("This is the DataFrame")
print("=====================")
print(df)

**Creating a DataFrame by passing a dictionary of objects that can be converted into a series-like structure:**

In [None]:
df2 = pd.DataFrame(
    {
        "A": 1.0,
        "B": pd.Timestamp("20130102"),
        "C": pd.Series(1, index=list(range(4)), dtype="float32"),
        "D": np.array([3] * 4, dtype="int32"),
        "E": pd.Categorical(["test", "train", "test", "train"]),
        "F": "foo",
    }
)


print(df2)

**The columns of the resulting DataFrame have different dtypes:**

In [None]:
df2.dtypes

## Viewing data

Use DataFrame.head() and DataFrame.tail() to view the top and bottom rows of the frame respectively:

In [None]:
df.head()


In [None]:
df.tail(3)

![](scikit-learn_logo.png)

# scikit-learn: Machine Learnign Libraries and More

Simple and efficient tools for predictive data analysis
Accessible to everybody, and reusable in various contexts

[scikit-learn Documentation](https://scikit-learn.org/stable/user_guide.html)

In [None]:
# Import the packages needed

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Load the dataset into a DataFrame

data=pd.read_csv('diabetes2.csv')

In [None]:
# Let's take a look at the data

data.head()

In [None]:
# Now let's look at the info of the dataset

data.info()

In [None]:
# Describe will allow us to get more insights into the data

data.describe()

In [None]:
# Lets take a look at the correlations

data.corr()

In [None]:
# Matplot Heatmap

fig, ax = plt.subplots(figsize=(9,9))
im = ax.imshow(data.corr(), interpolation='nearest')
fig.colorbar(im, orientation='vertical')

In [None]:
# Seaborn Heatmap

f,ax = plt.subplots(figsize=(8,6))
sns.heatmap(data.corr(), cmap="GnBu", annot=True, linewidths=0.5, fmt= '.1f',ax=ax)
plt.show()

In [None]:
# Lets sort the correlations

data.corr().Outcome.sort_values()

In [None]:
# Drop the target variable

y = data.loc[:,"Outcome"].values
x = data.drop(['Outcome'], axis = 1)


In [None]:
# Split the data into testing and training

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.33, random_state = 123)

In [None]:
# Create linear regression object

from sklearn import linear_model
logreg = linear_model.LogisticRegression(max_iter=150)

In [None]:
# Train the model using the training sets

logreg.fit(x_train,y_train)

In [None]:
# Make predictions using the testing set

predicted = logreg.predict(x_test)

In [None]:
# Show the accuracy of the model

print("Test accuracy: {} ".format(logreg.score(x_test, y_test)))

In [None]:
# Create a confusion Matrix to identify TP, FP, FN, TN

from sklearn.metrics import confusion_matrix
cf_matrix = confusion_matrix(y_test,predicted)
cf_matrix

In [None]:
# Create a heatmap of the Confusion Matrix

sns.heatmap(cf_matrix/np.sum(cf_matrix), annot=True, 
            fmt='.2%', cmap='Blues')
plt.show()

![](Confusion_Matrix.png)

In [None]:
# Create yoru own DataFrame and run spcific predicitons

column_names=["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age"]
test_prediction_array = [ [6, 148, 72, 35, 0, 33.6, 0.627, 50],
                          [1, 85, 66, 29, 0, 26.6, 0.351, 31] ]
test_prediciton_df = pd.DataFrame(test_prediction_array, columns=column_names)

test_prediciton_df

In [None]:
# Predict
logreg.predict(test_prediciton_df)