Version 08.08.2022, A. S. Lundervold

# Python, Numpy, Pandas, Matplotlib

If you're able to successfully run through this notebook then your Python environment is likely correctly configured.

If you get any error messages when running the code in this notebook, go to https://github.com/alu042/DAT158-2022 for instructions.

## How to use Jupyter Notebook?

[Jupyter Notebook](http://jupyter.org/) is a convenient tool for experimenting with code. All text and code is written in HTML, Markdown and Python.

Use the arrow keys to navigate between cells. Press ENTER on a cell to enter edit mode. ESC to go back. (Try it now!)

In [None]:
print("This is a Jupyter cell containing Python code. Hit 'Run' in the menu to run the cell. ")

You can also run cells using **Shift+Enter** and **Ctrl+Enter**. Try running the above cell using both of these. 

Use Jupyter's Help menu above for more information.

DAT158ML will use Notebook for most of our coding. You'll get good Notebook skills after a while.

### Exercise
- Experiment with *Tab completion* and *tooltips* in Jupyter
- Read about Jupyter *magic commands*.

Hint: Google. Or have a look <a href="http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/A%20quick%20tour%20of%20IPython%20Notebook.ipynb">here</a>.

# Import libraries

These are some Python libraries we'll use frequently:

In [None]:
# To display plots directly in notebooks:
%matplotlib inline

In [None]:
# A commonly used plotting library:
import matplotlib
import matplotlib.pyplot as plt

In [None]:
# An extension of matplotlib that can generate nicer plots:
import seaborn as sns

In [None]:
# A library for efficient manipulation of matrices (and more):
import numpy as np

In [None]:
# To read, write and process tabular data:
import pandas as pd

In [None]:
# For machine learning:
import sklearn

# Test libraries

**NB:** The purpose of the following is to test your installation. Don't worry if things don't make much sense to you right now. It'll all become familiar during the course.

## `Numpy`

In [None]:
import numpy as np

In [None]:
a = np.array([1, 2, 3])
print(type(a))

In [None]:
e = np.random.random((3,3))
e

## `matplotlib`: a simple plot 

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

This should result in a figure displaying a sine function.

In [None]:
# Data to be plotted (generated using Numpy)
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2 * np.pi * t)

# Create a figure of a certain size
f = plt.figure(figsize=(8,4))

# Plot t versus s
plt.plot(t, s)

# Add title and labels:
plt.title('A simple plot')
plt.xlabel('time (s)')
plt.ylabel('voltage')

# Vis plot:
plt.show()

## `Seaborn`: a more advanced plot

In [None]:
import seaborn as sns

Source for the below example: [Link](https://seaborn.pydata.org/examples/scatterplot_categorical.html). You'll find many more via this link.

In [None]:
sns.set(style="whitegrid", palette="muted")

# Load the example iris dataset
iris = sns.load_dataset("iris")

# "Melt" the dataset to "long-form" or "tidy" representation
iris = pd.melt(iris, "species", var_name="measurement")

# Set up figure
f, ax = plt.subplots(figsize=(8,8))

# Draw a categorical scatterplot to show each observation
sns.swarmplot(x="measurement", y="value", hue="species", data=iris, size=3, ax=ax)

plt.show()

## `Pandas`

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('data/0.0-test_data.csv')

In [None]:
df.head()

In [None]:
df['age'].hist()
plt.title("Histogram of age")
plt.xlabel("Age")
plt.show()

## `scikit-learn`: machine learning

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

In [None]:
data = datasets.load_breast_cancer()

In [None]:
X = data['data']
y = data['target']
features = data['feature_names']
labels = data['target_names']

In [None]:
print(features)

In [None]:
print(labels)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [None]:
rf = RandomForestClassifier(n_estimators=100)

In [None]:
rf.fit(X_train, y_train)

In [None]:
predictions = rf.predict(X_test)

In [None]:
accuracy_score(y_test, predictions) * 100