# Welcome to Jupyter!

Good job making it this far into the solar system.

Google Colab is a free Jupyter notebook that runs in the browser via your Google account. If you want some additional resources on Colab, you can find them here:

- [Welcome To Colaboratory - Colaboratory](https://colab.research.google.com/notebooks/intro.ipynb#scrollTo=5fCEDCU_qrC0)
- [Overview of Colaboratory Features - Colaboratory](https://colab.research.google.com/notebooks/basic_features_overview.ipynb)

## The Building Blocks
Jupyter notebooks are made of two types of blocks.

- Code blocks, written in Python
- Blocks of **markdown**, a language used for rendering text that can be learned extremely quickly

Being able to interweave code and rich text allows ML programmers to communicate about their code in an expressive, effective way.

If possible, it's a good idea to open your Quantic lesson on a separate monitor, or split your screen vertically so you can interact with both interfaces at the same time. We'll be switching back and forth throughout this lesson. Check back in on the Quantic interface for now!

# Markdown Blocks
That was a **big header**.
## This is a smaller one.
That's *much* better.
### Maybe an h3?
#### It's rare to make header smaller than h4.

`print("Not bad!")`
Wait, I want more code than that!
```
print("That's much more space!")
x = 5
y = 8
print(x+y)
```
I can even make it highlight syntax if I specify the language:
```python
print("That's much more beautiful!")
x = 5 # colored comments!
y = 8 # incredible.
print(x+y)
```

**Links** begin with their *title* in `[brackets]` and their URL in `(parentheses)`, like this: [Markdown Crash Course](https://www.youtube.com/watch?v=HUBNt18RFbo)

What about some math? $y=f(x)$. Use a single `$` for inline and two `$$` for equations on a line of their own.

$$\dfrac{1}{2m} \sum_{i=1}^{m}(f(x_i) - y_i)^2$$

Math is rendered using $LaTeX$. You can learn more about it [here](https://web.mit.edu/rsi/www/pdfs/new-latex.pdf). Cool!

## Python Code Blocks

In [None]:
# Try holding Ctrl/Cmd + Enter while your cursor's inside the block
print("Beep boop! Isn't that neat?")

In [None]:
# Try playing the block below *before* playing this one
your_name = "Joe" # put your name here!

In [None]:
print("Hey there,", your_name)

## NumPy

In [None]:
# programmer convention is to abbreviate numpy as np when importing
import numpy as # !fill this in!

# NumPy arrays are Python lists with superpowers
# create them by converting a normal list to a numpy array like so:
# numpy_array = np.array(pylist)
pylist = [1,2,3,4]
numpy_array = # !create a NumPy array with pylist!
print(f"A Python list looks like: \n{pylist}\n")
print(f"A NumPy array looks like: \n{numpy_array}\n")

# if we have more than one feature, our array becomes two dimensional
bigger_numpy_array = np.array([[1,5],[2,6],[3,7], [4,8]])
print(f"A two dimensional array looks like: \n{bigger_numpy_array}\n")
print(f"Dimensions in numpy_array: \n{numpy_array.ndim}\nDimensions in bigger_numpy_array: \n{bigger_numpy_array.ndim}\n")
print(f"Shape of numpy_array: \n{numpy_array.shape}\nShape of bigger_numpy_array: \n{bigger_numpy_array.shape}\n")

In [None]:
numpy_array = np.array([1,2,3,4])
# you can access an item in a NumPy array via its index, seen below
print(numpy_array[0], numpy_array)
numpy_array[0] = 2
print(numpy_array[0], numpy_array)
# fill in the following line to change the third item to 8
# (remember the list is zero-indexed)
numpy_array[] = 
print(numpy_array)

In [None]:
numpy_array = np.array([1,2,3,4])
second_np_array = np.array([2,4,6,8])

# increase each item in the array by a single number
array_addition = numpy_array + 5
print(array_addition)

# multiply each item in the array by a single number
array_product = numpy_array * 5
print(array_product)

# add two arrays with each other
arrays_added = numpy_array + second_np_array
print(arrays_added)

# and multiply arrays with each other
arrays_multiplied = numpy_array * second_np_array
print(arrays_multiplied)

# look at the output for array-on-array operations,
# ensure you understand the gist of how they work!

# sum adds each item in the array into a single sum
print(numpy_array.sum())

# the size attribute tells us how many items are in our array
# this will be useful to find m, our count of training examples, later on
print(numpy_array.size)

## The Dot Product
One important bit of math we've got to cover is the dot product.

It simply means to multiply two arrays with each other, then sum each product until you're left with a single number. It's used in many places in ML to combine sets of numbers into a single value. Think of it like a sci-fi laser gun that condenses its target from two sets of numbers all the way into just one number.

Note: The number of rows in the first array must match the number of columns in the second array for the dot product to work.

In [None]:
a1 = np.array([1,2,3])
a2 = np.array([2,3,4])
# first we'll do it by hand
product = a1 * a2
dot_product = product.sum()
print(dot_product)
# there are two fast and easy ways to do this with numpy
# with the inbuilt function:
print(np.dot(a1,a2))
# or with the @ operator:
print(a1 @ a2)

## MatPlotLib
Our next library, Matplotlib, is simpler.

We'll use Pyplot, Matplotlib's set of graphing functions, to graph our data.

In [None]:
import matplotlib.pyplot as plt

# our first example is a scatter plot
# we need coordinates for x and y data
x_data = np.array([1,4,2,3,5,9,7])
y_data = np.array([2,1,4,7,3,4,8])
plt.scatter(x_data,y_data)

In [None]:
# to graph a line instead of a scatter, use plot
plt.plot(x_data,y_data)

We can also plot equations. For this example, we'll generate some values for $x$ using the NumPy function np.linspace(). The function takes the start, stop, and number of values as arguments and returns an array with numbers spaced out evenly between the start and stop values. (Unlike similar functions in Python, the 'stop' argument of np.linspace() is inclusive. In other words, np.linspace() will stop after it outputs the stop value.) To get our $y$-values, we'll perform math operations on those $x$-values. The plt.plot function then creates a graph from those two arrays.

In [None]:
x_data = np.linspace(1,10,10) # from 1-10, give us 10 numbers
plt.plot(x_data, 1.5 * x_data + 5) # what's the function?

Finally, let's learn how to label graphs. Thoughtfully documenting your work enables others to quickly comprehend it and build on it!

In [None]:
x_data = np.linspace(1,10,10) # from 1-10, give us 10 numbers
plt.plot(x_data, 2*x_data**4, color="purple", label="Our Rocket")
plt.plot(x_data, x_data**4, color="gold", label="Our Competitor's Rocket")
plt.title("Spaceflight to Jupiter")
plt.legend()
plt.xlabel("Time Since Liftoff (in minutes)")
plt.ylabel("Altitude (in miles)")

## Scikit Learn
Our final library is the ultra high-powered ML toolkit SciKit Learn. SciKit is built on top of NumPy and Matplotlib, so it integrates very easily with what we're doing.

For our first project in the next lesson, we'll implement the learning algorithms from scratch to best understand them, but SciKit Learn can automate most of the most popular ML tasks.

To begin getting acquainted with the library, we'll use the Train-Test Split module to separate our data into training and test sets.

We'll also provide an optional playground with SciKit's inbuilt regression constructor, which will allow you to randomly generate datasets with linear relationships to practice more linear regression.

In [None]:
from sklearn.model_selection import train_test_split
import numpy as np

all_X = np.array([1,2,3,4,5,6,7,8,9,10])
all_y = np.array([1,2,3,4,5,6,7,8,9,10])
#random_state is a seed value for replicating results
prepared_data = train_test_split(all_X, all_y, test_size=0.3, random_state=3242) 
print(prepared_data)
# unpack these arrays for easier access
X_train, X_test, y_train, y_test = prepared_data
print(f"Features of our training set: {X_train}")

## Putting It All Together
Now we'll put what we've just learned to use and graph a scatter plot of our ML 101 Grade Predictor data. After that's done, we'll be well prepared to fit a linear regression model to it in the next lesson.

In [None]:
# turn these two lists into NumPy arrays named X and y, respectively
py_X = [10,24,37,51,65,79,88]
py_y = [57,66,79,75,90,87,98]

# !!CODE HERE!!

# use train_test_split to separate the X and y data
X_train, X_test, y_train, y_test = # !!CODE HERE!!

# create two scatter plots, m1 and m2, that graph training and testing sets
# give them different colors and labels
m1 = plt.scatter() # !!CODE HERE!!
m2 = plt.scatter() # !!CODE HERE!!

# title the graph "ML 101 Grade Predictor" and show the legend
# then label x and y axes "Minutes studied" and "Smartcase grade," respectively
# !!CODE HERE!!

## Further Practice
As we all know, coding skills are developed by actually coding. With that in mind, here are some ways you can take what we've learned today a little further. If anything failed to stick, or you're overwhelmed by anything in the next lesson, feel free to come back here and practice!

In [None]:
from sklearn import datasets

# if you want to play with scikit, you can also make a random regression problem like so:
X, y = datasets.make_regression(n_samples=100, n_features=1, noise=20, random_state=4)
# try graphing this data instead of our grade predictor data for added practice!

# try creating a train_test split for this data

# also try coming up with your own values for slope and intercept, such as:
w = .5
b = 53
plt.plot(X, w * X + b)
# you can overlay them on top of the scatter plot to start getting a feel for linear regression!