Machine Learning with Tensor and Python
===

These are JulianNF's notes from following [freecodecamp's online Machine Learning with Python certification](https://www.freecodecamp.org/learn/machine-learning-with-python), and supplemented by [Google's Tensorflow documentation](https://www.tensorflow.org/guide/tensor)

Feel free to benefit from them if you're studying on your own.

---

In [1]:
# Required in notebook:
%pip install -q sklearn
%tensorflow_version 2.x

Note: you may need to restart the kernel to use updated packages.


UsageError: Line magic function `%tensorflow_version` not found.


In [25]:
from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np # library for handling arrays better
import pandas as pd # library for data manipulation
import matplotlib.pyplot as plt # library for graphing

import tensorflow as tf

## Prepping our datasets for math
In order to use our data, we need to make sure that all of our values within it are numeric, so that we can apply math to them. To do this, we can encode non-numeric values (e.g. male = 0, female = 1).

There are two types of data:
1. **Categorical** Columns
	- These columns have non-numerical data. In the case of the Titanic datasets that we're working with, some examples include sex, class, and deck.
	- Will have a limited set of possible values (n <= N) (aka "categories") such as:
		- male, female
		- Queenstown, Cherbourg, Southampton, unknown
		- first class, second class, third class
2. **Numeric** Columns
	- These columnds are already numerical.

Feature columns are a tool in TensorFlow that allows us to define what the feature columns 😋 are in the dataset, and what they can look like.

In [26]:
training_dataframe = pd.read_csv(
	'https://storage.googleapis.com/tf-datasets/titanic/train.csv')
testing_dataframe = pd.read_csv(
	'https://storage.googleapis.com/tf-datasets/titanic/eval.csv')

training_survived = training_dataframe.pop('survived')
testing_survived = testing_dataframe.pop('survived')

categorical_columns = [
	'sex',
	'n_siblings_spouses',
	'parch',
	'class',
	'deck',
	'embark_town',
	'alone'
]

numeric_columns = [
	'age',
	'fare'
]

feature_columns = []

for feature_name in categorical_columns:
	vocabulary = training_dataframe[feature_name].unique() # get all unique possible values (aka categories) in the given column
	feature_columns.append(
		tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary)
	)

for feature_name in numeric_columns:
	feature_columns.append(
		tf.feature_column.numeric_column(feature_name, dtype=tf.float32)
	)
print(feature_columns)


[VocabularyListCategoricalColumn(key='sex', vocabulary_list=('male', 'female'), dtype=tf.string, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='n_siblings_spouses', vocabulary_list=(1, 0, 3, 4, 2, 5, 8), dtype=tf.int64, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='parch', vocabulary_list=(0, 1, 2, 5, 3, 4), dtype=tf.int64, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='class', vocabulary_list=('Third', 'First', 'Second'), dtype=tf.string, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='deck', vocabulary_list=('unknown', 'C', 'G', 'A', 'B', 'D', 'F', 'E'), dtype=tf.string, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='embark_town', vocabulary_list=('Southampton', 'Cherbourg', 'Queenstown', 'unknown'), dtype=tf.string, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='alone', vocabulary_list=('n', 'y'), dtype=tf.string, def