Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions machine-learning/00_series-intro_outline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Series Intro Outline

Still looking at ideas more than syntax/diction at this point.

## Code Rabbit Intro
* I'm guessing we'll have some sort of standardized theme music/opening animation here.

## What is machine learning?
* Machine Learning is the process of using agorithms to automatically analyze data to
make predictions or identify interesting trends
* The term "data" used to refer to tables of facts and figures
* "Data" can now mean anything that has been recorded
* Pictures of cats, daily rainfall, the words in a novel, the number of views on a video
* What sort of trends and predictions? (We'll probably only use a few of these)
* Companies use machine learning to identify which employees are most likely to quit their jobs
* Genetecists use machine learning to predict how genes in your DNA will express themselves
* advertisers use machine learning to choose which ad appeals most to you
* Real Estate companies use machine learning to predict the price of a house
* The NSA uses machine learning to identify potential threats to US citizens
* Meteorologists use machine learning to predict the weather.
* Youtube uses machine learning to idenify which videos you might want to watch
* Amazon uses machine learning to predict what products you are most interested in
* Personal assistants use machine learning to both understand commands and use those commands
to give the user useful information
## Why is machine learning so important/powerful?
* Machine Learning enables computers to do things that only humans could do up until recently
* Recognize faces and voices, drive cars,
* Machine Learning makes things previously done by computers faster, more accurate, and automatic
* Machine Learning is continually becoming more powerful
* Computers are becoming faster and memory is becoming cheaper
* Companies are becoming more creative with what they consider data
* See points on pictures of cats, etc. above
* Companies have realized the importance of data and are now collecting more than ever.
## Conclusion
* A seasoned machine learning practitioner can get computers to do things that are impossible
without machine learning
* The high level of interest and talent involved the field of machine learning insures that
applications will continue to get more impressive in the future.
33 changes: 33 additions & 0 deletions machine-learning/series-structure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Proposed Series Structure

## Intro
1. What is machine learning?
* Define Machine Learning
* Why is it important?

## Machine Learning Basics

I'm actually not sure where we would like to start

2. The anatomy of a dataset
* Tabular vs non-tabular data
* Independend and Dependent variables
3. Supervised vs Unsupervised Learning
4. Supervised Learning: Regression and Classification
5. Interpreting Results
* Precision vs. Recall
* Standard Error
* Overfitting
* Correlation vs. Causation

## The Algorithms

8. Linear Regression
9. Logistic Regression
10. K Nearest Neighbors
11. K Means Clustering
12. Decision Trees and Random Forests
13. Extreme Gradient Boosting Machines
14. Neural Networks


125 changes: 125 additions & 0 deletions machine-learning/series_brainstorm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Machine Learning

(!) = Definitely needs to be reworded or explained more

Standardized CodeRabbit introduction

Introduction
------------------------------------------------------------------------------
* Data is becoming more easily available
* Data sets are becoming larger
* Both the private companies and government institutions are becoming better at
generating, streaming, and storing data

* We need algorithms/models (!) that can help us gain insight from increasingly
large and complex datasets

* Machine Learning is a discipline that studies how computers can automatically
analyze and draw conclusions from data. (!)
* Even though technically it is a subfield of mathematics and does not
necessarily need to use computers.

* Examples (I'll probably choose three):
* Facebook and Snapchat use machine learning to build technology that can
recognize faces
* Online advertisers use machine learning to offer you the products you are
most likely to buy
* Large banks use machine learning to detect credit fraud
* Self-Driving cars use machine learning to saftely navagate through roads.
* Hedge Funds use machine learning to predict movements in the stock market.
* Netflix uses machine learning to recommend movies and TV shows you might
like.

Body
------------------------------------------------------------------------------
I'm actually not sure where to go with the body for the first video's transcript...

Here are some potnetial options:

1. How are institutions getting more data?
* Many free applications on both your browser and your phone are designed to
record and store your data
* retweets, likes, views, right swipes, your location, and even your tax
return information are all saved by these applications
* Some companies store your credit card information, social security number,
and spending history
* This is why data breaches like those at Target and Experian are so
dangerous (!)
* Some countries have strong privacy laws that require companies to keep
your data secure and anonymous (!), others do not
* This is a very controversial issue

2. The Inner workings of a machine learning model
* There are two types of models:
* Classification - We try to predict a category
* Is an email important or spam?
* Is this a picture of a cat or a dog?
* Regression - We try to predict a number
* How much does this house cost?
* What is this person's credit score?
* We can have algorithms generate these models for us
* Most algorithms use calculus and matrix algebra to create models
* luckily, most programming languages have libraries or packages that allow
us to use these algorithms while the pre-packaged code deals with the
heavy lifting in the background. (!)

3. What are the steps involved in training a machine learning model?
* Aquire and preprocess data
* Getting data
* Companies that provide servies or applications can record user
or client activity (See retweets, likes, etc. from earlier)
* Governments often get data through surveys
* Buy data from large companies
* There is even data available for free online
* [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/index.php)
* [Kaggle](https://www.kaggle.com/datasets)
* [US Government](https://www.data.gov/)
* Web Scraping
* Preprocessing Data
* Many models require data to be in a specific format before they can
work on a dataset.
* It is very common for models to require tabular data.
* Tabular data is data that can be stored in a table or spreadsheet
with one column for each variable and one row for each observation.
* Some specialized models can take in non-tabular data with very little
intervention
* Convolutional Neural Networks can train on pictures
* Recurrent Neural Networks can train on sounds
* Train a model
* See Regression vs Classification from earlier
* There is a wide array of models that are useful for different circumstance
* Linear Regression
* Logistic Regression
* K Nearest Neighbors
* Decision Trees
* Support Vector Machines
* Random Forests
* Extreme Gradient Boosting
* Neural Networks
* This does not include Unsupervised Learning Algorithms
* K Means Clustering
* Heirarchical Clustering
* Singular Value Decomposition
* Principle Components Analysis
* Interpret results and draw conclusions
* The two most common metrics machine learning researchers and engineers
use to evaluate the quality of a model's results are precision and
recall.
* Precision
* Models can suffer from overfitting
* Drawing conclusions that are either too complex or based on too
small an amout of data
* Models can be misinterpreted
* Correlation does not guarantee causation
* (I'm sure we could find some interesting correlation from the
internet to post here)
* At the end of the data, the only purpose of machine learning is to
provide information to people or software so they can make better
decisions

4. Something else you think would be better to talk about in an introductory
video.

Conclusion
-------------------------------------------------------------------------------
Machine Learning is pretty cool.