From b36f63c549165b4d3df5af503c6bdb615e0f778e Mon Sep 17 00:00:00 2001 From: Substructures Date: Tue, 13 Mar 2018 02:54:54 +0000 Subject: [PATCH 1/5] Create 00_series-intro_outline.md This is an (still incomplete) outline before I put an official script together. If you have requests like "be more specific", or "cover this in a later video", or "don't cover this at all", feel free to mention it. --- machine-learning/00_series-intro_outline.md | 94 +++++++++++++++++++++ 1 file changed, 94 insertions(+) create mode 100644 machine-learning/00_series-intro_outline.md diff --git a/machine-learning/00_series-intro_outline.md b/machine-learning/00_series-intro_outline.md new file mode 100644 index 0000000..78703d9 --- /dev/null +++ b/machine-learning/00_series-intro_outline.md @@ -0,0 +1,94 @@ +# Machine Learning + +(!) = Definitely needs to be reworded or explained more + +-Standardized CodeRabbit introduction + +Introduction +------------------------------------------------------------------------------ +-Data is becoming more easily available + +-Data sets are becoming larger + +-Both the private companies and government institutions are becoming better at + generating, streaming, and storing data + +-We need algorithms/models (!) that can help us gain insight from increasingly + large and complex datasets + +-Machine Learning is a discipline that studies how computers can automatically + analyze and draw conclusions from data. (!) + -*Even though technically it is a subfield of mathematics and does not + necessarily need to use computers.* + +-Examples (I'll probably choose three): + + -Facebook and Snapchat use machine learning to build technology that can + recognize faces + + -Online advertisers use machine learning to offer you the products you are + most likely to buy + + -Large banks use machine learning to detect credit fraud + + -Self-Driving cars use machine learning to saftely navagate through roads. + + -Hedge Funds use machine learning to predict movements in the stock market. + + -Netflix uses machine learning to recommend movies and TV shows you might + like. + +Body +------------------------------------------------------------------------------ +I'm actually not sure where to go with the body for the first video's transcript... + +Options: +1. How these institutions getting more data? + + -Many free applications on both your browser and your phone are designed to + record and store your data + + -retweets, likes, views, right swipes, your location, and even your tax + return information are all saved by these applications + + -Some companies store your credit card information, social security number, + and spending history + + -This is why data breaches like those at Target and Experian are so + dangerous (!) + + -Some countries have strong privacy laws that require companies to keep + your data secure and anonymous (!), others do not + + -This is a very controversial issue + +2. At the heart of any machine learning project is a model + + -There are two types of models: + + -Classification - We try to predict a category + + -Is an email important or spam? + + -Is this a picture of a cat or a dog? + + -Regression - We try to predict a number + + -How much does this house cost? + + -What is this person's credit score? + + -We can have algorithms generate these models for us + + -Models allow us to explain trends in our data or predict trends in the + future + + -Most algorithms use calculus and matrix algebra to create models + + -luckily, most programming languages have libraries or packages that allow + us to use these algorithms while the pre-packaged code deals with the + heavy lifting in the background. (!) + +Conclusion +------------------------------------------------------------------------------- +Machine Learning is pretty cool. From bd7ac334943798481736d32df6605a31ccda88ee Mon Sep 17 00:00:00 2001 From: Substructures Date: Tue, 13 Mar 2018 02:11:07 -0500 Subject: [PATCH 2/5] Update 00_series-intro_outline.md Cleaned up the file so it's actually readable. Added some ideas. Still very far from a complete outline. --- machine-learning/00_series-intro_outline.md | 175 ++++++++++++-------- 1 file changed, 103 insertions(+), 72 deletions(-) diff --git a/machine-learning/00_series-intro_outline.md b/machine-learning/00_series-intro_outline.md index 78703d9..81b4b88 100644 --- a/machine-learning/00_series-intro_outline.md +++ b/machine-learning/00_series-intro_outline.md @@ -2,92 +2,123 @@ (!) = Definitely needs to be reworded or explained more --Standardized CodeRabbit introduction +Standardized CodeRabbit introduction Introduction ------------------------------------------------------------------------------ --Data is becoming more easily available - --Data sets are becoming larger - --Both the private companies and government institutions are becoming better at - generating, streaming, and storing data +* Data is becoming more easily available +* Data sets are becoming larger +* Both the private companies and government institutions are becoming better at + generating, streaming, and storing data --We need algorithms/models (!) that can help us gain insight from increasingly - large and complex datasets - --Machine Learning is a discipline that studies how computers can automatically - analyze and draw conclusions from data. (!) - -*Even though technically it is a subfield of mathematics and does not - necessarily need to use computers.* +* We need algorithms/models (!) that can help us gain insight from increasingly + large and complex datasets --Examples (I'll probably choose three): +* Machine Learning is a discipline that studies how computers can automatically + analyze and draw conclusions from data. (!) + * Even though technically it is a subfield of mathematics and does not + necessarily need to use computers. - -Facebook and Snapchat use machine learning to build technology that can - recognize faces - - -Online advertisers use machine learning to offer you the products you are - most likely to buy - - -Large banks use machine learning to detect credit fraud - - -Self-Driving cars use machine learning to saftely navagate through roads. - - -Hedge Funds use machine learning to predict movements in the stock market. - - -Netflix uses machine learning to recommend movies and TV shows you might - like. +* Examples (I'll probably choose three): + * Facebook and Snapchat use machine learning to build technology that can + recognize faces + * Online advertisers use machine learning to offer you the products you are + most likely to buy + * Large banks use machine learning to detect credit fraud + * Self-Driving cars use machine learning to saftely navagate through roads. + * Hedge Funds use machine learning to predict movements in the stock market. + * Netflix uses machine learning to recommend movies and TV shows you might + like. Body ------------------------------------------------------------------------------ I'm actually not sure where to go with the body for the first video's transcript... -Options: -1. How these institutions getting more data? +Here are some potnetial options: - -Many free applications on both your browser and your phone are designed to - record and store your data - - -retweets, likes, views, right swipes, your location, and even your tax - return information are all saved by these applications - - -Some companies store your credit card information, social security number, - and spending history - - -This is why data breaches like those at Target and Experian are so - dangerous (!) - - -Some countries have strong privacy laws that require companies to keep - your data secure and anonymous (!), others do not - - -This is a very controversial issue - -2. At the heart of any machine learning project is a model - - -There are two types of models: - - -Classification - We try to predict a category - - -Is an email important or spam? - - -Is this a picture of a cat or a dog? - - -Regression - We try to predict a number - - -How much does this house cost? - - -What is this person's credit score? - - -We can have algorithms generate these models for us - - -Models allow us to explain trends in our data or predict trends in the - future +1. How are institutions getting more data? + * Many free applications on both your browser and your phone are designed to + record and store your data + * retweets, likes, views, right swipes, your location, and even your tax + return information are all saved by these applications + * Some companies store your credit card information, social security number, + and spending history + * This is why data breaches like those at Target and Experian are so + dangerous (!) + * Some countries have strong privacy laws that require companies to keep + your data secure and anonymous (!), others do not + * This is a very controversial issue - -Most algorithms use calculus and matrix algebra to create models +2. The Inner workings of a machine learning model + * There are two types of models: + * Classification - We try to predict a category + * Is an email important or spam? + * Is this a picture of a cat or a dog? + * Regression - We try to predict a number + * How much does this house cost? + * What is this person's credit score? + * We can have algorithms generate these models for us + * Most algorithms use calculus and matrix algebra to create models + * luckily, most programming languages have libraries or packages that allow + us to use these algorithms while the pre-packaged code deals with the + heavy lifting in the background. (!) - -luckily, most programming languages have libraries or packages that allow - us to use these algorithms while the pre-packaged code deals with the - heavy lifting in the background. (!) + 3. What are the steps involved in training a machine learning model? + * Aquire and preprocess data + * Getting data + * Companies that provide servies or applications can record user + or client activity (See retweets, likes, etc. from earlier) + * Governments often get data through surveys + * Buy data from large companies + * There is even data available for free online + * [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/index.php) + * [Kaggle](https://www.kaggle.com/datasets) + * [US Government](https://www.data.gov/) + * Web Scraping + * Preprocessing Data + * Many models require data to be in a specific format before they can + work on a dataset. + * It is very common for models to require tabular data. + * Tabular data is data that can be stored in a table or spreadsheet + with one column for each variable and one row for each observation. + * Some specialized models can take in non-tabular data with very little + intervention + * Convolutional Neural Networks can train on pictures + * Recurrent Neural Networks can train on sounds + * Train a model + * See Regression vs Classification from earlier + * There is a wide array of models that are useful for different circumstance + * Linear Regression + * Logistic Regression + * K Nearest Neighbors + * Decision Trees + * Support Vector Machines + * Random Forests + * Extreme Gradient Boosting + * Neural Networks + * This does not include Unsupervised Learning Algorithms + * K Means Clustering + * Heirarchical Clustering + * Singular Value Decomposition + * Principle Components Analysis + * Interpret results and draw conclusions + * The two most common metrics machine learning researchers and engineers + use to evaluate the quality of a model's results are precision and + recall. + * Precision + * Models can suffer from overfitting + * Drawing conclusions that are either too complex or based on too + small an amout of data + * Models can be misinterpreted + * Correlation does not guarantee causation + * (I'm sure we could find some interesting correlation from the + internet to post here) + * At the end of the data, the only purpose of machine learning is to + provide information to people or software so they can make better + decisions + +4. Something else you think would be better to talk about in an introductory + video. Conclusion ------------------------------------------------------------------------------- From 65fda7d33fb8da02cd1d60f3f794c0d1072f9319 Mon Sep 17 00:00:00 2001 From: Substructures Date: Wed, 14 Mar 2018 03:12:26 +0000 Subject: [PATCH 3/5] Create series-structure.md The start of a general course outline. --- machine-learning/series-structure.md | 33 ++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) create mode 100644 machine-learning/series-structure.md diff --git a/machine-learning/series-structure.md b/machine-learning/series-structure.md new file mode 100644 index 0000000..39e6961 --- /dev/null +++ b/machine-learning/series-structure.md @@ -0,0 +1,33 @@ +# Proposed Series Structure + +## Intro +1. What is machine learning? + * Define Machine Learning + * Why is it important? + +## Machine Learning Basics + +I'm actually not sure where we would like to start + +2. The anatomy of a dataset + * Tabular vs non-tabular data + * Independend and Dependent variables +3. Supervised vs Unsupervised Learning +4. Supervised Learning: Regression and Classification +5. Interpreting Results + * Precision vs. Recall + * Standard Error + * Overfitting + * Correlation vs. Causation + +## The Algorithms + +8. Linear Regression +9. Logistic Regression +10. K Nearest Neighbors +11. K Means Clustering +12. Decision Trees and Random Forests +13. Extreme Gradient Boosting Machines +14. Neural Networks + + From 15d4683477314e9183c5be64c163e006d006b819 Mon Sep 17 00:00:00 2001 From: Substructures Date: Wed, 14 Mar 2018 03:15:35 +0000 Subject: [PATCH 4/5] Rename 00_series-intro_outline.md to series_brainstorm.md --- .../{00_series-intro_outline.md => series_brainstorm.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename machine-learning/{00_series-intro_outline.md => series_brainstorm.md} (100%) diff --git a/machine-learning/00_series-intro_outline.md b/machine-learning/series_brainstorm.md similarity index 100% rename from machine-learning/00_series-intro_outline.md rename to machine-learning/series_brainstorm.md From 256e472a56eb0d80f8064e2fb6b7f8f377889507 Mon Sep 17 00:00:00 2001 From: Substructures Date: Wed, 14 Mar 2018 04:58:15 +0000 Subject: [PATCH 5/5] Create new outline for series intro An attempt at a more focused outline. --- machine-learning/00_series-intro_outline.md | 38 +++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 machine-learning/00_series-intro_outline.md diff --git a/machine-learning/00_series-intro_outline.md b/machine-learning/00_series-intro_outline.md new file mode 100644 index 0000000..78195c3 --- /dev/null +++ b/machine-learning/00_series-intro_outline.md @@ -0,0 +1,38 @@ +# Series Intro Outline + +Still looking at ideas more than syntax/diction at this point. + +## Code Rabbit Intro +* I'm guessing we'll have some sort of standardized theme music/opening animation here. + +## What is machine learning? +* Machine Learning is the process of using agorithms to automatically analyze data to + make predictions or identify interesting trends + * The term "data" used to refer to tables of facts and figures + * "Data" can now mean anything that has been recorded + * Pictures of cats, daily rainfall, the words in a novel, the number of views on a video +* What sort of trends and predictions? (We'll probably only use a few of these) + * Companies use machine learning to identify which employees are most likely to quit their jobs + * Genetecists use machine learning to predict how genes in your DNA will express themselves + * advertisers use machine learning to choose which ad appeals most to you + * Real Estate companies use machine learning to predict the price of a house + * The NSA uses machine learning to identify potential threats to US citizens + * Meteorologists use machine learning to predict the weather. + * Youtube uses machine learning to idenify which videos you might want to watch + * Amazon uses machine learning to predict what products you are most interested in + * Personal assistants use machine learning to both understand commands and use those commands + to give the user useful information +## Why is machine learning so important/powerful? +* Machine Learning enables computers to do things that only humans could do up until recently + * Recognize faces and voices, drive cars, +* Machine Learning makes things previously done by computers faster, more accurate, and automatic +* Machine Learning is continually becoming more powerful + * Computers are becoming faster and memory is becoming cheaper + * Companies are becoming more creative with what they consider data + * See points on pictures of cats, etc. above + * Companies have realized the importance of data and are now collecting more than ever. +## Conclusion +* A seasoned machine learning practitioner can get computers to do things that are impossible + without machine learning +* The high level of interest and talent involved the field of machine learning insures that + applications will continue to get more impressive in the future.