# Machine Learning with Python
> This post is about Machine Learning using Python

- toc: true
- branch: master
- badges: true
- comments: true
- categories: [machine learning, python, course]
- image: images/Machine-Learning-Python.jpg
- hide: false

(Ref: https://machinelearningmastery.com)

 Machine learning is broken down into a 5-step process:
 
 
    Step 1: Adjust Mindset. Believe you can practice and apply machine learning.
        What is Holding you Back From Your Machine Learning Goals?
        Why Machine Learning Does Not Have to Be So Hard
        How to Think About Machine Learning
        Find Your Machine Learning Tribe
        
    Step 2: Pick a Process. Use a systemic process to work through problems.
        Applied Machine Learning Process
        
    Step 3: Pick a Tool. Select a tool for your level and map it onto your process.
        Beginners: Weka Workbench.
        Intermediate: Python Ecosystem.
        Advanced: R Platform.
        Best Programming Language for Machine Learning
        
    Step 4: Practice on Datasets. Select datasets to work on and practice the process.
        Practice Machine Learning with Small In-Memory Datasets
        Tour of Real-World Machine Learning Problems
        Work on Machine Learning Problems That Matter To You
        
    Step 5: Build a Portfolio. Gather results and demonstrate your skills.
        Build a Machine Learning Portfolio
        Get Paid To Apply Machine Learning
        Machine Learning For Money


## __What Is Holding You Back From Your Machine Learning Goals?__

The first question I ask them is what is stopping them from getting started?

In this post, I want to touch on some self-limiting beliefs I see crop up in my email exchanges and discussions with coaching students.



## _Self-Limiting Belief_

A self-limiting belief is something that you assume to be true that is limiting your progress. You presuppose something about yourself or about the thing you want to achieve. The problem is you hold that belief to be true and you don’t question it.

3 types of self-limiting


- __If-then Beliefs:__ e.g. If I get started in machine learning, I will fail because I am not good enough.
- __Universal Beliefs:__ e.g. All Data Scientists have a Ph.D. and are mathematics rock gods.
- __Personal and Self-Esteem Beliefs:__ e.g. I’m not good enough to be a machine learner.


## _Waiting To Get Started_

I think the biggest class of limiting belief I see is the belief that you cannot get started until you have some specific prior knowledge.

The problem is that the prior knowledge you think you need is either not required or is so vast in scope that even experts in that subject don’t know it all.

For example: “I need to KNOW statistics“. See how ambiguous that belief is. How much statistics, what areas of statistics and why do you need to know them before you can start your investigation into machine learning?

I can’t get into machine learning until…

    …I get a degree or higher degree
    …I complete a course
    …I am good at linear algebra
    …I know statistics and probability theory
    …I have mastered the R programming language


__You can get started in machine learning today, right now. Run your first classifier in 5 minutes. You’re in. Now, start blocking out what it is from machine learning that you really want?__

## _Awaiting Perfect Conditions_

Another class of __self-limiting belief__ is where you are __waiting__ for the perfect environment or conditions before taking the leap. Things will never be perfect, leap and make a mess, then leap again.

I can’t get started in machine learning because…

    …I don’t have the time right now
    …I don’t have a fast CPU, GPU or a bazillion MB of RAM
    …I am just a student right now
    …I am not a good programmer at the moment
    …I am very busy at work right now


__It does take a lot of time and effort to get good at machine learning, but not all at once and not all at the beginning.__

You can make good progress with __a few hours a week, or tens of minutes per day__. There are plenty of __small snack-sized tasks__ you could take on to get started in machine learning. 

You can get started, __it is just going to take some sacrifice__, like all good things in life.

### _Struggling or Tried and Failed_

Machine learning is hard but no harder than other technical skills like programming. 
__It takes persistence and dedication.__ It’s applied and empirical and demands trial and error.

I can’t get into machine learning because…

    …I feel overwhelmed
    …I don’t understand x
    …I will never be as good as y
    …I don’t know what to do next
    …I can’t get my program to work

__My advice is to cut scope or change direction.__

### _What is your self-limiting belief?_

Do you have a self-limiting belief? Think about it. What are your goals and why do you think you are not there yet?

Do you have a goal to get into machine learning, to become a data scientist or a machine learning engineer but have not taken the first step?

    Are you waiting to acquire some perfect set of skills before getting started?
    Are you waiting for the perfect conditions before getting started?
    Have you taken a first step and abandoned the trail?
    Where do you want to be and what are you struggling with?

## __What If I Am Not Good At Mathematics__

Problen: The think that mathematicians are smarter than they are and that they cannot excel in a subject until they “know the math”. 

I have seen this first hand, and I have seen it stop people from getting started.

__In this post, I want to convince you that you can get started and make great progress in machine learning without being strong in mathematics.__

### _Get Started and Learn by Doing_

I didn’t learn boolean logic before I started programming.

I followed an empirical path that involved __trial and error__. It is slow and I wrote a lot of bad code, __but__ I was passionately interested and I could see progress.

I hunted for conceptual and practical tools I could use to overcome the limitations __I was actually experiencing. This was a powerful learning tool.__

### _The Danger Zone_

__I like it when my programs don’t work.__ It means I have to roll up my sleeves and really understand what is going on.

You can get a long way by copy and pasting code without really understanding it. __You only need to understand blocks of code as functional units that do a thing you need done.__ Glue enough of them together and you have a program that solves the problem you need solved.

This __empirical hackery__ is a great way to learn fast, but __a terrifying way__ to build production systems.

This is an important distinction to make. The often spoken of __“danger zone”__ is when systems built from empirical learning are made operational and the author does not really know how it works or what the results actually mean.

### _The Technician_

You can get started in machine learning today, __empirically__. Three options available to you are:

   1. Learn to drive a tool like scikit-learn.
   2. Use libraries that provide algorithms and write little programs
   3. Implement algorithms yourself from tutorials and books.


This can be the path of the technician __from beginner to intermediate__ that is learning the mathematics required for a technique, just-in-time.

Define __small problems__, solve them methodically and present the results of what you have learned on your blog.

There will be interesting algorithms that you will want to know more about, such as __what a particular parameter actually does when you change it or how to get better results from a particular algorithm.__

This will drive you to want (need) to understand __how that technique really works and what it is doing.__

You can remain the empiricist. I call this the path of the technician.

You can __build up an empirical intuition of which methods__ to use and how to use them. You can also learn just enough algebra to be able to read algorithm descriptions and turn them into code.

There is a path here for the __skilled technician__ to create tools, plug-in’s and even operational systems that use machine learning.

The __technician is contrasted to the theoretician__ at the other end of the scale. The theoretician can:

    1. Internalize existing methods.
    2. Propose extensions to existing methods.
    3. Devise entirely new methods.

The __theoretician__ may be able to demonstrate the capability of a method in the abstract, but is likely __insufficiently skilled to turn the methods into code__ beyond prototype demonstration systems at best.

___You can learn as little or as much mathematics as you like, just in time. Focus on your strengths and be honest about your limitations.___

### _Mathematics is Critical, Later_

If you have to learn linear algebra __just-in-time__, why not learn it fully more completely up front and understand the machine learning methods at this deep level from the beginning?

__This is certainly an option__, perhaps the most efficient option which is why it is the path used to teach in university. It’s just not the only option available to you.

Just like learning to program by starting with logic and abstract concepts, internalizing machine learning theory may __not be the most efficient way for you to get started.__

___You learned that the technician can learn the mathematical representations and descriptions of machine learning algorithms just-in-time. You also learned that the danger zone for the technician is overconfidence and the risk of putting systems into production that are poorly understood.___

## __Why Machine Learning Does Not Have to Be So Hard?__

Useful skills we use every day like reading, driving, and programming were not learned this way and were in fact learned using an inverted top-down approach. 

__This top-down approach can be used to learn technical subjects directly such as machine learning, which can make you a lot more productive a lot sooner, and be a lot of fun.__

In contrast, technical topics like mathematics, physics, and even computer science are taught using a __bottom-up approach__.

You will know:

    - The bottom-up approach used in universities to teach technical subjects and the problems with it.
    - How people learn to read, drive, and program in a top-down manner and how the top-down approach works.
    - The frame of machine learning and even mathematics using the top-down approach to learning and how to start to make rapid progress as a practitioner.


This is an __important blog post__, because I think it can really help to shake you out of the bottom-up, university-style way of learning machine learning.

This post is divided into seven parts; they are:

    1. Bottom-Up Learning
    2. Learning to Read
    3. Learning to Drive
    4. Learning to Code
    5. Top-Down Learning
    6. Learn Machine Learning
    7. Learning Mathematics


### _Bottom-Up Learning_

Think back to __high-school or undergraduate studies__ and the fundamental fields you may have worked through: examples such as: Mathematics, as mentioned, Biology, Chemistry etc.

Think about how the material was laid out, __week-by-week, semester-by-semester, year-by-year. Bottom-up, logical progression.__

__The problem is, the logical progression through the material may not be the best way to learn the material in order to be productive.__

We are not robots executing a learning program. We are emotional humans that need motivation, interest, attention, encouragement, and results.

If you have completed __a technical subject__, think back to how to you actually learned it. __I bet it was not bottom-up.__

#### _Learning to Drive_

I remember hiring a driving instructor and doing driving lessons. Every single lesson was practical, in the car, practicing the skill I was required to master, driving the vehicle in traffic.

Here’s what I did not study or discuss with my driving instructor:

    The history of the automobile.
    The theory of combustion engines.
    The common mechanical faults in cars.
    The electrical system of the car.
    The theory of traffic flows.

In fact, I never expect to learn these topics. I have zero need or interest and they will not help me realize the thing I want and need, which is safe and easy personal mobility.

If the car breaks, I’ll call an expert.

#### _Learning to Code_

I started programming without any idea of what coding or software engineering meant.

### _Top-Down Learning_

The __bottom-up__ approach is not just a common way for teaching technical topics; it looks like the only way.

The designers of university courses, masters of their subject area, are trying to help. They are laying everything out to give you the logical progression through the material that they think will get you to the skills and capabilities that you require (hopefully).

And as I mentioned, it can work for some people.

__It does not work for me, and I expect it does not work for you.__

__Don’t start with definitions and theory. Instead, start by connecting the subject with the results you want and show how to get results immediately.__

Lay out a program that focuses on practicing this process of getting results, __going deeper into some areas as needed__, but always in the context of the result they require.

___Be careful not to use traditional ways of thinking or comparison if you take this path.___


    It is iterative:       Topics are revisited many times with deeper understanding.
    It is imperfect:       Results may be poor in the beginning, but improve with practice.
    It requires discovery: The learner must be open to continual learning and discoverery.
    It requires ownership: The learner is responsible for improvement.
    It requires curiosity: The learner must pay attention to what interests them and follow it.


### _Learning Machine Learning_

Are you following a top-down type approach but are riddled with guilt, math envy, and insecurities?

You are not alone; I see this every single day in helping beginners on this website.

To connect the dots for you, __I strongly encourage you to study machine learning using the top-down approach.__

    Don’t start with precursor math.
    Don’t start with machine learning theory.
    Don’t code every algorithm from scratch.


__1- Start by learning how to work through very simple predictive modeling problems using a fixed framework with free and easy-to-use open source tools.__

__2- Practice on many small projects and slowly increase their complexity.__

__3- Show your work by building a public portfolio.__

__You can learn machine learning by practicing predictive modeling, not by studying math and theory.__

Not only is this the way I learned and continue to practice machine learning, but it has helped tens of thousands of my students (and the many millions of readers of this blog).

__A top-down approach might be to:__

    - Implement the method in a high-level library such as scikit-learn and get a result.
    - Implement the method in a lower-level library such as NumPy/SciPy and reproduce the result.
    - Implement the method directly using matrices and matrix operations in NumPy or Octave.
    - Study and explore the matrix arithmetic operations involved.
    - Study and explore the matrix decomposition operations involved.
    - Study methods for approximating the eigendecomposition of a matrix.
    And so on…


The goal provides the context and you can let your curiosity define the depth of study.

Painted this way, studying math is no different to studying any other topic in programming, machine learning, or other technical subjects.


    - The bottom-up approach used in universities to teach technical subjects and the problems with it.
    - How people learn to read, drive, and program in a top-down manner and how the top-down approach works.
    - The frame of machine learning and even mathematics using the top-down approach to learning and how to start to make rapid progress as a practitioner.


## __How to Think About Machine Learning__

You can achieve impressive results with machine learning and find solutions to very challenging problems. 

But this is __only a small__ corner of the broader field of machine learning often __called predictive modeling or predictive analytics__.

In this post, you will discover _how to change the way you think about machine learning in order to best serve you as a machine learning practitioner._


    - What machine learning is and how it relates to artificial intelligence and statistics.
    - The corner of machine learning that you should focus on.
    - How to think about your problem and the machine learning solution to your problem.


__Machine learning is a large field of study, and not all much of it is going to be relevant to you if you’re focused on solving a problem.__

### __What is Machine Learning?__

Machine learning is a field of computer science concerned with programs that learn.

There are many types of learning, many types of feedback to learn from, and many things that can be learned.

This could encompass diverse types of learning, such as:

    - Developing code to investigate how populations of organisms “learn” to adapt to their environment over evolutionary time.
    - Developing code to investigate how one neuron in the brain “learns” in response to stimulus from other neurons.
    - Developing code to investigate how ants “learn” the optimal path from their home to their food source.
Another case that you may be more familiar with is:

    - Developing code to investigate how to “learn” patterns in historical data.

This is less glamorous, but is the basis of the small corner of machine learning in which we as practitioners are deeply interested.

### __What About Statistics?__

Statistics, or applied statistics with computers, is a sub-field of mathematics that is concerned with __describing and understanding the relationships in data__.

This could encompass diverse types of learning such as:

    - Developing models to summarize the distribution of a variable.
    - Developing models to best characterize the relationship between two variables.
    - Developing models to test the similarity between two populations of observations.

__It also overlaps with the corner of machine learning interested in learning patterns in data.__

Many methods used for understanding data in statistics can be used in machine learning to __learn patterns in data__. These tasks could be called machine learning or applied statistics.

### __Your Machine Learning__

Machine learning is a large field of study, and it can help you solve specific problems.

__But you don’t need to know about all of it.__



In fact, when it comes to learning relationships in data:

    - You’re not investigating the capabilities of an algorithm.
    - You’re not developing an entirely new theory or algorithm.
    - You’re not extending an existing machine learning algorithm to new cases.


__So what parts of machine learning do you need to focus on?__

I think there are two ways to think about machine learning:

    - In terms of the problem you are trying to solve.
    - In terms of the solution you require.


__Your Machine Learning Problem__

Your problem can best be described as the following:

    Find a model or procedure that makes best use of historical data comprised of inputs and outputs in order to skillfully predict outputs given new and unseen inputs in the future.

Based on this description:
        
    It discards entire sub-fields of machine learning, such as unsupervised learning, to focus on one type of learning called supervised learning

__Your Machine Learning Solution__

The solution you require is best described as the following:

    A model or procedure that automatically creates the most likely approximation of the unknown underlying relationship between inputs and associated outputs in historical data.

In fact, problems of this type resist top-down hand-coded solutions. If you could sit down and write some if-statements to solve your problem, you would not need a machine learning solution. It would be a programming problem.

___The type of machine learning methods that you need will learn the relationship between the inputs and outputs in your historical data.___

## __Applied Machine Learning Process__

Over time, working on applied machine learning problems ___you develop a pattern or process for quickly getting to good robust results.___

Once developed, ___you can use this process again and again on project after project. The more robust and developed your process, the faster you can get to reliable results.___

### __The skeleton of the process for working a machine learning problem__

This  can be as a __starting point or template__ on your next project

#### __5-step process__

    
    1. Define the Problem
    2. Prepare Data
    3. Spot Check Algorithms
    4. Improve Results
    5. Present Results


There is a lot of __flexibility in this process__.

For example, the “prepare data” step is typically broken down into analyze data (summarize and graph) and prepare data (prepare samples for experiments).

It’s __a great big production line__ that I try to move through __in a linear manner.__

The great thing in __using automated tools__ is that you __can go back a few steps__ (say from “Improve Results” back to “Prepare Data”) and insert a new transform of the dataset and __re-run experiments__ in the intervening steps ___to see what interesting results come out and how they compare to the experiments you executed before.___

___NOTE:___ The process I use has been adapted from the standard data mining process of knowledge discovery in databases (or KDD).