# Can you write code to tell the difference between an apple and an orange?

#### _A problem that sounds very easy but is impossible to solve without machine learning._

![Apples&Oranges](https://cdn.shopify.com/s/files/1/0580/0965/products/Tseng_AppleAndOrange_a28c0f98-0dfb-4a0e-9667-a5cb2fe6c85a.jpg)

Consider **a program that takes an image as input, does some analysis and outputs the type of fruit**. What would be the basic approach?

One could be to write a lot of manual rules. For example, there could be a rule to compare the number of orange pixels to the number of red pixels.

But these rules _start to break since real world inputs are messy_. 

A few questions to ask are:

1. What if the image is black and white?
2. How would you handle images in which there are no apples/oranges?
3. How would you handle images in which there are both of the classes?
4. What if the apple is a green apple or a golden apple?

There's a very fair possibility that even if a lot of rules are carefully written, exceptions might occur.

And all that trouble is just to differentiate between apples and oranges. **If the problem is changed to differentiate between grapes and bananas, one would have to start all over again.**

### Machine Learning
To solve this problem, we need an algorithm that can figure out these rules for us so that we don't have to write them by hand everytime a new problem occurs. 

To do that, we're going to train a classifier.

**A classifier can be thought of as a function that takes _data as input_ and returns the _class as its output._**

For example, one could have a classifier that would classify images as apples or oranges or a classifier that marks an email as spam or not spam.

The technique to write this classifer automatically is called supervised learning and it begins with the examples of the problem that you want to solve.

In [17]:
import numpy as np
import pandas as pd
df = pd.DataFrame(columns = ['Weight', 'Texture', 'Fruit'])
df = df.append({'Weight':140,
          'Texture': 'smooth',
          'Fruit': 'apple'}, ignore_index=True)

In [19]:
df

Unnamed: 0,Weight,Texture,Fruit
0,140,smooth,apple


# A six-line machine learning code that can solve this problem.

To use supervised learning, we'll follow a recipe with a few standard steps.

1. Collect training data.
2. Train the classifier.
3. Make predictions.

For our problem, we'll create a classifier to classify a piece of fruit.

Take an example where we go to a farm and collect data about apples and oranges and write down the measurements that describe them in a table. Those measurements are called **features.**

For simplicity, we'll use 2 features here: 
1. The weight of the fruit in grams.
2. The texture of the fruit.

"A good feature makes it easy to discriminate between different types of fruit".

Each row in our training data is an example. It describes one piece of fruit.

The last column is called the label. It identifies what type of fruit is in each row.

In this case, there's only two possibilities, apples and oranges. The whole table is our training data. 

**NOTE:** _The more training data you have, the better a classifier you can create._

For now, we'll solve the problem with tabulated data.

We'll create 2 lists, features and output.

Each element of feature list is a list of its own. The first element is the weight in gms and the second element is the texture. We've converted it to a numerical value so that it can be evaluated. 0 is smooth and 1 is bumpy.

We'll also use 0 for apple and 1 for orange in our data.

In [4]:
# current state of the code.
import sklearn
features = [[140, 0], [130, 0], [150, 1], [170, 1]]
output = [0, 0, 1, 1]

### Next, we need to train a classifier. 
The classifier that we'll be using is called a **decision tree.**
For now, it's okay to think of a classifier as a box of rules. That's because there are many different types of classifiers but the input and the output is the same.

In [6]:
#current state of the code.
from sklearn import tree
features = [[140, 0], [130, 0], [150, 1], [170, 1]]
outputs = [0, 0, 1, 1]
# At this point of time, our classifier is just an empty box since it doesn't know about apples and oranges yet. 
clf = tree.DecisionTreeClassifier() 
# To train it, we'll need a learning algorithm.
clf.fit(features, output) #fits the data or to put it simply, trains the classifier on the given data.
print(clf.predict([[150, 0]]))

[1]


## Finished!

We finally created our first classifier. A decision tree classifier can now predict if a fruit is an apple or an orange given the weight and the texture of the fruit. 

**NOTE:** This was just sample data. In real-life situations, we have more data and the predictions are more accurate. There's also a lot of data so there are other challenges associated with it.