How Machines Learn
Not so long ago, if you had picked up your phone, opened ChatGPT and asked it what to do in a certain particular situation, it would have ignored you—and people might have looked at you like you were losing your mind. Today, it's normal. Technology is growing like never before, and Machine Learning (ML) is at the center of this transformation.

Back in the 1990s, one of the earliest real-world applications of ML was the spam filter. Imagine checking your email back then—your inbox would often be flooded with unwanted messages about dubious offers, miracle diets, or suspicious job proposals. Engineers started building systems that could learn to recognize these spam patterns based on examples. The idea wasn't to hard-code the rules (like "if email contains the word 'lottery', mark as spam") but instead to let the system learn from actual spam and non-spam (ham) emails.

Over time, as users kept marking messages as spam, the system learned to identify new spam even before the user had to touch it. That was early machine learning in action.

What Machine Learning Is Not
If you download hundreds of books onto your computer, it doesn’t automatically become smart. Your computer doesn’t start writing like Shakespeare or summarizing those books for you. It just stores data. That’s not machine learning. That’s just data storage and retrieval.

Machine Learning kicks in when a computer can improve at a task by learning from data and experience—without being manually programmed for every scenario.

What Machine Learning Is
Machine Learning is the science (and art) of programming computers so they can learn from data.

Here’s a slightly more general definition:

“Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed.” — Arthur Samuel, 1959

And a more engineering-focused one:

“A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.” — Tom Mitchell, 1997

Let’s break that down with the spam filter example:

Task (T): Identify whether an email is spam or not
Experience (E): Examples of labeled emails (spam or ham)
Performance measure (P): Accuracy — how many emails are correctly classified
If the system keeps getting better at flagging spam based on past emails, it is said to be learning.

The examples used to train the system are called the training set, and each individual example is a training instance (or sample). The part of the ML system that does the actual learning and prediction is called the model. Think of the model as the brain that’s being trained.

Some popular models include:

Neural Networks
Random Forests
Support Vector Machines
Traditional Programming vs Machine Learning
To truly understand what makes ML different, let’s compare it with the traditional approach.

Traditional Approach
In traditional programming:

A human writes explicit rules for the computer to follow.
Input + Rules = Output
Example: If an email contains the word “free money,” mark it as spam. But what if spammers start writing “frëe m0ney”? The rule breaks.

 

Machine Learning Approach
In the ML approach:

The computer is given examples (data) and learns the rules by itself.
Input + Output (examples) → Learning → Model (Rules)
Then: New Input + Model = Prediction
 

So, instead of hardcoding logic for every possible spam variation, the system learns patterns in spammy emails over time and generalizes them to new messages—even ones it hasn’t seen before.

Summary
Just storing data is not machine learning.
ML is about improving performance with experience.
The system learns patterns from data rather than following hardcoded rules.
The model is what makes predictions.
One of the earliest ML use-cases was spam detection.
Unlike traditional programming, ML systems adapt as they see more data.

Installing Scikit-learn
Open your terminal (or use Jupyter Notebook / Google Colab) and run:

pip install scikit-learn

Or in Google Colab:

!pip install scikit-learn

In [2]:
%pip install scikit-learn

# 1. Linear Regression
# Question:
# Build a linear regression model to predict house prices using features like area and number of bedrooms.

# Dataset Suggestion: Use a synthetic or small CSV with 2-3 columns (area, bedrooms, price).

from sklearn.linear_model import LinearRegression
import pandas as pd

data ={
    "area":[1000,1500,2000,2500],
    "bedroom":[2,3,3,4],
    "price":[20000,30000,35000,50000]
}
df = pd.DataFrame(data)

x = df[["area","bedroom"]]
y = df['price']

model = LinearRegression()
model.fit(x,y)
print("Predicted price for room 2 and area 1800",model.predict([[1800,2]]))



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
Predicted price for room 2 and area 1800 [26750.]




In [4]:
# 3. Data Preprocessing with Scikit-Learn
# Question:
# Load a dataset and:

# Handle missing values

# Encode categorical variables using LabelEncoder or OneHotEncoder

# Normalize or scale features using StandardScaler or MinMaxScaler
import pandas as pd
from sklearn.preprocessing import StandardScaler, LabelEncoder
data = {
    'gender':['M','F','M','F'],
    'age':[25,30,22,35],
    'income':[40000,35000,55000,45000]
}

df = pd.DataFrame(data)
le = LabelEncoder()
df['gender'] = le.fit_transform(df['gender'])

# scaling
scaler = StandardScaler()
df[['age','income']] = scaler.fit_transform(df[['age','income']])
print(df)


   gender       age    income
0       1 -0.606092 -0.507093
1       0  0.404061 -1.183216
2       1 -1.212183  1.521278
3       0  1.414214  0.169031


In [16]:
# 4. Train-Test Split
# Question:
# Split a dataset into training and testing sets and evaluate the model accuracy.

# Use: train_test_split() from sklearn.model_selection

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

iris = load_iris()
x = iris.data
y = iris.target

x_train, x_test, y_train, y_test  = train_test_split(x,y, test_size=0.2)
model = DecisionTreeClassifier()
model.fit(x_train,y_train)
print("Accuracy : ", model.score(x_test,y_test))

Accuracy :  0.9666666666666667


In [19]:
#  5. K-Nearest Neighbors (KNN)
# Question:
# Classify iris flowers into species using the KNN algorithm.

# Dataset: from sklearn.datasets import load_iris

import pandas as pd 
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
x,y = iris.data, iris.target
model = KNeighborsClassifier(n_neighbors=3)
model.fit(x,y)
print("Prediciton for first folwer:- ", model.predict([x[0]]))

Prediciton for first folwer:-  [0]
