<h4 style="text-align:center"><b>Machine Learning</b></h4>

<b>Machine Learning</b> is the science of getting computers to leran and act like human do, and improve their learning over time in autonomus fashion, by feeding them data and infromation in the form of observations and real-world interactions.

<img src='https://media.geeksforgeeks.org/wp-content/uploads/20250110153147721466/Machine-Learning-Techniques.webp'></img>

### Types of Machine Learning

Machine Learning algorithms can be broadly categorized into **three main types** based on their learning approach and the nature of the data they work with.

---

#### 1. Supervised Machine Learning  
Supervised learning algorithms are trained on **labelled data** to map input features to targets.

- **Regression**: Predicts continuous values (e.g., house prices, temperature).  
- **Classification**: Assigns data to specific categories (e.g., spam detection, disease diagnosis).  

---

#### 2. Unsupervised Machine Learning  
Unsupervised learning algorithms identify **patterns in unlabelled data**.  

There are two main types of unsupervised learning:  

- **Clustering**: Groups similar data points together (e.g., customer segmentation, document grouping).  
- **Dimensionality Reduction**: Reduces the number of variables while retaining key information (e.g., PCA, t-SNE).  

---

#### 3. Reinforcement Machine Learning  
In Reinforcement Learning, an **agent learns by interacting with an environment**, performing actions, and receiving **rewards/penalties**.  
It aims to maximize **cumulative reward**.  

There are two main types of reinforcement learning:  

- **Model-based Reinforcement Learning**: The agent learns a model of the environment to plan actions.  
- **Model-free Reinforcement Learning**: The agent learns directly from experience without a model.  

---


### 📊 Types of Variables in Machine Learning

Understanding the types of variables is important for preprocessing, feature engineering, and choosing the right ML models.

---

#### 1. **Categorical Variables (Qualitative Data)**  
Variables that represent **categories or labels**.

- **Nominal** → Categories without order  
  - Example: `Gender = {Male, Female}`, `Color = {Red, Blue, Green}`  
  - 🔧 Encoding: One-Hot Encoding, Label Encoding  

- **Ordinal** → Categories with **order/rank**, but no fixed distance  
  - Example: `Education = {High School < Graduate < Postgraduate}`  
  - 🔧 Encoding: Label Encoding (with order), Ordinal Encoding  

---

#### 2. **Numerical Variables (Quantitative Data)**  
Variables that are **numbers** and measurable.  

- **Discrete** → Countable values  
  - Example: `Number of children = {0, 1, 2}`, `Exam Score = {45, 60, 90}`  
  - 🔧 Encoding: Usually kept as integers  

- **Continuous** → Values on a continuous scale  
  - Example: `Height = 175.6 cm`, `Weight = 68.4 kg`, `Temperature = 36.7°C`  
  - 🔧 Encoding: Normalization, Standardization  

---

#### 3. **Binary Variables**  
A special case of categorical with only **two categories**.  
- Example: `Yes/No`, `0/1`, `True/False`  
- 🔧 Encoding: Directly as `0` and `1`  

---

#### 4. **Time Variables**  
Variables related to **time**.  
- Example: `Date = 2025-08-28`, `Month = August`, `Day of Week = Monday`  
- 🔧 Encoding: Extract features (Year, Month, Day), Cyclical Encoding (sin/cos for hours, days)  

---

#### 5. **Text Variables**  
Variables containing **string/text data**.  
- Example: `"Great product!"`, `"The weather is sunny."`  
- 🔧 Encoding: Bag of Words (BoW), TF-IDF, Word Embeddings (Word2Vec, GloVe, BERT)  

---

#### 6. **Derived/Engineered Variables**  
Created from existing features to improve ML performance.  
- Example: `BMI = Weight / Height^2`, `Age = Current Year - Birth Year`, `Season` from Date  

---

#### ✅ Summary Table

| Variable Type     | Sub-Type     | Example                  | Encoding Method |
|-------------------|--------------|--------------------------|----------------|
| Categorical       | Nominal      | Color = {Red, Blue}      | One-Hot Encoding |
|                   | Ordinal      | Education = {HS, UG, PG} | Ordinal Encoding |
| Numerical         | Discrete     | No. of Cars = {0,1,2}    | Integer Encoding |
|                   | Continuous   | Weight = 65.5 kg         | Scaling/Normalization |
| Binary            | -            | Gender = {M, F}          | 0/1 Encoding |
| Time              | -            | Date = 2025-08-28        | Feature Extraction |
| Text              | -            | "Great product!"         | BoW, TF-IDF, Embeddings |

---

In [1]:
import pandas as pd

df = pd.read_csv("loan.csv")
df

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...,...
609,LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
610,LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
611,LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
612,LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


In [5]:
df.head(10)

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
5,LP001011,Male,Yes,2,Graduate,Yes,5417,4196.0,267.0,360.0,1.0,Urban,Y
6,LP001013,Male,Yes,0,Not Graduate,No,2333,1516.0,95.0,360.0,1.0,Urban,Y
7,LP001014,Male,Yes,3+,Graduate,No,3036,2504.0,158.0,360.0,0.0,Semiurban,N
8,LP001018,Male,Yes,2,Graduate,No,4006,1526.0,168.0,360.0,1.0,Urban,Y
9,LP001020,Male,Yes,1,Graduate,No,12841,10968.0,349.0,360.0,1.0,Semiurban,N


In [3]:
df.shape

(614, 13)

In [8]:
df.isnull()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,False,False,False,False,False,False,False,False,True,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
609,False,False,False,False,False,False,False,False,False,False,False,False,False
610,False,False,False,False,False,False,False,False,False,False,False,False,False
611,False,False,False,False,False,False,False,False,False,False,False,False,False
612,False,False,False,False,False,False,False,False,False,False,False,False,False


In [9]:
df.isnull().sum()

Loan_ID               0
Gender               13
Married               3
Dependents           15
Education             0
Self_Employed        32
ApplicantIncome       0
CoapplicantIncome     0
LoanAmount           22
Loan_Amount_Term     14
Credit_History       50
Property_Area         0
Loan_Status           0
dtype: int64