# Machine Learning: Basic & Important Concepts with Examples
This notebook covers foundational concepts in Machine Learning with simple Python examples using `scikit-learn` and `pandas`.

## 1. Supervised vs Unsupervised Learning
- **Supervised Learning**: Learn from labeled data (e.g., classification, regression)
- **Unsupervised Learning**: Learn patterns from unlabeled data (e.g., clustering)

In [None]:
# Example for Supervised Learning: Linear Regression
from sklearn.linear_model import LinearRegression, LogisticRegression
import pandas as pd
import numpy as np

# Sample data
X = np.array([[10], [20], [30], [40], [50],[60],[70]])
y = np.array([1200, 1900, 3200, 3900, 5100,6000,6980])

df=pd.read_csv("predictive_maintenance.csv")
X=df
X=X.drop(["Product ID","Type","Failure Type"],axis=1)
y=df["Failure Type"]
model = LinearRegression()
lg_model = LogisticRegression()
lg_model.fit(X, y)
print('Prediction for:', model.predict([[83]]))

In [17]:
df=pd.read_csv("predictive_maintenance.csv")
df.head(2)

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,Failure Type
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,No Failure
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,No Failure


## 2. Regression vs Classification
- **Regression**: Predict continuous values
- **Classification**: Predict discrete labels

In [16]:
# Classification Example: Logistic Regression
from sklearn.linear_model import LogisticRegression

# Dummy data
X = [[0.5], [1.5], [2.5], [3.5]]
y = [0, 0, 1, 1]
clf = LogisticRegression()
clf.fit(X, y)
print('Predicted class for 2.0:', clf.predict([[2.0]]))

Predicted class for 2.0: [1]


## 3. Overfitting vs Underfitting
- **Overfitting**: High training accuracy, poor test performance
- **Underfitting**: Poor training accuracy, poor test performance

Use train/test split to detect these.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier

# Simple data
X = [[i] for i in range(10)]
y = [0]*5 + [1]*5
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
print('Accuracy:', accuracy_score(y_test, model.predict(X_test)))

## 4. Train/Test Split
Use to evaluate model generalization on unseen data.

## 5. Cross Validation
Improves reliability of performance metrics by averaging over multiple train/test splits.

In [None]:
from sklearn.model_selection import cross_val_score

scores = cross_val_score(DecisionTreeClassifier(), X, y, cv=5)
print('Cross-validation scores:', scores)

## 6. Feature Scaling
- Normalize or standardize feature values.
- Important for models like SVM, KNN.

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_X = scaler.fit_transform(X)
print(scaled_X)

## 7. Evaluation Metrics
- Accuracy, Precision, Recall, F1-score, Confussion Matrix, ROC-AUC Curve for classification
- MSE, RMSE, MAE, R2-Score for regression

## 8. Common Algorithms
- Linear Regression, Logistic Regression
- Decision Trees, Random Forest, Gradient Boost, AdaBoost, XGBoost
- KNN, Naive Bayes
- SVM
- KMeans, PCA (unsupervised)