
## 🧾 Scikit-learn Cheatsheet

| **Action**                     | **Import Example**                                     | **Python Example**                                                                   | **Explanation**                                                 |
| ------------------------------ | ------------------------------------------------------ | ------------------------------------------------------------------------------------ | --------------------------------------------------------------- |
| Load dataset                   | `from sklearn import datasets`                         | `iris = datasets.load_iris()`                                                        | Provides built-in datasets like Iris, Digits for quick testing. |
| Train-test split               | `from sklearn.model_selection import train_test_split` | `X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)`           | Splits data into training and testing sets.                     |
| Standardize features           | `from sklearn.preprocessing import StandardScaler`     | `scaler = StandardScaler(); X_scaled = scaler.fit_transform(X)`                      | Normalizes features (mean=0, variance=1).                       |
| One-hot encoding               | `from sklearn.preprocessing import OneHotEncoder`      | `enc = OneHotEncoder(sparse_output=False); X_enc = enc.fit_transform([[1],[2],[3]])` | Converts categorical values into binary vectors.                |
| Label encoding                 | `from sklearn.preprocessing import LabelEncoder`       | `le = LabelEncoder(); y_enc = le.fit_transform(["cat","dog","cat"])`                 | Converts text labels into integers.                             |
| Logistic Regression            | `from sklearn.linear_model import LogisticRegression`  | `clf = LogisticRegression(); clf.fit(X_train, y_train)`                              | Classification algorithm for binary/multiclass problems.        |
| Decision Tree                  | `from sklearn.tree import DecisionTreeClassifier`      | `tree = DecisionTreeClassifier(); tree.fit(X_train, y_train)`                        | Splits data by features to classify outcomes.                   |
| Random Forest                  | `from sklearn.ensemble import RandomForestClassifier`  | `rf = RandomForestClassifier(); rf.fit(X_train, y_train)`                            | Ensemble of decision trees for better accuracy.                 |
| KNN (classification)           | `from sklearn.neighbors import KNeighborsClassifier`   | `knn = KNeighborsClassifier(n_neighbors=3); knn.fit(X_train, y_train)`               | Classifies based on nearest neighbors.                          |
| SVM                            | `from sklearn.svm import SVC`                          | `svm = SVC(kernel='linear'); svm.fit(X_train, y_train)`                              | Finds a hyperplane to separate classes.                         |
| Naive Bayes                    | `from sklearn.naive_bayes import GaussianNB`           | `nb = GaussianNB(); nb.fit(X_train, y_train)`                                        | Probabilistic classifier assuming independence.                 |
| KMeans Clustering              | `from sklearn.cluster import KMeans`                   | `kmeans = KMeans(n_clusters=3); kmeans.fit(X)`                                       | Groups data into K clusters (unsupervised).                     |
| PCA (Dimensionality Reduction) | `from sklearn.decomposition import PCA`                | `pca = PCA(n_components=2); X_pca = pca.fit_transform(X)`                            | Reduces high-dimensional data into fewer features.              |
| Pipeline                       | `from sklearn.pipeline import Pipeline`                | `pipe = Pipeline([("scaler", StandardScaler()),("clf", LogisticRegression())])`      | Chains preprocessing and model steps together.                  |
| Accuracy Score                 | `from sklearn.metrics import accuracy_score`           | `accuracy_score(y_test, y_pred)`                                                     | Evaluates classification accuracy.                              |
| Confusion Matrix               | `from sklearn.metrics import confusion_matrix`         | `confusion_matrix(y_test, y_pred)`                                                   | Shows true vs predicted classifications.                        |
| Classification Report          | `from sklearn.metrics import classification_report`    | `print(classification_report(y_test, y_pred))`                                       | Precision, Recall, F1-score summary.                            |
| Cross Validation               | `from sklearn.model_selection import cross_val_score`  | `cross_val_score(clf, X, y, cv=5)`                                                   | Tests model performance across folds.                           |
| Grid Search                    | `from sklearn.model_selection import GridSearchCV`     | `grid = GridSearchCV(clf, {"C":[0.1,1,10]}, cv=5); grid.fit(X,y)`                    | Finds best hyperparameters automatically.                       |

