üõ†Ô∏è Scikit-Learn Essential Cheat Sheet1. 

Core scikit-learn API 

This is the heart of sklearn. Almost every model follows this pattern.

The Universal Pattern

- model = Model(parameters)
- model.fit(X_train, y_train)
- predictions = model.predict(X_test)

Methods you MUST know 
- fit(): Trains the model on the data.
- predict(): Generates predictions.
- predict_proba(): Returns probability estimates (for classification).
- score(): Returns the coefficient of determination or accuracy.
- get_params() / set_params(): View or modify model hyperparameters.
-----------
2. Dataset Handling (sklearn.datasets)

Used for practice and prototyping without worrying about external CSVs.

Important Functions
- Classification: load_iris(), load_breast_cancer(), load_digits()
- Regression: load_diabetes(), fetch_california_housing()
- Synthetic Data: make_classification(), make_regression(), make_blobs()
--------
3. Train-Test Split & Validation (sklearn.model_selection)

Essential for preventing overfitting and ensuring your model generalizes to new data.
- train_test_split(): Splitting data into training and testing sets.
- KFold() / StratifiedKFold(): For robust cross-validation.
- cross_val_score(): Quickly evaluate a model using cross-validation.
- GridSearchCV() / RandomizedSearchCV(): Tools for hyperparameter tuning.
--------------
4. Preprocessing & Feature Engineering (sklearn.preprocessing)

Critical for handling real-world data scales and types.
- Scaling: StandardScaler(), MinMaxScaler(), RobustScaler()
- Encoding: OneHotEncoder(), LabelEncoder()
- Feature Creation: PolynomialFeatures()

5. Pipelines (sklearn.pipeline)

Pipelines bundle preprocessing and modeling into a single object, preventing data leakage.

pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("model", LogisticRegression())
])

6. Supervised Learning Models (CORE SET)

|Regression Models|Classification Models|
|---|---|
|LinearRegression()|LogisticRegression()|
|Ridge() / Lasso()|KNeighborsClassifier()|
|RandomForestRegressor()|RandomForestClassifier()|
|SVR() (Support Vector)|SVC()|
|DecisionTreeRegressor()|GaussianNB()|

7. Model Evaluation (sklearn.metrics)

How we measure if the model is actually "good.

"Classification Metrics
- accuracy_score(), precision_score(), recall_score(), f1_score()
- confusion_matrix(), classification_report()roc_auc_score()

Regression Metrics
- mean_squared_error() (MSE)
- mean_absolute_error() (MAE)
- r2_score()

8. Feature Selection (sklearn.feature_selection)

Used for improving performance and interpretability by removing "noise" features.

- SelectKBest(): Selects features based on statistical tests.
- RFE(): Recursive Feature Elimination.

9. Dimensionality Reduction (sklearn.decomposition)

- PCA(): Principal Component Analysis for reducing feature count while keeping variance.
- TruncatedSVD(): Useful for sparse data (like text).

10. Ensemble Methods (High-Impact)

These models combine multiple learners to create a stronger overall model.
- RandomForestClassifier()
- GradientBoostingClassifier()
- AdaBoostClassifier()

11. Clustering & Unsupervised Learning
- KMeans(): Grouping data based on centroids.
- DBSCAN(): Density-based clustering.
- AgglomerativeClustering(): Hierarchical clustering.

12. Model Persistence (Real-World Use)How to save your model so you don't have to retrain it every time.

import joblib

##### Save
joblib.dump(model, "model.pkl")

##### Load
loaded_model = joblib.load("model.pkl")

13. Visualization Helpers

- plot_tree(): Visualize a Decision Tree.
- ConfusionMatrixDisplay(): A quick way to plot confusion matrices using Matplotlib.