<a href="https://colab.research.google.com/github/datagrad/1.ML/blob/main/ML_model_Fitting_Codes_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Assuming that `X_train` and `y_train` are training features and labels, respectively.

### 1. Linear Regression

```python
from sklearn.linear_model import LinearRegression

# Initialize the model
linear_reg = LinearRegression(normalize=True)
# Fit the model
linear_reg.fit(X_train, y_train)
```

**Parameter Importance:**
- `normalize`: Previously, this parameter normalized the input variables before regression by subtracting the mean and dividing by the l2-norm. However, it's recommended to use `StandardScaler` for preprocessing.

### 2. Logistic Regression

```python
from sklearn.linear_model import LogisticRegression

# Initialize the model with regularization
logistic_reg = LogisticRegression(C=1.0, penalty='l2', solver='liblinear')
# Fit the model
logistic_reg.fit(X_train, y_train)
```

**Parameter Importance:**
- `C`: Inverse of regularization strength; smaller values specify stronger regularization. Helps prevent overfitting.
- `penalty`: Specifies the norm used in the penalization (regularization).
- `solver`: Algorithm to use in the optimization problem. For small datasets, `'liblinear'` is a good choice.

### 3. Decision Tree

```python
from sklearn.tree import DecisionTreeClassifier

# Initialize the model with depth and feature constraints
decision_tree = DecisionTreeClassifier(max_depth=5, max_features=5)
# Fit the model
decision_tree.fit(X_train, y_train)
```

**Parameter Importance:**
- `max_depth`: The maximum depth of the tree. Limits the number of nodes in the tree to prevent overfitting.
- `max_features`: The number of features to consider when looking for the best split. Helps in reducing variance and making the model more robust.

### 4. Random Forest

```python
from sklearn.ensemble import RandomForestClassifier

# Initialize the model with more options
random_forest = RandomForestClassifier(n_estimators=100, max_features=5, max_depth=5, min_samples_split=4)
# Fit the model
random_forest.fit(X_train, y_train)
```

**Parameter Importance:**
- `n_estimators`: The number of trees in the forest. More trees increase accuracy but also computational cost.
- `max_features`: The number of features to consider when looking for the best split.
- `max_depth`: The maximum depth of the trees.
- `min_samples_split`: The minimum number of samples required to split an internal node. Higher values prevent creating nodes that represent too few samples, thus avoiding overfitting.

### 5. XGBoost

```python
from xgboost import XGBClassifier

# Initialize the model with more detailed parameters
xgboost_model = XGBClassifier(use_label_encoder=False, eval_metric='logloss', n_estimators=100, max_depth=5, learning_rate=0.1, min_child_weight=1)
# Fit the model
xgboost_model.fit(X_train, y_train)
```

**Parameter Importance:**
- `use_label_encoder`: Avoids using label encoder to prevent additional warnings.
- `eval_metric`: Evaluation metrics for validation data, a default might vary by objective.
- `n_estimators`: Number of gradient boosted trees. Equivalent to the number of boosting rounds.
- `max_depth`: Maximum depth of a tree. Increasing this value will make the model more complex and likely to overfit.
- `learning_rate`: Step size shrinkage used in update to prevents overfitting. It makes the model more robust by shrinking the weights on each step.
- `min_child_weight`: Minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than `min_child_weight`, then the building process will give up further partitioning.




### Support Vector Machines (SVM) for Classification

```python
from sklearn.svm import SVC

# Initialize the model with kernel choice
svm_model = SVC(kernel='linear', C=1.0)
# Fit the model
svm_model.fit(X_train, y_train)

# kernel: Specifies the kernel type to be used in the algorithm.
# C: Regularization parameter. The strength of the regularization is inversely proportional to C.
```

### K-Nearest Neighbors (KNN)

```python
from sklearn.neighbors import KNeighborsClassifier

# Initialize the model with number of neighbors
knn_model = KNeighborsClassifier(n_neighbors=5)
# Fit the model
knn_model.fit(X_train, y_train)

# n_neighbors: Number of neighbors to use for kneighbors queries.
```

### Naive Bayes for Classification

```python
from sklearn.naive_bayes import GaussianNB

# Initialize the model
naive_bayes_model = GaussianNB()
# Fit the model
naive_bayes_model.fit(X_train, y_train)

# No specific parameters needed for the basic model. It assumes that the features follow a normal distribution.
```

### Gradient Boosting for Classification

```python
from sklearn.ensemble import GradientBoostingClassifier

# Initialize the model with learning rate and number of estimators
gb_model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1)
# Fit the model
gb_model.fit(X_train, y_train)

# n_estimators: The number of boosting stages to be run.
# learning_rate: Learning rate shrinks the contribution of each tree by learning_rate.
```

### Ridge Regression for Regression Tasks

```python
from sklearn.linear_model import Ridge

# Initialize the model with regularization strength
ridge_model = Ridge(alpha=1.0)
# Fit the model
ridge_model.fit(X_train, y_train)

# alpha: Regularization strength; must be a positive float. Regularization improves the conditioning of the problem.
```

### Lasso Regression for Regression Tasks

```python
from sklearn.linear_model import Lasso

# Initialize the model with regularization strength
lasso_model = Lasso(alpha=1.0)
# Fit the model
lasso_model.fit(X_train, y_train)

# alpha: Regularization strength; similarly to Ridge, it controls the amount of shrinkage.
```


Here are additional examples across various machine learning tasks, including ensemble methods, clustering, and dimensionality reduction techniques. Each example includes the model initialization, fitting, and a brief commentary on the parameters.

### Ensemble Methods

#### AdaBoost for Classification

```python
from sklearn.ensemble import AdaBoostClassifier

# Initialize the model with the number of estimators
ada_boost_model = AdaBoostClassifier(n_estimators=100)
# Fit the model
ada_boost_model.fit(X_train, y_train)

# n_estimators: The maximum number of estimators at which boosting is terminated.
```

#### Bagging for Classification

```python
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

# Initialize the model with base estimator and number of base estimators
bagging_model = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=10)
# Fit the model
bagging_model.fit(X_train, y_train)

# base_estimator: The base model to ensemble. Default is Decision Tree.
# n_estimators: The number of base estimators in the ensemble.
```

### Clustering

#### K-Means Clustering

```python
from sklearn.cluster import KMeans

# Initialize the model with number of clusters
kmeans = KMeans(n_clusters=3)
# Fit the model
kmeans.fit(X_train)

# n_clusters: The number of clusters to form as well as the number of centroids to generate.
```

#### DBSCAN for Clustering

```python
from sklearn.cluster import DBSCAN

# Initialize the model with eps and min_samples
dbscan = DBSCAN(eps=0.5, min_samples=5)
# Fit the model
dbscan.fit(X_train)

# eps: The maximum distance between two samples for one to be considered as in the neighborhood of the other.
# min_samples: The number of samples in a neighborhood for a point to be considered as a core point.
```

### Dimensionality Reduction

#### Principal Component Analysis (PCA)

```python
from sklearn.decomposition import PCA

# Initialize PCA with number of components
pca = PCA(n_components=2)
# Fit and transform the data
X_train_pca = pca.fit_transform(X_train)

# n_components: The number of components to keep.
```

#### t-Distributed Stochastic Neighbor Embedding (t-SNE)

```python
from sklearn.manifold import TSNE

# Initialize t-SNE with number of components
tsne = TSNE(n_components=2, perplexity=30.0)
# Fit and transform the data
X_train_tsne = tsne.fit_transform(X_train)

# n_components: The dimension of the embedded space.
# perplexity: The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms.
```

These models cover a broad spectrum of machine learning applications, from supervised learning (both classification and regression), over ensemble methods that improve prediction robustness, to unsupervised learning techniques like clustering and dimensionality reduction, which are great for data exploration. As with any model, the choice and tuning of parameters are critical to achieving optimal performance, so experimentation and validation are key steps in the modeling process.

# Performance Matrices Code for For Classification Models

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Use respective Predictions code for the model and comment out other
y_pred_1 = logistic_reg.predict(X_test)
y_pred_1 = decision_tree.predict(X_test)
y_pred_1 = random_forest.predict(X_test)
y_pred_1 = xgboost_model.predict(X_test)




# Metrics
print("Decision Tree Performance:")
print("Accuracy:", accuracy_score(y_test, y_pred_1))
print("Precision:", precision_score(y_test, y_pred_1, average='binary'))
print("Recall:", recall_score(y_test, y_pred_1, average='binary'))
print("F1 Score:", f1_score(y_test, y_pred_1, average='binary'))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_1))



In [None]:
from sklearn.linear_model import LogisticRegression

# Initialize the model
logistic_reg = LogisticRegression()

# Fit the model
logistic_reg.fit(X_train, y_train)


In [None]:
from sklearn.tree import DecisionTreeClassifier

# Initialize the model
decision_tree = DecisionTreeClassifier()

# Fit the model
decision_tree.fit(X_train, y_train)


In [None]:
from sklearn.ensemble import RandomForestClassifier

# Initialize the model
random_forest = RandomForestClassifier(n_estimators=100, max_features = 5, max_depth = 5)  # You can adjust n_estimators

# Fit the model
random_forest.fit(X_train, y_train)


In [None]:
from xgboost import XGBClassifier

# Initialize the model
xgboost_model = XGBClassifier(use_label_encoder=False, eval_metric='logloss')

# Fit the model
xgboost_model.fit(X_train, y_train)
