# Applying Probability Distributions in Machine Learning
In this exercise, you will apply common probability distributions to a machine learning problem. Follow the steps below to complete the exercise.

## Step 1: Import Necessary Libraries
Import the necessary libraries to load, preprocess, and analyze the data. You'll need libraries like NumPy, Pandas, Matplotlib, and Scikit-learn.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

## Step 2: Load and Preprocess the Data
Load a dataset of your choice, preprocess it as needed, and split it into training and testing sets. You can use a dataset from the Scikit-learn library, or you can load your own dataset.

In [None]:
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

## Step 3: Select a Machine Learning Algorithm
Choose a machine learning algorithm that is suitable for your dataset. You can use a classification algorithm like logistic regression, or a clustering algorithm like k-means.

In [None]:
clf = LogisticRegression(random_state=0)

## Step 4: Train the Model
Train your model on the training dataset. You can use the fit() method of the algorithm you selected in Step 3.

In [None]:
clf.fit(X_train, y_train)


## Step 5: Apply Probability Distributions
Calculate the probability density function (PDF) or cumulative distribution function (CDF) for the probability distributions of interest (e.g., Gaussian, Poisson, etc.), and apply them to the training and testing datasets. You can use NumPy to calculate the PDF or CDF.

In [None]:
mu, std = np.mean(X_train), np.std(X_train)
pdf = 1/(std*np.sqrt(2*np.pi)) * np.exp(-(X_train-mu)**2/(2*std**2))
X_train_gauss = (X_train - mu) / std
X_test_gauss = (X_test - mu) / std


## Step 6: Evaluate Performance
Use evaluation metrics such as accuracy, precision, recall, and F1 score to evaluate the performance of the model. You can use the predict() method of the algorithm you selected in Step 3 to make predictions on the testing dataset, and then use Scikit-learn to calculate the evaluation metrics.

In [None]:
y_pred = clf.predict(X_test_gauss)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

## Step 7: Visualize the Results
Visualize the results of your analysis using graphs, charts, or other visualizations. You can use Matplotlib to create visualizations of your results

In [None]:
plt.scatter(X_train_gauss[:, 0], X_train_gauss[:, 1], c=y_train)
plt.title("Iris Dataset with Gaussian Distribution")