# Scipy/SciKit

SciPy is an open-source Python library that is used for scientific and technical computing. It provides a vast array of functions and tools for tasks such as optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics, and more.

Built on top of NumPy, SciPy extends its functionality by providing additional mathematical algorithms and functions, making it a powerful tool for scientific computing in Python.

### Scipy Usage and Sub Packages 

In [None]:
- Built on top of NumPy, so it is efficient for numerical computations.
- Sub-packages contain specific functions related to various scientific domains.
- You typically call the specific sub-package you want to use.

### Interpolation

#### Import Necessary Information

In [None]:
from scipy.interpolate import CubicSpline
import matplotlib.pyplot as plt
import numpy as np

#### Generating Sample Data

Generate a set of evenly spaced points on the interval [0, 10] (inclusive) with 11 data points

In [None]:
x = np.linspace(0, 10, num=11)
y = np.cos(-x**2 / 9.)

#### Cubic Spline Interpolation

Create a cubic spline interpolation object using the x and y data points

In [None]:
spl = CubicSpline(x, y)

#### Generating New Data Points

Generate a new set of evenly spaced points on the interval [0, 10] (inclusive) with a higher resolution (1001 data points)

In [None]:
xnew = np.linspace(0, 10, num=1001)

#### Creating Subplots

In [None]:
fig, (ax1, ax2) = plt.subplots(2, 1)

#### Plotting Cubic Spline Interpolation

In [None]:
ax1.plot(xnew, spl(xnew), label='Cubic Spline Interpolation')
ax1.plot(x, y, 'o', label='Data', markersize=5)
ax1.legend(loc='best')

#### Plotting First Derivative of the Spline

In [None]:
ax2.plot(xnew, spl(xnew, nu=1), '--', label='1st Derivative')
ax2.legend(loc='best')

#### Adjusting Layout and Displaying Plot

In [None]:
# Adjust the layout of the subplots to prevent overlap
plt.tight_layout()
# Display the plot
plt.show()

## SciKit Learn

- Machine learning package for Python


- Built on numpy, SciPy, and matplotlib


- Useful for classification, regression, clustering, dimensionality reduction, and preprocessing


- Sub-packages can be called in a similar way to SciPy  

### Clustering

In [None]:
# Generating Synthetic Data for Clustering

# Importing Necessary Libraries
from sklearn.datasets import make_blobs
from pandas import DataFrame

# Generating 2D Dataset
# Create a synthetic dataset with 100 samples, 3 centers, and 2 features (dimensions)
X, y = make_blobs(n_samples=100, centers=3, n_features=2, random_state=42)
# Create a DataFrame to organize the generated data, with 'x' and 'y' columns representing feature values and 'label' column representing cluster labels
test_data = DataFrame(dict(x=X[:,0], y=X[:,1], label=y))


#### Installing and Visualizing Data with Seaborn

In [None]:
# Installing Seaborn
!pip install seaborn
import seaborn as sns

# Create a scatter plot using Seaborn to visualize the synthetic data
# Scatter plot represents the 'x' and 'y' features from the test_data DataFrame, with different colors indicating different cluster labels
sns.scatterplot(data=test_data, x='x', y='y', hue='label')

#### Splitting Data into Training and Testing Sets

In [None]:
from sklearn.model_selection import train_test_split

# Splitting Data
# Split the synthetic data into training and testing sets
# X_train and X_test contain the feature values (in this case, 'x' and 'y') for training and testing sets respectively
# y_train and y_test contain the corresponding labels for the training and testing sets respectively
# The data is split into training and testing sets with a ratio of 67% training and 33% testing, specified by the test_size parameter
# random_state ensures reproducibility of results by fixing the random seed for data splitting

X_train, X_test, y_train, y_test = train_test_split(test_data[['x', 'y']], test_data['label'], test_size=0.33, random_state=0)

#### Clustering with KMeans

In [None]:
from sklearn.cluster import KMeans 

# Initialize a KMeans clustering object with the desired number of clusters (in this case, 2)
kmeans = KMeans(n_clusters=2)
# Fit the KMeans model to the training data to cluster the data points into two clusters based on their features
kmeans.fit(X_train)

# Visualizing Clustered Data
# Create a scatter plot using Seaborn to visualize the clustered data points
# Each data point is represented by its 'x' and 'y' features, with different colors indicating the cluster to which it belongs
sns.scatterplot(data=X_train, x='x', y='y', hue=kmeans.labels_, palette="deep")