# Lab Exam - Set 2

This notebook contains implementations for all questions in Set 2.

## Question 5: Pandas DataFrame - Ranking, Sorting, and Aggregation

**Concepts:**
- **DataFrame from dictionary**: Creating structured data from key-value pairs
- **Ranking**: Assigning rank positions based on values
- **Sorting**: Ordering data by specific columns
- **Aggregation**: Computing summary statistics (sum, mean, max, etc.)
- **GroupBy**: Grouping data by categories for analysis

In [None]:
import pandas as pd

data = {"A":[10,20,30],"B":[5,15,25]}
df = pd.DataFrame(data)

df["Rank_A"] = df["A"].rank()
sorted_df = df.sort_values("B")
agg = df.agg({"A":"sum","B":"mean"})

print(df)
print(sorted_df)
print(agg)


## Question 6: Skewness Detection and Min-Max Normalization

**Concepts:**
- **Skewness**: Measure of asymmetry in data distribution (positive/negative/zero)
- **Skew values**: -0.5 to 0.5 = normal, outside = skewed
- **Min-Max Scaling**: Normalizes data to range [0, 1] using formula: (x - min) / (max - min)
- **Why normalize**: Makes features comparable and improves ML model performance

In [None]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

df = pd.read_csv("data.csv")

print(df.skew())

scaler = MinMaxScaler()
scaled = scaler.fit_transform(df)

print(scaled)

## Question 7: Scatter Plots and Correlation Heatmap

**Concepts:**
- **Scatter plot**: Shows relationship between two continuous variables
- **Correlation**: Measure of linear relationship (-1 to +1)
- **Positive correlation**: Both variables increase together
- **Negative correlation**: One increases, other decreases
- **Heatmap**: Color-coded matrix showing correlation strengths

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("data.csv")

sns.scatterplot(x=df.columns[0], y=df.columns[1], data=df)
sns.heatmap(df.corr(), annot=True)

plt.show()

## Question 8: Support Vector Machine (SVM) Classifier with Decision Boundary

**Concepts:**
- **SVM**: Supervised learning algorithm that finds optimal hyperplane to separate classes
- **Linear kernel**: Creates straight decision boundary
- **Decision boundary**: Line/surface separating different classes
- **Support vectors**: Data points closest to decision boundary
- **Margin**: Distance between decision boundary and nearest points

In [None]:
from sklearn import datasets
from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt

iris = datasets.load_iris()

X = iris.data[:, :2]     # take only 2 features
y = iris.target

model = SVC(kernel='linear')
model.fit(X, y)

plt.scatter(X[:, 0], X[:, 1], c=y)

# create grid
x_min, x_max = X[:,0].min()-1, X[:,0].max()+1
y_min, y_max = X[:,1].min()-1, X[:,1].max()+1

xx, yy = np.meshgrid(
    np.linspace(x_min, x_max, 30),
    np.linspace(y_min, y_max, 30)
)

# decision function
Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z[:, 0]                    # choose boundary for class 0
Z = Z.reshape(xx.shape)

plt.contour(xx, yy, Z, levels=[0], linewidths=2)
plt.show()
