# Lab Exam - Set 3

This notebook contains implementations for all questions in Set 3.

## Question 9: NumPy Arrays - Indexing, Slicing, and Universal Functions

**Concepts:**
- **Creating from lists**: Convert Python lists to NumPy arrays
- **Indexing**: Accessing single elements using position
- **Slicing**: Extracting subarrays using start:stop:step
- **Universal functions (ufuncs)**: Fast element-wise operations (sqrt, exp, sin, etc.)
- **Broadcasting**: Operating on arrays of different shapes

In [None]:
import numpy as np

# create NumPy array from a list
a = np.array([10, 20, 30, 40, 50])

print("Array:", a)

# indexing
print("First element:", a[0])
print("Last element:", a[-1])

# slicing
print("First 3 elements:", a[:3])
print("Middle elements:", a[1:4])
print("Every second element:", a[::2])

# universal functions
print("Square root:", np.sqrt(a))
print("Square:", np.square(a))
print("Sine:", np.sin(a))


## Question 10: Label Encoding and Missing Value Imputation

**Concepts:**
- **Categorical data**: Non-numeric data (colors, categories, names)
- **Label encoding**: Converting categories to numbers (0, 1, 2, ...)
- **Imputation**: Filling missing values with statistical measures
- **SimpleImputer**: sklearn tool for handling missing values
- **Strategies**: mean, median, most_frequent, constant

In [None]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.impute import SimpleImputer

df = pd.read_csv("data.csv")

le = LabelEncoder()
df["Category"] = le.fit_transform(df["Category"])

imp = SimpleImputer(strategy="mean")
df.iloc[:, :] = imp.fit_transform(df)

print(df)


## Question 11: Scatter Plots with Regression Lines and Bar Plots with Error Bars

**Concepts:**
- **Regression line**: Best-fit line showing trend between variables
- **Linear regression**: Finding line that minimizes error
- **Error bars**: Show variability/uncertainty in data
- **Standard error**: Measure of variability in sample mean
- **Confidence intervals**: Range where true value likely lies

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("data.csv")

sns.regplot(x=df.columns[0], y=df.columns[1], data=df)

df.mean().plot(kind='bar', yerr=df.std())

plt.show()

## Question 12: Decision Tree Classifier using Gini Index

**Concepts:**
- **Decision Tree**: Tree-like model making decisions based on feature values
- **Gini Index**: Measure of impurity (0=pure, 0.5=maximum impurity)
- **Splitting**: Dividing data based on feature thresholds
- **Leaf nodes**: Final classification decisions
- **Tree depth**: Number of levels in the tree

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

dt = DecisionTreeClassifier(criterion="gini")
dt.fit(X_train, y_train)

pred = dt.predict(X_test)

print("Accuracy:", accuracy_score(y_test, pred))
