# Lab Exam - Set 8

This notebook contains implementations for all questions in Set 8.

## Question 29: Pandas DataFrame - Advanced String Manipulations

**Concepts:**
- **String operations**: Methods to manipulate text data in DataFrames
- **Case conversion**: Converting text to upper/lower case using `.str.upper()`, `.str.lower()`
- **String splitting**: Breaking strings into parts using `.str.split()`
- **String replacement**: Replacing substrings using `.str.replace()`
- **Pattern matching**: Finding patterns using `.str.contains()`, `.str.extract()`
- **String slicing**: Extracting portions of strings using `.str[]` or `.str.slice()`

In [None]:
import pandas as pd

df = pd.DataFrame({
    'Name': ['John Doe', 'Alice Smith', 'Bob Johnson']
})

# String manipulations
df['Upper'] = df['Name'].str.upper()
df['First_Name'] = df['Name'].str.split().str[0]
df['Replaced'] = df['Name'].str.replace('o', '0')

print(df)


## Question 30: Missing Data Imputation and Feature Scaling

**Concepts:**
- **Missing data**: Incomplete values in dataset (NaN or None)
- **Imputation**: Filling missing values with substitutes (mean, median, mode)
- **Feature scaling**: Transforming features to similar ranges
- **Standardization**: Scaling data to mean=0 and std=1 using StandardScaler
- **Normalization**: Scaling data to range [0,1] using MinMaxScaler
- **SimpleImputer**: Scikit-learn tool for filling missing values

In [None]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

df = pd.DataFrame({
    'Age': [25, 30, None, 35, 40],
    'Salary': [50000, 60000, 55000, None, 65000]
})

# Fill missing values with mean
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Salary'].fillna(df['Salary'].mean(), inplace=True)

# Feature scaling
scaler = StandardScaler()
df[['Age', 'Salary']] = scaler.fit_transform(df[['Age', 'Salary']])

print(df)


## Question 31: Scatter Plots and Line Charts for Feature Visualization

**Concepts:**
- **Scatter plot**: Shows relationship between two continuous variables as points
- **Line chart**: Displays data points connected by lines, good for trends over time
- **Correlation**: Measure of how two variables are related (-1 to +1)
- **Trend analysis**: Identifying patterns in data over time or across variables
- **Matplotlib/Seaborn**: Python libraries for data visualization

In [None]:
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

# Scatter plot
plt.scatter(x, y)
plt.title('Scatter Plot')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

# Line chart
plt.plot(x, y, marker='o')
plt.title('Line Chart')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()


## Question 32: Support Vector Machine (SVM) for Classification

**Concepts:**
- **SVM**: Supervised learning algorithm that finds optimal hyperplane to separate classes
- **Hyperplane**: Decision boundary that separates different classes in feature space
- **Kernel**: Function to transform data into higher dimensions (linear, RBF, polynomial)
- **Support Vectors**: Data points closest to the decision boundary
- **RBF Kernel**: Radial Basis Function, useful for non-linear classification
- **Decision boundary**: The line/surface that separates different classes

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
import matplotlib.pyplot as plt

# Load dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # first two features
y = iris.target

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM model
model = SVC(kernel='linear')
model.fit(X_train, y_train)

# Visualize predictions
plt.scatter(X_test[:, 0], X_test[:, 1], c=model.predict(X_test), cmap='coolwarm')
plt.title('SVM Classification Results')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()