In [10]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Load the dataset
df = pd.read_excel("RealEstate.xlsx")

# Split features and target variable
X = df.drop(columns=["Y house price of unit area"])
y = df["Y house price of unit area"]

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("Mean Absolute Error:", mae)
print("R-squared:", r2)



Mean Squared Error: 54.59884830498453
Mean Absolute Error: 5.418032735899173
R-squared: 0.6745414195692574


In [14]:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Load the dataset
df = pd.read_excel("RealEstate.xlsx")

# Split features and target variable
X = df.drop(columns=["Y house price of unit area"])
y = df["Y house price of unit area"]

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA(n_components=3)
X_pca = pca.fit_transform(X_scaled)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_pca, y, test_size=0.2, random_state=42)

# Train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse_pca = mean_squared_error(y_test, y_pred)
mae_pca = mean_absolute_error(y_test, y_pred)
r2_pca = r2_score(y_test, y_pred)

print("Performance Metrics using PCA:")
print("Mean Squared Error:", mse_pca)
print("Mean Absolute Error:", mae_pca)
print("R-squared:", r2_pca)

print("Performance Metrics without PCA:")
print("Mean Squared Error:", mse)
print("Mean Absolute Error:", mae)
print("R-squared:", r2)

"""Comparing the performance of the model using PCA with the performance obtained in Q1, we can observe the following:

1. Mean Squared Error (MSE):
   - Without PCA: 54.5988
   - With PCA: 58.7746
   The MSE is slightly higher when using PCA, indicating that the model's predictions have higher errors on average compared to the model trained on the original dataset.

2. Mean Absolute Error (MAE):
   - Without PCA: 5.4180
   - With PCA: 5.8288
   Similarly, the MAE is slightly higher when using PCA, indicating that the absolute errors of the model's predictions are slightly larger with the reduced-dimensional representation.

3. R-squared (R2):
   - Without PCA: 0.6745
   - With PCA: 0.6496
   The R-squared value is lower when using PCA, indicating that the model explains less of the variance in the target variable compared to the model trained on the original dataset.

These results suggest that while PCA helps in reducing the dimensionality of the dataset, it also results in a slight decrease in predictive performance. The reduction in performance is relatively small, which indicates that the first three principal components capture a substantial portion of the variance in the original dataset. However, there is still some loss of information during the dimensionality reduction process, leading to slightly inferior performance compared to the model trained on the full set of features.
"""

Performance Metrics using PCA:
Mean Squared Error: 58.77464185501734
Mean Absolute Error: 5.82883266736957
R-squared: 0.6496499084264946
Performance Metrics without PCA:
Mean Squared Error: 54.59884830498453
Mean Absolute Error: 5.418032735899173
R-squared: 0.6745414195692574


"Comparing the performance of the model using PCA with the performance obtained in Q1, we can observe the following:\n\n1. Mean Squared Error (MSE):\n   - Without PCA: 54.5988\n   - With PCA: 58.7746\n   The MSE is slightly higher when using PCA, indicating that the model's predictions have higher errors on average compared to the model trained on the original dataset.\n\n2. Mean Absolute Error (MAE):\n   - Without PCA: 5.4180\n   - With PCA: 5.8288\n   Similarly, the MAE is slightly higher when using PCA, indicating that the absolute errors of the model's predictions are slightly larger with the reduced-dimensional representation.\n\n3. R-squared (R2):\n   - Without PCA: 0.6745\n   - With PCA: 0.6496\n   The R-squared value is lower when using PCA, indicating that the model explains less of the variance in the target variable compared to the model trained on the original dataset.\n\nThese results suggest that while PCA helps in reducing the dimensionality of the dataset, it also resul

In [15]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA(n_components=3)
X_pca = pca.fit_transform(X_scaled)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_pca, y, test_size=0.2, random_state=42)

# Train the logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate performance metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

# Print performance metrics
print("Performance Metrics:")
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)


Performance Metrics:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0


In [17]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA(n_components=3)
X_pca = pca.fit_transform(X_scaled)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_pca, y, test_size=0.2, random_state=42)

# Train the logistic regression model with L1 regularization
model = LogisticRegression(penalty='l1', solver='liblinear')  # L1 regularization with solver set to 'liblinear'
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate performance metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

# Print performance metrics
print("Performance Metrics with L1 Regularization:")
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)

"""  Let's compare the performance metrics of the logistic regression model with L1 regularization to the performance reported in Q3:

Performance Metrics with L1 Regularization:
- Accuracy: 0.9667
- Precision: 0.9694
- Recall: 0.9667
- F1 Score: 0.9664

Performance Metrics from Q3:
- Accuracy: 0.9667
- Precision: 0.9694
- Recall: 0.9667
- F1 Score: 0.9664

The performance metrics with L1 regularization are almost identical to the performance reported in Q3 without regularization. This outcome suggests that the addition of L1 regularization did not significantly impact the model's performance on this dataset.

Explanation:
- Accuracy, Precision, Recall, and F1 Score: The similarity in performance metrics indicates that the logistic regression model with L1 regularization achieved comparable results to the model without regularization. In this case, the dataset may not have required regularization to prevent overfitting, or the amount of regularization applied with L1 regularization was not sufficient to affect the model's performance noticeably.

Overall, the outcome suggests that the logistic regression model was well-suited for the Iris dataset, and the addition of L1 regularization did not lead to significant improvements or changes in performance. Regularization techniques like L1 can be particularly useful when dealing with high-dimensional datasets or when there's a risk of overfitting, but in this case, the dataset and model were already well-balanced."""

Performance Metrics with L1 Regularization:
Accuracy: 0.9666666666666667
Precision: 0.9694444444444444
Recall: 0.9666666666666667
F1 Score: 0.9664109121909632


"  Let's compare the performance metrics of the logistic regression model with L1 regularization to the performance reported in Q3:\n\nPerformance Metrics with L1 Regularization:\n- Accuracy: 0.9667\n- Precision: 0.9694\n- Recall: 0.9667\n- F1 Score: 0.9664\n\nPerformance Metrics from Q3:\n- Accuracy: 0.9667\n- Precision: 0.9694\n- Recall: 0.9667\n- F1 Score: 0.9664\n\nThe performance metrics with L1 regularization are almost identical to the performance reported in Q3 without regularization. This outcome suggests that the addition of L1 regularization did not significantly impact the model's performance on this dataset.\n\nExplanation:\n- Accuracy, Precision, Recall, and F1 Score: The similarity in performance metrics indicates that the logistic regression model with L1 regularization achieved comparable results to the model without regularization. In this case, the dataset may not have required regularization to prevent overfitting, or the amount of regularization applied with L1 reg