<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

Follow the instructions below to set up this environment for further use.


# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


In [1]:
%pip install seaborn

### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [2]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod2.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [3]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

path = URL

await download(path, "dataset.csv")

---


# Test Environment


In [4]:
# Keep appending the code generated to this cell, or add more cells below this to execute in parts
import pandas as pd

file_path = "dataset.csv"
data_frame = pd.read_csv(file_path)



In [6]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

#  Define the source and target variables
source_variable = data_frame["CPU_frequency"]
target_variable = data_frame["Price"]

# Create and train the linear regression model
model = LinearRegression()
model.fit(source_variable.values.reshape(-1, 1), target_variable)

# Make predictions using the trained model
predictions = model.predict(source_variable.values.reshape(-1, 1))

# Calculate and display the MSE and R^2 values
mse = mean_squared_error(target_variable, predictions)
r2 = r2_score(target_variable, predictions)

print("Mean Squared Error:", mse)
print("R^2 Score:", r2)

Mean Squared Error: 284583.44058686297
R^2 Score: 0.13444363210243238


In [7]:
# Define the source and target variables
source_variables = data_frame[["CPU_frequency", "RAM_GB", "Storage_GB_SSD", "CPU_core", "OS", "GPU", "Category"]]
target_variable = data_frame["Price"]

# Create and train the linear regression model
model = LinearRegression()
model.fit(source_variables, target_variable)

# Make predictions using the trained model
predictions = model.predict(source_variables)

# Calculate and display the MSE and R^2 values
mse = mean_squared_error(target_variable, predictions)
r2 = r2_score(target_variable, predictions)

print("Mean Squared Error:", mse)
print("R^2 Score:", r2)

Mean Squared Error: 161680.57263893107
R^2 Score: 0.5082509055187374


In [10]:
from sklearn.preprocessing import PolynomialFeatures
import numpy as np

# Define the source and target variables
source_variable = data_frame["CPU_frequency"]
target_variable = data_frame["Price"]

# Create polynomial features for different orders
orders = [2, 3, 5]
mse_values = []
r2_values = []

for order in orders:
    polynomial_features = PolynomialFeatures(degree=order)
    source_variable_poly = polynomial_features.fit_transform(source_variable.values.reshape(-1, 1))

    # Create and train the polynomial regression model
    model = LinearRegression()
    model.fit(source_variable_poly, target_variable)

    # Make predictions using the trained model
    predictions = model.predict(source_variable_poly)

    # Calculate and display the MSE and R^2 values
    mse = mean_squared_error(target_variable, predictions)
    r2 = r2_score(target_variable, predictions)

    mse_values.append(mse)
    r2_values.append(r2)

    print(f"Polynomial Regression (Order {order}):")
    print("Mean Squared Error:", mse)
    print("R^2 Score:", r2)
    print()

# Compare the performance of the models
best_order = orders[np.argmin(mse_values)]
best_mse = np.min(mse_values)
best_r2 = r2_values[np.argmin(mse_values)]

print(f"Best Performing Polynomial Regression (Order {best_order}):")
print("Mean Squared Error:", best_mse)
print("R^2 Score:", best_r2)



Polynomial Regression (Order 2):
Mean Squared Error: 249022.66596751186
R^2 Score: 0.2426012074542374

Polynomial Regression (Order 3):
Mean Squared Error: 241024.863038488
R^2 Score: 0.2669264079653102

Polynomial Regression (Order 5):
Mean Squared Error: 229137.29548054413
R^2 Score: 0.30308227064436244

Best Performing Polynomial Regression (Order 5):
Mean Squared Error: 229137.29548054413
R^2 Score: 0.30308227064436244


In [12]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

# Define the source and target variables
source_variables = data_frame[["CPU_frequency", "RAM_GB", "Storage_GB_SSD", "CPU_core", "OS", "GPU", "Category"]]
target_variable = data_frame["Price"]

# Create the pipeline
pipeline = make_pipeline(
    StandardScaler(),
    PolynomialFeatures(degree=2),
    LinearRegression()
)

# Train the pipeline
pipeline.fit(source_variables, target_variable)

# Make predictions using the trained pipeline
predictions = pipeline.predict(source_variables)

# Calculate and display the MSE and R^2 values
mse = mean_squared_error(target_variable, predictions)
r2 = r2_score(target_variable, predictions)

print("Mean Squared Error:", mse)
print("R^2 Score:", r2)


Mean Squared Error: 120934.2421875
R^2 Score: 0.6321802730109751


In [13]:
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV, cross_val_score


# Step 2: Define the source and target variables
source_variables = data_frame[["CPU_frequency", "RAM_GB", "Storage_GB_SSD", "CPU_core", "OS", "GPU", "Category"]]
target_variable = data_frame["Price"]

# Step 3: Create polynomial features for selected attributes
polynomial_features = PolynomialFeatures(degree=2)
source_variables_poly = polynomial_features.fit_transform(source_variables)

# Step 4: Define the hyperparameter grid for Grid Search
param_grid = {
    "alpha": [0.0001,0.001,0.01, 0.1, 1, 10],
    "fit_intercept": [True, False]
}

# Step 5: Create and train the Ridge regression model with Grid Search
model = Ridge()
grid_search = GridSearchCV(model, param_grid, cv=4)
grid_search.fit(source_variables_poly, target_variable)

# Step 6: Evaluate the resulting model using cross-validation
mse_scores = -cross_val_score(grid_search.best_estimator_, source_variables_poly, target_variable, cv=5, scoring="neg_mean_squared_error")
r2_scores = cross_val_score(grid_search.best_estimator_, source_variables_poly, target_variable, cv=5, scoring="r2")

# Step 7: Display the MSE and R^2 values
mse = np.mean(mse_scores)
r2 = np.mean(r2_scores)

print("Mean Squared Error:", mse)
print("R^2 Score:", r2)



  return linalg.solve(A, Xy, assume_a="pos", overwrite_a=True).T
  return linalg.solve(A, Xy, assume_a="pos", overwrite_a=True).T
  return linalg.solve(A, Xy, assume_a="pos", overwrite_a=True).T


Mean Squared Error: 175973.83952536914
R^2 Score: 0.26605903748461024


## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright © 2023 IBM Corporation. All rights reserved.
