<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

Follow the instructions below to set up this environment for further use.


# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


In [1]:
%pip install seaborn
import piplite

await piplite.install(['nbformat', 'plotly'])

### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [2]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod2.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [3]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

path = URL

await download(path, "laptop_pricing_dataset_mod2.csv")

---


# Test Environment


In [5]:
# Model Development
# Importing data set
import pandas as pd

# Define the file path
file_path = 'laptop_pricing_dataset_mod2.csv'

# Read the CSV file into a pandas data frame
df = pd.read_csv('laptop_pricing_dataset_mod2.csv', header=0)
df.head()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,GPU,OS,CPU_core,Screen_Size_inch,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_pounds,Price,Price-binned,Screen-Full_HD,Screen-IPS_panel
0,0,0,Acer,4,2,1,5,14.0,0.551724,8,256,3.528,978,Low,0,1
1,1,1,Dell,3,1,1,3,15.6,0.689655,4,256,4.851,634,Low,1,0
2,2,2,Dell,3,1,1,7,15.6,0.931034,8,256,4.851,946,Low,1,0
3,3,3,Dell,4,2,1,5,13.3,0.551724,8,128,2.6901,1244,Low,0,1
4,4,4,HP,4,2,1,7,15.6,0.62069,8,256,4.21155,837,Low,1,0


In [6]:
# Model Development
# Importing data set
import pandas as pd

data = pd.read_csv('laptop_pricing_dataset_mod2.csv', header=0)
print(df)

     Unnamed: 0.1  Unnamed: 0 Manufacturer  Category  GPU  OS  CPU_core  \
0               0           0         Acer         4    2   1         5   
1               1           1         Dell         3    1   1         3   
2               2           2         Dell         3    1   1         7   
3               3           3         Dell         4    2   1         5   
4               4           4           HP         4    2   1         7   
..            ...         ...          ...       ...  ...  ..       ...   
233           233         233       Lenovo         4    2   1         7   
234           234         234      Toshiba         3    2   1         5   
235           235         235       Lenovo         4    2   1         5   
236           236         236       Lenovo         3    3   1         5   
237           237         237      Toshiba         3    2   1         5   

     Screen_Size_inch  CPU_frequency  RAM_GB  Storage_GB_SSD  Weight_pounds  \
0                14.

In [7]:
# Model Development
# Linear regression in one variable
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Assuming 'source_column' is the source variable and 'target_column' is the target variable
X = df[['CPU_frequency']]
y = df['Price']

# Initialize the linear regression model
model = LinearRegression()

# Train the model using the source and target variables
model.fit(X, y)

# Make predictions using the trained model
y_pred = model.predict(X)

# Calculate MSE and R^2
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)

print('Mean Squared Error:', mse)
print('R^2 Score:', r2)

Mean Squared Error: 284583.44058686297
R^2 Score: 0.13444363210243238


In [8]:
# Model Development
# Linear regression in multiple variables
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Assume you have a pandas data frame called 'data_frame' with multiple columns: 'source_variable_1', 'source_variable_2', ..., 'target_variable'
# Extract the source variables and target variable from the data frame
X = df[['CPU_frequency', 'RAM_GB', 'Storage_GB_SSD', 'CPU_core', 'OS', 'GPU', 'Category']]
y = df['Price']

# Create and train the linear regression model
model = LinearRegression()
model.fit(X, y)

# Make predictions using the trained model 
y_pred = model.predict(X)

# Calculate the mean squared error (MSE) & coefficient of determination (R^2)
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)

print("Mean Squared Error (MSE):", mse)
print("Coefficient of Determination (R^2):", r2)

Mean Squared Error (MSE): 161680.57263893107
Coefficient of Determination (R^2): 0.5082509055187374


In [9]:
# Model Development
# Polynomial Regression
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score

# Assuming 'source_column' and 'target_column' are the columns used as source and target variables
# Extract the source variable and target variable from the data frame
def polynomial_regression(df, CPU_frequency, Price, degrees=[2, 3, 5]):
    X = df[['CPU_frequency']]
    y = df['Price']
    results = {}
    # Initialize lists to store the MSE and R^2 values for each model
    mse = []
    r2 = []

    # Loop through the polynomial orders
    for degree in degrees:
        # Create polynomial features & Initialize a linear regression model
        poly_features = PolynomialFeatures(degree=degree)
        X_poly = poly_features.fit_transform(X)
        model = LinearRegression()
        # Train the model using the polynomial features and target variable
        model.fit(X_poly, y)
        # Make predictions using the trained model
        y_pred = model.predict(X_poly)
        # Calculate the mean squared error (MSE) & the coefficient of determination (R^2)
        mse = mean_squared_error(y, y_pred)
        r2 = r2_score(y, y_pred)
        results[f'Degree {degree}'] = {'Mean Squared Error': mse, 'R^2': r2}
    return results

results = polynomial_regression(df, 'source_column', 'target_column')
for degree, values in results.items():
    # Display the MSE and R^2 values for the current model
    print(f'Polynomial Order {degree}:')
    print('Mean Squared Error:', values['Mean Squared Error'])
    print('R^2:', values['R^2'])
    
# Compare the performance of the models
best_order = np.argmin(mse)
worst_order = np.argmax(r2)

print("Model Comparison:")
print(f"Best Polynomial Order: {best_order + 2}")
print(f"Worst Polynomial Order: {worst_order + 2}")

Polynomial Order Degree 2:
Mean Squared Error: 249022.66596751186
R^2: 0.2426012074542374
Polynomial Order Degree 3:
Mean Squared Error: 241024.863038488
R^2: 0.2669264079653102
Polynomial Order Degree 5:
Mean Squared Error: 229137.29548054413
R^2: 0.30308227064436244
Model Comparison:
Best Polynomial Order: 2
Worst Polynomial Order: 2


In [10]:
# Compare the performance of the models
def compare_models_performance(results):
    for degree, values in results.items():
        print(f'Polynomial Order {degree}:')
        print('Mean Squared Error:', values['Mean Squared Error'])
        print('R^2:', values['R^2'])

compare_models_performance(results)

Polynomial Order Degree 2:
Mean Squared Error: 249022.66596751186
R^2: 0.2426012074542374
Polynomial Order Degree 3:
Mean Squared Error: 241024.863038488
R^2: 0.2669264079653102
Polynomial Order Degree 5:
Mean Squared Error: 229137.29548054413
R^2: 0.30308227064436244


In [11]:
# Model Development
# Creating a Pipeline
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Assume you have a pandas data frame called 'data_frame' with multiple columns: 'source_variable_1', 'source_variable_2', ..., 'target_variable'
# Extract the source variables and target variable from the data frame
X = df[['CPU_frequency', 'RAM_GB', 'Storage_GB_SSD', 'CPU_core', 'OS', 'GPU', 'Category']]
y = df['Price']

# Create a pipeline that performs parameter scaling, polynomial feature generation, and linear regression
pipeline = Pipeline([
('scaler', StandardScaler()),
('poly_features', PolynomialFeatures(degree=3)),
('linear_reg', LinearRegression())
])

# Train the model using the source and target variables
pipeline.fit(X, y)

# Make predictions using the trained model
y_pred = pipeline.predict(X)

# Calculate the mean squared error (MSE) & coefficient of determination (R^2)
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)

# Display the MSE and R^2 values
print("Mean Squared Error (MSE):", mse)
print("Coefficient of Determination (R^2):", r2)

Mean Squared Error (MSE): 91018.05777310925
Coefficient of Determination (R^2): 0.7231699098980529


In [12]:
# Model Development
# Grid search and Ridge regression
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error, r2_score

# Assume you have a pandas data frame called 'data_frame' with multiple columns: 'source_variable_1', 'source_variable_2', ..., 'target_variable'
# Extract the source variables and target variable from the data frame
X = df[['CPU_frequency', 'RAM_GB', 'Storage_GB_SSD', 'CPU_core', 'OS', 'GPU', 'Category']]
y = df['Price']

# Create polynomial features & Transform the source variables into polynomial features
polynomial_features = PolynomialFeatures(degree=2)
X_poly = polynomial_features.fit_transform(X)

# Initialize a ridge regression model
ridge = Ridge()

# Set up the hyperparameters to search
param_grid = {
    'alpha': [0.0001,0.001,0.01, 0.1, 1, 10]
}

# Perform grid search with cross-validation
grid_search = GridSearchCV(ridge, param_grid, cv=4)

# Train the model using the polynomial features and target variable
grid_search.fit(X_poly, y)

# Get the best model
best_model = grid_search.best_estimator_ 

# Make predictions using the trained model
y_pred = best_model.predict(X_poly)

# Calculate the mean squared error (MSE) & coefficient of determination (R^2)
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)

# Display the MSE and R^2 values
print("Mean Squared Error (MSE):", mse)
print("Coefficient of Determination (R^2):", r2)

Mean Squared Error (MSE): 128987.04078699532
Coefficient of Determination (R^2): 0.6076878039733669


## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright © 2023 IBM Corporation. All rights reserved.
