<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

Follow the instructions below to set up this environment for further use.


# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


In [1]:
%pip install seaborn
import piplite

await piplite.install(['nbformat', 'plotly'])

### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [2]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod2.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [3]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

path = URL

await download(path, "dataset.csv")

---


# Test Environment


In [17]:
## Building the prompt: Importing data set
# PROMPT 1: Write a Python code that can perform the following tasks.
# Read the CSV file, located on a given file path, into a pandas data frame, assuming that the first row of the file can be used as the headers for the data.

# Import the pandas library
import pandas as pd

# Read the CSV file into a DataFrame
# The header parameter is set to 0, meaning the first row will be taken as column headers
df = pd.read_csv("dataset.csv", header=0)

# To verify, let's print the DataFrame
print(df.head())

# Get Nan info:
df.info()

## Linear regression in one variable
# PROMPT 2: Write a Python code that performs the following tasks.
# 1. Develops and trains a linear regression model that uses one attribute of a data frame as the source variable and another as a target variable.
# 2. Calculate and display the MSE and R^2 values for the trained model

# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Define feature (X) and target (y)
X = df[['CPU_frequency']]
y = df['Price']

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the Linear Regression model
model_1 = LinearRegression()
model_1.fit(X_train, y_train)

# Predict on the test set
y_pred_1 = model_1.predict(X_test)

# Calculate and print Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred_1)
print(f'Mean Squared Error for the model with 1 feature: {mse}')

# Calculate and print R^2 (coefficient of determination)
r2 = r2_score(y_test, y_pred_1)
print(f'R^2 Score for the model with 1 feature: {r2}')

## Linear regression in multiple variables
# PROMPT 3: Write a Python code that performs the following tasks.
# 1. Develops and trains a linear regression model that uses some attributes of a data frame as the source variables and one of the attributes as a target variable.
# 2. Calculate and display the MSE and R^2 values for the trained model.

# Define the features (X) and target (y)
X = df[['CPU_frequency', 'RAM_GB', 'Storage_GB_SSD', 'CPU_core', 'OS', 'GPU', 'Category']]
y = df['Price']

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the Linear Regression model
model_2 = LinearRegression()
model_2.fit(X_train, y_train)

# Predict on the test set
y_pred_2 = model_2.predict(X_test)

# Calculate and print Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred_2)
print(f'Mean Squared Error for the model with 7 features: {mse}')

# Calculate and print R^2 (coefficient of determination)
r2 = r2_score(y_test, y_pred_2)
print(f'R^2 Score for the model with 7 features: {r2}')

## Polynomial regression
# PROMPT 4: Write a Python code that performs the following tasks.
# 1. Develops and trains multiple polynomial regression models, with orders 2, 3, and 5, that use one attribute of a data frame as the source variable and another as a target variable.
# 2. Calculate and display the MSE and R^2 values for the trained models.
# 3. Compare the performance of the models.

# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Assuming the DataFrame is already generated and available
# For demonstration, let's create a synthetic DataFrame

# Define features (X) and target (y)
X = df[['CPU_frequency']]
y = df['Price']

# Define model orders
degrees = [2, 3, 5]

# Function to train and evaluate polynomial models
def evaluate_polynomial_model(degree):
    polynomial_features = PolynomialFeatures(degree=degree)
    X_poly = polynomial_features.fit_transform(X)

    # Split the data
    X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size=0.2, random_state=42)

    # Create and train the Polynomial Regression model
    model_poly = LinearRegression()
    model_poly.fit(X_train, y_train)

    # Predict on the test set
    y_pred_poly = model_poly.predict(X_test)

    # Calculate and return MSE and R^2
    mse = mean_squared_error(y_test, y_pred_poly)
    r2 = r2_score(y_test, y_pred_poly)
    return mse, r2

# Evaluate each polynomial model
results = {}
for degree in degrees:
    mse, r2 = evaluate_polynomial_model(degree)
    results[f'Degree {degree}'] = {'MSE': mse, 'R^2': r2}

# Display results
for deg, metrics in results.items():
    print(f'{deg} Polynomial Regression Model Metrics: MSE = {metrics["MSE"]:.4f}, R^2 = {metrics["R^2"]:.4f}')

# Compare performances
best_model = min(results, key=lambda x: results[x]['MSE'])
print(f'\nBest Performing Polynomial Regression Model is Degree {best_model} with MSE = {results[best_model]["MSE"]:.4f} and R^2 = {results[best_model]["R^2"]:.4f}')


## Creating a Pipeline
# PROMPT 5: Write a Python code that performs the following tasks.
# 1. Create a pipeline that performs parameter scaling, Polynomial Feature generation, and Linear regression. Use the set of multiple features as before to create this pipeline.
# 2. Calculate and display the MSE and R^2 values for the trained model.

# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Define features and target
X = df[['CPU_frequency', 'RAM_GB', 'Storage_GB_SSD', 'CPU_core', 'OS', 'GPU', 'Category']]
y = df['Price']
# Create the pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),  # Scale the features
    ('poly', PolynomialFeatures(degree=2, include_bias=False)),  # Generate Polynomial Features
    ('linear', LinearRegression())  # Use Linear Regression
])

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit the pipeline on the training data
pipeline.fit(X_train, y_train)

# Make predictions
y_pred_pipe = pipeline.predict(X_test)

# Calculate MSE and R^2
mse = mean_squared_error(y_test, y_pred_pipe)
r2 = r2_score(y_test, y_pred_pipe)

# Print the results
print(f"Pipeline Mean Squared Error (MSE): {mse}")
print(f"Pipeline R^2 Score: {r2}")

## Grid search and Ridge regression
# PROMPT 6: Write a Python code that performs the following tasks.
# 1. Use polynomial features for some of the attributes of a data frame.
# 2. Perform Grid search on a ridge regression model for a set of values of hyperparameter alpha and polynomial features as input.
# 3. Use cross-validation in the Grid search.
# 4. Evaluate the resulting model's MSE and R^2 values.
# Set of values for alpha: 0.0001,0.001,0.01, 0.1, 1, 10
# Cross Validation: 4-fold
# Polynomial Feature order: 2

# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Define features and target
X = df[['CPU_frequency', 'RAM_GB', 'Storage_GB_SSD', 'CPU_core', 'OS', 'GPU', 'Category']]
y = df['Price']

# Specify the pipeline stages
poly = PolynomialFeatures(degree=2, include_bias=False)  # Create quadratic features
ridge = Ridge()

pipeline = Pipeline([
    ('poly', poly),
    ('ridge', ridge)
])

# Define the parameter grid for GridSearchCV
param_grid = {
    'ridge__alpha': [0.0001,0.001,0.01, 0.1, 1, 10]  # Alpha values ranging over 4 orders of magnitude
}

# Set up GridSearchCV with 5-fold cross-validation
grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=4)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Run Grid Search
grid_search.fit(X_train, y_train)

# Access the best model
best_model = grid_search.best_estimator_

# Predict on the test set
y_pred = best_model.predict(X_test)

# Calculate MSE and R^2 of the best model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Print out results
print(f"Best Alpha: {grid_search.best_params_['ridge__alpha']}")
print(f"Mean Squared Error (MSE): {mse}")
print(f"R^2 Score: {r2}")



   Unnamed: 0.1  Unnamed: 0 Manufacturer  Category  GPU  OS  CPU_core  \
0             0           0         Acer         4    2   1         5   
1             1           1         Dell         3    1   1         3   
2             2           2         Dell         3    1   1         7   
3             3           3         Dell         4    2   1         5   
4             4           4           HP         4    2   1         7   

   Screen_Size_inch  CPU_frequency  RAM_GB  Storage_GB_SSD  Weight_pounds  \
0              14.0       0.551724       8             256        3.52800   
1              15.6       0.689655       4             256        4.85100   
2              15.6       0.931034       8             256        4.85100   
3              13.3       0.551724       8             128        2.69010   
4              15.6       0.620690       8             256        4.21155   

   Price Price-binned  Screen-Full_HD  Screen-IPS_panel  
0    978          Low               0   

## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright © 2023 IBM Corporation. All rights reserved.
