<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

Follow the instructions below to set up this environment for further use.


# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


In [1]:
%pip install seaborn

### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [2]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod1.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [3]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

path = URL

await download(path, "dataset.csv")

---


# Test Environment


In [7]:
# Keep appending the code generated to this cell, or add more cells below this to execute in parts

import pandas as pd

# Specify the file path of the CSV file
file_path = "dataset.csv"

# Read the CSV file into a Pandas data frame
data_frame = pd.read_csv(file_path)


In [8]:
# Check for missing values in each column
missing_values = data_frame.isnull().sum()

# Get the columns with missing values
columns_with_missing_values = missing_values[missing_values > 0].index.tolist()

# Print the columns with missing values
print("Columns with missing values:", columns_with_missing_values)

Columns with missing values: ['Screen_Size_cm', 'Weight_kg']


In [9]:
# Replace missing values in the "Screen_Size_cm" column with the most frequent value
most_frequent_value = data_frame["Screen_Size_cm"].mode()[0]
data_frame["Screen_Size_cm"].fillna(most_frequent_value, inplace=True)

# Replace missing values in the "Weight_kg" column with the mean value
mean_value = data_frame["Weight_kg"].mean()
data_frame["Weight_kg"].fillna(mean_value, inplace=True)

In [10]:
# Check for missing values in each column
missing_values = data_frame.isnull().sum()

# Get the columns with missing values
columns_with_missing_values = missing_values[missing_values > 0].index.tolist()

# Print the columns with missing values
print("Columns with missing values:", columns_with_missing_values)

Columns with missing values: []


In [11]:
# Change the data type of the "Screen_Size_cm" attribute to float
data_frame["Screen_Size_cm"] = data_frame["Screen_Size_cm"].astype(float)

# Change the data type of the "Weight_kg" attribute to float
data_frame["Weight_kg"] = data_frame["Weight_kg"].astype(float)

In [12]:
data_frame.head(3)


Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_cm,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
0,0,Acer,4,IPS Panel,2,1,5,35.56,1.6,8,256,1.6,978
1,1,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.2,634
2,2,Dell,3,Full HD,1,1,7,39.624,2.7,8,256,2.2,946


In [13]:
# Convert the data under the 'Screen_Size_cm' attribute from centimeters to inches
data_frame['Screen_Size_inch'] = data_frame['Screen_Size_cm'] * 0.393701

# Modify the name of the attribute from 'Screen_Size_cm' to 'Screen_Size_inch'
data_frame = data_frame.rename(columns={'Screen_Size_cm': 'Screen_Size_inch'})

# Convert the data under the 'Weight_kg' attribute from kilograms to pounds
data_frame['Weight_pounds'] = data_frame['Weight_kg'] * 2.20462

# Modify the name of the attribute from 'Weight_kg' to 'Weight_pounds'
data_frame = data_frame.rename(columns={'Weight_kg': 'Weight_pounds'})

data_frame.head(2)

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_inch,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_pounds,Price,Screen_Size_inch.1,Weight_pounds.1
0,0,Acer,4,IPS Panel,2,1,5,35.56,1.6,8,256,1.6,978,14.000008,3.527392
1,1,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.2,634,15.600008,4.850164


In [15]:
# Normalize the content under the "CPU_frequency" attribute
data_frame["CPU_frequency"] = data_frame["CPU_frequency"] / data_frame["CPU_frequency"].max()

data_frame.head(2)

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_inch,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_pounds,Price,Screen_Size_inch.1,Weight_pounds.1
0,0,Acer,4,IPS Panel,2,1,5,35.56,0.551724,8,256,1.6,978,14.000008,3.527392
1,1,Dell,3,Full HD,1,1,3,39.624,0.689655,4,256,2.2,634,15.600008,4.850164


In [17]:
# Convert the "Screen" attribute into indicator variables
df1 = pd.get_dummies(data_frame["Screen"], prefix="Screen")

# Append df1 into the original data frame
data_frame = pd.concat([data_frame, df1], axis=1)

# Drop the original "Screen" attribute from the data frame
data_frame.drop("Screen", axis=1, inplace=True)

data_frame.head(2)

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,GPU,OS,CPU_core,Screen_Size_inch,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_pounds,Price,Screen_Size_inch.1,Weight_pounds.1,Screen_Full HD,Screen_IPS Panel
0,0,Acer,4,2,1,5,35.56,0.551724,8,256,1.6,978,14.000008,3.527392,0,1
1,1,Dell,3,1,1,3,39.624,0.689655,4,256,2.2,634,15.600008,4.850164,1,0


In [18]:
# Convert the data under the 'Price' attribute from dollars to Euros
data_frame['Price'] = data_frame['Price'] * 0.85

data_frame.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,GPU,OS,CPU_core,Screen_Size_inch,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_pounds,Price,Screen_Size_inch.1,Weight_pounds.1,Screen_Full HD,Screen_IPS Panel
0,0,Acer,4,2,1,5,35.56,0.551724,8,256,1.6,831.3,14.000008,3.527392,0,1
1,1,Dell,3,1,1,3,39.624,0.689655,4,256,2.2,538.9,15.600008,4.850164,1,0
2,2,Dell,3,1,1,7,39.624,0.931034,8,256,2.2,804.1,15.600008,4.850164,1,0
3,3,Dell,4,2,1,5,33.782,0.551724,8,128,1.22,1057.4,13.300007,2.689636,0,1
4,4,HP,4,2,1,7,39.624,0.62069,8,256,1.91,711.45,15.600008,4.210824,1,0


In [20]:
# Normalize the content under the "CPU_frequency" attribute using min-max normalization
data_frame["CPU_frequency"] = (data_frame["CPU_frequency"] - data_frame["CPU_frequency"].min()) / (data_frame["CPU_frequency"].max() - data_frame["CPU_frequency"].min())

data_frame.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,GPU,OS,CPU_core,Screen_Size_inch,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_pounds,Price,Screen_Size_inch.1,Weight_pounds.1,Screen_Full HD,Screen_IPS Panel
0,0,Acer,4,2,1,5,35.56,0.235294,8,256,1.6,831.3,14.000008,3.527392,0,1
1,1,Dell,3,1,1,3,39.624,0.470588,4,256,2.2,538.9,15.600008,4.850164,1,0
2,2,Dell,3,1,1,7,39.624,0.882353,8,256,2.2,804.1,15.600008,4.850164,1,0
3,3,Dell,4,2,1,5,33.782,0.235294,8,128,1.22,1057.4,13.300007,2.689636,0,1
4,4,HP,4,2,1,7,39.624,0.352941,8,256,1.91,711.45,15.600008,4.210824,1,0


## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright © 2023 IBM Corporation. All rights reserved.
