<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

Follow the instructions below to set up this environment for further use.


# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


In [2]:
%pip install seaborn
import piplite

await piplite.install(['nbformat', 'plotly'])

### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [3]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod1.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [4]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

path = URL

await download(path, "dataset.csv")

---


# Test Environment


In [None]:
# Keep appending the code generated to this cell, or add more cells below this to execute in parts


In [16]:
import pandas as pd
file_name="dataset.csv"
# Read the CSV file into a Pandas data frame
df = pd.read_csv(file_name)

# Display the data frame
df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_cm,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
0,0,Acer,4,IPS Panel,2,1,5,35.56,1.6,8,256,1.6,978
1,1,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.2,634
2,2,Dell,3,Full HD,1,1,7,39.624,2.7,8,256,2.2,946
3,3,Dell,4,IPS Panel,2,1,5,33.782,1.6,8,128,1.22,1244
4,4,HP,4,Full HD,2,1,7,39.624,1.8,8,256,1.91,837


In [17]:
# Identify columns with missing values
missing_values = df.columns[df.isnull().any()].tolist()
print("Columns with missing values:", missing_values)

Columns with missing values: ['Screen_Size_cm', 'Weight_kg']


In [18]:
# Replace missing values in the "Screen_Size_cm" column with the most frequent value
most_frequent_screen_size = df['Screen_Size_cm'].mode()[0]
df['Screen_Size_cm'].fillna(most_frequent_screen_size, inplace=True)

# Replace missing values in the "Weight_kg" column with the mean value
mean_weight = df['Weight_kg'].mean()
df['Weight_kg'].fillna(mean_weight, inplace=True)

df.head()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Screen_Size_cm'].fillna(most_frequent_screen_size, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Weight_kg'].fillna(mean_weight, inplace=True)


Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_cm,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
0,0,Acer,4,IPS Panel,2,1,5,35.56,1.6,8,256,1.6,978
1,1,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.2,634
2,2,Dell,3,Full HD,1,1,7,39.624,2.7,8,256,2.2,946
3,3,Dell,4,IPS Panel,2,1,5,33.782,1.6,8,128,1.22,1244
4,4,HP,4,Full HD,2,1,7,39.624,1.8,8,256,1.91,837


In [19]:
df['Screen_Size_cm'] = df['Screen_Size_cm'].astype(float)
df['Weight_kg'] = df['Weight_kg'].astype(float)

print(df.dtypes)

Unnamed: 0          int64
Manufacturer       object
Category            int64
Screen             object
GPU                 int64
OS                  int64
CPU_core            int64
Screen_Size_cm    float64
CPU_frequency     float64
RAM_GB              int64
Storage_GB_SSD      int64
Weight_kg         float64
Price               int64
dtype: object


In [20]:
# Convert 'Screen_Size_cm' from centimeters to inches and rename the column
df['Screen_Size_inch'] = df['Screen_Size_cm'] * 0.393701
df.drop(columns=['Screen_Size_cm'], inplace=True)

# Convert 'Weight_kg' from kilograms to pounds and rename the column
df['Weight_pounds'] = df['Weight_kg'] * 2.20462
df.drop(columns=['Weight_kg'], inplace=True)
df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_pounds
0,0,Acer,4,IPS Panel,2,1,5,1.6,8,256,978,14.000008,3.527392
1,1,Dell,3,Full HD,1,1,3,2.0,4,256,634,15.600008,4.850164
2,2,Dell,3,Full HD,1,1,7,2.7,8,256,946,15.600008,4.850164
3,3,Dell,4,IPS Panel,2,1,5,1.6,8,128,1244,13.300007,2.689636
4,4,HP,4,Full HD,2,1,7,1.8,8,256,837,15.600008,4.210824


In [21]:
# Normalize 'CPU_frequency' column
max_cpu_frequency = df['CPU_frequency'].max()
df['CPU_frequency'] = df['CPU_frequency'] / max_cpu_frequency

df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_pounds
0,0,Acer,4,IPS Panel,2,1,5,0.551724,8,256,978,14.000008,3.527392
1,1,Dell,3,Full HD,1,1,3,0.689655,4,256,634,15.600008,4.850164
2,2,Dell,3,Full HD,1,1,7,0.931034,8,256,946,15.600008,4.850164
3,3,Dell,4,IPS Panel,2,1,5,0.551724,8,128,1244,13.300007,2.689636
4,4,HP,4,Full HD,2,1,7,0.62069,8,256,837,15.600008,4.210824


In [22]:
# Convert 'Screen' attribute into indicator variables
df1 = pd.get_dummies(df['Screen'], prefix='Screen')

# Append df1 into the original data frame df
df = pd.concat([df, df1], axis=1)

# Drop the original 'Screen' attribute from the data frame df
df.drop(columns=['Screen'], inplace=True)

df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_pounds,Screen_Full HD,Screen_IPS Panel
0,0,Acer,4,2,1,5,0.551724,8,256,978,14.000008,3.527392,False,True
1,1,Dell,3,1,1,3,0.689655,4,256,634,15.600008,4.850164,True,False
2,2,Dell,3,1,1,7,0.931034,8,256,946,15.600008,4.850164,True,False
3,3,Dell,4,2,1,5,0.551724,8,128,1244,13.300007,2.689636,False,True
4,4,HP,4,2,1,7,0.62069,8,256,837,15.600008,4.210824,True,False


In [23]:
# Convert 'Price' from USD to Euros and rename the column
df['Euro_Price'] = df['Price'] * 0.85  # Assuming 1 USD = 0.85 Euros
df.drop(columns=['Price'], inplace=True)

In [24]:
# Normalize 'CPU_frequency' column
min_cpu_frequency = df['CPU_frequency'].min()
max_cpu_frequency = df['CPU_frequency'].max()
df['CPU_frequency'] = (df['CPU_frequency'] - min_cpu_frequency) / (max_cpu_frequency - min_cpu_frequency)

df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Screen_Size_inch,Weight_pounds,Screen_Full HD,Screen_IPS Panel,Euro_Price
0,0,Acer,4,2,1,5,0.235294,8,256,14.000008,3.527392,False,True,831.3
1,1,Dell,3,1,1,3,0.470588,4,256,15.600008,4.850164,True,False,538.9
2,2,Dell,3,1,1,7,0.882353,8,256,15.600008,4.850164,True,False,804.1
3,3,Dell,4,2,1,5,0.235294,8,128,13.300007,2.689636,False,True,1057.4
4,4,HP,4,2,1,7,0.352941,8,256,15.600008,4.210824,True,False,711.45


## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright © 2023 IBM Corporation. All rights reserved.
