<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

Follow the instructions below to set up this environment for further use.


# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


In [1]:
%pip install seaborn
import piplite

await piplite.install(['nbformat', 'plotly'])

### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [2]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod1.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [3]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

path = URL

await download(path, "dataset.csv")
file_name  = "dataset.csv"

---


# Test Environment


In [None]:
#AI instructions for the type of code to generate

#“Generate plan basic python code without using definition or exceptions or if else statements. Response should be concise.”

In [35]:
# Keep appending the code generated to this cell, or add more cells below this to execute in parts
#Write a Python code that can perform the following tasks:
#Read the CSV file, located on a given file path, into a Pandas data frame, assuming that the first rows of the file are the headers for the data.

import pandas as pd

# Path to the CSV file (update this to your actual file path)
file_path = file_name

# Read the CSV into a DataFrame, treating the first row as the header
df = pd.read_csv(file_path, header=0)

df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_cm,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
0,0,Acer,4,IPS Panel,2,1,5,35.56,1.6,8,256,1.6,978
1,1,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.2,634
2,2,Dell,3,Full HD,1,1,7,39.624,2.7,8,256,2.2,946
3,3,Dell,4,IPS Panel,2,1,5,33.782,1.6,8,128,1.22,1244
4,4,HP,4,Full HD,2,1,7,39.624,1.8,8,256,1.91,837


In [36]:
#Write a Python code that identifies the columns with missing values in a pandas data frame and gives missing value counts per column.

# Compute missing value counts per column
missing_counts = df.isnull().sum()

# Identify columns with at least one missing value
columns_with_missing = missing_counts[missing_counts > 0].index.tolist()

# Output results
print("Missing value counts per column:")
print(missing_counts)

print("Columns with missing values:")
print(columns_with_missing)


Missing value counts per column:
Unnamed: 0        0
Manufacturer      0
Category          0
Screen            0
GPU               0
OS                0
CPU_core          0
Screen_Size_cm    4
CPU_frequency     0
RAM_GB            0
Storage_GB_SSD    0
Weight_kg         5
Price             0
dtype: int64
Columns with missing values:
['Screen_Size_cm', 'Weight_kg']


In [37]:
#Write a Python code to replace the missing values in a pandas data frame, per the following guidelines.
#1. For a categorical attribute "Screen_Size_cm", replace the missing values with the most frequent value in the column.
#2. For a continuous value attribute "Weight_kg", replace the missing values with the mean value of the entries in the column.

# 1) Replace missing Screen_Size_cm values with the most frequent value (mode)
most_freq_screen = df['Screen_Size_cm'].mode().iloc[0]

# 2) Replace missing Weight_kg values with the mean of the column
mean_weight = df['Weight_kg'].mean()

# Apply replacements for both columns in a single operation
df.fillna({'Screen_Size_cm': most_freq_screen, 'Weight_kg': mean_weight}, inplace=True)

In [38]:
#Write a Python code snippet to change the data type of the attributes "Screen_Size_cm" and "Weight_kg" of a data frame to float.

# Convert the specified columns to float (coerce non-numeric values to NaN)
df['Screen_Size_cm'] = pd.to_numeric(df['Screen_Size_cm'], errors='coerce')
df['Weight_kg'] = pd.to_numeric(df['Weight_kg'], errors='coerce')

In [39]:
#Write a Python code to modify the contents under the following attributes of the data frame as required.
#1. Data under 'Screen_Size_cm' is assumed to be in centimeters. Convert this data into inches. Modify the name of the attribute to 'Screen_Size_inch'.
#2. Data under 'Weight_kg' is assumed to be in kilograms. Convert this data into pounds. Modify the name of the attribute to 'Weight_pounds'.

# 1) Convert Screen_Size_cm (cm) to Screen_Size_inch (inches) and drop the old column
df['Screen_Size_inch'] = df['Screen_Size_cm'] / 2.54
df.drop(columns=['Screen_Size_cm'], inplace=True)

# 2) Convert Weight_kg (kg) to Weight_pounds (lb) and drop the old column
df['Weight_pounds'] = df['Weight_kg'] * 2.2046226218
df.drop(columns=['Weight_kg'], inplace=True)

In [40]:
#Write a Python code to normalize the content under the attribute "CPU_frequency" in a data frame df concerning its maximum value. Make changes to the original data, and do not create a new attribute.

# Normalize the 'CPU_frequency' column in place by its maximum value
# Assumes 'df' is a pre-loaded DataFrame containing a numeric 'CPU_frequency' column
max_val = df['CPU_frequency'].max()
df['CPU_frequency'] = df['CPU_frequency'].div(max_val)

In [41]:
#Write a Python code to perform the following tasks.
#1. Convert a data frame df attribute "Screen", into indicator variables, saved as df1, with the naming convention "Screen_<unique value of the attribute>".
#2. Append df1 into the original data frame df.
#3. Drop the original attribute from the data frame df.

# One-hot encode the 'Screen' column into separate indicator columns
# The resulting df1 will have column names like 'Screen_<category>'
df1 = pd.get_dummies(df['Screen'], prefix='Screen')

# Append the new indicator columns to the original dataframe
df = pd.concat([df, df1], axis=1)

# Drop the original 'Screen' column
df.drop(columns=['Screen'], inplace=True)

In [42]:
#Convert the values under a df column named Price from USD to Euros

# Convert the 'Price' column from USD to EUR in place
# Update the exchange rate as needed
USD_TO_EUR = 0.92  # example rate: 1 USD = 0.92 EUR

# In-place conversion
df['Price'] = df['Price'] * USD_TO_EUR

In [45]:
#Write a Python code to normalize the content under the attribute "CPU_frequency" in a data frame df concerning its minimum and maximum value. Make changes to the original data, and do not create a new attribute.

import numpy as np

# In-place min-max normalization of the 'CPU_frequency' column
min_val = df['CPU_frequency'].min()
max_val = df['CPU_frequency'].max()
range_val = max_val - min_val
# Use a safe denominator: 1 when range is 0 to avoid division by zero; this retains 0s when all values are identical
df['CPU_frequency'] = (df['CPU_frequency'] - min_val) / np.where(range_val != 0, range_val, 1)

df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_pounds,Screen_Full HD,Screen_IPS Panel
0,0,Acer,4,2,1,5,0.235294,8,256,899.76,14.0,3.527396,False,True
1,1,Dell,3,1,1,3,0.470588,4,256,583.28,15.6,4.85017,True,False
2,2,Dell,3,1,1,7,0.882353,8,256,870.32,15.6,4.85017,True,False
3,3,Dell,4,2,1,5,0.235294,8,128,1144.48,13.3,2.68964,False,True
4,4,HP,4,2,1,7,0.352941,8,256,770.04,15.6,4.210829,True,False


## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright © 2023 IBM Corporation. All rights reserved.
