<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

Follow the instructions below to set up this environment for further use.


# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


In [1]:
%pip install seaborn
import piplite

await piplite.install(['nbformat', 'plotly'])

### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [2]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod1.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [3]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

path = URL

await download(path, "dataset.csv")
file_name  = "dataset.csv"

---


# Test Environment


In [6]:
# Keep appending the code generated to this cell, or add more cells below this to execute in parts
#This script reads a CSV file from the given path into a Pandas DataFrame. The first row is treated as headers by default (header=0).

import pandas as pd

# Path to the CSV file
file_path = "dataset.csv"

# Read the CSV into a DataFrame. By default, header=0, so the first row is treated as column names.
df = pd.read_csv(file_path)

# Optional: display basic information about the loaded DataFrame
print(df.head())
print('Rows:', len(df), 'Columns:', len(df.columns))

   Unnamed: 0 Manufacturer  Category     Screen  GPU  OS  CPU_core  \
0           0         Acer         4  IPS Panel    2   1         5   
1           1         Dell         3    Full HD    1   1         3   
2           2         Dell         3    Full HD    1   1         7   
3           3         Dell         4  IPS Panel    2   1         5   
4           4           HP         4    Full HD    2   1         7   

   Screen_Size_cm  CPU_frequency  RAM_GB  Storage_GB_SSD  Weight_kg  Price  
0          35.560            1.6       8             256       1.60    978  
1          39.624            2.0       4             256       2.20    634  
2          39.624            2.7       8             256       2.20    946  
3          33.782            1.6       8             128       1.22   1244  
4          39.624            1.8       8             256       1.91    837  
Rows: 238 Columns: 13


In [9]:
#This code defines a reusable function missing_value_counts that computes, for each column in a Pandas DataFrame, how many entries are missing and what percentage of the total rows those missing entries represent. By default it returns only columns with at least one missing value; set include_all_columns=True to include all columns (with 0 counts for columns with no missing values). Includes a small demonstration when run as a script.

import pandas as pd
from typing import Optional


def missing_value_counts(df: pd.DataFrame, *, include_all_columns: bool = False) -> pd.DataFrame:
    # Compute per-column missing value counts for a DataFrame.
    # Result includes 'column', 'missing_count', 'missing_percentage'.
    if not isinstance(df, pd.DataFrame):
        raise TypeError('df must be a pandas DataFrame')

    total_rows = len(df)
    # Count missing values per column (NaN/NA considered missing)
    missing_counts = df.isna().sum(axis=0)

    # Build a small DataFrame with the results
    result = missing_counts.reset_index().rename(columns={'index': 'column', 0: 'missing_count'})
    result['missing_percentage'] = (result['missing_count'] / max(total_rows, 1)) * 100

    if not include_all_columns:
        # Keep only columns that have at least one missing value
        result = result[result['missing_count'] > 0]

    # Sort by missing_count descending, then by column name for stable output
    result = result.sort_values(by=['missing_count', 'column'], ascending=[False, True]).reset_index(drop=True)
    return result


if __name__ == '__main__':
    # Simple demonstration with a small DataFrame
    df_demo = pd.DataFrame({
        'A': [1, None, 3],
        'B': ['x', 'y', None],
        'C': [None, None, None],
    })
    print('Missing value counts:')
    print(missing_value_counts(df_demo))
    print('\nAll columns (including zero-missing):')
    print(missing_value_counts(df_demo, include_all_columns=True))



Missing value counts:
  column  missing_count  missing_percentage
0      C              3          100.000000
1      A              1           33.333333
2      B              1           33.333333

All columns (including zero-missing):
  column  missing_count  missing_percentage
0      C              3          100.000000
1      A              1           33.333333
2      B              1           33.333333


In [10]:
df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_cm,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
0,0,Acer,4,IPS Panel,2,1,5,35.56,1.6,8,256,1.6,978
1,1,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.2,634
2,2,Dell,3,Full HD,1,1,7,39.624,2.7,8,256,2.2,946
3,3,Dell,4,IPS Panel,2,1,5,33.782,1.6,8,128,1.22,1244
4,4,HP,4,Full HD,2,1,7,39.624,1.8,8,256,1.91,837


In [16]:
df.info()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 238 entries, 0 to 237
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Unnamed: 0      238 non-null    int64  
 1   Manufacturer    238 non-null    object 
 2   Category        238 non-null    int64  
 3   Screen          238 non-null    object 
 4   GPU             238 non-null    int64  
 5   OS              238 non-null    int64  
 6   CPU_core        238 non-null    int64  
 7   Screen_Size_cm  234 non-null    float64
 8   CPU_frequency   238 non-null    float64
 9   RAM_GB          238 non-null    int64  
 10  Storage_GB_SSD  238 non-null    int64  
 11  Weight_kg       233 non-null    float64
 12  Price           238 non-null    int64  
dtypes: float64(3), int64(8), object(2)
memory usage: 22.4+ KB


In [17]:
#This script fills missing values in Screen_Size_cm with the column's most frequent value (mode) and in Weight_kg with the column's mean, using a single fillna call with a per-column mapping. Assumes the DataFrame df contains both columns.

import pandas as pd

# df is the existing DataFrame
# 1) Replace missing values in the categorical column with its most frequent value
screen_size_mode = df['Screen_Size_cm'].mode().iloc[0]

# 2) Replace missing values in the continuous column with its mean value
weight_kg_mean = df['Weight_kg'].mean()

# Apply the replacements
df = df.fillna({
    'Screen_Size_cm': screen_size_mode,
    'Weight_kg': weight_kg_mean
})

In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 238 entries, 0 to 237
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Unnamed: 0      238 non-null    int64  
 1   Manufacturer    238 non-null    object 
 2   Category        238 non-null    int64  
 3   Screen          238 non-null    object 
 4   GPU             238 non-null    int64  
 5   OS              238 non-null    int64  
 6   CPU_core        238 non-null    int64  
 7   Screen_Size_cm  238 non-null    float64
 8   CPU_frequency   238 non-null    float64
 9   RAM_GB          238 non-null    int64  
 10  Storage_GB_SSD  238 non-null    int64  
 11  Weight_kg       238 non-null    float64
 12  Price           238 non-null    int64  
dtypes: float64(3), int64(8), object(2)
memory usage: 22.4+ KB


In [20]:
#Convert the two specified DataFrame columns to float dtype using astype for numeric operations. Assumes the columns exist and contain values that can be represented as floats.

df['Screen_Size_cm'] = df['Screen_Size_cm'].astype(float)
df['Weight_kg'] = df['Weight_kg'].astype(float)

In [21]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 238 entries, 0 to 237
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Unnamed: 0      238 non-null    int64  
 1   Manufacturer    238 non-null    object 
 2   Category        238 non-null    int64  
 3   Screen          238 non-null    object 
 4   GPU             238 non-null    int64  
 5   OS              238 non-null    int64  
 6   CPU_core        238 non-null    int64  
 7   Screen_Size_cm  238 non-null    float64
 8   CPU_frequency   238 non-null    float64
 9   RAM_GB          238 non-null    int64  
 10  Storage_GB_SSD  238 non-null    int64  
 11  Weight_kg       238 non-null    float64
 12  Price           238 non-null    int64  
dtypes: float64(3), int64(8), object(2)
memory usage: 22.4+ KB


In [22]:
# Convert Screen_Size_cm (cm) to Screen_Size_inch and rename the column
# Create a new column with inches and drop the original centimeter column
df['Screen_Size_inch'] = df['Screen_Size_cm'] / 2.54
df = df.drop(columns=['Screen_Size_cm'])
# Convert Weight_kg (kg) to Weight_pounds and rename the column
# Create a new column with pounds and drop the original kilogram column
df['Weight_pounds'] = df['Weight_kg'] * 2.2046226218
df = df.drop(columns=['Weight_kg'])

In [23]:
df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_pounds
0,0,Acer,4,IPS Panel,2,1,5,1.6,8,256,978,14.0,3.527396
1,1,Dell,3,Full HD,1,1,3,2.0,4,256,634,15.6,4.85017
2,2,Dell,3,Full HD,1,1,7,2.7,8,256,946,15.6,4.85017
3,3,Dell,4,IPS Panel,2,1,5,1.6,8,128,1244,13.3,2.68964
4,4,HP,4,Full HD,2,1,7,1.8,8,256,837,15.6,4.210829


In [25]:
#Normalize the CPU_frequency column by its maximum value in place, replacing the original data without creating a new attribute.

df['CPU_frequency'] /= df['CPU_frequency'].max()

In [26]:
df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_pounds
0,0,Acer,4,IPS Panel,2,1,5,0.551724,8,256,978,14.0,3.527396
1,1,Dell,3,Full HD,1,1,3,0.689655,4,256,634,15.6,4.85017
2,2,Dell,3,Full HD,1,1,7,0.931034,8,256,946,15.6,4.85017
3,3,Dell,4,IPS Panel,2,1,5,0.551724,8,128,1244,13.3,2.68964
4,4,HP,4,Full HD,2,1,7,0.62069,8,256,837,15.6,4.210829


In [34]:
Valor_max = df["CPU_frequency"].max()
Valor_max

1.0

In [35]:
#This code performs one-hot encoding on the 'Screen' column, stores the resulting indicator columns in df1, appends them to the original DataFrame, and then removes the original 'Screen' column.

# 1) Convert 'Screen' into indicator variables named 'Screen_<value>'
df1 = pd.get_dummies(df['Screen'], prefix='Screen')

# 2) Append the indicator variables to the original DataFrame
df = df.join(df1)

# 3) Drop the original 'Screen' column from the DataFrame
df.drop(columns=['Screen'], inplace=True)

In [36]:
df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_pounds,Screen_Full HD,Screen_IPS Panel
0,0,Acer,4,2,1,5,0.551724,8,256,978,14.0,3.527396,False,True
1,1,Dell,3,1,1,3,0.689655,4,256,634,15.6,4.85017,True,False
2,2,Dell,3,1,1,7,0.931034,8,256,946,15.6,4.85017,True,False
3,3,Dell,4,2,1,5,0.551724,8,128,1244,13.3,2.68964,False,True
4,4,HP,4,2,1,7,0.62069,8,256,837,15.6,4.210829,True,False


In [38]:
import pandas as pd

# Example exchange rate (this can be dynamically fetched if desired)
usd_to_eur_rate = 0.92  # 1 USD = 0.92 EUR

# Convert the 'Price' column from USD to EUR
df["Price_EUR"] = df["Price"] * usd_to_eur_rate

# Optionally, remove or overwrite the original column
df["Price"] = df["Price"] * usd_to_eur_rate

# Display the updated DataFrame
print(df.head())

   Unnamed: 0 Manufacturer  Category  GPU  OS  CPU_core  CPU_frequency  \
0           0         Acer         4    2   1         5       0.551724   
1           1         Dell         3    1   1         3       0.689655   
2           2         Dell         3    1   1         7       0.931034   
3           3         Dell         4    2   1         5       0.551724   
4           4           HP         4    2   1         7       0.620690   

   RAM_GB  Storage_GB_SSD    Price  Screen_Size_inch  Weight_pounds  \
0       8             256   899.76              14.0       3.527396   
1       4             256   583.28              15.6       4.850170   
2       8             256   870.32              15.6       4.850170   
3       8             128  1144.48              13.3       2.689640   
4       8             256   770.04              15.6       4.210829   

   Screen_Full HD  Screen_IPS Panel  Price_EUR  
0           False              True     899.76  
1            True             

In [39]:
import pandas as pd

# Example: assume df is your DataFrame
# df = pd.read_csv("your_file.csv")

# Perform Min-Max normalization on "CPU_frequency"
min_val = df["CPU_frequency"].min()
max_val = df["CPU_frequency"].max()

df["CPU_frequency_normalized"] = (df["CPU_frequency"] - min_val) / (max_val - min_val)

# Display the updated DataFrame and basic stats
print(df[["CPU_frequency", "CPU_frequency_normalized"]].head())
print(f"\nMin of normalized: {df['CPU_frequency_normalized'].min():.2f}")
print(f"Max of normalized: {df['CPU_frequency_normalized'].max():.2f}")

   CPU_frequency  CPU_frequency_normalized
0       0.551724                  0.235294
1       0.689655                  0.470588
2       0.931034                  0.882353
3       0.551724                  0.235294
4       0.620690                  0.352941

Min of normalized: 0.00
Max of normalized: 1.00


## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright Â© 2023 IBM Corporation. All rights reserved.
