<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

Follow the instructions below to set up this environment for further use.


# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


In [2]:
%pip install seaborn
import piplite

await piplite.install(['nbformat', 'plotly'])

### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [3]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod1.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [4]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

path = URL

await download(path, "dataset.csv")

---


# Test Environment


In [5]:
### DATA PREPARATION LAB

# Importing the dataset
## PROMPT 1: Write a Python code that can perform the following tasks.
# Read the CSV file, located on a given file path, into a Pandas data frame, 
#assuming that the first rows of the file are the headers for the data.

import pandas as pd
# Read the CSV file into a Pandas data frame
df = pd.read_csv("dataset.csv")
print(df.head())

# Building the prompt: Handle missing data
## PROMPT 2: Write a Python code that identifies the columns with missing values in a pandas data frame.
def find_missing_columns(df):
    """
    This function identifies the columns in a DataFrame that contain any missing values.
    
    Parameters:
    df (pandas.DataFrame): The DataFrame to check for missing values.
    
    Returns:
    list: A list of column names with at least one missing value.
    """
    try:
        # Check for any NaN values in each column
        missing_columns = df.columns[df.isnull().any()].tolist()
        return missing_columns
    except Exception as e:
        return f"An error occurred: {str(e)}"
    
missing_cols = find_missing_columns(df)
    
if missing_cols:
    print("Columns with missing values:")
    for col in missing_cols:
        print(col)
else:
    print("No columns with missing values found.")


## PROMPT 3: Write a Python code to replace the missing values in a pandas data frame, per the following guidelines.
#1. For a categorical attribute "Screen_Size_cm", replace the missing values with the most frequent value in the column.
#2. For a continuous value attribute "Weight_kg", replace the missing values with the mean value of the entries in the column.

import numpy as np

def impute_missing_values(df):
    """
    Imputes missing values in a DataFrame based on column data type.
    
    Columns 'Screen_Size_cm' are treated as categorical and imputed with the mode,
    while 'Weight_kg' is treated as numerical and imputed with the mean.
    
    Parameters:
    df (pd.DataFrame): Input DataFrame containing 'Screen_Size_cm' and 'Weight_kg' columns.
    
    Returns:
    pd.DataFrame: DataFrame with missing values imputed.
    """
    # Ensure target columns exist
    if 'Screen_Size_cm' not in df.columns or 'Weight_kg' not in df.columns:
        raise ValueError("DataFrame must contain 'Screen_Size_cm' and 'Weight_kg' columns.")
# Separate categorical and continuous columns
    categorical_cols = df.select_dtypes(include=['object']).columns
    numerical_cols = df.select_dtypes(include=['int64', 'float64']).columns
    
    # Impute 'Screen_Size_cm' with mode
    if 'Screen_Size_cm' in categorical_cols:
        mode_value = df['Screen_Size_cm'].mode()[0]
        df['Screen_Size_cm'].fillna(mode_value, inplace=True)
    
    # Impute 'Weight_kg' with mean
    if 'Weight_kg' in numerical_cols:
        mean_value = df['Weight_kg'].mean()
        df['Weight_kg'].fillna(mean_value, inplace=True)

    return df

# Apply given function:
imputed_df = impute_missing_values(df)
print(imputed_df.head())

#Building the prompt: Modify data type
## PROMPT 4: Write a Python code snippet to change the data type of the attributes "Screen_Size_cm" and "Weight_kg" of a data frame to float.
def change_dtypes_to_float(df):
    """
    Changes the data type of specified columns in a DataFrame to float.
    
    Parameters:
    df (pd.DataFrame): Input DataFrame containing 'Screen_Size_cm' and 'Weight_kg' columns.
    
    Returns:
    pd.DataFrame: DataFrame with specified columns converted to float.
    """
    # Check and change the data types
    target_columns = ['Screen_Size_cm', 'Weight_kg']
    
    for col in target_columns:
        if col in df.columns:
            df[col] = df[col].astype(float)
    
    return df

# Apply change_dtypes_to_float function
float_imputed_df = change_dtypes_to_float(imputed_df)
print(float_imputed_df.head())

# Building the prompt: Standardization and Normalization
## PROMPT 5: Write a Python code to modify the contents under the following attributes of the data frame as required.
#1. Data under 'Screen_Size_cm' is assumed to be in centimeters. Convert this data into inches. Modify the name of the attribute to 'Screen_Size_inch'.
#2. Data under 'Weight_kg' is assumed to be in kilograms. Convert this data into pounds. Modify the name of the attribute to 'Weight_pounds'.

def convert_units_and_rename(df):
    """
    Converts units of 'Screen_Size_cm' to inches and 'Weight_kg' to pounds,
    then renames the columns accordingly.
    
    Parameters:
    df (pd.DataFrame): Input DataFrame with 'Screen_Size_cm' and 'Weight_kg' columns.
    
    Returns:
    pd.DataFrame: DataFrame with converted data and renamed columns.
    """
    # Define conversion factors
    cm_to_inch = 0.393701
    kg_to_pound = 2.20462
    
    # Perform conversions
    df['Screen_Size_inch'] = df['Screen_Size_cm'].apply(lambda x: x * cm_to_inch if pd.notna(x) else x)
    df['Weight_pounds'] = df['Weight_kg'].apply(lambda x: x * kg_to_pound if pd.notna(x) else x)
    
    # Drop old columns and rename new ones
    df.drop(columns=['Screen_Size_cm', 'Weight_kg'], inplace=True)
    
    return df

# Apply conver_units_and_rename function:
new_df = convert_units_and_rename(float_imputed_df)
print(new_df.head())

## PROMPT 6: Write a Python code to normalize the content under the attribute "CPU_frequency" in a data frame df concerning its maximum value. 
# Make changes to the original data, and do not create a new attribute.
import numpy as np

def normalize_cpu_frequency(df):
    """
    Normalizes the 'CPU_frequency' column in the DataFrame to have values between 0 and 1.
    
    Args:
    df (pd.DataFrame): The input DataFrame containing 'CPU_frequency' column.
    
    Returns:
    pd.DataFrame: DataFrame with 'CPU_frequency' normalized.
    """
    # Ensure 'CPU_frequency' is present in the DataFrame
    if 'CPU_frequency' not in df.columns:
        raise ValueError("DataFrame must contain 'CPU_frequency' column.")

    # Find the maximum value in 'CPU_frequency'
    max_value = df['CPU_frequency'].max()
    
    # Handle the case where max_value is zero to avoid division by zero
    if max_value == 0:
        df['CPU_frequency'] = 0
    else:
        df['CPU_frequency'] = df['CPU_frequency'].apply(lambda x: x / max_value)
    
    return df

# Apply normalize_cpu_frequency function:
max_normalized_df = normalize_cpu_frequency(new_df)
print(max_normalized_df.head())
    
# Building the prompt: Categorical to Numerical
## PROMPT 7: Write a Python code to perform the following tasks.
# 1. Convert a data frame df attribute "Screen", into indicator variables, saved as df1, with the naming convention "Screen_<unique value of the attribute>".
# 2. Append df1 into the original data frame df.
# 3. Drop the original attribute from the data frame df.

def convert_and_append_indicator_vars(df):
    """
    Converts 'Screen' column in DataFrame df into indicator (dummy) variables, appends them back to df,
    and removes the original 'Screen' column.
    
    Args:
    df (pd.DataFrame): The input DataFrame containing 'Screen' column.
    
    Returns:
    pd.DataFrame: DataFrame with 'Screen' column converted to indicator variables and appended.
    """
    # Ensure 'Screen' column exists in the DataFrame
    if 'Screen' not in df.columns:
         raise ValueError("'Screen' column must be present in the DataFrame.")
    
    # Convert 'Screen' column to indicator variables
    df1 = pd.get_dummies(df, columns=['Screen'], prefix='Screen')
    
    # Append df1 to df
    df = pd.concat([df, df1], axis=1)
    
    # Drop the original 'Screen' column
    df.drop(columns=['Screen'], inplace=True)
     
    return df

# Apply convert_and_append_indicator_vars function:
cat_to_num_df = convert_and_append_indicator_vars(max_normalized_df)
print(cat_to_num_df.head())

# Practice Problems
## PROMPT 8: Create a prompt to generate a Python code that converts the values under Price from USD to Euros
def convert_price_to_eur(df, conversion_rate):
    """
    Converts 'Price' column from USD to EUR in the DataFrame.
    
    Args:
    df (pd.DataFrame): The input DataFrame containing 'Price' column in USD.
    conversion_rate (float): Conversion rate from USD to EUR (e.g., 0.85 for 1 EUR = 0.85 USD).
    
    Returns:
    pd.DataFrame: DataFrame with 'Price' converted to EUR.
    """
    if 'Price' not in df.columns:
        raise ValueError("DataFrame must contain 'Price' column.")
    
    # Convert 'Price' to EUR
    df['Price'] *= conversion_rate
    
    return df

# Apply convert_price_to_eur function:
conversion_rate = 0.85
df_euros = convert_price_to_eur(cat_to_num_df, conversion_rate)
print(df_euros.head())

## PROMPT 9: Modify the normalization prompt to perform min-max normalization on the CPU_frequency parameter.
def min_max_normalize_cpu_frequency(df):
    """
    Performs min-max normalization on the 'CPU_frequency' column of a DataFrame.
    
    Args:
    df (pd.DataFrame): The input DataFrame containing 'CPU_frequency' column.
    
    Returns:
    pd.DataFrame: DataFrame with 'CPU_frequency' normalized using min-max normalization.
    """
    # Ensure 'CPU_frequency' column exists in the DataFrame
    if 'CPU_frequency' not in df.columns:
        raise ValueError("DataFrame must contain 'CPU_frequency' column.")

    # Find min and max values for normalization
    min_value = df['CPU_frequency'].min()
    max_value = df['CPU_frequency'].max()
    
    # Apply min-max normalization directly on 'CPU_frequency' column
    df['CPU_frequency'] = (df['CPU_frequency'] - min_value) / (max_value - min_value)

    return df
# Apply convert_price_to_eur function:
min_max_normalized_df = min_max_normalize_cpu_frequency(df_euros)
min_max_normalized_df.head()


Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


   Unnamed: 0 Manufacturer  Category     Screen  GPU  OS  CPU_core  \
0           0         Acer         4  IPS Panel    2   1         5   
1           1         Dell         3    Full HD    1   1         3   
2           2         Dell         3    Full HD    1   1         7   
3           3         Dell         4  IPS Panel    2   1         5   
4           4           HP         4    Full HD    2   1         7   

   Screen_Size_cm  CPU_frequency  RAM_GB  Storage_GB_SSD  Weight_kg  Price  
0          35.560            1.6       8             256       1.60    978  
1          39.624            2.0       4             256       2.20    634  
2          39.624            2.7       8             256       2.20    946  
3          33.782            1.6       8             128       1.22   1244  
4          39.624            1.8       8             256       1.91    837  
Columns with missing values:
Screen_Size_cm
Weight_kg


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Weight_kg'].fillna(mean_value, inplace=True)


   Unnamed: 0 Manufacturer  Category     Screen  GPU  OS  CPU_core  \
0           0         Acer         4  IPS Panel    2   1         5   
1           1         Dell         3    Full HD    1   1         3   
2           2         Dell         3    Full HD    1   1         7   
3           3         Dell         4  IPS Panel    2   1         5   
4           4           HP         4    Full HD    2   1         7   

   Screen_Size_cm  CPU_frequency  RAM_GB  Storage_GB_SSD  Weight_kg  Price  
0          35.560            1.6       8             256       1.60    978  
1          39.624            2.0       4             256       2.20    634  
2          39.624            2.7       8             256       2.20    946  
3          33.782            1.6       8             128       1.22   1244  
4          39.624            1.8       8             256       1.91    837  
   Unnamed: 0 Manufacturer  Category     Screen  GPU  OS  CPU_core  \
0           0         Acer         4  IPS Panel

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,...,OS.1,CPU_core.1,CPU_frequency.1,RAM_GB.1,Storage_GB_SSD.1,Price.1,Screen_Size_inch,Weight_pounds,Screen_Full HD,Screen_IPS Panel
0,0,Acer,4,2,1,5,0.235294,8,256,831.3,...,1,5,0.235294,8,256,831.3,14.000008,3.527392,False,True
1,1,Dell,3,1,1,3,0.470588,4,256,538.9,...,1,3,0.470588,4,256,538.9,15.600008,4.850164,True,False
2,2,Dell,3,1,1,7,0.882353,8,256,804.1,...,1,7,0.882353,8,256,804.1,15.600008,4.850164,True,False
3,3,Dell,4,2,1,5,0.235294,8,128,1057.4,...,1,5,0.235294,8,128,1057.4,13.300007,2.689636,False,True
4,4,HP,4,2,1,7,0.352941,8,256,711.45,...,1,7,0.352941,8,256,711.45,15.600008,4.210824,True,False


## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright © 2023 IBM Corporation. All rights reserved.
