<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

Follow the instructions below to set up this environment for further use.


# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


In [1]:
%pip install seaborn nbformat plotly

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.1.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
import piplite

await piplite.install(['nbformat', 'plotly'])

### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [2]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod1.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [3]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

path = URL

await download(path, "dataset.csv")
file_name  = "dataset.csv"

---


# Test Environment


In [9]:
# Keep appending the code generated to this cell, or add more cells below this to execute in parts
import pandas as pd
import numpy as np
# Specify the file path
file_path = URL
# Read the CSV file into a Pandas data frame
df = pd.read_csv(file_path)

In [4]:
import urllib.request

# Function to download the CSV file from URL
def download_file(URL, local_filename):
    with urllib.request.urlopen(URL) as response, open(local_filename, 'wb') as out_file:
        out_file.write(response.read())

# Local path to save the downloaded file
local_filename = "laptop_pricing_dataset_mod1.csv"

# Download the file
download_file(URL, local_filename)

# Read the downloaded CSV file into a pandas DataFrame
df = pd.read_csv(local_filename)

# Display the DataFrame
print(df.head())

# Optionally clean up: Remove the local file if not needed
import os
if os.path.exists(local_filename):
    os.remove(local_filename)
    print("Local file deleted.")

   Unnamed: 0 Manufacturer  Category     Screen  GPU  OS  CPU_core  \
0           0         Acer         4  IPS Panel    2   1         5   
1           1         Dell         3    Full HD    1   1         3   
2           2         Dell         3    Full HD    1   1         7   
3           3         Dell         4  IPS Panel    2   1         5   
4           4           HP         4    Full HD    2   1         7   

   Screen_Size_cm  CPU_frequency  RAM_GB  Storage_GB_SSD  Weight_kg  Price  
0          35.560            1.6       8             256       1.60    978  
1          39.624            2.0       4             256       2.20    634  
2          39.624            2.7       8             256       2.20    946  
3          33.782            1.6       8             128       1.22   1244  
4          39.624            1.8       8             256       1.91    837  
Local file deleted.


In [5]:
# Assuming you already have a Pandas data frame named 'df'
# Identify columns with missing values
columns_with_missing_values = df.columns[df.isnull().any()]

In [6]:
def find_missing_columns(df):
    """
    Identify columns with missing values in a DataFrame.
    
    Parameters:
    df (pd.DataFrame): Input DataFrame to analyze
    
    Returns:
    list: List of column names that contain missing values
    """
    # Identify columns that have missing values
    missing_values_presence = df.isnull().sum()
    
    # Filter columns that have at least one missing value
    cols_with_missing = missing_values_presence[missing_values_presence > 0].index.tolist()
    
    return cols_with_missing

# Sample usage
if __name__ == "__main__":
    # Assuming df is your DataFrame
    # df = pd.read_csv('path_to_your_csv_file.csv')
    
    example_df = pd.DataFrame({
        'A': [1, 2, None, 4],
        'B': [5, None, 7, 8],
        'C': [9, 10, 11, 12]
    })
    
    missing_cols = find_missing_columns(example_df)
    
    if missing_cols:
        print("Columns with missing values: ", missing_cols)
    else:
        print("No columns have missing values.")

Columns with missing values:  ['A', 'B']


In [7]:
# Replace missing values in the 'Screen_Size_cm' column with the most frequent value
most_frequent_value = df['Screen_Size_cm'].mode()[0]
df['Screen_Size_cm'].fillna(most_frequent_value, inplace=True)
# Replace missing values in the 'Weight_kg' column with the mean value
mean_value = df['Weight_kg'].mean()
df['Weight_kg'].fillna(mean_value, inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Screen_Size_cm'].fillna(most_frequent_value, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Weight_kg'].fillna(mean_value, inplace=True)


In [10]:
def fill_missing_values(df):
    """
    Replace missing values in a DataFrame.
    
    For 'Screen_Size_cm' column, replace missing values with the mode.
    For 'Weight_kg' column, replace missing values with the mean.
    
    Parameters:
    df (pd.DataFrame): Input DataFrame
    
    Returns:
    pd.DataFrame: DataFrame with filled missing values
    """
    
    # Copies of columns to work with
    screen_size_cm = df['Screen_Size_cm'].copy()
    weight_kg = df['Weight_kg'].copy()
    
    # Find mode for 'Screen_Size_cm'
    mode_screen_size = screen_size_cm.mode()[0]
    
    # Fill NaNs in 'Screen_Size_cm' with mode
    screen_size_cm.fillna(mode_screen_size, inplace=True)
    
    # Calculate mean for 'Weight_kg'
    mean_weight = weight_kg.mean()
    
    # Fill NaNs in 'Weight_kg' with mean
    weight_kg.fillna(mean_weight, inplace=True)
    
    # Update DataFrame
    df['Screen_Size_cm'] = screen_size_cm
    df['Weight_kg'] = weight_kg
    
    return df

# Sample usage
if __name__ == "__main__":
    # Assuming df is your DataFrame
    df = pd.DataFrame({
        'Screen_Size_cm': ['14', '15.6', np.nan, '13.3', np.nan],
        'Weight_kg': [1.8, 2.2, np.nan, 1.9, 2.1]
    })
    
    filled_df = fill_missing_values(df)
    
    print(filled_df)

  Screen_Size_cm  Weight_kg
0             14        1.8
1           15.6        2.2
2           13.3        2.0
3           13.3        1.9
4           13.3        2.1


In [11]:
# Change the data type of 'Screen_Size_cm' and 'Weight_kg' to float
df['Screen_Size_cm'] = df['Screen_Size_cm'].astype(float)
df['Weight_kg'] = df['Weight_kg'].astype(float)

In [12]:
def convert_columns_to_float(df, columns):
    """
    Convert specified columns in a DataFrame to float.
    
    Parameters:
    df (pd.DataFrame): Input DataFrame
    columns (list): List of column names to convert to float
    
    Returns:
    pd.DataFrame: DataFrame with specified columns converted to float
    """
    for col in columns:
        if col in df.columns:
            df[col] = df[col].astype(float)
    return df

# Sample usage
if __name__ == "__main__":
    # Assuming df is your DataFrame
    df = pd.DataFrame({
        'Screen_Size_cm': ['14.0', '15.6', '13.3', '14.1'],
        'Weight_kg': ['1.8', '2.2', '1.9', '2.1']
    })
    
    columns_to_convert = ['Screen_Size_cm', 'Weight_kg']
    converted_df = convert_columns_to_float(df, columns_to_convert)
    
    print(converted_df)

   Screen_Size_cm  Weight_kg
0            14.0        1.8
1            15.6        2.2
2            13.3        1.9
3            14.1        2.1


In [13]:
# Convert 'Screen_Size_cm' from centimeters to inches and modify the attribute name
df['Screen_Size_inch'] = df['Screen_Size_cm'] * 0.393701
df.drop('Screen_Size_cm', axis=1, inplace=True)
# Convert 'Weight_kg' from kilograms to pounds and modify the attribute name
df['Weight_pounds'] = df['Weight_kg'] * 2.20462
df.drop('Weight_kg', axis=1, inplace=True)


In [14]:
def convert_units(df):
    """
    Convert 'Screen_Size_cm' to 'Screen_Size_inch' and 'Weight_kg' to 'Weight_pounds'.
    
    Parameters:
    df (pd.DataFrame): Input DataFrame
    
    Returns:
    pd.DataFrame: DataFrame with converted and renamed columns
    """
    # Conversion logic
    df['Screen_Size_inch'] = (df['Screen_Size_cm'] * 0.393701).astype(float)
    df['Weight_pounds'] = (df['Weight_kg'] * 2.20462).astype(float)
    
    # Drop the original columns
    df.drop(columns=['Screen_Size_cm', 'Weight_kg'], inplace=True)
    
    return df

# Sample usage
if __name__ == "__main__":
    # Assuming df is your DataFrame
    df = pd.DataFrame({
        'Screen_Size_cm': [14, 15.6, 13.3, 14.1],
        'Weight_kg': [1.8, 2.2, 1.9, 2.1]
    })
    
    converted_df = convert_units(df)
    
    print(converted_df)

   Screen_Size_inch  Weight_pounds
0          5.511814       3.968316
1          6.141736       4.850164
2          5.236223       4.188778
3          5.551184       4.629702


In [15]:
# Normalize the content under 'CPU_frequency' with respect to its maximum value
max_value = df['CPU_frequency'].max()
df['CPU_frequency'] = df['CPU_frequency'] / max_value

KeyError: 'CPU_frequency'

In [16]:
def normalize_cpu_frequency(df):
    """
    Normalize 'CPU_frequency' column by making its max value 1.
    
    Parameters:
    df (pd.DataFrame): Input DataFrame with 'CPU_frequency' column
    
    Returns:
    pd.DataFrame: DataFrame with 'CPU_frequency' normalized
    """
    # Find the maximum value in 'CPU_frequency'
    max_cpu_freq = df['CPU_frequency'].max()
    
    # Normalize 'CPU_frequency' by dividing each value by max_cpu_freq
    df['CPU_frequency'] = df['CPU_frequency'] / max_cpu_freq
    
    return df

# Sample usage
if __name__ == "__main__":
    # Assuming df is your DataFrame
    df = pd.DataFrame({
        'CPU_frequency': [2000, 2500, 3000, 1500]
    })
    
    normalized_df = normalize_cpu_frequency(df)
    
    print(normalized_df)

   CPU_frequency
0       0.666667
1       0.833333
2       1.000000
3       0.500000


In [17]:
# Convert the 'Screen' attribute into indicator variables
df1 = pd.get_dummies(df['Screen'], prefix='Screen')
# Append df1 into the original data frame df
df = pd.concat([df, df1], axis=1)
# Drop the original 'Screen' attribute from the data frame
df.drop('Screen', axis=1, inplace=True)

KeyError: 'Screen'

In [18]:
def convert_to_indicators(df):
    """
    Convert 'Screen' column into indicator variables and append them to the original DataFrame.
    
    Parameters:
    df (pd.DataFrame): Input DataFrame with 'Screen' column
    
    Returns:
    pd.DataFrame: DataFrame with 'Screen' converted to indicator variables and appended back
    """
    # Create dummy variables (indicator variables) for the 'Screen' column
    df_indicators = pd.get_dummies(df, columns=['Screen'], prefix='Screen')
    
    # Append the indicator columns to the original DataFrame
    df = pd.concat([df, df_indicators], axis=1)
    
    # Drop the original 'Screen' column
    df.drop(columns=['Screen'], inplace=True)
    
    return df

# Sample usage
if __name__ == "__main__":
    # Assuming df is your original DataFrame
    df = pd.DataFrame({
        'ID': [1, 2, 3, 4],
        'Value': [10, 20, 30, 40],
        'Screen': ['A', 'B', 'A', 'C']
    })
    
    df1 = convert_to_indicators(df)
    
    print(df1)

   ID  Value  ID  Value  Screen_A  Screen_B  Screen_C
0   1     10   1     10      True     False     False
1   2     20   2     20     False      True     False
2   3     30   3     30      True     False     False
3   4     40   4     40     False     False      True


In [19]:
def convert_price_to_euro(df, exchange_rate=0.85):
    """
    Convierte los valores de 'Price' del DataFrame de USD a EUR. 
    
    Los valores se multiplican por el tipo de cambio proporcionado (0.85 por defecto, corresponde a 1 USD = 0.85 EUR).
    
    Parámetros:
    df (pd.DataFrame): DataFrame original con columna 'Price' en USD.
    exchange_rate (float): Tipo de cambio a utilizar (opcional, se utiliza 0.85 por defecto).
    
    Devuelve:
    pd.DataFrame: DataFrame original con la columna 'Price' convertida a EUR.
    """
    df['Price_EUR'] = df['Price'] * exchange_rate
    df.drop('Price', axis=1, inplace=True)
    df.rename(columns={'Price_EUR': 'Price'}, inplace=True)
    
    return df

# Muestreo para el uso
if __name__ == "__main__":
    # Supongamos que df es tu DataFrame original
    df = pd.DataFrame({
        'ID': [1, 2, 3, 4], 
        'Item': ['Laptop', 'Phone', 'Tablet', 'Glasses'],
        'Price': [899, 249, 399, 149]
    })

    df_euro = convert_price_to_euro(df)

    print(df_euro)

   ID     Item   Price
0   1   Laptop  764.15
1   2    Phone  211.65
2   3   Tablet  339.15
3   4  Glasses  126.65


In [20]:
def normalize_cpu_frequency(df):
    """
    Normaliza la columna 'CPU_frequency' del DataFrame mediante la normalización min-max.
    
    Parámetros:
    df (pd.DataFrame): DataFrame original con una columna 'CPU_frequency'.
    
    Devuelve:
    pd.DataFrame: DataFrame original con la columna 'CPU_frequency' normalizada.
    """
    # Calcula el valor mínimo y máximo de 'CPU_frequency'
    min_cpu_freq = df['CPU_frequency'].min()
    max_cpu_freq = df['CPU_frequency'].max()
    
    # Aplica la normalización min-max
    df['CPU_frequency_normalized'] = (df['CPU_frequency'] - min_cpu_freq) / (max_cpu_freq - min_cpu_freq)
    
    # Elimine la columna original 'CPU_frequency'
    df.drop(columns=['CPU_frequency'], inplace=True)
    
    # Renomee la nueva columna para reflejar su función
    df.rename(columns={'CPU_frequency_normalized': 'CPU_frequency'}, inplace=True)
    
    return df

# Ejemplo de uso
if __name__ == "__main__":
    # Supongamos que df es tu DataFrame original
    df = pd.DataFrame({
        'ID': [1, 2, 3, 4], 
        'Item': ['Laptop', 'Phone', 'Tablet', 'Glasses'],
        'CPU_frequency': [2000, 2500, 3000, 1500]
    })

    df_normalized = normalize_cpu_frequency(df)

    print(df_normalized)

   ID     Item  CPU_frequency
0   1   Laptop       0.333333
1   2    Phone       0.666667
2   3   Tablet       1.000000
3   4  Glasses       0.000000


## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright © 2023 IBM Corporation. All rights reserved.
