<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

Follow the instructions below to set up this environment for further use.


# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


In [26]:
%pip install seaborn
import piplite

await piplite.install(['nbformat', 'plotly'])

### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [27]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod1.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [28]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

path = URL

await download(path, "dataset.csv")

In [29]:
import pandas as pd

# Specify the file path
file_path = "dataset.csv"

# Read the CSV file into a Pandas data frame
df = pd.read_csv(file_path)

# Assuming the first rows of the file are the headers, you don't need to specify any additional parameters

# Additional details:
# - The `pd.read_csv()` function is used to read a CSV file into a Pandas data frame.
# - By default, it assumes that the first row of the file contains the headers for the data.
# - If your file doesn't have headers, you can specify `header=None` as an additional parameter.
# - You can also specify other parameters, such as `sep` to specify the delimiter used in the file.
# - Make sure you have the Pandas library installed in your Python environment before running this code.

In [30]:
df


Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_cm,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
0,0,Acer,4,IPS Panel,2,1,5,35.560,1.6,8,256,1.60,978
1,1,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.20,634
2,2,Dell,3,Full HD,1,1,7,39.624,2.7,8,256,2.20,946
3,3,Dell,4,IPS Panel,2,1,5,33.782,1.6,8,128,1.22,1244
4,4,HP,4,Full HD,2,1,7,39.624,1.8,8,256,1.91,837
...,...,...,...,...,...,...,...,...,...,...,...,...,...
233,233,Lenovo,4,IPS Panel,2,1,7,35.560,2.6,8,256,1.70,1891
234,234,Toshiba,3,Full HD,2,1,5,33.782,2.4,8,256,1.20,1950
235,235,Lenovo,4,IPS Panel,2,1,5,30.480,2.6,8,256,1.36,2236
236,236,Lenovo,3,Full HD,3,1,5,39.624,2.5,6,256,2.40,883


In [31]:
def find_missing_values(df):
    missing_values = df.isnull().sum()
    columns_with_missing_values = missing_values[missing_values > 0].index.tolist()
    return columns_with_missing_values

# Assuming 'data_frame' is the pandas data frame you want to check for missing values
columns_with_missing_values = find_missing_values(df)

# Display the columns with missing values
print("Columns with missing values:", columns_with_missing_values)


Columns with missing values: ['Screen_Size_cm', 'Weight_kg']


# Test Environment


In [32]:
import numpy as np

# Step 2: Creating a sample DataFrame
data = {
    'User_ID': [1, 2, 3, 4, 5, 6],
    'Screen_Size_cm': ['15', np.nan, '15', '20', '15', '20'],
    'Weight_kg': [70, 80, np.nan, 65, 72, 85]
}

df = pd.DataFrame(data)

# Step 3: Replace missing values

# For the categorical column 'Screen_Size_cm'
most_frequent_value = df['Screen_Size_cm'].mode()[0]
df['Screen_Size_cm'].fillna(most_frequent_value, inplace=True)

# For the continuous column 'Weight_kg'
mean_weight = df['Weight_kg'].mean()
df['Weight_kg'].fillna(mean_weight, inplace=True)

# Display the resulting DataFrame
print(df)



The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Screen_Size_cm'].fillna(most_frequent_value, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Weight_kg'].fillna(mean_weight, inplace=True)


   User_ID Screen_Size_cm  Weight_kg
0        1             15       70.0
1        2             15       80.0
2        3             15       74.4
3        4             20       65.0
4        5             15       72.0
5        6             20       85.0


In [33]:
# Step 2: Creating a sample DataFrame
data = {
    'Screen_Size_cm': ['15', '15', '15', '20', np.nan, '20'],
    'Weight_kg': [70, 80, np.nan, 65, 72, 85]
}

df = pd.DataFrame(data)

# Step 3: Change data types

# Convert 'Screen_Size_cm' to float, converting strings into numeric values (e.g., '15' to 15.0)
df['Screen_Size_cm'] = df['Screen_Size_cm'].astype('float64')

# Convert 'Weight_kg' to float directly (already numeric)
df['Weight_kg'] = df['Weight_kg'].astype('float64')

# Display the resulting DataFrame with new data types
print(df)

   Screen_Size_cm  Weight_kg
0            15.0       70.0
1            15.0       80.0
2            15.0        NaN
3            20.0       65.0
4             NaN       72.0
5            20.0       85.0


In [34]:
# Step 2: Creating a sample DataFrame
data = {
    'Screen_Size_cm': ['15', '15', '15', '20', np.nan, '20'],
    'Weight_kg': [70, 80, np.nan, 65, 72, 85]
}

df = pd.DataFrame(data)

# Step 3: Convert to numeric and then apply conversion functions

# Convert 'Screen_Size_cm' column to float first (equal to '15', '20', NaN as floats)
df['Screen_Size_cm'] = pd.to_numeric(df['Screen_Size_cm'], errors='coerce')

# Function to convert centimeters to inches
def cm_to_inch(cm):
    return cm / 2.54

# Function to convert kilograms to pounds
def kg_to_pounds(kg):
    return kg * 2.20462

# Apply conversions and rename columns
df['Screen_Size_inch'] = df['Screen_Size_cm'].apply(cm_to_inch)
df['Weight_pounds'] = df['Weight_kg'].apply(kg_to_pounds)

# Drop the original columns after transformation
df.drop(columns=['Screen_Size_cm', 'Weight_kg'], inplace=True)

# Display the resulting DataFrame
print(df)



   Screen_Size_inch  Weight_pounds
0          5.905512      154.32340
1          5.905512      176.36960
2          5.905512            NaN
3          7.874016      143.30030
4               NaN      158.73264
5          7.874016      187.39270


In [35]:
# Step 2: Creating a sample DataFrame
data = {
    'User_ID': [1, 2, 3, 4, 5, 6],
    'Screen_Size_cm': ['15', np.nan, '15', '20', '15', '20'],
    'Weight_kg': [70, 80, np.nan, 65, 72, 85]
}

df = pd.DataFrame(data)

# Step 3: Replace missing values

# For the categorical column 'Screen_Size_cm'
most_frequent_value = df['Screen_Size_cm'].mode()[0]
df['Screen_Size_cm'].fillna(most_frequent_value, inplace=True)

# For the continuous column 'Weight_kg'
mean_weight = df['Weight_kg'].mean()
df['Weight_kg'].fillna(mean_weight, inplace=True)

# Display the resulting DataFrame
print(df)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Screen_Size_cm'].fillna(most_frequent_value, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Weight_kg'].fillna(mean_weight, inplace=True)


   User_ID Screen_Size_cm  Weight_kg
0        1             15       70.0
1        2             15       80.0
2        3             15       74.4
3        4             20       65.0
4        5             15       72.0
5        6             20       85.0


In [36]:
# Step 2: Creating a sample DataFrame
data = {
    'Screen_Size_cm': ['15', '15', '15', '20', np.nan, '20'],
    'Weight_kg': [70, 80, np.nan, 65, 72, 85]
}

df = pd.DataFrame(data)

# Step 3: Change data types

# Convert 'Screen_Size_cm' to float, converting strings into numeric values (e.g., '15' to 15.0)
df['Screen_Size_cm'] = df['Screen_Size_cm'].astype('float64')

# Convert 'Weight_kg' to float directly (already numeric)
df['Weight_kg'] = df['Weight_kg'].astype('float64')

# Display the resulting DataFrame with new data types
print(df)

   Screen_Size_cm  Weight_kg
0            15.0       70.0
1            15.0       80.0
2            15.0        NaN
3            20.0       65.0
4             NaN       72.0
5            20.0       85.0


In [37]:
# Step 2: Creating a sample DataFrame
data = {
    'Screen_Size_cm': ['15', '15', '15', '20', np.nan, '20'],
    'Weight_kg': [70, 80, np.nan, 65, 72, 85]
}

df = pd.DataFrame(data)

# Step 3: Convert to numeric and then apply conversion functions

# Convert 'Screen_Size_cm' column to float first (equal to '15', '20', NaN as floats)
df['Screen_Size_cm'] = pd.to_numeric(df['Screen_Size_cm'], errors='coerce')

# Function to convert centimeters to inches
def cm_to_inch(cm):
    return cm / 2.54

# Function to convert kilograms to pounds
def kg_to_pounds(kg):
    return kg * 2.20462

# Apply conversions and rename columns
df['Screen_Size_inch'] = df['Screen_Size_cm'].apply(cm_to_inch)
df['Weight_pounds'] = df['Weight_kg'].apply(kg_to_pounds)

# Drop the original columns after transformation
df.drop(columns=['Screen_Size_cm', 'Weight_kg'], inplace=True)

# Display the resulting DataFrame
print(df)

   Screen_Size_inch  Weight_pounds
0          5.905512      154.32340
1          5.905512      176.36960
2          5.905512            NaN
3          7.874016      143.30030
4               NaN      158.73264
5          7.874016      187.39270


In [38]:
# Assume df is already defined
# Example DataFrame creation for demonstration
data = {
    'CPU_frequency': [2000, 2500, 3000, 2200, 1800, 2400]
}

df = pd.DataFrame(data)

# Step 2: Find the maximum value of 'CPU_frequency'
max_cpu_freq = df['CPU_frequency'].max()

# Step 3: Normalize the 'CPU_frequency' column
df['CPU_frequency'] = df['CPU_frequency'] / max_cpu_freq

# Display the updated DataFrame
print(df)


   CPU_frequency
0       0.666667
1       0.833333
2       1.000000
3       0.733333
4       0.600000
5       0.800000


In [39]:
# Sample DataFrame creation for demonstration
data = {
    'Screen': ['A', 'B', 'A', 'C', 'B', 'D'],
    'Value1': [1, 2, 3, 4, 5, 6]
}

df = pd.DataFrame(data)

# Step 1: Create indicator variables
# Create a new DataFrame df1
df1 = pd.get_dummies(df, columns=['Screen'], prefix='Screen')

# Append df1 to df
df = pd.concat([df, df1], axis=1)

# Drop the original 'Screen' column
df.drop(columns='Screen', inplace=True)

# Display the updated DataFrame
print(df)

   Value1  Value1  Screen_A  Screen_B  Screen_C  Screen_D
0       1       1      True     False     False     False
1       2       2     False      True     False     False
2       3       3      True     False     False     False
3       4       4     False     False      True     False
4       5       5     False      True     False     False
5       6       6     False     False     False      True


In [41]:
import pandas as pd

def convert_price_usd_to_eur(df, exchange_rate):
    """
    Convert prices from USD to Euros.
    
    Args:
        df (DataFrame): The input DataFrame containing a 'Price_USD' column.
        exchange_rate (float): The exchange rate from USD to Euros.
    
    Returns:
        DataFrame: The DataFrame with the 'Price_USD' column converted to 'Price_EUR'.
    """
    # Check if 'Price_USD' column exists in the DataFrame
    if 'Price_USD' not in df.columns:
        raise KeyError("'Price_USD' column not found in the DataFrame")
    
    # Convert 'Price_USD' to 'Price_EUR'
    df['Price_EUR'] = df['Price_USD'] * exchange_rate
    return df

# Sample DataFrame creation
data = {
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Price_USD': [800, 450, 250]
}

df = pd.DataFrame(data)

# Exchange rate (for demonstration, a simplified rate of 1.1 is used)
exchange_rate = 1.1

try:
    # Convert prices
    df = convert_price_usd_to_eur(df, exchange_rate)
    # Print the updated DataFrame
    print(df)
except KeyError as e:
    print(e)

  Product  Price_USD  Price_EUR
0  Laptop        800      880.0
1   Phone        450      495.0
2  Tablet        250      275.0


In [42]:
def minmax_normalize(df, column_name):
    """
    Normaliza los valores de una columna específica en un DataFrame utilizando la escala min-max.
    
    Args:
    df (DataFrame): El DataFrame original.
    column_name (str): Nombre de la columna que se normalizará.
    
    Returns:
    DataFrame: El DataFrame con la columna normalizada.
    """
    # Comprobar si la columna existe en el DataFrame
    if column_name not in df.columns:
        raise ValueError(f"Column '{column_name}' not found in the DataFrame")
    
    # Obtener los mínimos y máximos de la columna
    min_val = df[column_name].min()
    max_val = df[column_name].max()
    
    # Aplicar la escala min-max y reemplazar los valores originales
    df[column_name] = (df[column_name] - min_val) / (max_val - min_val)
    return df

# Demostrar con un DataFrame sampleado
data = {
    'CPU_frequency': [2000, 2500, 3000, 2200, 1800, 2400],
    'Other_Value': [1, 2, 3, 4, 5, 6]
}

df = pd.DataFrame(data)

# Normalizar la columna 'CPU_frequency'
df = minmax_normalize(df, 'CPU_frequency')

# Imprimir el DataFrame con las normalizaciones
print(df)

   CPU_frequency  Other_Value
0       0.166667            1
1       0.583333            2
2       1.000000            3
3       0.333333            4
4       0.000000            5
5       0.500000            6


## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright © 2023 IBM Corporation. All rights reserved.
