<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

Follow the instructions below to set up this environment for further use.


# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


In [1]:
%pip install seaborn
import piplite

await piplite.install(['nbformat', 'plotly'])

### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [2]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod1.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [3]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

path = URL

await download(path, "laptop_pricing_dataset_mod1.csv")

In [4]:
import pandas as pd

# 1. Read the CSV file into a pandas data used_car_price_analframe
file_path = ("laptop_pricing_dataset_mod1.csv")
df = pd.read_csv(file_path)

# Assuming the first row of the file can be used as the headers for the data

# 2. Print the first 5 rows of the dataframe
print(df.head())

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


   Unnamed: 0 Manufacturer  Category     Screen  GPU  OS  CPU_core  \
0           0         Acer         4  IPS Panel    2   1         5   
1           1         Dell         3    Full HD    1   1         3   
2           2         Dell         3    Full HD    1   1         7   
3           3         Dell         4  IPS Panel    2   1         5   
4           4           HP         4    Full HD    2   1         7   

   Screen_Size_cm  CPU_frequency  RAM_GB  Storage_GB_SSD  Weight_kg  Price  
0          35.560            1.6       8             256       1.60    978  
1          39.624            2.0       4             256       2.20    634  
2          39.624            2.7       8             256       2.20    946  
3          33.782            1.6       8             128       1.22   1244  
4          39.624            1.8       8             256       1.91    837  


---


# Test Environment


In [5]:
import pandas as pd

# Assuming 'data' is the Pandas data frame you want to check for missing values
missing_values = df.isnull().sum()
columns_with_missing_values = missing_values[missing_values > 0].index.tolist()

print("Columns with missing values:")
print(columns_with_missing_values)

Columns with missing values:
['Screen_Size_cm', 'Weight_kg']


In [6]:
import pandas as pd


# Replace missing values in 'Screen_Size_cm' with the most frequent value
most_frequent_screen_size = df['Screen_Size_cm'].mode()[0]
df['Screen_Size_cm'].fillna(most_frequent_screen_size, inplace=True)

# Replace missing values in 'Weight_kg' with the mean value
mean_weight = df['Weight_kg'].mean()
df['Weight_kg'].fillna(mean_weight, inplace=True)

print("Data frame with missing values replaced:")
print(df)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Screen_Size_cm'].fillna(most_frequent_screen_size, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Weight_kg'].fillna(mean_weight, inplace=True)


Data frame with missing values replaced:
     Unnamed: 0 Manufacturer  Category     Screen  GPU  OS  CPU_core  \
0             0         Acer         4  IPS Panel    2   1         5   
1             1         Dell         3    Full HD    1   1         3   
2             2         Dell         3    Full HD    1   1         7   
3             3         Dell         4  IPS Panel    2   1         5   
4             4           HP         4    Full HD    2   1         7   
..          ...          ...       ...        ...  ...  ..       ...   
233         233       Lenovo         4  IPS Panel    2   1         7   
234         234      Toshiba         3    Full HD    2   1         5   
235         235       Lenovo         4  IPS Panel    2   1         5   
236         236       Lenovo         3    Full HD    3   1         5   
237         237      Toshiba         3    Full HD    2   1         5   

     Screen_Size_cm  CPU_frequency  RAM_GB  Storage_GB_SSD  Weight_kg  Price  
0            35

In [7]:
import pandas as pd

# Assuming 'df' is the Pandas data frame you want to change the data types in

# Change the data type of 'Screen_Size_cm' to float
df['Screen_Size_cm'] = df['Screen_Size_cm'].astype(float)

# Change the data type of 'Weight_kg' to float
df['Weight_kg'] = df['Weight_kg'].astype(float)

# Display the data frame with updated data types
print(df.dtypes)


Unnamed: 0          int64
Manufacturer       object
Category            int64
Screen             object
GPU                 int64
OS                  int64
CPU_core            int64
Screen_Size_cm    float64
CPU_frequency     float64
RAM_GB              int64
Storage_GB_SSD      int64
Weight_kg         float64
Price               int64
dtype: object


In [8]:
import pandas as pd

# Assuming 'df' is the Pandas data frame you want to modify

# Convert 'Screen_Size_cm' from centimeters to inches
df['Screen_Size_inch'] = df['Screen_Size_cm'] * 0.393701

# Convert 'Weight_kg' from kilograms to pounds
df['Weight_pounds'] = df['Weight_kg'] * 2.20462

# Drop the original columns 'Screen_Size_cm' and 'Weight_kg'
df.drop(['Screen_Size_cm', 'Weight_kg'], axis=1, inplace=True)

# Display the data frame with modified attributes
print(df)



     Unnamed: 0 Manufacturer  Category     Screen  GPU  OS  CPU_core  \
0             0         Acer         4  IPS Panel    2   1         5   
1             1         Dell         3    Full HD    1   1         3   
2             2         Dell         3    Full HD    1   1         7   
3             3         Dell         4  IPS Panel    2   1         5   
4             4           HP         4    Full HD    2   1         7   
..          ...          ...       ...        ...  ...  ..       ...   
233         233       Lenovo         4  IPS Panel    2   1         7   
234         234      Toshiba         3    Full HD    2   1         5   
235         235       Lenovo         4  IPS Panel    2   1         5   
236         236       Lenovo         3    Full HD    3   1         5   
237         237      Toshiba         3    Full HD    2   1         5   

     CPU_frequency  RAM_GB  Storage_GB_SSD  Price  Screen_Size_inch  \
0              1.6       8             256    978         14.000

In [9]:
import pandas as pd

# Assuming 'df' is the Pandas data frame you want to normalize

# Find the maximum value in the 'CPU_frequency' column
max_cpu_frequency = df['CPU_frequency'].max()

# Normalize the 'CPU_frequency' column by dividing each value by the maximum value
df['CPU_frequency'] = df['CPU_frequency'] / max_cpu_frequency

# Display the data frame with the normalized 'CPU_frequency' values
print(df)

     Unnamed: 0 Manufacturer  Category     Screen  GPU  OS  CPU_core  \
0             0         Acer         4  IPS Panel    2   1         5   
1             1         Dell         3    Full HD    1   1         3   
2             2         Dell         3    Full HD    1   1         7   
3             3         Dell         4  IPS Panel    2   1         5   
4             4           HP         4    Full HD    2   1         7   
..          ...          ...       ...        ...  ...  ..       ...   
233         233       Lenovo         4  IPS Panel    2   1         7   
234         234      Toshiba         3    Full HD    2   1         5   
235         235       Lenovo         4  IPS Panel    2   1         5   
236         236       Lenovo         3    Full HD    3   1         5   
237         237      Toshiba         3    Full HD    2   1         5   

     CPU_frequency  RAM_GB  Storage_GB_SSD  Price  Screen_Size_inch  \
0         0.551724       8             256    978         14.000

In [10]:
import pandas as pd

# Assuming 'df' is the Pandas data frame you want to modify

# Convert the 'Screen' attribute into indicator variables and save as df1
df1 = pd.get_dummies(df['Screen'], prefix='Screen')

# Append df1 into the original data frame df
df = pd.concat([df, df1], axis=1)

# Drop the original 'Screen' attribute from the data frame df
df.drop(['Screen'], axis=1, inplace=True)

# Display the modified data frame df
print(df)



     Unnamed: 0 Manufacturer  Category  GPU  OS  CPU_core  CPU_frequency  \
0             0         Acer         4    2   1         5       0.551724   
1             1         Dell         3    1   1         3       0.689655   
2             2         Dell         3    1   1         7       0.931034   
3             3         Dell         4    2   1         5       0.551724   
4             4           HP         4    2   1         7       0.620690   
..          ...          ...       ...  ...  ..       ...            ...   
233         233       Lenovo         4    2   1         7       0.896552   
234         234      Toshiba         3    2   1         5       0.827586   
235         235       Lenovo         4    2   1         5       0.896552   
236         236       Lenovo         3    3   1         5       0.862069   
237         237      Toshiba         3    2   1         5       0.793103   

     RAM_GB  Storage_GB_SSD  Price  Screen_Size_inch  Weight_pounds  \
0         8     

In [11]:
import pandas as pd

# Create a sample DataFrame (replace this with your actual DataFrame)
# Define the current exchange rate from USD to Euros
exchange_rate = 0.82

# Convert 'Price' values from USD to Euros
df['Price'] = df['Price'] * exchange_rate

# Display the updated DataFrame with prices in Euros
print(df)

     Unnamed: 0 Manufacturer  Category  GPU  OS  CPU_core  CPU_frequency  \
0             0         Acer         4    2   1         5       0.551724   
1             1         Dell         3    1   1         3       0.689655   
2             2         Dell         3    1   1         7       0.931034   
3             3         Dell         4    2   1         5       0.551724   
4             4           HP         4    2   1         7       0.620690   
..          ...          ...       ...  ...  ..       ...            ...   
233         233       Lenovo         4    2   1         7       0.896552   
234         234      Toshiba         3    2   1         5       0.827586   
235         235       Lenovo         4    2   1         5       0.896552   
236         236       Lenovo         3    3   1         5       0.862069   
237         237      Toshiba         3    2   1         5       0.793103   

     RAM_GB  Storage_GB_SSD    Price  Screen_Size_inch  Weight_pounds  \
0         8   

In [12]:
# Import the necessary library
import pandas as pd


# Perform min-max normalization on the 'CPU_frequency' parameter
df['CPU_frequency'] = (df['CPU_frequency'] - df['CPU_frequency'].min()) / (df['CPU_frequency'].max() - df['CPU_frequency'].min())

# Display the updated DataFrame with min-max normalized values
print(df)

     Unnamed: 0 Manufacturer  Category  GPU  OS  CPU_core  CPU_frequency  \
0             0         Acer         4    2   1         5       0.235294   
1             1         Dell         3    1   1         3       0.470588   
2             2         Dell         3    1   1         7       0.882353   
3             3         Dell         4    2   1         5       0.235294   
4             4           HP         4    2   1         7       0.352941   
..          ...          ...       ...  ...  ..       ...            ...   
233         233       Lenovo         4    2   1         7       0.823529   
234         234      Toshiba         3    2   1         5       0.705882   
235         235       Lenovo         4    2   1         5       0.823529   
236         236       Lenovo         3    3   1         5       0.764706   
237         237      Toshiba         3    2   1         5       0.647059   

     RAM_GB  Storage_GB_SSD    Price  Screen_Size_inch  Weight_pounds  \
0         8   

## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright © 2023 IBM Corporation. All rights reserved.
