<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# **Hands-on Practice Lab: Importing Dataset - Laptops Pricing**

Estimated time needed: **20** minutes

In this lab, you will practice the process of loading and drawing basic insights on a dataset as learnt through the module. You are being provided with a fresh dataset on 'Laptop Pricing' which will be used for all the practice labs throughout the course.


# Objectives

After completing this lab you will be able to:

 - Import a dataset from a CSV file to a Pandas dataframe
 - Develop some basic insights about the dataset


# Setup


For this lab, we will be using the following libraries:

* `skillsnetwork` for downloading the daataset

*   [`pandas`](https://pandas.pydata.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for managing the data.
*   [`numpy`](https://numpy.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for mathematical operations.


### Importing Required Libraries


In [1]:
import pandas as pd
import numpy as np

The data set to be used is available on the link below.


The functions below will download the dataset into your browser:


In [2]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

In [3]:
file_path = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_base.csv"

To obtain the dataset, utilize the download() function as defined above:


In [4]:
await download(file_path, "laptops.csv")
file_name="laptops.csv"

In [5]:
df = pd.read_csv(file_name)

> Note: This version of the lab is working on JupyterLite, which requires the dataset to be downloaded to the interface. While working on the downloaded version of this notebook on their local machines, the learners can simply **skip the steps above**, and simply use the URL directly in the `pandas.read_csv()` function. You can uncomment and run the statements in the cell below.


In [6]:
#filepath = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_base.csv"
#df = pd.read_csv(filepath, header=None)

<h1> Task #1: </h1>
<h3>Load the dataset to a pandas dataframe named 'df'</h3>
Print the first 5 entries of the dataset to confirm loading.


In [7]:
# Write your code below and press Shift+Enter to execute.
print(df.head())

   Acer  4  IPS Panel  2  1  5   35.56  1.6  8  256 1.6.1   978
0  Dell  3    Full HD  1  1  3  39.624  2.0  4  256   2.2   634
1  Dell  3    Full HD  1  1  7  39.624  2.7  8  256   2.2   946
2  Dell  4  IPS Panel  2  1  5  33.782  1.6  8  128  1.22  1244
3    HP  4    Full HD  2  1  7  39.624  1.8  8  256  1.91   837
4  Dell  3    Full HD  1  1  5  39.624  1.6  8  256   2.2  1016


<details><summary>Click here for solution</summary>

```python
df = pd.read_csv(path, header=None)
print(df.head())
```
</details>


<h1> Task #2: </h1>
<h3>Add headers to the dataframe</h3>
The headers for the dataset, in sequence, are "Manufacturer", "Category", "Screen", "GPU", "OS", "CPU_core",
"Screen_Size_inch", "CPU_frequency", "RAM_GB", "Storage_GB_SSD", "Weight_kg" and "Price".

Confirm insertion by printing the first 10 rows of the dataset.


In [8]:
# Write your code below and press Shift+Enter to execute.
headers = ["Manufacturer", "Category", "Screen", "GPU", "OS", "CPU_core", "Screen_Size_inch", "CPU_frequency", "RAM_GB", "Storage_GB_SSD", "Weight_kg", "Price"]
df.columns = headers
print(df.head(10))

  Manufacturer  Category     Screen  GPU  OS  CPU_core Screen_Size_inch  \
0         Dell         3    Full HD    1   1         3           39.624   
1         Dell         3    Full HD    1   1         7           39.624   
2         Dell         4  IPS Panel    2   1         5           33.782   
3           HP         4    Full HD    2   1         7           39.624   
4         Dell         3    Full HD    1   1         5           39.624   
5           HP         3    Full HD    3   1         5           39.624   
6         Acer         3  IPS Panel    2   1         5             38.1   
7         Dell         3    Full HD    1   1         5           39.624   
8         Acer         3  IPS Panel    3   1         7             38.1   
9         Dell         3    Full HD    1   1         7           39.624   

   CPU_frequency  RAM_GB  Storage_GB_SSD Weight_kg  Price  
0            2.0       4             256       2.2    634  
1            2.7       8             256       2.2    

<details><summary>Click here for solution</summary>

```python
# create headers list
headers = ["Manufacturer", "Category", "Screen", "GPU", "OS", "CPU_core", "Screen_Size_inch", "CPU_frequency", "RAM_GB", "Storage_GB_SSD", "Weight_kg", "Price"]
df.columns = headers
print(df.head(10))
```
</details>


<h1> Task #3: </h1>
<h3>Replace '?' with 'NaN'</h3>
Replace the '?' entries in the dataset with NaN value, recevied from the Numpy package.


In [9]:
# Write your code below and press Shift+Enter to execute.
df.replace('?', np.nan, inplace=True)

<details><summary>Click here for solution</summary>

```python
df.replace('?',np.nan, inplace = True)
```
</details>


<h1> Task #4: </h1>
<h3>Print the data types of the dataframe columns</h3>
Make a note of the data types of the different columns of the dataset.


In [10]:
# Write your code below and press Shift+Enter to execute.
print(df.dtypes)

Manufacturer         object
Category              int64
Screen               object
GPU                   int64
OS                    int64
CPU_core              int64
Screen_Size_inch     object
CPU_frequency       float64
RAM_GB                int64
Storage_GB_SSD        int64
Weight_kg            object
Price                 int64
dtype: object


<details><summary>Click here for solution</summary>

```python
print(df.dtypes)
```
</details>


<h1> Task #5: </h1>
<h3>Print the statistical description of the dataset, including that of 'object' data types.</h3>


In [11]:
# Write your code below and press Shift+Enter to execute.
print(df.describe(include='all'))

       Manufacturer    Category   Screen         GPU          OS    CPU_core  \
count           237  237.000000      237  237.000000  237.000000  237.000000   
unique           11         NaN        2         NaN         NaN         NaN   
top            Dell         NaN  Full HD         NaN         NaN         NaN   
freq             71         NaN      161         NaN         NaN         NaN   
mean            NaN    3.202532      NaN    2.151899    1.059072    5.632911   
std             NaN    0.776450      NaN    0.639556    0.236258    1.243736   
min             NaN    1.000000      NaN    1.000000    1.000000    3.000000   
25%             NaN    3.000000      NaN    2.000000    1.000000    5.000000   
50%             NaN    3.000000      NaN    2.000000    1.000000    5.000000   
75%             NaN    4.000000      NaN    3.000000    1.000000    7.000000   
max             NaN    5.000000      NaN    3.000000    2.000000    7.000000   

       Screen_Size_inch  CPU_frequency 

<details><summary>Click here for solution</summary>

```python
print(df.describe(include='all'))
```
</details>


<h1> Task #6: </h1>
<h3>Print the summary information of the dataset.</h3>


In [12]:
# Write your code below and press Shift+Enter to execute.
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 237 entries, 0 to 236
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Manufacturer      237 non-null    object 
 1   Category          237 non-null    int64  
 2   Screen            237 non-null    object 
 3   GPU               237 non-null    int64  
 4   OS                237 non-null    int64  
 5   CPU_core          237 non-null    int64  
 6   Screen_Size_inch  233 non-null    object 
 7   CPU_frequency     237 non-null    float64
 8   RAM_GB            237 non-null    int64  
 9   Storage_GB_SSD    237 non-null    int64  
 10  Weight_kg         232 non-null    object 
 11  Price             237 non-null    int64  
dtypes: float64(1), int64(7), object(4)
memory usage: 18.6+ KB
None


<details><summary>Click here for solution</summary>

```python
print(df.info())
```
</details>


---


# Congratulations! You have completed the lab


## Authors


[Abhishek Gagneja](https://www.coursera.org/instructor/~129186572)

[Vicky Kuo](https://author.skills.network/instructors/vicky_kuo)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-09-15|0.1|Abhishek Gagneja|Initial Version Created|
|2023-09-18|0.2|Vicky Kuo|Reviewed and Revised|


Copyright © 2023 IBM Corporation. All rights reserved.
