<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# **Hands-on Practice Lab: Importing Dataset - Laptops Pricing**

Estimated time needed: **20** minutes

In this lab, you will practice the process of loading and drawing basic insights on a dataset as learnt through the module. You are being provided with a fresh dataset on 'Laptop Pricing' which will be used for all the practice labs throughout the course.


# Objectives

After completing this lab you will be able to:

 - Import a dataset from a CSV file to a Pandas dataframe
 - Develop some basic insights about the dataset


# Setup


For this lab, we will be using the following libraries:

* `skillsnetwork` for downloading the daataset

*   [`pandas`](https://pandas.pydata.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for managing the data.
*   [`numpy`](https://numpy.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for mathematical operations.


### Importing Required Libraries


In [2]:
import pandas as pd
import numpy as np

The data set to be used is available on the link below.


The functions below will download the dataset into your browser:


In [3]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

In [4]:
file_path = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_base.csv"

To obtain the dataset, utilize the download() function as defined above:


In [5]:
await download(file_path, "laptops.csv")
file_name="laptops.csv"

In [6]:
df = pd.read_csv(file_name)

> Note: This version of the lab is working on JupyterLite, which requires the dataset to be downloaded to the interface. While working on the downloaded version of this notebook on their local machines, the learners can simply **skip the steps above**, and simply use the URL directly in the `pandas.read_csv()` function. You can uncomment and run the statements in the cell below.


In [None]:
#filepath = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_base.csv"
#df = pd.read_csv(filepath, header=None)

<h1> Task #1: </h1>
<h3>Load the dataset to a pandas dataframe named 'df'</h3>
Print the first 5 entries of the dataset to confirm loading.


In [7]:
# Write your code below and press Shift+Enter to execute.
df = pd.read_csv(file_name, header=None)
print(df.head(5))

     0   1          2   3   4   5       6    7   8    9     10    11
0  Acer   4  IPS Panel   2   1   5   35.56  1.6   8  256   1.6   978
1  Dell   3    Full HD   1   1   3  39.624  2.0   4  256   2.2   634
2  Dell   3    Full HD   1   1   7  39.624  2.7   8  256   2.2   946
3  Dell   4  IPS Panel   2   1   5  33.782  1.6   8  128  1.22  1244
4    HP   4    Full HD   2   1   7  39.624  1.8   8  256  1.91   837


<details><summary>Click here for solution</summary>

```python
df = pd.read_csv(file_name, header=None)
print(df.head())
```
</details>


<h1> Task #2: </h1>
<h3>Add headers to the dataframe</h3>
The headers for the dataset, in sequence, are "Manufacturer", "Category", "Screen", "GPU", "OS", "CPU_core",
"Screen_Size_inch", "CPU_frequency", "RAM_GB", "Storage_GB_SSD", "Weight_kg" and "Price".

Confirm insertion by printing the first 10 rows of the dataset.


In [11]:
# Write your code below and press Shift+Enter to execute.
headers=[ "Manufacturer", "Category", "Screen", "GPU", "OS", "CPU_core", "Screen_Size_inch", "CPU_frequency", "RAM_GB", "Storage_GB_SSD", "Weight_kg","Price"]
df.columns=headers
print(df.head(10))

  Manufacturer  Category     Screen  GPU  OS  CPU_core Screen_Size_inch  \
0         Acer         4  IPS Panel    2   1         5            35.56   
1         Dell         3    Full HD    1   1         3           39.624   
2         Dell         3    Full HD    1   1         7           39.624   
3         Dell         4  IPS Panel    2   1         5           33.782   
4           HP         4    Full HD    2   1         7           39.624   
5         Dell         3    Full HD    1   1         5           39.624   
6           HP         3    Full HD    3   1         5           39.624   
7         Acer         3  IPS Panel    2   1         5             38.1   
8         Dell         3    Full HD    1   1         5           39.624   
9         Acer         3  IPS Panel    3   1         7             38.1   

   CPU_frequency  RAM_GB  Storage_GB_SSD Weight_kg  Price  
0            1.6       8             256       1.6    978  
1            2.0       4             256       2.2    

<details><summary>Click here for solution</summary>

```python
# create headers list
headers = ["Manufacturer", "Category", "Screen", "GPU", "OS", "CPU_core", "Screen_Size_inch", "CPU_frequency", "RAM_GB", "Storage_GB_SSD", "Weight_kg", "Price"]
df.columns = headers
print(df.head(10))
```
</details>


<h1> Task #3: </h1>
<h3>Replace '?' with 'NaN'</h3>
Replace the '?' entries in the dataset with NaN value, recevied from the Numpy package.


In [14]:
# Write your code below and press Shift+Enter to execute.
df=df.replace('?',np.NaN)
print(df.head(5))

   Manufacturer  Category     Screen  GPU  OS  CPU_core Screen_Size_inch  \
0          Acer         4  IPS Panel    2   1         5            35.56   
1          Dell         3    Full HD    1   1         3           39.624   
2          Dell         3    Full HD    1   1         7           39.624   
3          Dell         4  IPS Panel    2   1         5           33.782   
4            HP         4    Full HD    2   1         7           39.624   
5          Dell         3    Full HD    1   1         5           39.624   
6            HP         3    Full HD    3   1         5           39.624   
7          Acer         3  IPS Panel    2   1         5             38.1   
8          Dell         3    Full HD    1   1         5           39.624   
9          Acer         3  IPS Panel    3   1         7             38.1   
10         Dell         3    Full HD    1   1         7           39.624   
11           HP         3    Full HD    2   1         3           39.624   
12         A

<details><summary>Click here for solution</summary>

```python
df.replace('?',np.nan, inplace = True)
```
</details>


<h1> Task #4: </h1>
<h3>Print the data types of the dataframe columns</h3>
Make a note of the data types of the different columns of the dataset.


In [17]:
# Write your code below and press Shift+Enter to execute.
print(df.dtypes)

Manufacturer         object
Category              int64
Screen               object
GPU                   int64
OS                    int64
CPU_core              int64
Screen_Size_inch     object
CPU_frequency       float64
RAM_GB                int64
Storage_GB_SSD        int64
Weight_kg            object
Price                 int64
dtype: object


<details><summary>Click here for solution</summary>

```python
print(df.dtypes)
```
</details>


<h1> Task #5: </h1>
<h3>Print the statistical description of the dataset, including that of 'object' data types.</h3>


In [18]:
# Write your code below and press Shift+Enter to execute.
df.describe(include='all')

Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_inch,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
count,238,238.0,238,238.0,238.0,238.0,234.0,238.0,238.0,238.0,233.0,238.0
unique,11,,2,,,,9.0,,,,77.0,
top,Dell,,Full HD,,,,39.624,,,,2.2,
freq,71,,161,,,,89.0,,,,21.0,
mean,,3.205882,,2.151261,1.058824,5.630252,,2.360084,7.882353,245.781513,,1462.344538
std,,0.776533,,0.638282,0.23579,1.241787,,0.411393,2.482603,34.765316,,574.607699
min,,1.0,,1.0,1.0,3.0,,1.2,4.0,128.0,,527.0
25%,,3.0,,2.0,1.0,5.0,,2.0,8.0,256.0,,1066.5
50%,,3.0,,2.0,1.0,5.0,,2.5,8.0,256.0,,1333.0
75%,,4.0,,3.0,1.0,7.0,,2.7,8.0,256.0,,1777.0


<details><summary>Click here for solution</summary>

```python
print(df.describe(include='all'))
```
</details>


<h1> Task #6: </h1>
<h3>Print the summary information of the dataset.</h3>


In [19]:
# Write your code below and press Shift+Enter to execute.
print(df.info)

<bound method DataFrame.info of     Manufacturer  Category     Screen  GPU  OS  CPU_core Screen_Size_inch  \
0           Acer         4  IPS Panel    2   1         5            35.56   
1           Dell         3    Full HD    1   1         3           39.624   
2           Dell         3    Full HD    1   1         7           39.624   
3           Dell         4  IPS Panel    2   1         5           33.782   
4             HP         4    Full HD    2   1         7           39.624   
..           ...       ...        ...  ...  ..       ...              ...   
233       Lenovo         4  IPS Panel    2   1         7            35.56   
234      Toshiba         3    Full HD    2   1         5           33.782   
235       Lenovo         4  IPS Panel    2   1         5            30.48   
236       Lenovo         3    Full HD    3   1         5           39.624   
237      Toshiba         3    Full HD    2   1         5            35.56   

     CPU_frequency  RAM_GB  Storage_GB_SSD 

<details><summary>Click here for solution</summary>

```python
print(df.info())
```
</details>


---


# Congratulations! You have completed the lab


## Authors


[Abhishek Gagneja](https://www.coursera.org/instructor/~129186572)

[Vicky Kuo](https://author.skills.network/instructors/vicky_kuo)


Copyright © 2023 IBM Corporation. All rights reserved.


<!--## Change Log


<!--|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-09-15|0.1|Abhishek Gagneja|Initial Version Created|
|2023-09-18|0.2|Vicky Kuo|Reviewed and Revised|--!>
