# **Hands-on Practice Lab: Importing Dataset - Laptops Pricing**

In this lab, you will practice the process of loading and drawing basic insights on a dataset as learnt through the module. You are being provided with a fresh dataset on 'Laptop Pricing' which will be used for all the practice labs throughout the course.


# Objectives

After completing this lab you will be able to:

 - Import a dataset from a CSV file to a Pandas dataframe
 - Develop some basic insights about the dataset


# Setup


For this lab, we will be using the following libraries:

* `skillsnetwork` for downloading the daataset

*   [`pandas`](https://pandas.pydata.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for managing the data.
*   [`numpy`](https://numpy.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for mathematical operations.


### Importing Required Libraries


In [1]:
import pandas as pd
import numpy as np

The data set to be used is available on the link below.


The functions below will download the dataset into your browser:


In [None]:
# from pyodide.http import pyfetch

# async def download(url, filename):
#     response = await pyfetch(url)
#     if response.status == 200:
#         with open(filename, "wb") as f:
#             f.write(await response.bytes())

In [None]:
# file_path = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_base.csv"

To obtain the dataset, utilize the download() function as defined above:


In [None]:
# await download(file_path, "laptops.csv")
# file_name="laptops.csv"

In [None]:
# df = pd.read_csv(file_name)

> Note: This version of the lab is working on JupyterLite, which requires the dataset to be downloaded to the interface. While working on the downloaded version of this notebook on their local machines, the learners can simply **skip the steps above**, and simply use the URL directly in the `pandas.read_csv()` function. You can uncomment and run the statements in the cell below.


In [4]:
filepath = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_base.csv"
file_name="laptops.csv"
df = pd.read_csv(filepath, header=None)
df.to_csv(file_name)

<h1> Task #1: </h1>
<h3>Load the dataset to a pandas dataframe named 'df'</h3>
Print the first 5 entries of the dataset to confirm loading.


In [5]:
# Write your code below and press Shift+Enter to execute.
df = pd.read_csv(file_name, header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
0,,0,1,2,3,4,5,6.0,7.0,8,9,10.0,11
1,0.0,Acer,4,IPS Panel,2,1,5,35.56,1.6,8,256,1.6,978
2,1.0,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.2,634
3,2.0,Dell,3,Full HD,1,1,7,39.624,2.7,8,256,2.2,946
4,3.0,Dell,4,IPS Panel,2,1,5,33.782,1.6,8,128,1.22,1244


<details><summary>Click here for solution</summary>

```python
df = pd.read_csv(file_name, header=None)
print(df.head())
```
</details>


<h1> Task #2: </h1>
<h3>Add headers to the dataframe</h3>
The headers for the dataset, in sequence, are "Manufacturer", "Category", "Screen", "GPU", "OS", "CPU_core",
"Screen_Size_inch", "CPU_frequency", "RAM_GB", "Storage_GB_SSD", "Weight_kg" and "Price".

Confirm insertion by printing the first 10 rows of the dataset.


In [10]:
# Write your code below and press Shift+Enter to execute.
header = ["symboling","Manufacturer", "Category", "Screen", "GPU", "OS", "CPU_core","Screen_Size_inch", "CPU_frequency", "RAM_GB", "Storage_GB_SSD", "Weight_kg","Price"]
df.columns = header

<details><summary>Click here for solution</summary>

```python
# create headers list
headers = ["Manufacturer", "Category", "Screen", "GPU", "OS", "CPU_core", "Screen_Size_inch", "CPU_frequency", "RAM_GB", "Storage_GB_SSD", "Weight_kg", "Price"]
df.columns = headers
print(df.head(10))
```
</details>


<h1> Task #3: </h1>
<h3>Replace '?' with 'NaN'</h3>
Replace the '?' entries in the dataset with NaN value, recevied from the Numpy package.


In [11]:
df.head(10)

Unnamed: 0,symboling,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_inch,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
0,,0,1,2,3,4,5,6.0,7.0,8,9,10.0,11
1,0.0,Acer,4,IPS Panel,2,1,5,35.56,1.6,8,256,1.6,978
2,1.0,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.2,634
3,2.0,Dell,3,Full HD,1,1,7,39.624,2.7,8,256,2.2,946
4,3.0,Dell,4,IPS Panel,2,1,5,33.782,1.6,8,128,1.22,1244
5,4.0,HP,4,Full HD,2,1,7,39.624,1.8,8,256,1.91,837
6,5.0,Dell,3,Full HD,1,1,5,39.624,1.6,8,256,2.2,1016
7,6.0,HP,3,Full HD,3,1,5,39.624,1.6,8,256,2.1,1117
8,7.0,Acer,3,IPS Panel,2,1,5,38.1,1.6,4,256,2.2,866
9,8.0,Dell,3,Full HD,1,1,5,39.624,2.5,4,256,2.3,812


In [12]:
df.tail(10)

Unnamed: 0,symboling,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_inch,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
229,228.0,HP,2,Full HD,2,1,5,31.75,2.3,8,256,1.26,2120
230,229.0,Dell,4,Full HD,2,1,5,35.56,2.5,8,256,1.36,2082
231,230.0,Dell,4,Full HD,2,1,5,?,2.5,8,256,1.36,1870
232,231.0,Dell,4,Full HD,2,1,7,35.56,2.8,8,256,1.36,2255
233,232.0,Toshiba,3,Full HD,2,1,5,33.782,2.3,8,256,1.2,1855
234,233.0,Lenovo,4,IPS Panel,2,1,7,35.56,2.6,8,256,1.7,1891
235,234.0,Toshiba,3,Full HD,2,1,5,33.782,2.4,8,256,1.2,1950
236,235.0,Lenovo,4,IPS Panel,2,1,5,30.48,2.6,8,256,1.36,2236
237,236.0,Lenovo,3,Full HD,3,1,5,39.624,2.5,6,256,2.4,883
238,237.0,Toshiba,3,Full HD,2,1,5,35.56,2.3,8,256,1.95,1499


In [13]:
# Write your code below and press Shift+Enter to execute.
df = df.replace("?",np.NaN)

<details><summary>Click here for solution</summary>

```python
df.replace('?',np.nan, inplace = True)
```
</details>


<h1> Task #4: </h1>
<h3>Print the data types of the dataframe columns</h3>
Make a note of the data types of the different columns of the dataset.


In [14]:
# Write your code below and press Shift+Enter to execute.
df.dtypes

symboling           float64
Manufacturer         object
Category              int64
Screen               object
GPU                   int64
OS                    int64
CPU_core              int64
Screen_Size_inch     object
CPU_frequency       float64
RAM_GB                int64
Storage_GB_SSD        int64
Weight_kg            object
Price                 int64
dtype: object

<details><summary>Click here for solution</summary>

```python
print(df.dtypes)
```
</details>


<h1> Task #5: </h1>
<h3>Print the statistical description of the dataset, including that of 'object' data types.</h3>


In [15]:
# Write your code below and press Shift+Enter to execute.
df.describe(include='all')

Unnamed: 0,symboling,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_inch,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
count,238.0,239,239.0,239,239.0,239.0,239.0,235.0,239.0,239.0,239.0,234.0,239.0
unique,,12,,3,,,,10.0,,,,78.0,
top,,Dell,,Full HD,,,,39.624,,,,2.2,
freq,,71,,161,,,,89.0,,,,21.0,
mean,118.5,,3.196653,,2.154812,1.07113,5.627615,,2.379498,7.882845,244.790795,,1456.271967
std,68.848868,,0.787927,,0.639301,0.302585,1.239846,,0.508539,2.477393,37.922718,,581.033661
min,0.0,,1.0,,1.0,1.0,3.0,,1.2,4.0,9.0,,11.0
25%,59.25,,3.0,,2.0,1.0,5.0,,2.0,8.0,256.0,,1061.5
50%,118.5,,3.0,,2.0,1.0,5.0,,2.5,8.0,256.0,,1333.0
75%,177.75,,4.0,,3.0,1.0,7.0,,2.7,8.0,256.0,,1777.0


<details><summary>Click here for solution</summary>

```python
print(df.describe(include='all'))
```
</details>


<h1> Task #6: </h1>
<h3>Print the summary information of the dataset.</h3>


In [16]:
# Write your code below and press Shift+Enter to execute.
df.info

<bound method DataFrame.info of      symboling Manufacturer  Category     Screen  GPU  OS  CPU_core  \
0          NaN            0         1          2    3   4         5   
1          0.0         Acer         4  IPS Panel    2   1         5   
2          1.0         Dell         3    Full HD    1   1         3   
3          2.0         Dell         3    Full HD    1   1         7   
4          3.0         Dell         4  IPS Panel    2   1         5   
..         ...          ...       ...        ...  ...  ..       ...   
234      233.0       Lenovo         4  IPS Panel    2   1         7   
235      234.0      Toshiba         3    Full HD    2   1         5   
236      235.0       Lenovo         4  IPS Panel    2   1         5   
237      236.0       Lenovo         3    Full HD    3   1         5   
238      237.0      Toshiba         3    Full HD    2   1         5   

    Screen_Size_inch  CPU_frequency  RAM_GB  Storage_GB_SSD Weight_kg  Price  
0                  6            7.0 

<details><summary>Click here for solution</summary>

```python
print(df.info())
```
</details>


---


#### Author
[Ahmad Mubarak](https://www.linkedin.com/in/ahmad-mubarak-19861a177/)

<!--## Change Log


<!--|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-09-15|0.1|Abhishek Gagneja|Initial Version Created|
|2023-09-18|0.2|Vicky Kuo|Reviewed and Revised|--!>
