# **Loading and Drawing Basic Insights on a Dataset - Laptops Pricing**


Practice the process of loading and drawing basic insights on a dataset.


# Objectives

 - Import a dataset from a CSV file to a Pandas dataframe
 - Develop some basic insights about the dataset


# Setup


I am using the following libraries:

* `skillsnetwork` for downloading the daataset

*   [`pandas`](https://pandas.pydata.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for managing the data.
*   [`numpy`](https://numpy.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for mathematical operations.


### Importing Required Libraries


In [1]:

%pip install requests
%pip install pandas
%pip install numpy

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd
import numpy as np

The functions below will download the dataset into your browser:


In [3]:
import requests

def download(url, filename):
    response = requests.get(url)
    if response.status_code == 200:
        with open(filename, "wb") as f:
            f.write(response.content)

In [4]:
file_path = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_base.csv"

To obtain the dataset, utilize the download() function as defined above:


In [5]:
download(file_path, "../data/laptops.csv")
file_name = "../data/laptops.csv"

In [6]:
df = pd.read_csv(file_name)

<h3>Load the dataset to a pandas dataframe named 'df'</h3>
Print the first 5 entries of the dataset to confirm loading.


In [7]:
df = pd.read_csv(file_name, header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,Acer,4,IPS Panel,2,1,5,35.56,1.6,8,256,1.6,978
1,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.2,634
2,Dell,3,Full HD,1,1,7,39.624,2.7,8,256,2.2,946
3,Dell,4,IPS Panel,2,1,5,33.782,1.6,8,128,1.22,1244
4,HP,4,Full HD,2,1,7,39.624,1.8,8,256,1.91,837


<h3>Add headers to the dataframe</h3>
The headers for the dataset, in sequence, are "Manufacturer", "Category", "Screen", "GPU", "OS", "CPU_core",
"Screen_Size_inch", "CPU_frequency", "RAM_GB", "Storage_GB_SSD", "Weight_kg" and "Price".

Confirm insertion by printing the first 10 rows of the dataset.


In [8]:
headers = ["Manufacturer", "Category", "Screen", "GPU", "OS", "CPU_core", "Screen_Size_inch", "CPU_frequency", "RAM_GB", "Storage_GB_SSD", "Weight_kg", "Price"]
df.columns = headers
df.head(10)

Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_inch,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
0,Acer,4,IPS Panel,2,1,5,35.56,1.6,8,256,1.6,978
1,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.2,634
2,Dell,3,Full HD,1,1,7,39.624,2.7,8,256,2.2,946
3,Dell,4,IPS Panel,2,1,5,33.782,1.6,8,128,1.22,1244
4,HP,4,Full HD,2,1,7,39.624,1.8,8,256,1.91,837
5,Dell,3,Full HD,1,1,5,39.624,1.6,8,256,2.2,1016
6,HP,3,Full HD,3,1,5,39.624,1.6,8,256,2.1,1117
7,Acer,3,IPS Panel,2,1,5,38.1,1.6,4,256,2.2,866
8,Dell,3,Full HD,1,1,5,39.624,2.5,4,256,2.3,812
9,Acer,3,IPS Panel,3,1,7,38.1,1.8,8,256,2.2,1068


<h3>Replace '?' with 'NaN'</h3>
Replace the '?' entries in the dataset with NaN value, recevied from the Numpy package.


In [9]:
df.replace('?',np.nan, inplace = True)

<h3>Print the data types of the dataframe columns</h3>
Make a note of the data types of the different columns of the dataset.


In [10]:
df.dtypes

Manufacturer         object
Category              int64
Screen               object
GPU                   int64
OS                    int64
CPU_core              int64
Screen_Size_inch     object
CPU_frequency       float64
RAM_GB                int64
Storage_GB_SSD        int64
Weight_kg            object
Price                 int64
dtype: object

<h3>Print the statistical description of the dataset, including that of 'object' data types.</h3>


In [11]:
df.describe(include='all')

Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_inch,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
count,238,238.0,238,238.0,238.0,238.0,234.0,238.0,238.0,238.0,233.0,238.0
unique,11,,2,,,,9.0,,,,77.0,
top,Dell,,Full HD,,,,39.624,,,,2.2,
freq,71,,161,,,,89.0,,,,21.0,
mean,,3.205882,,2.151261,1.058824,5.630252,,2.360084,7.882353,245.781513,,1462.344538
std,,0.776533,,0.638282,0.23579,1.241787,,0.411393,2.482603,34.765316,,574.607699
min,,1.0,,1.0,1.0,3.0,,1.2,4.0,128.0,,527.0
25%,,3.0,,2.0,1.0,5.0,,2.0,8.0,256.0,,1066.5
50%,,3.0,,2.0,1.0,5.0,,2.5,8.0,256.0,,1333.0
75%,,4.0,,3.0,1.0,7.0,,2.7,8.0,256.0,,1777.0


<h3>Print the summary information of the dataset.</h3>


In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 238 entries, 0 to 237
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Manufacturer      238 non-null    object 
 1   Category          238 non-null    int64  
 2   Screen            238 non-null    object 
 3   GPU               238 non-null    int64  
 4   OS                238 non-null    int64  
 5   CPU_core          238 non-null    int64  
 6   Screen_Size_inch  234 non-null    object 
 7   CPU_frequency     238 non-null    float64
 8   RAM_GB            238 non-null    int64  
 9   Storage_GB_SSD    238 non-null    int64  
 10  Weight_kg         233 non-null    object 
 11  Price             238 non-null    int64  
dtypes: float64(1), int64(7), object(4)
memory usage: 22.4+ KB


<!--## Change Log


<!--|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-09-15|0.1|Abhishek Gagneja|Initial Version Created|
|2023-09-18|0.2|Vicky Kuo|Reviewed and Revised|--!>
