# Data Science portfolio project

#### This is a portfolio project for codecademy data scientist path. This is the final project for completion of the path.

##### The goal of this project is to analyze popular laptops purchased in 2024.

##### These are some questions that we need answered by the end:

1. Most popular brand
2. Average cost of laptops
3. Cheapest laptop
4. Most expensive laptop
5. Marks and reviews.

In [2]:
import pandas as pd
laptops_2024 = pd.read_csv('best_buy_laptops_2024.csv')

laptops_2024.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 489 entries, 0 to 488
Data columns (total 10 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   brand                        489 non-null    object 
 1   model                        488 non-null    object 
 2   aggregateRating/ratingValue  357 non-null    float64
 3   aggregateRating/reviewCount  357 non-null    float64
 4   offers/price                 262 non-null    float64
 5   offers/priceCurrency         489 non-null    object 
 6   depth                        431 non-null    float64
 7   width                        431 non-null    float64
 8   features/0/description       486 non-null    object 
 9   features/1/description       483 non-null    object 
dtypes: float64(5), object(5)
memory usage: 38.3+ KB


The names of some columns are too long. First, they need to be changed.

In [4]:
new_names = {
    'aggregateRating/ratingValue' : 'rating', 
    'aggregateRating/reviewCount' : 'reviewCount', 
    'offers/price' : 'price', 
    'offers/priceCurrency' : 'currency', 
}
laptops_2024 = laptops_2024.rename(
    mapper = new_names, 
    axis = 1,
)
laptops_2024.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 489 entries, 0 to 488
Data columns (total 10 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   brand                   489 non-null    object 
 1   model                   488 non-null    object 
 2   rating                  357 non-null    float64
 3   reviewCount             357 non-null    float64
 4   price                   262 non-null    float64
 5   currency                489 non-null    object 
 6   depth                   431 non-null    float64
 7   width                   431 non-null    float64
 8   features/0/description  486 non-null    object 
 9   features/1/description  483 non-null    object 
dtypes: float64(5), object(5)
memory usage: 38.3+ KB


Column 8 and 9 look to be the same. They must be one.

In [5]:
laptops_2024['features & description'] = laptops_2024['features/0/description'].str.cat(
    laptops_2024['features/1/description'], sep=' + '
)
column_drop = ['features/0/description', 'features/1/description']
laptops_2024 = laptops_2024.drop(
    labels=column_drop, 
    axis=1
)

laptops_2024.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 489 entries, 0 to 488
Data columns (total 9 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   brand                   489 non-null    object 
 1   model                   488 non-null    object 
 2   rating                  357 non-null    float64
 3   reviewCount             357 non-null    float64
 4   price                   262 non-null    float64
 5   currency                489 non-null    object 
 6   depth                   431 non-null    float64
 7   width                   431 non-null    float64
 8   features & description  481 non-null    object 
dtypes: float64(5), object(4)
memory usage: 34.5+ KB


 Now, another dataframe must be created separate from the original. This dataframe will help analyze about the brands.

In [6]:
laptops_brand = laptops_2024

laptops_brand.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 489 entries, 0 to 488
Data columns (total 9 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   brand                   489 non-null    object 
 1   model                   488 non-null    object 
 2   rating                  357 non-null    float64
 3   reviewCount             357 non-null    float64
 4   price                   262 non-null    float64
 5   currency                489 non-null    object 
 6   depth                   431 non-null    float64
 7   width                   431 non-null    float64
 8   features & description  481 non-null    object 
dtypes: float64(5), object(4)
memory usage: 34.5+ KB


In [9]:
col_drop = ['price','currency','depth','width', 'features & description' ]

laptops_brand = laptops_brand.drop(
    labels = col_drop,
    axis=1
)

laptops_brand.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 489 entries, 0 to 488
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   brand        489 non-null    object 
 1   model        488 non-null    object 
 2   rating       357 non-null    float64
 3   reviewCount  357 non-null    float64
dtypes: float64(2), object(2)
memory usage: 15.4+ KB


# Most popular brand

In [10]:
laptops_brand['brand'].value_counts()

brand
Lenovo       99
Dell         83
HP           68
ASUS         63
Microsoft    51
Acer         32
Razer        22
MSI          22
GIGABYTE     14
Samsung      13
LG            9
Alienware     6
HP OMEN       6
Thomson       1
Name: count, dtype: int64

In [11]:
brand_rating = (laptops_brand['rating'] == laptops_brand['rating'].max())
brand_review = (laptops_brand['reviewCount'] == laptops_brand['reviewCount'].max())

highest_rating = laptops_brand[brand_rating]
highest_review = laptops_brand[brand_review]

In [14]:
print('Highest rating:')
print(highest_rating['brand'].value_counts())
print()
print('Highest review:')
print(highest_review['brand'].value_counts())

Highest rating:
brand
ASUS         11
Dell         11
Lenovo       10
HP           10
Razer         5
Acer          5
Microsoft     3
LG            2
GIGABYTE      2
HP OMEN       1
Thomson       1
Name: count, dtype: int64

Highest review:
brand
HP    3
Name: count, dtype: int64


1. Lenovo is the most popular brand in market in 2024, but not the highest rated.
2. Highest rated laptops are ASUS and Dell.
3. Highest review score is for HP laptops.

# Average cost of laptops

In [15]:
laptops_2024['price'].describe().round(decimals=2)

count     262.00
mean     1159.04
std       782.61
min       143.99
25%       639.99
50%       959.99
75%      1494.99
max      4489.99
Name: price, dtype: float64

1. Average cost is: 1159.042. Most expensive laptop : 4489.9
3. Cheapest laptop: 143.9999

# Most expensive and cheapest

In [16]:
expensive = (laptops_2024['price'] == laptops_2024['price'].max())
cheapest = (laptops_2024['price'] == laptops_2024['price'].min())

expensive_laptop = laptops_2024[expensive]
cheapest_laptop = laptops_2024[cheapest]

#Let's print out the cheapest and most expensive 2024 laptops
print('Cheapest laptop in 2024: \n')
print(cheapest_laptop)
print()
print('Most expensive laptop in 2024: \n')
print(expensive_laptop)

Cheapest laptop in 2024: 

      brand       model  rating  reviewCount   price currency  depth  width  \
176  Lenovo  82LV003WUS     3.0          2.0  143.99      USD   9.29  14.18   

                      features & description  
176  4GB Soldered DDR4-2933 + 128GB eMMC 5.1  

Most expensive laptop in 2024: 

    brand  model  rating  reviewCount    price currency  depth  width  \
335  Dell  3GCR6     1.0          1.0  4489.99      USD   10.0   15.0   

                                features & description  
335  Microsoft Windows 11 Pro gives you a user-frie...  


1. Lenovo 82LV003WUS is the cheapest laptop with a 3.0 rating and a 2.0 review score.
2. Dell 3GCR6 is the most expensive laptop with a 1.0 rating and a 1.0 review score.

### Conclusion:

1. Lenovo has sold most laptops in 2024.
2. Asus and Dell have highest ratings and Lenovo and HP are second.
3. HP laptops the highest review score.
4. Average cost for a laptop in 2024 is about 1159.04 dollars.
5. Cheapest laptop in 2024 costs 143.99 dollars.
6. Most expensive laptop in 2024 costs 4489.99 dollars.
7. Lenovo 82LV003WUS is the cheapest laptop with a 3.0 rating and a 2.0 review score.
8. The Dell 3GCR6 is the most expensive laptop with a 1.0 rating and a 1.0 review score.