# Top_laptop_2024 Project (Beginner project)
This project aims to analyze the most popular laptops purchased in 2024.
Throughout the project we will looking for answering the following questions : 
1. What is the most popular brand ?✔
       -Is it the same brand that have the most reviews and have the best mark?✔
2. how much is the average laptop cost ?✔
3. how much is the cheapest laptop ?✔
4. how much is the most expensive laptop ?✔
5. What review and marks do they have ?  ✔

To answer these questions, we will need to clean up the dataset from useless informations.
however, it might be useful to save the original dataframe for any analyze we would have to do in the future. 

First we have to import the dataset and looking for column's informations.

In [122]:
import pandas as pd
laptops_2024 = pd.read_csv('best_buy_laptops_2024.csv')

laptops_2024.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 489 entries, 0 to 488
Data columns (total 10 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   brand                        489 non-null    object 
 1   model                        488 non-null    object 
 2   aggregateRating/ratingValue  357 non-null    float64
 3   aggregateRating/reviewCount  357 non-null    float64
 4   offers/price                 262 non-null    float64
 5   offers/priceCurrency         489 non-null    object 
 6   depth                        431 non-null    float64
 7   width                        431 non-null    float64
 8   features/0/description       486 non-null    object 
 9   features/1/description       483 non-null    object 
dtypes: float64(5), object(5)
memory usage: 38.3+ KB


Now that we have the informations, we can clean up the dataframe. 
The first thing that we notice is the name of some columns that aren't relevant or too long, perhaps we can change them. 

In [123]:
#Let's rename the column's name. 
new_names = {
    'aggregateRating/ratingValue' : 'rating', 
    'aggregateRating/reviewCount' : 'reviewCount', 
    'offers/price' : 'price', 
    'offers/priceCurrency' : 'currency', 
}
#We'll take a closer look at the 8 and 9 columns on the cells below
laptops_2024 = laptops_2024.rename(
    mapper = new_names, 
    axis = 1,
)
laptops_2024.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 489 entries, 0 to 488
Data columns (total 10 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   brand                   489 non-null    object 
 1   model                   488 non-null    object 
 2   rating                  357 non-null    float64
 3   reviewCount             357 non-null    float64
 4   price                   262 non-null    float64
 5   currency                489 non-null    object 
 6   depth                   431 non-null    float64
 7   width                   431 non-null    float64
 8   features/0/description  486 non-null    object 
 9   features/1/description  483 non-null    object 
dtypes: float64(5), object(5)
memory usage: 38.3+ KB


The second noticeable thing is the 8 and 9 columns that seem to be the same...Maybe we can concatenate them into one called "description & features"

In [124]:
laptops_2024['description & features'] = laptops_2024['features/0/description'].str.cat(
    laptops_2024['features/1/description'], sep=' + '
)
# Since the two columns have been concatenated in one, we can safely drop the former ones
column_drop = ['features/0/description', 'features/1/description']
laptops_2024 = laptops_2024.drop(
    labels=column_drop, 
    axis=1
)

laptops_2024.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 489 entries, 0 to 488
Data columns (total 9 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   brand                   489 non-null    object 
 1   model                   488 non-null    object 
 2   rating                  357 non-null    float64
 3   reviewCount             357 non-null    float64
 4   price                   262 non-null    float64
 5   currency                489 non-null    object 
 6   depth                   431 non-null    float64
 7   width                   431 non-null    float64
 8   description & features  481 non-null    object 
dtypes: float64(5), object(4)
memory usage: 34.5+ KB


# Dataframe Brand creation
Now that we have renamed and dropped the useless columns, I suggest that we create another dataframe separate from the original. 
This dataframe's aim is to help us analyze the questions about the laptops' brand.


In [125]:
laptops_brand_analyze = laptops_2024

laptops_brand_analyze.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 489 entries, 0 to 488
Data columns (total 9 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   brand                   489 non-null    object 
 1   model                   488 non-null    object 
 2   rating                  357 non-null    float64
 3   reviewCount             357 non-null    float64
 4   price                   262 non-null    float64
 5   currency                489 non-null    object 
 6   depth                   431 non-null    float64
 7   width                   431 non-null    float64
 8   description & features  481 non-null    object 
dtypes: float64(5), object(4)
memory usage: 34.5+ KB


Now, let's drop the useless series like : 
- descriptions & features
- depth
- width
- currency
- price


To make our new dataframe a little bit more cleaner

In [126]:
drop_columns = ['price','currency','depth','width', 'description & features' ]

laptops_brand_analyze = laptops_brand_analyze.drop(
    labels = drop_columns,
    axis=1
)

laptops_brand_analyze.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 489 entries, 0 to 488
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   brand        489 non-null    object 
 1   model        488 non-null    object 
 2   rating       357 non-null    float64
 3   reviewCount  357 non-null    float64
dtypes: float64(2), object(2)
memory usage: 15.4+ KB


# Brand comparison
Nice, know we can focus on the first question : 
Which brand is the most popular. 
To answer this question, we can simply use the value_counts() method. 

In [128]:
laptops_brand_analyze['brand'].value_counts()

brand
Lenovo       99
Dell         83
HP           68
ASUS         63
Microsoft    51
Acer         32
MSI          22
Razer        22
GIGABYTE     14
Samsung      13
LG            9
Alienware     6
HP OMEN       6
Thomson       1
Name: count, dtype: int64

From our request, we can observe that Lenovo tends to be the most popular brand on the market in 2024. 
But is it the most rated and reviewed brand ? 
Let's investigate

In [129]:
laptop_brand_rating = (laptops_brand_analyze['rating'] == laptops_brand_analyze['rating'].max())
laptop_brand_review = (laptops_brand_analyze['reviewCount'] == laptops_brand_analyze['reviewCount'].max())

best_rating_score = laptops_brand_analyze[laptop_brand_rating]
best_review_score = laptops_brand_analyze[laptop_brand_review]

Now that we have filtered the rating and review scores on two news dataframes, we can apply the same process than for the popularity brand. 
With the values_count() methods, we will be able to establish a tendancy if lenove received both max rating and review scores. 

In [130]:
print('Best rating score')
print(best_rating_score['brand'].value_counts())
print()
print('Best review score')
print(best_review_score['brand'].value_counts())

Best rating score
brand
ASUS         11
Dell         11
Lenovo       10
HP           10
Razer         5
Acer          5
Microsoft     3
LG            2
GIGABYTE      2
HP OMEN       1
Thomson       1
Name: count, dtype: int64

Best review score
brand
HP    3
Name: count, dtype: int64


From that analyze, we can observe that lenovo does not have neither the best rating score nor the best reviews score.  
therefore, we can observe the following conclusion : 
1. lenovo is the most popular brand from purchased laptop in 2024. 
2. Asus and Dell have the best rating score meanwhile Lenovo is only second with HP. 
3. HP have the best reviewing score. 


# Laptop cost Information

Now let's focus on the laptop cost. The following topic will allow us to answer the average, cheapest and highest laptop cost. 
we are going to take back the original dataframe  : laptop_2024

In [131]:
laptops_2024['price'].describe().round(decimals=2)

count     262.00
mean     1159.04
std       782.61
min       143.99
25%       639.99
50%       959.99
75%      1494.99
max      4489.99
Name: price, dtype: float64

From that request, pandas provide us with the following observations : 
1. the average cost is equal : 1159.04
2. the laptop that is the most expensive cost :  4489.99
3. the laptop that is the cheapest cost : 143.99

# Cheapest and most expensive laptop
Now that we have the wanted informations about the cost, let's see the laptops caracteristics according to the lowest and highest price.

In [137]:
expensive_statement = (laptops_2024['price'] == laptops_2024['price'].max())
cheapest_statement = (laptops_2024['price'] == laptops_2024['price'].min())

most_expensive_laptop = laptops_2024[expensive_statement]
cheapest_laptop = laptops_2024[cheapest_statement]

#Let's print out the cheapest and most expensive 2024 laptops
print('Cheapest laptop in 2024 : \n')
print(cheapest_laptop)
print()
print('Most expensive laptop in 2024 : \n')
print(most_expensive_laptop)

Cheapest laptop in 2024 : 

      brand       model  rating  reviewCount   price currency  depth  width  \
176  Lenovo  82LV003WUS     3.0          2.0  143.99      USD   9.29  14.18   

                      description & features  
176  4GB Soldered DDR4-2933 + 128GB eMMC 5.1  

Most expensive laptop in 2024 : 

    brand  model  rating  reviewCount    price currency  depth  width  \
335  Dell  3GCR6     1.0          1.0  4489.99      USD   10.0   15.0   

                                description & features  
335  Microsoft Windows 11 Pro gives you a user-frie...  


The request reveal the following informations : 
1. The Lenovo 82LV003WUS is the cheapest laptop with a 3.0 rating score and a 2.0 reviewing score.
2. The Dell 3GCR6 is the most expensive laptop with a 1.0 rating score and a 1.0 reviewing score.

# Conclusion
This beginner project leads us on observations about the laptop's brand popularity and laptop's cost. 
with the given dataset, we can conclude :
## About the brand :
1. Lenovo is the brand that sells most laptops in 2024.
2. Asus and Dell have the best rating score meanwhile Lenovo is only second with HP.
3. HP have the best reviewing score.

## About laptop's price : 
1. The average laptop's cost is about 1159.04 dollar
2. The cheapest laptop cost 143.99 dollar
3. The most expensive laptop cost 4489.99 dollar
4. Lenovo 82LV003WUS is the cheapest laptop with a 3.0 rating score and a 2.0 reviewing score.
5. The Dell 3GCR6 is the most expensive laptop with a 1.0 rating score and a 1.0 reviewing score.
