<a href="https://colab.research.google.com/github/AkshayYede/iphone-Sales-Analysis/blob/main/Iphone_Sales_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Iphone Sales Analysis**

## **Summary**

This project involves analyzing a dataset of Apple iPhones available on Flipkart to gain insights into their pricing, ratings, discounts, and other attributes. The dataset includes columns such as 'Product', 'URL', 'Brand', 'Price', 'MRP', 'Discount Percentage', 'Number Of Ratings', 'Number Of Reviews', 'UPC', 'Rating', and 'RAM'. The analysis aims to understand the relationships between these attributes and to visualize various aspects of iPhone sales, such as the distribution of prices, the impact of discounts, and the correlation between ratings and pricing.

## **Problem Statement**

The primary objectives of this project are:
1. **Understand Pricing and Discounts:** To explore how the pricing of iPhones relates to the discount percentages offered and to identify any patterns or trends in the distribution of discounts.
2. **Analyze Ratings and Reviews:** To determine which iPhone models are most highly rated and to examine how the number of ratings correlates with other factors such as price and discount percentage.
3. **Visualize Data Relationships:** To create visualizations that illustrate key relationships within the dataset, such as the impact of price on discount percentages and the distribution of ratings and reviews among different models.

## **Importing Libraries**

In [94]:
# Import the necessary libraries

import pandas as pd  # pandas for data manipulation and analysis
import numpy as np  # numpy for numerical operations
import plotly.express as px  # plotly.express for simple visualizations
import plotly.graph_objects as go  # plotly.graph_objects for more complex and detailed visualizations

## **Loading Dataset**

In [95]:
# Load the dataset from a CSV file
df = pd.read_csv('/content/apple_products.csv')  # Read the 'apple_products.csv' file into a pandas DataFrame 'df'

In [96]:
# Display the first 5 rows of the DataFrame
df.head()

Unnamed: 0,Product Name,Product URL,Brand,Sale Price,Mrp,Discount Percentage,Number Of Ratings,Number Of Reviews,Upc,Star Rating,Ram
0,"APPLE iPhone 8 Plus (Gold, 64 GB)",https://www.flipkart.com/apple-iphone-8-plus-g...,Apple,49900,49900,0,3431,356,MOBEXRGV7EHHTGUH,4.6,2 GB
1,"APPLE iPhone 8 Plus (Space Grey, 256 GB)",https://www.flipkart.com/apple-iphone-8-plus-s...,Apple,84900,84900,0,3431,356,MOBEXRGVAC6TJT4F,4.6,2 GB
2,"APPLE iPhone 8 Plus (Silver, 256 GB)",https://www.flipkart.com/apple-iphone-8-plus-s...,Apple,84900,84900,0,3431,356,MOBEXRGVGETABXWZ,4.6,2 GB
3,"APPLE iPhone 8 (Silver, 256 GB)",https://www.flipkart.com/apple-iphone-8-silver...,Apple,77000,77000,0,11202,794,MOBEXRGVMZWUHCBA,4.5,2 GB
4,"APPLE iPhone 8 (Gold, 256 GB)",https://www.flipkart.com/apple-iphone-8-gold-2...,Apple,77000,77000,0,11202,794,MOBEXRGVPK7PFEJZ,4.5,2 GB


In [97]:
# Rename specific columns of the DataFrame
df.rename(columns={'Product Name': 'Product', 'Product URL': 'URL', 'Sale Price': 'Price',
                   'Mrp': 'MRP', 'Upc': 'UPC', 'Star Rating': 'Rating', 'Ram': 'RAM'}, inplace=True)

In [98]:
# Display 5 random rows from the DataFrame
df.sample(5)

Unnamed: 0,Product,URL,Brand,Price,MRP,Discount Percentage,Number Of Ratings,Number Of Reviews,UPC,Rating,RAM
13,"Apple iPhone XR (White, 128 GB) (Includes EarP...",https://www.flipkart.com/apple-iphone-xr-white...,Apple,41999,52900,20,79512,6796,MOBF9Z7ZZY3HCDZZ,4.6,4 GB
32,"APPLE iPhone 12 Pro Max (Graphite, 128 GB)",https://www.flipkart.com/apple-iphone-12-pro-m...,Apple,120900,129900,6,580,45,MOBFWBYZFDGQSDWS,4.6,6 GB
47,"APPLE iPhone 12 Pro Max (Pacific Blue, 128 GB)",https://www.flipkart.com/apple-iphone-12-pro-m...,Apple,120900,129900,6,580,45,MOBFWBYZZABKHZQA,4.6,6 GB
26,"APPLE iPhone 12 Mini (White, 128 GB)",https://www.flipkart.com/apple-iphone-12-mini-...,Apple,64900,74900,13,740,64,MOBFWBYZAGXJRDGB,4.5,4 GB
60,"APPLE iPhone 11 (Black, 64 GB)",https://www.flipkart.com/apple-iphone-11-black...,Apple,46999,54900,14,43470,3331,MOBFWQ6BXGJCEYNY,4.6,4 GB


In [99]:
# Get the dimensions of the DataFrame
df.shape

(62, 11)

In [100]:
# Display summary information about the DataFrame
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62 entries, 0 to 61
Data columns (total 11 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Product              62 non-null     object 
 1   URL                  62 non-null     object 
 2   Brand                62 non-null     object 
 3   Price                62 non-null     int64  
 4   MRP                  62 non-null     int64  
 5   Discount Percentage  62 non-null     int64  
 6   Number Of Ratings    62 non-null     int64  
 7   Number Of Reviews    62 non-null     int64  
 8   UPC                  62 non-null     object 
 9   Rating               62 non-null     float64
 10  RAM                  62 non-null     object 
dtypes: float64(1), int64(5), object(5)
memory usage: 5.5+ KB


In [101]:
# Generate descriptive statistics of the DataFrame
df.describe()

Unnamed: 0,Price,MRP,Discount Percentage,Number Of Ratings,Number Of Reviews,Rating
count,62.0,62.0,62.0,62.0,62.0,62.0
mean,80073.887097,88058.064516,9.951613,22420.403226,1861.677419,4.575806
std,34310.446132,34728.825597,7.608079,33768.58955,2855.88383,0.05919
min,29999.0,39900.0,0.0,542.0,42.0,4.5
25%,49900.0,54900.0,6.0,740.0,64.0,4.5
50%,75900.0,79900.0,10.0,2101.0,180.0,4.6
75%,117100.0,120950.0,14.0,43470.0,3331.0,4.6
max,140900.0,149900.0,29.0,95909.0,8161.0,4.7


In [102]:
# Check for missing values in each column of the DataFrame
df.isnull().sum()

Unnamed: 0,0
Product,0
URL,0
Brand,0
Price,0
MRP,0
Discount Percentage,0
Number Of Ratings,0
Number Of Reviews,0
UPC,0
Rating,0


There are no missing values in the dataset, as indicated by the absence of null values in all columns. This ensures that the dataset is complete and ready for further analysis without the need for data imputation or cleaning to handle missing values. As a result, the analysis can proceed with confidence in the integrity of the data, reducing potential biases caused by incomplete information.

## **Distribution of IPhone Prices**

In [103]:
# Create a histogram to visualize the distribution of iPhone prices
price_distribution = px.histogram(df, x="Price", title="Distribution of iPhone Prices")

# Show the histogram
price_distribution.show()

**Conclusion**:

The distribution of iPhone prices is bimodal, indicating the presence of two distinct price ranges within the dataset. This suggests that there are two major categories of iPhones based on price: a lower-priced category and a higher-priced category.

## **Top 10 Products by Rating**

In [104]:
# Sort the DataFrame by 'Rating' in descending order to prioritize higher ratings
highest_rated = df.sort_values(by=['Rating'], ascending=False)

# Select the top 10 rows from the sorted DataFrame, which represent the highest-rated products
top_10_highest_rated = highest_rated.head(10)

# Extract and display the 'Product' column of the top 10 highest-rated products
top_10_highest_rated['Product']

Unnamed: 0,Product
20,"APPLE iPhone 11 Pro Max (Midnight Green, 64 GB)"
17,"APPLE iPhone 11 Pro Max (Space Grey, 64 GB)"
16,"APPLE iPhone 11 Pro Max (Midnight Green, 256 GB)"
15,"APPLE iPhone 11 Pro Max (Gold, 64 GB)"
14,"APPLE iPhone 11 Pro Max (Gold, 256 GB)"
0,"APPLE iPhone 8 Plus (Gold, 64 GB)"
29,"APPLE iPhone 12 (White, 128 GB)"
32,"APPLE iPhone 12 Pro Max (Graphite, 128 GB)"
35,"APPLE iPhone 12 (Black, 128 GB)"
36,"APPLE iPhone 12 (Blue, 128 GB)"


**Conclusion:**

The top 10 products by rating are predominantly from the iPhone 11 Pro Max and iPhone 12 series, showcasing a range of colors and storage capacities. The list highlights a preference for high-end models such as the iPhone 11 Pro Max in various finishes and the iPhone 12 series in different colors. This indicates that consumers have rated these models highly, possibly due to their features, performance, and overall satisfaction.

## **Number of Ratings for Highest Rated iPhones**

In [105]:
# Count the occurrences of each product in the top 10 highest-rated products
iphones = top_10_highest_rated["Product"].value_counts()

# Extract the product names (labels) from the counted occurrences
label = iphones.index

# Extract the number of ratings for each product from the top 10 highest-rated products
counts = top_10_highest_rated["Number Of Ratings"]

# Create a bar chart to visualize the number of ratings for each highest-rated iPhone
figure = px.bar(top_10_highest_rated, x=label, y=counts, title="Number of Ratings of Highest Rated iPhones")

# Display the bar chart
figure.show()

**Conclusion:**

Among the top 10 highest-rated iPhones, the iPhone 8 Plus (Gold, 64 GB) stands out as having the highest number of reviews on Flipkart. This indicates that, despite being an older model, it has garnered significant attention and engagement from users, reflecting a strong consumer interest and satisfaction with this particular variant in India.

## **Number of Reviews for Highest Rated iPhones**

In [106]:
# Count the occurrences of each product in the top 10 highest-rated products
iphones = top_10_highest_rated["Product"].value_counts()

# Extract the product names (labels) from the counted occurrences
label = iphones.index

# Extract the number of reviews for each product from the top 10 highest-rated products
counts = top_10_highest_rated["Number Of Reviews"]

# Create a bar chart to visualize the number of ratings for each highest-rated iPhone
figure = px.bar(top_10_highest_rated, x=label, y=counts, title="Number of Reviews of Highest Rated iPhones")

# Display the bar chart
figure.show()

**Conclusion:**

Among the top 10 highest-rated iPhones, the iPhone 8 Plus (Gold, 64 GB) has the highest number of reviews. This indicates that this particular model has attracted the most user feedback, reflecting a high level of consumer engagement and interaction compared to other top-rated iPhones.

## **Number of Ratings Vs RAM**

In [107]:
# Create a box plot to visualize the distribution of the number of ratings for different RAM sizes
ratings_vs_ram = px.box(df, x="RAM", y="Number Of Ratings", title="Number of Ratings vs. RAM")

# Show the box plot
ratings_vs_ram.show()

## **Relationship Between Price and Number of Ratings**

In [108]:
# Create a scatter plot to visualize the relationship between the number of ratings and the price of products
# The size of the points represents the discount percentage, and a trendline is added to show the overall trend

figure = px.scatter(df, x="Number Of Ratings", y="Price", size="Discount Percentage", trendline='ols',
                   title="Relationship Between Price and Number of Reviews")

# Display the scatter plot
figure.show()

**Conclusion:**

The scatter plot reveals a negative linear relationship between the sale price of iPhones and the number of ratings. This indicates that iPhones with lower sale prices tend to receive more ratings, suggesting that more affordable models are sold more frequently in India. The trendline confirms that as the price decreases, the number of ratings generally increases, reflecting a higher volume of sales for lower-priced iPhones.

## **Relationship Between Discount Percentage and Number of Ratings**

In [109]:
# Create a scatter plot to visualize the relationship between the number of ratings and the discount percentage of products
# The size of the points represents the discount percentage, and a trendline is added to show the overall trend

figure = px.scatter(df, x="Number Of Ratings", y="Discount Percentage", size='Price', trendline='ols',
                   title="Relationship Between Discount Percentage and Number of Ratings")

# Display the scatter plot
figure.show()

**Conclusion:**

1. **Most Appreciated iPhone:** The iPhone 8 Plus (Gold, 64 GB) stands out as the most appreciated iPhone model in India, reflecting a high level of consumer satisfaction and engagement.

2. **Price and Sales Volume:** iPhones with lower sale prices tend to receive more ratings and are sold more frequently in India. This suggests that affordability plays a significant role in driving sales volume.

3. **Impact of Discounts:** iPhones offering higher discounts are also sold more frequently. This indicates that consumers are more likely to purchase iPhones with significant discounts, which enhances their attractiveness and sales performance.

## **Relationship Between Discount Percentage and Price**

In [110]:
# Create a scatter plot to visualize the relationship between discount percentage and price
discount_vs_price = px.scatter(df, x="Price", y="Discount Percentage", title="Relationship Between Discount Percentage and Price")

# Show the scatter plot
discount_vs_price.show()

**Conclusion:**

The analysis of discount percentages versus prices reveals a clear trend: as the price of iPhones increases, the discount percentage tends to decrease, and conversely, lower-priced iPhones are associated with higher discount percentages. This indicates that higher-priced models typically offer smaller discounts, while more affordable models often come with larger discounts to attract buyers.

The highest discount observed is 29%, demonstrating that significant price reductions are applied to some lower-priced models. This pricing strategy likely aims to boost sales volumes and make more budget-friendly options appealing to a broader audience.

## **Most and Least Expensive iphones**

In [111]:
# Identify the most and least expensive products in the DataFrame

# Locate the row with the maximum price in the DataFrame
most_expensive = df.loc[df['Price'].idxmax()]

# Locate the row with the minimum price in the DataFrame
least_expensive = df.loc[df['Price'].idxmin()]

# Print the details of the most expensive product
print("Most Expensive Product:")
print(most_expensive)

# Print the details of the least expensive product
print("\nLeast Expensive Product:")
print(least_expensive)

Most Expensive Product:
Product                             APPLE iPhone 12 Pro (Silver, 512 GB)
URL                    https://www.flipkart.com/apple-iphone-12-pro-s...
Brand                                                              Apple
Price                                                             140900
MRP                                                               149900
Discount Percentage                                                    6
Number Of Ratings                                                    542
Number Of Reviews                                                     42
UPC                                                     MOBFWBYZ5UY6ZBVA
Rating                                                               4.5
RAM                                                                 4 GB
Name: 24, dtype: object

Least Expensive Product:
Product                                   APPLE iPhone SE (White, 64 GB)
URL                    https://www.flipkart.com/ap

**Conclusion:**

1. **Most Expensive Product:** The most expensive iPhone in the dataset is the APPLE iPhone 12 Pro (Silver, 512 GB), priced at ₹140,900. This high-end model features substantial storage capacity and is positioned at the premium end of the market.

2. **Least Expensive Product:** The least expensive iPhone is the APPLE iPhone SE (White, 64 GB), priced at ₹29,999. Despite its lower price, it still maintains a competitive rating and has a significant number of ratings and reviews.

The comparison highlights the range of iPhone prices available, from high-end models with advanced features to more affordable options, catering to diverse consumer preferences and budgets.

## **Conclusion**

The analysis provides several key insights:

1. **Pricing and Discounts:** There is a notable trend where higher-priced iPhones tend to have lower discount percentages, while lower-priced models often offer higher discounts. The maximum discount observed is 29%, indicating significant reductions for some of the more affordable models.

2. **Top-Rated Products:** The iPhone 8 Plus (Gold, 64 GB) emerges as the most appreciated model based on ratings, despite its older age. This suggests a high level of consumer satisfaction with this particular variant.

3. **Price vs. Number of Ratings:** The scatter plot reveals a negative linear relationship between iPhone prices and the number of ratings. This indicates that lower-priced models generally receive more ratings, suggesting higher sales volumes for these more affordable options.

4. **Discount Percentage vs. Price:** The analysis confirms that higher discounts are associated with lower prices, supporting the strategy of offering more attractive discounts on budget-friendly models to drive sales.

Overall, the project highlights the dynamics of pricing, discounts, and consumer ratings in the iPhone market, providing valuable insights into how these factors influence purchasing decisions and market trends.