# Importing Libraries

In [None]:
import pandas as pd     #perform a wide variety of mathematical operations on arrays
import numpy as np     #analyzing, cleaning, exploring, and manipulating data.
import seaborn as sns   #provides statistical graphics for statistical data analysis
import matplotlib.pyplot as plt  #for creating static, animated, and interactive visualizations in Python. 

# Loading Data

In [None]:
df = pd.read_csv("/kaggle/input/sales-data/all_data.csv")
df.sort_index(inplace = True)   #sorting data in ascending order
df   #to display the first 5 rows of the dataset

# Data cleaning

In [None]:
df.columns     #to return the column labels of the given Dataframe

In [None]:
df.shape     #used to get the dimensions of any Python 

In [None]:
df.info()   #prints information about the DataFrame.

FINDING THE NULL VALUES AND DELETING THEM

In [None]:
df.isnull().sum()   #function returns the number of NaN values in all columns of a Pandas DataFrame.

In [None]:
na_df = df[df.isna().any(axis = 1)]     #another way to see null values

In [None]:
df.dropna(inplace = True) #DELETING NULL VALUES

In [None]:
df.isnull().sum()   #checking the null values after deleting them

**CHANGING DTYPES**

In [None]:
df = df[df['Order ID'] != 'Order ID']    #order id has string values therefore we take the store the value in a dataset without the str values 

In [None]:
df.info()

Changing the data types for variables to be used more effectively within the program.

In [None]:
df['Order ID'] = df['Order ID'].astype('int64')  #changing the datatype to int

In [None]:
#adding month col

df['Month'] = df['Order Date'].str[0:2]
df['Month'] = df['Month'].astype('int64')
df.head()

In [None]:
df['Order Date'] = pd.to_datetime(df['Order Date'])   #changing the dtype to date time column

In [None]:
df['Price Each'] = df['Price Each'].astype("float")   #changing the dtype to float

In [None]:
df['Quantity Ordered'] = pd.to_numeric(df['Quantity Ordered'])   #changing the dtype to numeric

In [None]:
# adding sales col
df['Sales'] = df['Quantity Ordered'] * df['Price Each']
df

Adding city column as it is needed for our analysis 

In [None]:
def get_city(Address):
    return Address.split(",")[1]

def get_state(Address):
    return Address.split(",")[2].split(" ")[1]


In [None]:
df['City'] = df['Purchase Address'].apply(lambda x : f"{get_city(x)} ({get_state(x)})")  #F-strings are used in Python for concise and readable string formatting with dynamic content.
df['City']

In [None]:
df.head()

* which country/city is the highest and lowest sales maker
* what time of the day does majority sales happen 
* which is the highest bought product 
* which month recorded higest sales 

In [None]:
Monthly_Sales = df[['Month', 'Sales']].groupby(by = 'Month').sum()
months = (range(1,13))
plt.figure(figsize = (12,5))
plt.bar(months ,Monthly_Sales["Sales"])
plt.title("Sales Vs Month", fontsize = 15 )
xlim = (1,13)
plt.xticks(np.arange(1,13),['JAN','FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC'])
 
plt.show()

**In the sales data analysis, December emerges as the peak month with the highest sales figures. On the other hand, January records the lowest sales, indicating a seasonal fluctuation in consumer purchasing behavior. This information provides valuable insights for business planning and resource allocation. It suggests the importance of strategic marketing and inventory management to capitalize on the peak season in December and mitigate the slower sales trend in January.**

In [None]:
df['Product'].unique()

In [None]:
highly_sold = df.groupby('Product')['Sales'].count().sort_values(ascending = False)[:10].reset_index()
highly_sold

In [None]:
plt.figure(figsize = (8,6))
sns.barplot(y = 'Product', x = 'Sales', data= highly_sold, palette = 'flare')
plt.title("Mostly Sold Products", fontsize = 15)
plt.show()

**From the above graph we know the Top 10 items mostly bought. The highest sold products in the dataset are the "USB-C Charging Cable," "Lightning Charging Cable," and "AAA Batteries." These items stand out as top performers in terms of sales, suggesting their popularity among consumers. This insight can inform inventory management and marketing strategies to further capitalize on the demand for these products and potentially explore opportunities for cross-selling or bundling.**

In [None]:
df['Hour'] = df['Order Date'].dt.hour
df['Minute'] = df['Order Date'].dt.minute
df

In [None]:
Peak_hours = df['Hour'].value_counts().sort_index()
plt.figure(figsize = (10,5))
plt.title('Peak hours for Sales', fontsize = 15)
plt.xticks(np.arange(1,25))
plt.plot(Peak_hours)
plt.xlabel('Hours')
plt.ylabel('No of Sales')
plt.grid()

**The above chart shows the highest number of sales is consistently observed after 10 am, with a peak between 12 pm and 9 pm. These active hours signify a strong consumer presence, indicating that the majority of purchases occur during this time. Businesses should target marketing efforts and inventory management to align with this peak period for maximizing sales and customer engagement. Additionally, they could strategically display advertisements during these hours to further enhance visibility and capitalize on consumer activity.**

In [None]:
df['City'].unique()

In [None]:
df.dtypes

In [None]:
df['City'] = df['City'].str.strip()

In [None]:
City_with_Sales = df.groupby(['City'])[['Sales']].sum().sort_values(by = 'Sales') 
City_with_Sales.reset_index(inplace = True)
City_with_Sales            

In [None]:
plt.figure(figsize = (10,8))
City_with_Sales['Sales'].plot(kind='pie', labels=City_with_Sales['City'], autopct='%1.1f%%')

**San Francisco, Los Angeles, and New York are the standout cities in terms of sales performance, recording the highest sales figures. This data suggests that these urban centers are key revenue generators, likely due to their large and diverse customer bases. Businesses should pay particular attention to these markets, allocating resources and marketing efforts accordingly to maximize sales and capture the robust consumer demand in these cities.**

Based on the findings from the exploratory data analysis (EDA) and the factors highlighted in the summary, here are some recommendations for businesses:

1. **Seasonal Sales Strategy**:
   - Develop a robust seasonal sales strategy that capitalizes on the peak in December and prepares for the lower sales in January. This could include special promotions, discounts, and holiday-themed marketing campaigns to attract more customers during the holiday season.

2. **Product Promotion**:
   - Given that "USB-C Charging Cable," "Lightning Charging Cable," and "AAA Batteries" are top-selling products, consider giving them more prominent placement in your marketing materials and on your website. Offering bundled deals or discounts on these items can also boost sales.

3. **Marketing Timing**:
   - Concentrate marketing efforts and customer engagement activities during the peak sales hours, which are between 12 pm and 9 pm. Utilize targeted advertisements, email campaigns, and social media promotions during this time to maximize visibility and engagement.

4. **Market Focus**:
   - Pay special attention to San Francisco, Los Angeles, and New York as key markets. Invest in localized marketing strategies, partnerships with local businesses, and tailored promotions to cater to the high consumer demand in these cities.

5. **Inventory Management**:
   - Optimize inventory management to match the seasonal demand. Stock up on popular products in preparation for the December peak and adjust inventory levels in January to minimize carrying costs.

6. **Data Monitoring and Feedback**:
   - Continuously monitor sales data and customer feedback to adapt your strategies in real-time. Stay agile and responsive to changing market conditions and consumer preferences.

7. **Customer Engagement**:
   - Enhance customer engagement by offering excellent customer service, convenient shopping experiences, and loyalty programs. Satisfied customers are more likely to make repeat purchases.

8. **Competitor Analysis**:
   - Keep an eye on competitors' strategies and pricing to ensure that your offerings remain competitive in the market.

9. **Diversification**:
   - Explore opportunities to diversify your product offerings based on customer preferences and market trends.

Implementing these recommendations can help businesses make the most of their sales data insights and drive growth, particularly in terms of seasonality, product selection, and market focus.

In [None]:
sns.set_palette('flare')