# Sales Dashboard Tutorial

## 1. Loading and Inspecting the Dataset

<p style="background:black">
<code style="background:black;color:white">C:\Users\YOUR_USERNAME> pip install pandas
</code>
</p>

In [1]:
import pandas as pd

# Dataset URL
DATA_URL = "https://raw.githubusercontent.com/Sven-Bo/datasets/master/store_sales_2022-2023.csv"

# Load dataset
data = pd.read_csv(DATA_URL)

# Display the first few rows
data.head()

Unnamed: 0,order_id,product_id,store_id,product_name,product_category,city,date_of_sale,quantity_sold,sales_amount
0,1,52,1,CodeComet,Software Development Tools,Tokyo,1/1/2022,8,303.29
1,2,83,3,SyntaxScribe,Software Development Tools,Yokohama,1/1/2022,8,173.53
2,3,24,3,CodeCanvas,Software Development Tools,Yokohama,1/2/2022,6,37.72
3,4,88,2,VarVista Pro,Educational Tools,Osaka,1/2/2022,6,10.47
4,5,60,1,LoopLantern,Creative & Design Tools,Tokyo,1/3/2022,1,159.1



### Explanation
- **Data URL:** We load the CSV file from a GitHub repository.
- **`pd.read_csv`:** Reads the CSV into a pandas DataFrame.
- **Inspect Data:** Using `.head()` gives a preview of the dataset.


## 2. Checking Data Types

In [2]:
# Check data types of columns
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1730 entries, 0 to 1729
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   order_id          1730 non-null   int64  
 1   product_id        1730 non-null   int64  
 2   store_id          1730 non-null   int64  
 3   product_name      1730 non-null   object 
 4   product_category  1730 non-null   object 
 5   city              1730 non-null   object 
 6   date_of_sale      1730 non-null   object 
 7   quantity_sold     1730 non-null   int64  
 8   sales_amount      1730 non-null   float64
dtypes: float64(1), int64(4), object(4)
memory usage: 121.8+ KB



### Explanation
- **`.info()`:** Displays column names, data types, and non-null counts. 
- This helps us understand the structure of the dataset and identify any potential issues with missing data or incorrect data types.


## 3. Transforming Data for DateTime and Extracting Features

In [3]:
# Convert 'date_of_sale' to datetime
data['date_of_sale'] = pd.to_datetime(data['date_of_sale'])
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1730 entries, 0 to 1729
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   order_id          1730 non-null   int64         
 1   product_id        1730 non-null   int64         
 2   store_id          1730 non-null   int64         
 3   product_name      1730 non-null   object        
 4   product_category  1730 non-null   object        
 5   city              1730 non-null   object        
 6   date_of_sale      1730 non-null   datetime64[ns]
 7   quantity_sold     1730 non-null   int64         
 8   sales_amount      1730 non-null   float64       
dtypes: datetime64[ns](1), float64(1), int64(4), object(3)
memory usage: 121.8+ KB


In [4]:
# Extract month and year
data['month'] = data['date_of_sale'].dt.month
data['year'] = data['date_of_sale'].dt.year

# Display the transformed DataFrame
data.head()

Unnamed: 0,order_id,product_id,store_id,product_name,product_category,city,date_of_sale,quantity_sold,sales_amount,month,year
0,1,52,1,CodeComet,Software Development Tools,Tokyo,2022-01-01,8,303.29,1,2022
1,2,83,3,SyntaxScribe,Software Development Tools,Yokohama,2022-01-01,8,173.53,1,2022
2,3,24,3,CodeCanvas,Software Development Tools,Yokohama,2022-01-02,6,37.72,1,2022
3,4,88,2,VarVista Pro,Educational Tools,Osaka,2022-01-02,6,10.47,1,2022
4,5,60,1,LoopLantern,Creative & Design Tools,Tokyo,2022-01-03,1,159.1,1,2022



### Explanation
- **`pd.to_datetime`:** Converts the `date_of_sale` column into a datetime object for easier manipulation.
- **`.dt.month` and `.dt.year`:** Extract month and year from the datetime column.
- These transformations are essential for time-based analysis.


## 4. Chaining Transformations

In [5]:
# Perform the same transformations using method chaining
data = data.assign(
    date_of_sale=lambda df: pd.to_datetime(df['date_of_sale']),
    month=lambda df: df['date_of_sale'].dt.month,
    year=lambda df: df['date_of_sale'].dt.year
)

# Display the transformed DataFrame
data.head()

Unnamed: 0,order_id,product_id,store_id,product_name,product_category,city,date_of_sale,quantity_sold,sales_amount,month,year
0,1,52,1,CodeComet,Software Development Tools,Tokyo,2022-01-01,8,303.29,1,2022
1,2,83,3,SyntaxScribe,Software Development Tools,Yokohama,2022-01-01,8,173.53,1,2022
2,3,24,3,CodeCanvas,Software Development Tools,Yokohama,2022-01-02,6,37.72,1,2022
3,4,88,2,VarVista Pro,Educational Tools,Osaka,2022-01-02,6,10.47,1,2022
4,5,60,1,LoopLantern,Creative & Design Tools,Tokyo,2022-01-03,1,159.1,1,2022



### Explanation
- Method chaining allows for cleaner and more concise data manipulation.
- The `.assign` method lets you add or modify columns directly.


## 5. Calculating City Revenues

In [8]:
YEAR = 2023

# Calculate total revenue for each city and year, then calculate percentage change
city_revenues = (
    data.groupby(['city', 'year'])['sales_amount']
    .sum()
    .unstack()
    .assign(change=lambda x: x.pct_change(axis=1)[YEAR] * 100)
)

city_revenues

year,2022,2023,change
city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Osaka,76914.92,81202.93,5.575004
Tokyo,79961.13,72717.66,-9.058739
Yokohama,63216.9,67110.89,6.159729



### Explanation
- **Grouping:** Data is grouped by city and year.
- **Aggregation:** Sum of `sales_amount` for each group.
- **Unstacking:** Reshapes the DataFrame to have years as columns.
- **Percentage Change:** Calculates year-over-year change using `.pct_change`.


## 6. Example for One City

In [9]:
# Example: Get revenue and change for a specific city
city = "Tokyo"

revenue = city_revenues.loc[city, YEAR]
change = city_revenues.loc[city, "change"]

f"Revenue for {city}: ${revenue:,.2f}, Change: {change:.2f}%"

'Revenue for Tokyo: $72,717.66, Change: -9.06%'


### Explanation
- Retrieves data for a specific city and year.
- Displays revenue and percentage change.


## 7. Preparing Data for Bar Chart

In [15]:
# Variables for filtering
selected_city = "Tokyo"
visualization_year = 2023

# Filter, group, and sum data for bar chart
filtered_data = (
    data.query("city == @selected_city & year == @visualization_year")
    .groupby("month", dropna=False, as_index=False)["sales_amount"]
    .sum()
)

filtered_data
# type(filtered_data)

Unnamed: 0,month,sales_amount
0,1,7289.7
1,2,6281.59
2,3,6568.99
3,4,6307.56
4,5,5881.92
5,6,4662.89
6,7,6116.09
7,8,5553.45
8,9,5327.64
9,10,8650.56



### Explanation
- **Querying:** Filters data for a specific city and year using `query`.
- **Grouping and Summing:** Groups data by month and sums `sales_amount`.
- This prepares the data for visualization in a bar chart.
