# **Retail Sales Data Analysis**

## Objectives

**Fetch Data from Kaggle and Save as Raw Data:**

Purpose: Download the retail sales dataset from Kaggle and store it in a raw format for initial analysis.

**Initial Data Exploration and Cleaning:**

Purpose: Conduct exploratory data analysis (EDA) to understand the structure and quality of the data, and perform initial cleaning tasks such as handling missing values and correcting inconsistencies.

**Data Transformation and Feature Engineering:**

Purpose: Transform the raw data into a suitable format for analysis, create new features (e.g., sales differences between holiday and non-holiday weeks), and standardize data formats.

**Descriptive Statistics and Visualization:**

Purpose: Generate descriptive statistics and create visualizations to summarize the data and identify initial trends and patterns.

**Hypothesis Testing and Statistical Analysis:**

Purpose: Formulate and validate hypotheses using statistical tests, such as t-tests and ANOVA, to uncover insights from the data.

**Predictive Modeling and Forecasting:**

Purpose: Build predictive models to forecast future sales based on historical data and identified trends.

**Impact Analysis of Promotional Markdowns:**

Purpose: Analyze the impact of promotional markdowns on sales during holiday and non-holiday periods, and visualize the results.

**Comparative Performance Analysis:**

Purpose: Compare sales performance across different stores and regions, taking into account store types and sizes.

**Project Documentation and Sharing:**

Purpose: Document the project process, findings, and code in a structured format, and share the results via GitHub or other platforms.

## Inputs

**Retail Sales Dataset:**

* Stores Data: Information about each store, including store type and size.

Filename: Stores.xlsx

Columns: Store, Type, Size

* Features Data: Additional information about each store, such as average temperature, fuel price, CPI, and unemployment rate.

Filename: Features.xlsx

Columns: Store, Date, Temperature, Fuel_Price, CPI, Unemployment, IsHoliday

* Sales Data: Weekly sales data for each department within each store.

Filename: Sales.xlsx

Columns: Store, Dept, Date, Weekly_Sales

**Python Libraries:**

* pandas: For data manipulation and analysis.

* numpy: For numerical operations.

* matplotlib: For creating static visualizations.

* seaborn: For creating enhanced visualizations.

* statsmodels: For statistical analysis and regression modeling.

* scipy: For additional statistical tests.

## Outputs

Visualizations:

Descriptive Statistics and Data Distributions:

Files/Code: Python code to generate histograms, box plots, and summary tables.

Artefacts: PNG or JPEG images of plots and charts visualizing data distributions.

Sales Trends and Patterns:

Files/Code: Python code for line graphs and time series plots.

Artefacts: PNG or JPEG images of sales trends over time.

Statistical Analysis Reports:

Hypothesis Testing and Correlation Analysis:

Files/Code: Python code for t-tests, ANOVA, and correlation matrices.

Artefacts: Text files or Jupyter Notebook cells documenting the results of statistical tests.

Predictive Models:

Regression Models and Forecasting:

Files/Code: Python code for regression analysis and forecasting models.

Artefacts: Model outputs and summary statistics saved as text files or displayed in the notebook.

Impact Analysis Visualizations:

Markdowns and Promotions Impact:

Files/Code: Python code for bar charts, heatmaps, and box plots comparing sales during promotional and non-promotional periods.

Artefacts: PNG or JPEG images of impact analysis visualizations.

Comparative Performance Visualizations:

Store and Region Performance:

Files/Code: Python code for bar charts, scatter plots, and heatmaps comparing sales across different stores and regions.

Artefacts: PNG or JPEG images of comparative performance visualizations.

Comprehensive Report:

Final Analysis Report:

Files/Code: Jupyter Notebook with markdown cells documenting the entire analysis process, findings, and visualizations.

Artefacts: Exported PDF or HTML report summarizing key insights and recommendations.

Project Documentation:

README File:

Files/Code: A detailed README file explaining the project objectives, methodology, and steps taken.

Artefacts: README.md file in your GitHub repository.
## Additional Comments

* If you have any additional comments that don't fit in the previous bullets, please state them here. 



---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [19]:
import os
current_dir = os.getcwd()
current_dir

'/workspace/CIproject1'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [20]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [21]:
current_dir = os.getcwd()
current_dir

'/workspace'

In [25]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

ModuleNotFoundError: No module named 'pandas'

# Section 1

Section 1 content

In [26]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


ModuleNotFoundError: No module named 'pandas'

---

# Section 2

Section 2 content

In [None]:
stores = pd.read_excel('Stores.csv')
features = pd.read_excel('Features data set.csv')
sales = pd.read_excel('Sales.csv')


NameError: name 'pd' is not defined

In [15]:
print(stores.head())
print(features.head())
print(sales.head())


NameError: name 'stores' is not defined

---

NOTE

* You may add as many sections as you want, as long as it supports your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---

# Push files to Repo

* In cases where you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
  # create your folder here
  # os.makedirs(name='')
except Exception as e:
  print(e)
