Effective data modeling can lead to the foundation for smooth data analysis through which we can create key performance indicators (KPIs) to monitor performance and make smarter decisions. This saves time, resources, and money. Throughout, we have learned key Power Query, DAX, and calculation functions necessary to perform modeling, exploratory data analysis and data transformation needed. It's time to practice those concepts!

Sales data modeling can cover many aspects (sales, production, customer experience, employee efficiency, etc.). Therefore, it presents many challenges. So, having a project like this in our portfolio can help demonstrate our skills. The goal of this project is to model and analyze data from a sales records database for scale model cars and extract information for decision-making.

Good analysis starts with questions. Imagine we are an analyst at a company. We've been tasked to produce a report that answers a set of questions. We'll work step by step through this project doing the analysis and creating the visualizations necessary to answer these questions. Below are the questions we'll consider in this project.

* **Question 1:** What were the total sales for the company?
* **Question 2:** Which market generated the most sales in average?
* **Question 3:** What were the profit by segment? Which segment has the most profit?

First, we'll explore the database. The Super Store dataset contains data on order details of customers for orders of a superstore in the US. It contains 13 columns:

* `Order ID`: all order id’s
* `Customer ID`: all customer information
* `Customer Name`: customer name
* `Segment`: segments of product
* `Country`: country of sales.
* `Market`: continent/Market of sales
* `Product ID`: list of products ids.
* `Category`: category of different product
* `Sub-Category`: sub-category of product
* `Product Name`: product name
* `Sales`: sales of different product
* `Quantity`: quantity of product sale
* `Profit`: profit made

It's important to explore the data to understand what each column contains. We can do it either under the **Browse Data** tab.

![image.png](attachment:image.png)

After we've successfully loaded the superstore dataset in Power BI, we need to clean the data before using it for modeling, exploratory data analysis and data transformation and subsequently creating visualizations.

By inspecting the dataset, we came up with some issues.

1. The current header is not meaningful. The first row contains data and the second row should be the header.

    * We'll have to remove the first rows and then promote the second row as the header.

2. There is a discrepancy between the columns content and their data types.

3. The format of the `Date` column values are not constistent. It has two different delimiters. Some of them use the `dd-mm-yyyy` format and others `dd/mm/yyyy`.

    * We'll have to replace all `/` by `-` to have all dates formatted the same way.

![image.png](attachment:image.png)

After cleaning the dataset and remove inconsistencies. Let's now add some columns needed for analysis.

We're interested in making our analysis on different date granularity such as year and month. Therefore, at this step we will add two columns to our table.

To do so we've learned a powerful tool: **DAX**. Let's use it to perform these operations.

![image.png](attachment:image.png)

Our dataset is now ready for visualization and analysis. Our first question was **What were the total sales for the company**?

To answer this question we'll compute the total sales and visualize it. Let's create a new measure for that.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

Now, we have the `Total sales` measure, we can do several things with it.

Above, we were interested only in the total sales. Now, we want to display the total sales by year and category using a **Table visual**.

![image.png](attachment:image.png)

We want now to answer our second question: **Which market generated the most sales on average?**

In this case, market refers to country. Our problem is then to find the top three countries in average sales.

We need to compute the average sales and then display them by country.

![image.png](attachment:image.png)

The last questions we want to answer are:

* **What was the profit by segment?**
* **Which segment has the most profit?**

To answer these questions let's find the net profit for all customer segments (Consumer, Corporate and Home Office) in 2016.

To do so we'll use the `CALCULATE` function we learned before. Here is an example using it to compute the total sales for 2013.

`#total sales 2013 = 
CALCULATE(
  SUM('superstore data'[Sales]), 
  'superstore data'[Sales] = 2013
 )`
 
 Let's use a similar syntax to compute the net profit in 2016.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

Congratulations! We have successfully loaded, cleaned and transformed a real-life dataset! 

If our organization is looking for effective data modeling to help with data analysis, Power BI provides a very powerful and simple platform. DAX, Power Query and functions are few of the examples covered here that can help to clean, transform and model our data prior to analysis.

We can load the report we provided with an example of solutions to the questions we asked.

These are a few next steps to consider:

* After the next course, choose the right visuals to show/analyze our data.

* Find other questions and try to answer them using similar techniques we used here.

* Find similar datasets and try to answer the same questions after cleaning and transforming them.

Curious to see what others have done on this project? [Head over to our Community to check them out](https://community.dataquest.io/tag/692). 

And of course, share your own project (the link to the online version) and show off your hard work. Head over to our community to [share finished Guided Project!](https://community.dataquest.io/tag/692)

![image.png](attachment:image.png)