#  Welcome to the Databricks Notebook 

 The Notebook is the primary code authoring tool in Databricks.  With it, you can do anything from simple exploratory data analysis to training ML models or building multi-stage data pipelines.

Let's dive in and explore bakehouse data to analyze sales!

### Step 1: Lets get started 

To run code in a Notebook cell, simply type your code and either click the Run cell button (top-left of the cell) or use `Cmd + Enter.`
</br>
</br>
<img src=https://docs.databricks.com/en/_images/cell-run-new.png>


Try running the provided statement: `print("Let's execute some Python code!")`.

In [0]:
print("Let's execute some Python code!")

#### Try the command palette

Use the `[Cmd + Shift + P]` keyboard shortcut to open the command palette for key notebook actions like inserting new cells, showing results side by side and more.

Use it to insert a cell. 

<img src="https://docs.databricks.com/en/_images/command-palette.gif" width=500>


#### The Notebook is a multi-language authoring experience

You're not limited to just Python in Databricks! You can write and execute code in Python, SQL and you can annotate your notebooks using Markdown.

##### Let's go ahead and change the language using the top-right drop-down.

- Insert a new cell below the current cell using the command palette. 
- Use the language switcher to change its language to SQL. Notice that `%sql` shows up at the top of the cell. This is called a [magic command](https://docs.databricks.com/en/notebooks/notebooks-code.html#mix-languages). 
- Type in `select "hello world";` and press run.

### Step 2: Access sample data

In this exercise, we’ll use the Bakehouse dataset stored in samples.bakehouse, simulating bakery franchise data. Start by viewing the `samples.bakehouse.sales_transactions` table.

On the left side of the Notebook, access the Catalog view to browse catalogs, schemas, and tables.

To find the `sales_transactions` table, open the schema browser by clicking ![](https://docs.databricks.com/en/_images/notebook-data-icon.png) on the left, navigate to **Samples**, select the **bakehouse** schema and find the **sales_transactions** table, then click the kebab menu  <img src="https://docs.databricks.com/en/_images/kebab-menu.png"> next to the table and choose **Preview in new cell**.


### Step 3: Explore and analyze data
Our bakehouse operates franchises across multiple countries, offering a variety of products.

We’ll begin by identifying the most popular product by querying the sample data using Python.

In [0]:
import pandas as pd

# Read the sample bakehouse transactions and franchises datasets and load them into a PySpark DataFrame. 
df_transactions = spark.read.table('samples.bakehouse.sales_transactions')
df_franchises = spark.read.table('samples.bakehouse.sales_franchises')

# Convert PySpark DataFrame to Pandas DataFrame
pdf_transactions = df_transactions.toPandas()
pdf_franchises = df_franchises.toPandas()

# Which product sold the most units?
top_product = (pdf_transactions.groupby('product')['quantity']
          .sum()
          .reset_index()
          .sort_values(by='quantity', ascending=False)
         )
        
display(top_product)

**Golden Gate Ginger** is our best-selling cookie! 

To identify the top-performing city for its sales, we’ll join the `transactions` table with the `franchises` table. This will allow us to analyze which city sells the most units of Golden Gate Ginger.

In [0]:
## Top city selling most units of Golden Gate Ginger
top_city = (pdf_franchises.merge(pdf_transactions[pdf_transactions['product'] == 'Golden Gate Ginger'], 
                            on='franchiseID', 
                            how='right')
            .groupby('city')['quantity']
            .sum()
            .reset_index()
            .sort_values(by='quantity', ascending=False)
            .rename(columns={'quantity': 'units'})
         )

display(top_city)

### Step 4: Search and filter the Results table

**Sort Results:**
Hover over a column name in the results table above, then click the arrow icon that appears to sort by that column’s values.
</br>
<img src="https://docs.databricks.com/en/_images/result-table-sort.png" width=350>
</br>

- Try sorting the results table in ascending order above to find the city selling the least number of Golden Gate Ginger cookies. 
</br>
</br>

**Filter results:**
To create a filter, click <img src="https://docs.databricks.com/en/_images/filter-icon.png"> at the upper-right of the cell results. In the dialog that appears, select the column to filter on and the filter rule and value to apply. 
- Try filtering for all cities selling **more than 100 units of Golden Gate Ginger** by typing in `units > 100`.




### Step 5: Visualize the data

Let’s visualize **_weekly sales of Golden Gate Ginger across all locations_**.

- Run the cell below to display the Golden Gate Ginger sales data. 
- To create a visualization, click the **+** button at the top of the results, then follow the steps in the visualization builder. 
- Choose your preferred chart type and set the chart values to complete your visualization.



<img src="https://docs.databricks.com/en/_images/new-visualization-menu.png" width=600>

- View a sample visualization by clicking Golden Gate Ginger sales table in the Results section below.

In [0]:
%sql
--  How many units of Golden Gate Ginger are being sold across all locations every week?
SELECT
f.name as franchise_name, 
date_trunc('week',datetime) as week, 
sum(quantity) as quantity
FROM samples.bakehouse.sales_transactions t join samples.bakehouse.sales_franchises f on t.franchiseID = f.franchiseID
WHERE product = 'Golden Gate Ginger' 
GROUP BY 1,2

-- Click Golden Gate Ginger Sales tab in the results section


Databricks visualization. Run in Databricks to view.

### Step 6: Use the Databricks Assistant for code suggestions
Notebooks come equipped with the context-aware [Databricks Assistant](https://docs.databricks.com/en/notebooks/use-databricks-assistant.html), which can help generate, explain, and fix code using natural language.
To use the assistant, create a new cell and click `CMD+I` or click <img src="https://docs.databricks.com/en/_images/help-assistant-icon.png"> on the top right corner of the new cell.

Enter a prompt for the Assistant to provide code suggestions in the cell below. Here is a sample prompt:
- _Python code to show total units of Golden Gate Ginger sold._

Click **Generate** or press **Enter** on the prompt and watch the Assistant suggest code to answer the prompt. Click “Accept” to save the code suggestion and run the cell to view the results!


#### AI Assistant can also help fix errors
Run the query below. When an error occurs, **[Quick Fix](https://docs.databricks.com/en/notebooks/use-databricks-assistant.html)** will automatically suggest solutions for basic issues that can be resolved with a single-line change.
Click **Accept and run** to apply the fix and continue running your code.

</br>
<img src="https://docs.databricks.com/en/_images/assistant-quick-fix.png" width=500>

In [0]:
%sql
SELECT date_trunc('WEEK', dateTime), sum(quantity) as totals
from samples.bakehouse.sales_transactions
where product = 'Golden Gate Ginger'



#### Continue exploring Notebooks!

- To learn about adding data from CSV files to Unity Catalog and visualize data, see [Get started: Import and visualize CSV data from a notebook](https://docs.databricks.com/en/getting-started/import-visualize-data.html).
- To learn how to load data into Databricks using Apache Spark, see [Tutorial: Load and transform data using Apache Spark DataFrames](https://docs.databricks.com/en/getting-started/dataframes.html).
- To learn more about visualizations, see [Visualizations in Databricks notebooks](Visualizations in Databricks notebooks).
