#  Welcome to the Databricks Notebook 

 The Notebook is the primary code authoring tool in Databricks.  With it, you can do anything from simple exploratory data analysis to training ML models or building multi-stage data pipelines.

 
 
 Let’s dive in and execute some code!

----------------------------------------------------------------------------------------------------------------


Running code in the Notebook is simple—just write some code in a cell and either click the “Run” button or use one of the following keyboard shortcuts:
-  Cmd + Enter:  Run the current cell
- Shift + Enter:  Run the current cell and move to cursor to the next cell

Try running the `print("Let's execute some Python code!")` statement that has been provided.

In [0]:
print("Let's execute some Python code!")

### Try the command palette

Use the `[Cmd + Shift + P]` keyboard shortcut to open the command palette for key notebook actions like adding new cells, showing results side by side and more.


### The Notebook is a multi-language authoring experience

You're not limited to just Python in Databricks!  You can write and execute code in Python, SQL, Scala, or R, and you can annotate your notebooks using Markdown.

Change the language of a cell using the following magic commands or the language switcher in the top-right of the cell:

* `%python` for Python
* `%sql` for SQL
* `%scala` for Scala
* `%r` for R
* `%md` for Markdown


### Access sample data

The Notebook includes an embedded Catalog explorer on the left-hand side.  Here, you can explore the data available to you inside of the Unity Catalog’s catalogs, schema, and tables. You can reference data in Unity Catalog as `catalog.schema.table`. 

For this exercise, the Bakehouse dataset stored in `samples.bakehouse`, which simulates a bakery franchise business, will be used.

#### View the data using SQL
Let's start with viewing the `samples.bakehouse.sales_transactions`

In [0]:
 %sql
SELECT * from samples.bakehouse.sales_transactions LIMIT 50;


#### Back to Python! 
Let's determine the most popular product and identify the top city contributing to its sales by querying sample data stored in Unity Catalog using Python.

In [0]:
import pandas as pd

# Read the sample bakehouse transactions and franchises datasets and load them into a PySpark DataFrame. Convert PySpark DataFrame to Pandas DataFrame
df_transactions = spark.read.table('samples.bakehouse.sales_transactions')
df_franchises = spark.read.table('samples.bakehouse.sales_franchises')
pdf_transactions = df_transactions.toPandas()
pdf_franchises = df_franchises.toPandas()

# Which product sold the most units?
top_product = (pdf_transactions.groupby('product')['quantity']
          .sum()
          .reset_index()
          .sort_values(by='quantity', ascending=False)
         )
        
display(top_product)

In [0]:
## Top city selling most units of Outback Oatmeal
top_city = (pdf_franchises.merge(pdf_transactions[pdf_transactions['product'] == 'Outback Oatmeal'], 
                            on='franchiseID', 
                            how='right')
            .groupby('city')['quantity']
            .sum()
            .reset_index()
            .sort_values(by='quantity', ascending=False)
            .rename(columns={'quantity': 'units'})
         )

display(top_city)

### Visualize the data

You can visualize the data in your table from the results of the query by clicking the + button at the top of the results experience and completing the visualization builder dialog.
Select your preferred visualization type, and fill out the chart values to prepare the chart.

Let's visualize Outback Oatmeal sales across all franchises every week.

In [0]:
%sql

--  How many units of Outback Oatmeal are being sold across all locations every week?


SELECT
f.name as franchise_name, 
date_trunc('week',datetime) as week, 
sum(quantity) as quantity
FROM samples.bakehouse.sales_transactions t join samples.bakehouse.sales_franchises f on t.franchiseID = f.franchiseID
WHERE product = 'Outback Oatmeal' 
GROUP BY 1,2


Databricks visualization. Run in Databricks to view.

### Use the Databricks Assistant for code suggestions
Notebooks come equipped with the context-aware Databricks Assistant, which can help generate, explain, and fix code using natural language.
To use the assistant, create a new cell and `click CMD+I` or click the Assistant icon on the top right corner of the new cell.

Enter a prompt for the Assistant to provide code suggestions. Here are some sample prompts:
- _SQL query to show total units of Outback Oatmeal sold per day_

Press Return or the submit button to submit the prompt and watch the Assistant suggest code to answer the prompt. Click “Accept” to save the code suggestion and run the cell to view the results!


#### Databricks Assistant can also help debug code 
Run the query below and click "Diagnose Error" at the bottom of the sell to get Assistant's help in fixing the error.

Once the Assistant suggests a fix, click the  `<<` icon at the top of the suggestion to replace the cell content and rerun the cell to view the results!

In [0]:
%sql

SELECT date_trunc('Q', dateTime), sum(quantity)
from samples.bakehouse.sales_transactions
where product = 'Outback Oatmeal'
-- GROUP BY 1

## Share the notebook with colleagues

- Collaboration with teammates in Databricks is easy. Copy and share a link to begin working together!

- To manage access to the notebook, click at the top of the notebook to open the permissions dialog.  Share your notebook with any colleague by adding the “All Users” group to the notebook’s access list with “Can View” or “Can Run” permission and sending your colleague the notebook’s URL, which can be copied to your clipboard using the “Copy link” button.

- If you are using the SQL Editor, click “Save” to save your work. Click “Share” to manage access for the query using the permissions dialog, using the same process as above.




#### Next steps

- To learn about adding data from CSV files to Unity Catalog and visualize data, see Get started: Import and visualize CSV data from a notebook.
- To learn how to load data into Databricks using Apache Spark, see Tutorial: Load and transform data using Apache Spark DataFrames.
- To learn more about ingesting data into Databricks, see Ingest data into a Databricks lakehouse.
- To learn more about querying data with Databricks, see Query data.
- To learn more about visualizations, see Visualizations in Databricks notebooks.
