
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>



# Data Visualization Lab

Let's import **`pandas`** and **`seaborn`**, then load the avocado dataset to perform data visualizations/analyses.

In [0]:
%run "../Includes/Classroom-Setup"

In [0]:
import pandas as pd
import seaborn as sns

# Set seaborn plot size to be easier to read
sns.set(rc = {"figure.figsize": (15,8)})

In [0]:
file_path = f"{DA.paths.datasets}/avocado/avocado.csv".replace("dbfs:", "/dbfs")
# Dropping incorrect index column.
df = pd.read_csv(file_path).drop("Unnamed: 0", axis=1) 
df



## Problem 1: Databricks Plotting

Using the built-in plotting feature in Databricks to plot the average **`Total Volume`** per **`year`**.

Remember to use **`display(df)`** to access built-in plotting.

In [0]:
# ANSWER
display(df)



<button onclick="myFunction2()" >Click for Hint</button>

<div id="myDIV2" style="display: none;">
  Select the bar chart, Plot Options, set the aggregate function to average, and put year as key and Total Volume as value. 
</div>
<script>
function myFunction2() {
  var x = document.getElementById("myDIV2");
  if (x.style.display === "none") {
    x.style.display = "block";
  } else {
    x.style.display = "none";
  }
}
</script>



## Problem 2: `pandas` plotting

Create a histogram of the **`AveragePrice`** of avocados using the pandas **`.hist()`** method.

In [0]:
# ANSWER
df["AveragePrice"].hist()



<button onclick="myFunction2()" >Click for Hint</button>

<div id="myDIV2" style="display: none;">
  Select the column to make a Series. Then call .hist() on the Series for the column.
</div>
<script>
function myFunction2() {
  var x = document.getElementById("myDIV2");
  if (x.style.display === "none") {
    x.style.display = "block";
  } else {
    x.style.display = "none";
  }
}
</script>



## Datetime

Unfortunately, our **`Date`** column is represented as an object type, when we want it to be a **`datetime`** type so we can do operations based on time (e.g. plot in chronological order instead of lexicographical order).  

Luckily, `pandas` provides a function called [to_datetime()](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html?highlight=to_datetime#pandas.to_datetime) that takes in a Series and converts the type to **`datetime`**.

In [0]:
# Notice the dtype of Date
df.dtypes

In [0]:
df["Date"] = pd.to_datetime(df["Date"])
df.dtypes



## Problem 3: `seaborn` plotting

Using **`seaborn`**, which is aliased as **`sns`** from above, create a scatter plot for the Total Volume of organic avocado sales over time for all of the US (e.g. filter on **`region`** for the **`TotalUS`** region & on **`type`** for just **`organic`**). Select **`Date`** as the x-axis.

In [0]:
# ANSWER
plot_df = df[(df["region"] == "TotalUS") & (df["type"] == "organic")]
sns.scatterplot(data=plot_df, x="Date", y="Total Volume")



<button onclick="myFunction2()" >Click for Hint</button>

<div id="myDIV2" style="display: none;">
  Recall the seaborn plot function looks like this: sns.scatterplot(data=, x=, y=). Create a filtered DataFrame to pass to this with the region and type we want.
</div>
<script>
function myFunction2() {
  var x = document.getElementById("myDIV2");
  if (x.style.display === "none") {
    x.style.display = "block";
  } else {
    x.style.display = "none";
  }
}
</script>



## Problem 4: What about conventional avocados?

Create the same scatter plot except with conventional avocados instead of organic ones. What differences do you notice between the two? Notice the axis scales.

In [0]:
# ANSWER
d = df[(df["region"] == "TotalUS") & (df["type"] == "conventional")]
sns.scatterplot(data=d, x="Date", y="Total Volume")



<button onclick="myFunction2()" >Click for Hint</button>

<div id="myDIV2" style="display: none;">
  Use the same code as before but make sure to filter with df["type"] == "conventional" this time
</div>
<script>
function myFunction2() {
  var x = document.getElementById("myDIV2");
  if (x.style.display === "none") {
    x.style.display = "block";
  } else {
    x.style.display = "none";
  }
}
</script>

&copy; 2023 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>