# Visualizing Bee Health  🍯 🐝

Quick Recap: This dataset of about 600 items displays honey production in many U.S. states. So, for example, you can look at the number of colonies, yield per colony, total production, stocks, price per pound, and value of the honey produced in a given state from 1998-2012, with one row per year for each state. 
This year span covers the devastating 'CCD' or 'Colony Collapse Disorder' first seen in 2006 (http://npic.orst.edu/envir/ccd.html), so it is a poignant dataset to study. 🐝

![Schema.png](attachment:f353d722-e825-458b-81f5-635ce5f7e9d8.png)

1. Honey producing colonies are the maximum number of colonies from which honey was harvested during the year. It is possible to harvest honey from
colonies which did not survive the entire year.
2. Stocks held by producers.
3. Average price per pound based on expanded sales.
4. Value of production is equal to production multiplied by average price per pound.
5. Includes data for States not published in this table.
6. Due to rounding, total colonies multiplied by total yield may not exactly equal production.
7. United States value of production will not equal summation of States

> Source @United States Department of Agriculture (USDA)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

honey = pd.read_csv('/kaggle/input/honey-production-usa-1998-2012/honey.csv')
honey.head()

In [None]:
# Number of columns
sns.relplot(data=honey, x='numcol', y='state', height=8, aspect=.7)

There's a variation in the values of the number of colonies across states, with the most being in the 4 states of FL, CA, SD and ND.

1. Below, we'll look at the progression per year in these states

In [None]:
sns.relplot(data=honey, x='numcol', y='state', hue='year', height=8, aspect=.7, palette='crest')

The number of colonies has reduced over the years except in states like MT, WA, OR and the **highest performer ND**

2. Does this statement hold the same weight for the yield per colony?

In [None]:
sns.relplot(data=honey, x='yieldpercol', y='state', hue='year', height=10, aspect=.5, palette='crest')

It fluctuates for the states with a progressively increasing number of columns over the years however, the states of **ME, MS and TN**

## Understanding bee health from a production view

---

To get a better trend of bee health, we will revisit past examples checking price in the [README file](http://https://github.com/JimmyKurui/Data-Science-For-Beginners/blob/main/3-Data-Visualization/12-visualization-relationships/README.md). Given that price has progressively increased,

![line1.png](attachment:13dc4a6a-7452-4e13-9ecb-b9d3319472e7.png)

We can check on the yield and colony population relationship
![dual-line.png](attachment:ffa04566-48dd-4da8-bf5b-4ac2de66f390.png)

Despite a relatively balanced number of bee colonies, the yield has continued to decrease over the years that could be due to several factors such as sickness, nutrient quality, invasions and others. We will investigate possible production factors explaining this

In [None]:
sns.set_style('dark')

fig, ax = plt.subplots()
yieldplot = sns.lineplot(data=honey, x='year', y='totalprod', color='orange', label='Total Production', legend=False)
sns.despine()
plt.ylabel('Total Production (1000 pounds)')
plt.xlabel('Year')

ax2 = ax.twinx()
stocksplot = sns.lineplot(data=honey, x='year', y='stocks', ax=ax2, label='Stocks', legend=False)
sns.despine()
plt.ylabel('Stocks (1000 pounds)')

ax.figure.legend()
plt.show()

Stocks is most likely not the cause of declining bee health however, it's management explains the low price in 2003 as a result of selling off stock from a foreseen decline in honey yield per colony