# Exploring Mushrooms

When asked to think of a mushroom, you might imagine the following:

<img src="https://static-assets.codecademy.com/Paths/data-analyst-career-path/mushroom-analysis-project/generic_mushroom.jpg" alt="this shows a photo of some generic button mushrooms in a forest." style="background-color:white;" width=500>

It is a beige, convex mushroom top with a uniform, thick stem. Maybe you thought of it cut up on a slice of delicious pizza or braised with sauce over rice. Regardless, you most likely did not consider this:

<img src="https://static-assets.codecademy.com/Paths/data-analyst-career-path/mushroom-analysis-project/pretty_mushroom.jpg" alt="A photo of some beautiful mushrooms in a forest. They do not have the signature 'mushroom top' and have visually appealing grooves that make them look almost floral." style="background-color:white;" width=500>

Mushrooms exist in a variety of different colors, shapes, sizes, textures, etc. In this project, you will analyze an extensive mushroom dataset from <a href="https://archive.ics.uci.edu/ml/datasets/Mushroom">UCI</a> using bar charts and acquaint yourself with the diverse array of mushrooms that exist worldwide.

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

df = pd.read_csv("mushroom_data.csv")
columns = df.columns.tolist()

df.head()

Unnamed: 0,Class,Cap Shape,Cap Surface,Cap Color,Bruises,Odor,Gill Attachment,Gill Spacing,Gill Size,Gill Color,...,Stalk Surface Below Ring,Stalk Color Above Ring,Stalk Color Below Ring,Veil Type,Veil Color,Ring Number,Ring Type,Spore Print Color,Population,Habitat
0,Poisonous,Convex,Smooth,Brown,True,Pungent,Free,Close,Narrow,Black,...,Smooth,White,White,Partial,White,One,Pendant,Black,Scattered,Urban
1,Edible,Convex,Smooth,Yellow,True,Almond,Free,Close,Broad,Black,...,Smooth,White,White,Partial,White,One,Pendant,Brown,Numerous,Grasses
2,Edible,Bell,Smooth,White,True,Anise,Free,Close,Broad,Brown,...,Smooth,White,White,Partial,White,One,Pendant,Brown,Numerous,Meadows
3,Poisonous,Convex,Scaly,White,True,Pungent,Free,Close,Narrow,Brown,...,Smooth,White,White,Partial,White,One,Pendant,Black,Scattered,Urban
4,Edible,Convex,Smooth,Gray,False,,Free,Crowded,Broad,Black,...,Smooth,White,White,Partial,White,One,Evanescent,Brown,Abundant,Grasses


1. Take a look at the cell above where we have loaded `mushroom_data.csv`. It contains 23 columns of data describing thousands of mushrooms. Data of about five different mushrooms is shown.

    Read through this table to get a sense of the type(s) of variables in the data and the structure of the table. It may also be helpful to read through the information on <a href="https://www.kaggle.com/uciml/mushroom-classification">Kaggle</a>.

    Before you move on to plotting any of this data, answer the following questions:

    * What type(s) of variables does `mushroom_data.csv` contain?
    * How many of the variables can we visualize effectively with a bar graph?

*If you read through the header of `mushroom_data.csv`, you will see that every single variable is a categorical variable. Therefore, you can create bar charts for the counts of every variable in the csv file.*

## Plotting Bar Graphs

2. There are 23 variables in this dataset (one for each column). Graphing each one individually would be tedious; luckily, you will use loops.

    If you look at the first cell, you will see an attribute called `columns`. This attribute returns the name of each variable in `mushroom_data.csv`.

    * Create a loop that traverses each `column` in the `columns` list.
    * Print each `column` in `columns` while iterating through the loop. This is to check that your `for` loop is working correctly.

3. In the terminal, you should see 23 column names pop up starting with `class` and ending with `habitat`.

    Great! Your `for` loop is working, so feel free to comment out your print statement.

    You can now plot your data using the `.countplot()` method from the seaborn library. Follow these steps:

    * Call `.countplot()` in the for loop
    * Use `column` and the `df` pandas DataFrame to graph the value counts of each variable in `mushroom_data.csv`.

    Please wait until the next task to use `plt.show()`. Since you are creating 23 plots in the browser, you will need an additional line of code.

4. At the end of your `for` loop, add the following lines of code to show your plots:

        plt.show()
        plt.clf()

    The `.show()` Matplotlib method should look familiar from previous lesson, but <a href="https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.clf.html">`.clf()`</a> might be unfamiliar. This method is also from the Matplotlib library. It clears any previous figure formatting. This will keep any graphs you are plotting from bunching up on each other. Instead, your plots will be neatly stacked on top of each other will spacing ideal for viewing.

5. After using `plt.show()` and  plt.clf()`, 23 plots should appear.

    Fullscreen icon that sits in the top right of the learning environnment

    Scroll through each of the graphs, and see what sort of trends you immediately see.

        * What variables have an obvious mode?
        * Do any of them have a notably diverse array of values?
        * What habitat are you most likely to find mushrooms in?

    What questions did you have before seeing the graphs? What questions are popping up now that you see them?

    When you want to go back to your original view of the code editor and tasks, click the icon again.

    Feel free to do this throughout the project if you want to take a closer look at your graphs and want to continue your analysis! In the next few steps, you will clean up the graphs and make them more readable and useful for finding patterns.

## Cleaning the Bar Graphs

6. As you scroll through the graphs, you may notice some imperfections. For example, some of the x-axis labels overlap each other. The font size for the labels along the x-axis is also pretty small, making them tough to read.

    Let us fix these up with two lines of code.

    Following your `.countplot()` method, add the following two lines of code in your `for` loop:

        # rotates the value labels slightly so they don’t overlap, also slightly increases font size
        plt.xticks(rotation=30, fontsize=10)
        # increases the variable label font size slightly to increase readability
        plt.xlabel(column, fontsize=12)

7. One more thing you can do to increase readability is to add an informative title. Using `.title()` from the Matplotlib library, give your graph the following title:

        {Variable Name} Value Counts

    Use column to capture each variable name. Be sure to call this method after `.countplot()` inside of your for loop.

## Ordering the Bars for Analysis

8. The graphs are readable, but you can take it another step further.

    * Add the `order` parameter to your `.countplot()` method.
    * Set this parameter so that the value counts in each column are in descending order.

    You will need to use the `.value_counts()` pandas method and the `.index` pandas object.

9. Great job! In relatively few lines of code, you have created 23 informative plots. Now that they are titled, labeled clearly, and ordered, you can really dive in on your analysis.

    Think about how someone could best use these visualizations. It is easy to tell which features of mushrooms are common and rare, and we get insights into the variety of mushrooms in the fungi kingdom.

    Spend some time looking over the graphs. Write down exciting insights you find. Here are some examples to get you started:

     * It is a roughly equal split between mushrooms that are edible vs. poisonous.
     * The majority of mushrooms in this dataset have a scaly surface.
     * There are a non-insignificant amount of mushrooms that give off an almond scent?
     * Most top surfaces of mushrooms in this dataset are scaly rather than smooth.

    Some of your analysis may also require research into mushroom features for any of the x-labels. We hope you enjoy continuing to explore the world of these fun guys!

## Extensions

10. Feel free to play around with the graphs and customize them any way you want to help in your analysis! Here are some ideas to get yourself started:

    * Turn any bar graph with less than six bars into a pie chart (hint: use a conditional statement!).
    * Create your bar charts using a list comprehension instead of a `for` loop.
    * Change the color theme of your graphs using the seaborn <a href="https://seaborn.pydata.org/generated/seaborn.countplot.html">color</a> or <a href="https://seaborn.pydata.org/tutorial/color_palettes.html">palette</a> parameters.
    * Remove any graphs you find uninformative.
    * Change around the current title or label formatting.