# Project Mushroom Edibility


## There are 4 parts to this project:
1. Download mushroom field guide data and clean the data up (Part I)
2. Perform exploratory data analysis (Part II)
3. Apply machine learning techniques to train a model to predict mushroom edibility (Part III)
4. Visualize a decision tree 

In [1]:
# Import pandas
import pandas as pd

### Download mushroom data
We'll be getting our data from the <a href='https://archive.ics.uci.edu/ml/datasets/Mushroom'>UCI Machine Learning Repository</a>. 

More specifically, the data is submitted by the Audobon Society Field Guide, a nature reference. 

Download the data <a href = 'https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/'>here</a>. You will need only two files:
1. agarious-lepiota.data
2. agarious-lepiota.names

In [3]:
# Read agaricus-lepiota.data 
df = pd.read_csv('agaricus-lepiota.data', header = None)
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,13,14,15,16,17,18,19,20,21,22
0,p,x,s,n,t,p,f,c,n,k,...,s,w,w,p,w,o,p,k,s,u
1,e,x,s,y,t,a,f,c,b,k,...,s,w,w,p,w,o,p,n,n,g
2,e,b,s,w,t,l,f,c,b,n,...,s,w,w,p,w,o,p,n,n,m
3,p,x,y,w,t,p,f,c,n,n,...,s,w,w,p,w,o,p,k,s,u
4,e,x,s,g,f,n,f,w,b,k,...,s,w,w,p,w,o,e,n,a,g
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8119,e,k,s,n,f,n,a,c,b,y,...,s,o,o,p,o,o,p,b,c,l
8120,e,x,s,n,f,n,a,c,b,y,...,s,o,o,p,n,o,p,b,v,l
8121,e,f,s,n,f,n,a,c,b,n,...,s,o,o,p,o,o,p,b,c,l
8122,p,k,y,n,f,y,f,c,n,b,...,k,w,w,p,w,o,e,w,v,l


### Rename your columns
As mentioned, the data does not have any column names, so we will give it one.

With reference to agaricus-lepiota.names, we will rename the columns in the following name and order:
1. class
2. cap_shape
3. cap_surface
4. cap_color
5. isBruised
6. odor
7. gill_attachment
8. gill_spacing
9. gill_size
10. gill_color
11. stalk_shape
12. stalk_root
13. stalk_surface_above_ring
14. stalk_surface_below_ring
15. stalk_color_above_ring
16. stalk_color_below_ring
17. veil_type
18. veil_color
19. ring_number
20. ring_type
21. spore_print_color
22. population
23. habitat

In [4]:
# Give your DataFrame column names
new_names = ['class', 'cap_shape', 'cap_surface', 'cap_color', 'isBruised', 'odor', 'gill_attachment', 'gill_spacing', 'gill_size', 'gill_color', 'stalk_shape', 'stalk_root', 'stalk_surface_above_ring', 'stalk_surface_below_ring', 'stalk_color_above_ring', 'stalk_color_below_ring', 'veil_type', 'veil_color', 'ring_number', 'ring_type', 'spore_print_color', 'population', 'habitat']
df.columns = new_names
df

Unnamed: 0,class,cap_shape,cap_surface,cap_color,isBruised,odor,gill_attachment,gill_spacing,gill_size,gill_color,...,stalk_surface_below_ring,stalk_color_above_ring,stalk_color_below_ring,veil_type,veil_color,ring_number,ring_type,spore_print_color,population,habitat
0,p,x,s,n,t,p,f,c,n,k,...,s,w,w,p,w,o,p,k,s,u
1,e,x,s,y,t,a,f,c,b,k,...,s,w,w,p,w,o,p,n,n,g
2,e,b,s,w,t,l,f,c,b,n,...,s,w,w,p,w,o,p,n,n,m
3,p,x,y,w,t,p,f,c,n,n,...,s,w,w,p,w,o,p,k,s,u
4,e,x,s,g,f,n,f,w,b,k,...,s,w,w,p,w,o,e,n,a,g
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8119,e,k,s,n,f,n,a,c,b,y,...,s,o,o,p,o,o,p,b,c,l
8120,e,x,s,n,f,n,a,c,b,y,...,s,o,o,p,n,o,p,b,v,l
8121,e,f,s,n,f,n,a,c,b,n,...,s,o,o,p,o,o,p,b,c,l
8122,p,k,y,n,f,y,f,c,n,b,...,k,w,w,p,w,o,e,w,v,l


In [5]:
# Export your DataFrame into a CSV
df.to_csv('mushroom_data_clean.csv', index = None)