<div style="background-color: #78E8A3; padding: 20px">
<h3>Project Scenario</h3>
<p>You're preparing to trek a forest soon.</p>
<p>You enjoy mushrooms a lot, and anticipate them appearing during your trek. But you know that some are more edible than others, and some are just downright poisonous.</p> 
<p>Luckily for you, you remember that there's a field guide containing mushroom information, but you prefer to use train a model to tell you whether the mushroom is poisonous.</p>
<p>In this project, you will use the field guide data to do exactly that, and hope that the model is 100% accurate.</p> 
</div>

### Step 1: Import pandas as pd
First up, import pandas as pd to work with the tabular data.

In [21]:
# Step 1: Import pandas
import pandas as pd

### Step 2: Download mushroom data
We'll be getting our data from the <a href='https://archive.ics.uci.edu/ml/datasets/Mushroom'>UCI Machine Learning Repository</a>. 

More specifically, the data is submitted by the Audobon Society Field Guide, a nature reference. 

Download the data <a href = 'https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/'>here</a>. You will need only two files:
1. agarious-lepiota.data
2. agarious-lepiota.names

If the server is down, click <a href = 'https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectMushroom/MushroomFiles.zip'>here</a> to download the zipped files.

### Step 3: Read agaricus-lepiota.names with a text editor
Firstly, read the agaricus-lepiota.names with a text editor. A few things you can use with:
1. Notepad
2. Notepad++
3. Microsoft Word

You'll see something like this:
![ReadingAgaricuslepiotaNames.png](attachment:ReadingAgaricuslepiotaNames.png)

It's importance to read this document since it clarifies what you will see in agaricus-lepiota.data, i.e. the columns and values in each column.

![AttributeInformation.png](attachment:AttributeInformation.png)

### Step 4: Read agaricus-lepiota.data into a DataFrame
Treat agaricus-lepiota.data like a CSV, and read it into a DataFrame.

The data does not have any column information, so take note to put None in your <strong>header</strong> parameter when you read the csv.

Sanity check:
1. 8,124 rows
2. 23 columns

In [22]:
# Step 4: Read agaricus-lepiota.data 
df = pd.read_csv("agaricus-lepiota.data", header = None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22
0,p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u
1,e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g
2,e,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m
3,p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u
4,e,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g


### Step 5: Rename your columns
As mentioned, the data does not have any column names, so we will give it one.

![RenameColumns.png](attachment:RenameColumns.png)

With reference to agaricus-lepiota.names, we will rename the columns in the following name and order:
1. class
2. cap_shape
3. cap_surface
4. cap_color
5. isBruised
6. odor
7. gill_attachment
8. gill_spacing
9. gill_size
10. gill_color
11. stalk_shape
12. stalk_root
13. stalk_surface_above_ring
14. stalk_surface_below_ring
15. stalk_color_above_ring
16. stalk_color_below_ring
17. veil_type
18. veil_color
19. ring_number
20. ring_type
21. spore_print_color
22. population
23. habitat

Prepare a list containing these strings in order, and set it to your DataFrame's .columns attribute.

In [23]:
# Step 5: Give your DataFrame column names
new_columns = ['class', 'cap_shape', 'cap_surface', 'cap_color', 'isBruised', 'odor', 'gill_attachment', 'gill_spacing', 'gill_size', 'gill_color', 'stalk_shape', 'stalk_root', 'stalk_surface_above_ring', 'stalk_surface_below_ring', 'stalk_color_above_ring', 'stalk_color_below_ring', 'veil_type', 'veil_color', 'ring_number', 'ring_type', 'spore_print_color', 'population', 'habitat']
df = df.rename(columns = dict((i,new_columns[i]) for i in range(len(new_columns))))
df.head()

Unnamed: 0,class,cap_shape,cap_surface,cap_color,isBruised,odor,gill_attachment,gill_spacing,gill_size,gill_color,stalk_shape,stalk_root,stalk_surface_above_ring,stalk_surface_below_ring,stalk_color_above_ring,stalk_color_below_ring,veil_type,veil_color,ring_number,ring_type,spore_print_color,population,habitat
0,p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u
1,e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g
2,e,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m
3,p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u
4,e,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g


### Step 6: Export your DataFrame into a CSV
There we have it - we have given our dataset proper column names. 

Let's export this into a CSV for subsequent analysis in Parts II and III. 

In [24]:
# Step 6: Export your DataFrame into a CSV
df.to_csv("part1_data.csv", index=False)