# Data understanding

#### Relevant Information:
This data set includes descriptions of hypothetical samples
corresponding to 23 species of gilled mushrooms in the Agaricus and
Lepiota Family (pp. 500-525).  Each species is identified as
definitely edible, definitely poisonous, or of unknown edibility and
not recommended.  This latter class was combined with the poisonous
one.  The Guide clearly states that there is no simple rule for
determining the edibility of a mushroom; no rule like leaflets
three, let it be for Poisonous Oak and Ivy.

#### Number of Instances:
- 8124

#### Number of Attributes: 
- 22 (all nominally valued)


#### Has Missing Values?
- Yes 

#### Additional information
Additional Information

     1. cap-shape:                bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s
     2. cap-surface:              fibrous=f,grooves=g,scaly=y,smooth=s
     3. cap-color:                brown=n,buff=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y
     4. bruises?:                 bruises=t,no=f
     5. odor:                     almond=a,anise=l,creosote=c,fishy=y,foul=f, musty=m,none=n,pungent=p,spicy=s
     6. gill-attachment:          attached=a,descending=d,free=f,notched=n
     7. gill-spacing:             close=c,crowded=w,distant=d
     8. gill-size:                broad=b,narrow=n
     9. gill-color:               black=k,brown=n,buff=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e, white=w,yellow=y
    10. stalk-shape:              enlarging=e,tapering=t
    11. stalk-root:               bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,rooted=r,missing=?
    12. stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s
    13. stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s
    14. stalk-color-above-ring:   brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y
    15. stalk-color-below-ring:   brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y
    16. veil-type:                partial=p,universal=u
    17. veil-color:               brown=n,orange=o,white=w,yellow=y
    18. ring-number:              none=n,one=o,two=t
    19. ring-type:                cobwebby=c,evanescent=e,flaring=f,large=l, none=n,pendant=p,sheathing=s,zone=z
    20. spore-print-color:        black=k,brown=n,buff=b,chocolate=h,green=r, orange=o,purple=u,white=w,yellow=y
    21. population:               abundant=a,clustered=c,numerous=n, scattered=s,several=v,solitary=y
    22. habitat:                  grasses=g,leaves=l,meadows=m,paths=p, urban=u,waste=w,woods=d

In [3]:
import pandas as pd

In [6]:
# Import data from csv
df = pd.read_csv('data/agaricus-lepiota.data')

# Show top 10 samples
print(df[:10].to_string()) 

   p  x  s  n  t p.1  f  c n.1  k  e e.1 s.1 s.2  w w.1 p.2 w.2  o p.3 k.1 s.3  u
0  e  x  s  y  t   a  f  c   b  k  e   c   s   s  w   w   p   w  o   p   n   n  g
1  e  b  s  w  t   l  f  c   b  n  e   c   s   s  w   w   p   w  o   p   n   n  m
2  p  x  y  w  t   p  f  c   n  n  e   e   s   s  w   w   p   w  o   p   k   s  u
3  e  x  s  g  f   n  f  w   b  k  t   e   s   s  w   w   p   w  o   e   n   a  g
4  e  x  y  y  t   a  f  c   b  n  e   c   s   s  w   w   p   w  o   p   k   n  g
5  e  b  s  w  t   a  f  c   b  g  e   c   s   s  w   w   p   w  o   p   k   n  m
6  e  b  y  w  t   l  f  c   b  n  e   c   s   s  w   w   p   w  o   p   n   s  m
7  p  x  y  w  t   p  f  c   n  p  e   e   s   s  w   w   p   w  o   p   k   v  g
8  e  b  s  y  t   a  f  c   b  g  e   c   s   s  w   w   p   w  o   p   k   s  m
9  e  x  y  y  t   l  f  c   b  g  e   c   s   s  w   w   p   w  o   p   n   n  g


In [12]:
# Set labels
labels = ["poisonous_edible", 
          "cap-shape",
          "cap-surface",
          "cap-color",
          "bruises?",
          "odor",
          "gill-attachment",
          "gill-spacing",
          "gill-size",
          "gill-color",
          "stalk-shape",
          "stalk-root",
          "stalk-surface-above-ring",
          "stalk-surface-below-ring",
          "stalk-color-above-ring",
          "stalk-color-below-ring",
          "veil-type",
          "veil-color",
          "ring-number",
          "ring-type",
          "spore-print-color",
          "population",
          "habitat"]

In [20]:
# Check lengths in labels and table 
len(labels)
len(df.columns)

23

In [21]:
# Import with full length headers
df = pd.read_csv('data/agaricus-lepiota.data', names=labels)

In [22]:
# Show top 10 samples
print(df[:10].to_string()) 

  poisonous_edible cap-shape cap-surface cap-color bruises? odor gill-attachment gill-spacing gill-size gill-color stalk-shape stalk-root stalk-surface-above-ring stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
0                p         x           s         n        t    p               f            c         n          k           e          e                        s                        s                      w                      w         p          w           o         p                 k          s       u
1                e         x           s         y        t    a               f            c         b          k           e          c                        s                        s                      w                      w         p          w           o         p                 n          n       g
2                e         b           s         w        

In [23]:
# Use describe function to get an easy resume
df.describe()

Unnamed: 0,poisonous_edible,cap-shape,cap-surface,cap-color,bruises?,odor,gill-attachment,gill-spacing,gill-size,gill-color,...,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
count,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124,...,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124
unique,2,6,4,10,2,9,2,2,2,12,...,4,9,9,1,4,3,5,9,6,7
top,e,x,y,n,f,n,f,c,b,b,...,s,w,w,p,w,o,p,w,v,d
freq,4208,3656,3244,2284,4748,3528,7914,6812,5612,1728,...,4936,4464,4384,8124,7924,7488,3968,2388,4040,3148
