# Logistic Regression

# Context 
Although this dataset was originally contributed to the UCI Machine Learning repository nearly 30 years ago, mushroom hunting (otherwise known as "shrooming") is enjoying new peaks in popularity. Learn which features spell certain death and which are most palatable in this dataset of mushroom characteristics. And how certain can your model be

## Content
This dataset includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family Mushroom drawn from The Audubon Society Field Guide to North American Mushrooms (1981). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like "leaflets three, let it be'' for Poisonous Oak and Ivy.



## Attribute Information: (classes: edible=e, poisonous=p)

• cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s

• cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s

• cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r,pink=p,purple=u,red=e,white=w,yellow=y

• bruises: bruises=t,no=f

• odor: almond=a,anise=l,creosote=c,fishy=y,foul=f,musty=m,none=n,pungent=p,spicy=s

• gill-attachment: attached=a,descending=d,free=f,notched=n

• gill-spacing: close=c,crowded=w,distant=d

## Part 1 - Data Preprocessing

### Importing the dataset

In [33]:
# pag import ng library ay ganito (import pandas as pd)
import pandas as pd
#tapos ganito ang pag gawa naman ng dataset(dataset = pd.read_csv('tapos eto ung name ng file or location'))
dataset = pd.read_csv('mushrooms.csv')

In [34]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8124 entries, 0 to 8123
Data columns (total 23 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   class                     8124 non-null   object
 1   cap-shape                 8124 non-null   object
 2   cap-surface               8124 non-null   object
 3   cap-color                 8124 non-null   object
 4   bruises                   8124 non-null   object
 5   odor                      8124 non-null   object
 6   gill-attachment           8124 non-null   object
 7   gill-spacing              8124 non-null   object
 8   gill-size                 8124 non-null   object
 9   gill-color                8124 non-null   object
 10  stalk-shape               8124 non-null   object
 11  stalk-root                8124 non-null   object
 12  stalk-surface-above-ring  8124 non-null   object
 13  stalk-surface-below-ring  8124 non-null   object
 14  stalk-color-above-ring  

In [35]:
# Move 'class' column to the end
dataset['class'] = dataset.pop('class')


In [36]:
#pag gusto mo makita top 10 ng data set ganito (dataset.head(10))
dataset.head(10)

Unnamed: 0,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,stalk-shape,...,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat,class
0,x,s,n,t,p,f,c,n,k,e,...,w,w,p,w,o,p,k,s,u,p
1,x,s,y,t,a,f,c,b,k,e,...,w,w,p,w,o,p,n,n,g,e
2,b,s,w,t,l,f,c,b,n,e,...,w,w,p,w,o,p,n,n,m,e
3,x,y,w,t,p,f,c,n,n,e,...,w,w,p,w,o,p,k,s,u,p
4,x,s,g,f,n,f,w,b,k,t,...,w,w,p,w,o,e,n,a,g,e
5,x,y,y,t,a,f,c,b,n,e,...,w,w,p,w,o,p,k,n,g,e
6,b,s,w,t,a,f,c,b,g,e,...,w,w,p,w,o,p,k,n,m,e
7,b,y,w,t,l,f,c,b,n,e,...,w,w,p,w,o,p,n,s,m,e
8,x,y,w,t,p,f,c,n,p,e,...,w,w,p,w,o,p,k,v,g,p
9,b,s,y,t,a,f,c,b,g,e,...,w,w,p,w,o,p,k,s,m,e


In [37]:
dataset.dtypes

cap-shape                   object
cap-surface                 object
cap-color                   object
bruises                     object
odor                        object
gill-attachment             object
gill-spacing                object
gill-size                   object
gill-color                  object
stalk-shape                 object
stalk-root                  object
stalk-surface-above-ring    object
stalk-surface-below-ring    object
stalk-color-above-ring      object
stalk-color-below-ring      object
veil-type                   object
veil-color                  object
ring-number                 object
ring-type                   object
spore-print-color           object
population                  object
habitat                     object
class                       object
dtype: object

In [38]:
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
label_encoder

In [39]:
dataset['cap-shape'] = label_encoder.fit_transform(dataset['cap-shape'])
dataset['cap-surface'] = label_encoder.fit_transform(dataset['cap-surface'])
dataset['cap-color'] = label_encoder.fit_transform(dataset['cap-color'])
dataset['bruises'] = label_encoder.fit_transform(dataset['bruises'])
dataset['odor'] = label_encoder.fit_transform(dataset['odor'])
dataset['gill-attachment'] = label_encoder.fit_transform(dataset['gill-attachment'])
dataset['gill-spacing'] = label_encoder.fit_transform(dataset['gill-spacing'])
dataset['gill-size'] = label_encoder.fit_transform(dataset['gill-size'])
dataset['gill-color'] = label_encoder.fit_transform(dataset['gill-color'])
dataset['stalk-shape'] = label_encoder.fit_transform(dataset['stalk-shape'])
dataset['stalk-root'] = label_encoder.fit_transform(dataset['stalk-root'])
dataset['stalk-surface-above-ring'] = label_encoder.fit_transform(dataset['stalk-surface-above-ring'])
dataset['stalk-surface-below-ring'] = label_encoder.fit_transform(dataset['stalk-surface-below-ring'])
dataset['stalk-color-above-ring'] = label_encoder.fit_transform(dataset['stalk-color-above-ring'])
dataset['stalk-color-below-ring'] = label_encoder.fit_transform(dataset['stalk-color-below-ring'])
dataset['veil-type'] = label_encoder.fit_transform(dataset['veil-type'])
dataset['veil-color'] = label_encoder.fit_transform(dataset['veil-color'])
dataset['ring-number'] = label_encoder.fit_transform(dataset['ring-number'])
dataset['ring-type'] = label_encoder.fit_transform(dataset['ring-type'])
dataset['spore-print-color'] = label_encoder.fit_transform(dataset['spore-print-color'])
dataset['population'] = label_encoder.fit_transform(dataset['population'])
dataset['habitat'] = label_encoder.fit_transform(dataset['habitat'])
dataset['class'] = label_encoder.fit_transform(dataset['class'])

# ginamit ko label encoder pag convert dahil kailangan nya maging numerical value

dataset.head()

Unnamed: 0,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,stalk-shape,...,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat,class
0,5,2,4,1,6,1,0,1,4,0,...,7,7,0,2,1,4,2,3,5,1
1,5,2,9,1,0,1,0,0,4,0,...,7,7,0,2,1,4,3,2,1,0
2,0,2,8,1,3,1,0,0,5,0,...,7,7,0,2,1,4,3,2,3,0
3,5,3,8,1,6,1,0,1,5,0,...,7,7,0,2,1,4,2,3,5,1
4,5,2,3,0,5,1,1,0,4,1,...,7,7,0,2,1,0,3,0,1,0


In [40]:
dataset.isna()
dataset.isnull()
print(dataset.isna().sum())

cap-shape                   0
cap-surface                 0
cap-color                   0
bruises                     0
odor                        0
gill-attachment             0
gill-spacing                0
gill-size                   0
gill-color                  0
stalk-shape                 0
stalk-root                  0
stalk-surface-above-ring    0
stalk-surface-below-ring    0
stalk-color-above-ring      0
stalk-color-below-ring      0
veil-type                   0
veil-color                  0
ring-number                 0
ring-type                   0
spore-print-color           0
population                  0
habitat                     0
class                       0
dtype: int64


### Getting the inputs and output

In [41]:
#pag :-1 ibig sabihin hindi kasama ang nasa dulo
# pag gusto mo makuha lahat ng columns eto gagamitin mo
#X independent
X = dataset.iloc[:,:-1].values
X

array([[5, 2, 4, ..., 2, 3, 5],
       [5, 2, 9, ..., 3, 2, 1],
       [0, 2, 8, ..., 3, 2, 3],
       ...,
       [2, 2, 4, ..., 0, 1, 2],
       [3, 3, 4, ..., 7, 4, 2],
       [5, 2, 4, ..., 4, 1, 2]])

In [42]:
# after neto X = dataset.iloc[:,1:-1].values
# Y dependent
y = dataset.iloc[:,-1].values  #kung gusto mo malaman ang data or value ng X and y lagay klang ng X and y
y

array([1, 0, 0, ..., 0, 1, 0])

### Creating the Training Set and the Test Set

In [43]:
#para gumawa ng test set at training set
# eto ay function (train_test_split)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

In [44]:
X_train

array([[3, 2, 3, ..., 7, 3, 1],
       [5, 0, 8, ..., 7, 2, 1],
       [5, 2, 8, ..., 2, 3, 3],
       ...,
       [5, 2, 3, ..., 3, 3, 1],
       [2, 0, 4, ..., 3, 4, 0],
       [5, 3, 3, ..., 3, 5, 0]])

In [45]:
X_test

array([[5, 3, 4, ..., 2, 3, 5],
       [2, 3, 3, ..., 2, 5, 0],
       [5, 3, 9, ..., 3, 2, 3],
       ...,
       [2, 3, 2, ..., 7, 4, 4],
       [0, 2, 8, ..., 2, 2, 3],
       [5, 3, 4, ..., 3, 4, 0]])

In [46]:
y_train

array([0, 0, 0, ..., 0, 0, 0])

In [47]:
y_test

array([1, 0, 0, ..., 1, 0, 0])

### Feature Scaling

In [48]:
# gagamit uli ng librsry
#ang features scaling ay nagagawa lang sa column hindi sya nagagawa sa rows
from sklearn.preprocessing import StandardScaler 
sc = StandardScaler()
# ang fefeatures natin ay mga nasa X train kase
# sya ang 80 percent 
X_train = sc.fit_transform(X_train)

In [49]:
X_train

array([[-0.2134376 ,  0.14364556, -0.58042016, ...,  1.42385604,
        -0.5040274 , -0.29674863],
       [ 1.03510511, -1.47851339,  1.3980497 , ...,  1.42385604,
        -1.29620976, -0.29674863],
       [ 1.03510511,  0.14364556,  1.3980497 , ..., -0.6737141 ,
        -0.5040274 ,  0.86329309],
       ...,
       [ 1.03510511,  0.14364556, -0.58042016, ..., -0.25420007,
        -0.5040274 , -0.29674863],
       [-0.83770895, -1.47851339, -0.18472619, ..., -0.25420007,
         0.28815496, -0.8767695 ],
       [ 1.03510511,  0.95472503, -0.58042016, ..., -0.25420007,
         1.08033733, -0.8767695 ]])

## Part 2 - Building and training the model

### Building the model

In [50]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(random_state = 0)

### Training the model

In [51]:
model.fit(X_train, y_train)

### Inference

Making the predictons of the data points in the test set

In [52]:
y_pred = model.predict(sc.transform(X_test))

In [53]:
y_pred

array([1, 0, 0, ..., 1, 0, 0])

Making the prediction of a single data point with:

1. cap-shape: bell = 1
2. cap-surface = 2
3. cap-color = 3                  
4. bruises = 4             
5. odor = 5                      
6. gill-attachment = 6             
7. gill-spacing = 7                
8. gill-size = 8           
9. gill-color= 9               
10. stalk-shape = 10                
11. stalk-root = 11            
12. stalk-surface-above-ring = 12    
13. stalk-surface-below-ring = 13
14. stalk-color-above-ring = 14
15. stalk-color-below-ring = 15 
16. veil-type = 16 
17. veil-color = 17              
18. ring-number = 18             
19. ring-type = 19            
20. spore-print-color = 20           
21. population = 21      
22. habitat = 22             

In [54]:
model.predict(sc.transform([[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22]]))

array([1])

## Part 3: Evaluating the model 

### Confusion Matrix

In [55]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_pred)

array([[820,  32],
       [ 38, 735]])

### Accuracy

In [56]:
(820+735)/(820+735+38+32)

0.9569230769230769

In [57]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)

0.9569230769230769