In [None]:
import pandas as pd
import seaborn as sns
import numpy as np

In [None]:
df=sns.load_dataset("iris")
df.head() #show the data

In [None]:
df["species"].unique()

In [None]:
df.isnull().sum()

In [None]:
# yaha par maine setosa vali species ko nikal diya hu data set se or df main store kiya hu 
df = df[df['species'] != 'setosa']

In [None]:
# ab maine species ko direct hum string form main naho de sakte model ko 
# to hume usko numerical value assign karne padegi to sabse asan tarikha 
# hai ki hum map ka use kare
df['species'] = df['species'].map({'versicolor': 0, 'virginica': 1})

### Splitting the Dataset into Independent and Dependent Features

Hum apne dataset ko **independent features (X)** aur **dependent feature (y)** mein split kar rahe hain.

- **Independent Features (X)**: Ye woh features hain jo hum apne machine learning model ko input dene ke liye use karte hain. Ye generally dataset ke saare columns hote hain, bas **last column** ko chhode ke.
- **Dependent Feature (y)**: Ye woh feature hai jise hum predict karna chahte hain. Ye typically dataset ka **last column** hota hai.

### Code Explanation:
1. **X = df.iloc[:, :-1]**:
   - `iloc`: Ye method index-based access deta hai.
   - `:` (before `,`): Sabhi rows ko select karna.
   - `:-1`: Sabhi columns ko lekin **last column** ko chhode ke.

2. **y = df.iloc[:, -1]**:
   - `-1`: Ye last column ko select karta hai.

Is tarah se hum dataset ko independent (X) aur dependent (y) features mein divide kar lete hain.

#### Summary:
- **X**: Independent features jo model ko train karne ke liye input ke roop mein diye jaate hain.
- **y**: Dependent feature jise hum predict karte hain.


In [None]:
### split datset into independent and dependent feature
# iska [:,:-1] maltb hain ki sare column lene hain last ko chode ke
X=df.iloc[:,:-1]
y=df.iloc[:,-1]

In [None]:
df.head()

In [None]:
X

In [None]:
y

### Splitting the Data into Training and Testing Sets

Hum **X_train** aur **y_train** ka use karenge apne model ko train karne ke liye, aur **X_test** aur **y_test** ka use model ke performance ko evaluate karne ke liye.

- **X_train**: Ye independent features ka training data hoga.
- **y_train**: Ye dependent feature ka training data hoga.
- **X_test**: Ye independent features ka testing data hoga, jisme hum apne trained model ka performance evaluate karenge.
- **y_test**: Ye dependent feature ka testing data hoga, jise hum model ke predictions ke saath compare karenge.

In [None]:
# hum X_train and y_train ka use karenege model of train karne ke liye 
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42)

In [None]:
from sklearn.linear_model import LogisticRegression
classifier=LogisticRegression()

### Using GridSearchCV for Hyperparameter Tuning

**GridSearchCV** ka use hum **model ke hyperparameters** ko optimize karne ke liye karte hain. Ye function different combinations of hyperparameters try karta hai, aur har combination par model ko train karke best performing model ko select karta hai.

- **Hyperparameters**: Ye wo parameters hote hain jo model ke behavior ko control karte hain. Jaise ki regularization strength (penalty) aur iteration count (max_iter).
- **Cross-Validation**: GridSearchCV multiple splits (folds) par model ko evaluate karta hai, taaki humara model overfitting na kare aur generalize kar sake.

### Code Explanation:
```python
from sklearn.model_selection import GridSearchCV

parameter = {'penalty': ['l1', 'l2', 'elasticnet'], 
             'C': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50], 
             'max_iter': [100, 200, 300]}

classifier_regression = GridSearchCV(classifier, param_grid=parameter, scoring='accuracy', cv=5)


In [None]:
#GridSearchCV different combinations of hyperparameters try karta hai,
#har combination pe model train karta hai, aur jo sabse best perform karta hai usse final model bana deta hai.
from sklearn.model_selection import GridSearchCV
parameter={'penalty':['l1','l2','elasticnet'],'C':[1,2,3,4,5,6,7,8,9,10,20,30,40,50],'max_iter':[100,200,300]}
classifier_regression=GridSearchCV(classifier,param_grid=parameter,scoring='accuracy',cv=5)

In [None]:
classifier_regression.fit(X_train,y_train)

In [None]:
print(classifier_regression.best_params_)

In [None]:
print(classifier_regression.best_score_)

In [None]:
#prediction
y_pred=classifier_regression.predict(X_test)

In [None]:
## accuracy score
from sklearn.metrics import accuracy_score,classification_report
score=accuracy_score(y_pred,y_test)
print(score)

In [None]:
## clssification_report
print(classification_report(y_pred,y_test))

In [None]:
sns.pairplot(df,hue='species')

In [None]:
## if we want to see the direct correction 
df.corr()