# 🧠 Introduction to Scikit-learn
Scikit-learn is a powerful Python library used for machine learning. It provides simple and efficient tools for data mining and analysis.

## 🔧 Installation
To install scikit-learn, run the following command:

In [1]:
!pip install scikit-learn



## 📚 Load a Sample Dataset
Scikit-learn comes with built-in datasets like the Iris dataset.

In [2]:
from sklearn.datasets import load_iris
import pandas as pd

iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


## 📈 Classification: Predict Iris Species
We will use a **Decision Tree Classifier**.

In [4]:
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = DecisionTreeClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy_score(y_test, predictions)

1.0

## 📉 Regression: Predict House Prices
We will use the **California Housing dataset** and **Linear Regression**.

In [5]:
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

housing = fetch_california_housing()
X_h = pd.DataFrame(housing.data, columns=housing.feature_names)
y_h = housing.target

Xh_train, Xh_test, yh_train, yh_test = train_test_split(X_h, y_h, test_size=0.2, random_state=42)

reg = LinearRegression()
reg.fit(Xh_train, yh_train)
preds = reg.predict(Xh_test)
mean_squared_error(yh_test, preds)

0.5558915986952437

# Score()

In [5]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

model = LogisticRegression()
model.fit(X_train, y_train)

print("Accuracy:", model.score(X_test, y_test))  # Accuracy score


Accuracy: 0.9473684210526315


## ✅ Summary
- Classification is for predicting categories.
- Regression is for predicting continuous values.
- Scikit-learn provides everything from data loading to model evaluation.

Keep practicing with different datasets and models!