# Scikit-learn ile Makine Öğrenmesi

## Giriş

Scikit-learn, Python'da makine öğrenmesi için en popüler ve kapsamlı kütüphanedir. Bu ders, veri bilimi araçlarından makine öğrenmesi araçlarına geçiş için temel oluşturacak.

## Öğrenme Hedefleri

Bu dersin sonunda:
- Scikit-learn'ün temel yapısını ve API'sini anlayacaksınız
- Supervised ve unsupervised learning algoritmalarını öğreneceksiniz
- Model değerlendirme ve validasyon tekniklerini kavrayacaksınız
- Feature engineering ve preprocessing konularında uzmanlaşacaksınız
- Hyperparameter tuning ve model optimizasyonu yapabileceksiniz

## İçerik

### 1. Scikit-learn Temelleri

#### 1.1 Kütüphane Yapısı

In [None]:
# Ana modüller
from sklearn import datasets, preprocessing, model_selection, metrics
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingRegressor
from sklearn.svm import SVC, SVR
from sklearn.cluster import KMeans, DBSCAN
from sklearn.decomposition import PCA

#### 1.2 Estimator API
Scikit-learn'de tüm modeller aynı API'yi takip eder:
- `fit()`: Modeli eğitir
- `predict()`: Tahmin yapar
- `score()`: Model performansını değerlendirir
- `transform()`: Veriyi dönüştürür

### 2. Veri Hazırlama ve Preprocessing

#### 2.1 Veri Yükleme

In [None]:
from sklearn.datasets import load_iris, load_boston, make_classification
from sklearn.model_selection import train_test_split

# Built-in datasets
iris = load_iris()
X, y = iris.data, iris.target

# Veriyi train/test olarak bölme
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

#### 2.2 Feature Scaling

In [None]:
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler

# Standardization (Z-score normalization)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Min-Max scaling
minmax_scaler = MinMaxScaler()
X_minmax = minmax_scaler.fit_transform(X)

#### 2.3 Encoding Categorical Variables

In [None]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer

# Label encoding
le = LabelEncoder()
y_encoded = le.fit_transform(y)

# One-hot encoding
ohe = OneHotEncoder(sparse=False)
X_encoded = ohe.fit_transform(X_categorical)

### 3. Supervised Learning

#### 3.1 Linear Models

In [None]:
# Linear Regression
from sklearn.linear_model import LinearRegression, Ridge, Lasso

lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)

# Ridge Regression (L2 regularization)
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

# Lasso Regression (L1 regularization)
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

#### 3.2 Classification Models

In [None]:
# Logistic Regression
from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression(random_state=42)
log_reg.fit(X_train, y_train)

# Support Vector Machines
from sklearn.svm import SVC

svm = SVC(kernel='rbf', C=1.0, gamma='scale')
svm.fit(X_train, y_train)

# Random Forest
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

### 4. Unsupervised Learning

#### 4.1 Clustering

In [None]:
# K-Means Clustering
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(X)

# DBSCAN
from sklearn.cluster import DBSCAN

dbscan = DBSCAN(eps=0.5, min_samples=5)
clusters = dbscan.fit_predict(X)

#### 4.2 Dimensionality Reduction

In [None]:
# Principal Component Analysis
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

# t-SNE
from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X)

### 5. Model Değerlendirme

#### 5.1 Cross-Validation

In [None]:
from sklearn.model_selection import cross_val_score, KFold

# K-fold cross validation
cv_scores = cross_val_score(model, X, y, cv=5)

# Stratified K-fold (classification için)
from sklearn.model_selection import StratifiedKFold
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

#### 5.2 Performance Metrics

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.metrics import classification_report, confusion_matrix

# Classification metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

# Regression metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

### 6. Hyperparameter Tuning

#### 6.1 Grid Search

In [None]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.001, 0.01, 0.1, 1],
    'kernel': ['rbf', 'linear']
}

grid_search = GridSearchCV(SVC(), param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_score = grid_search.best_score_

#### 6.2 Random Search

In [None]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform, randint

param_distributions = {
    'C': uniform(0.1, 10),
    'gamma': uniform(0.001, 1),
    'kernel': ['rbf', 'linear']
}

random_search = RandomizedSearchCV(
    SVC(), param_distributions, n_iter=100, cv=5, random_state=42
)
random_search.fit(X_train, y_train)

### 7. Pipeline ve Feature Union

#### 7.1 Pipeline

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', RandomForestClassifier(random_state=42))
])

pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)

#### 7.2 Feature Union

In [None]:
from sklearn.pipeline import FeatureUnion
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

feature_union = FeatureUnion([
    ('pca', PCA(n_components=3)),
    ('select', SelectKBest(k=3))
])

X_transformed = feature_union.fit_transform(X, y)

### 8. Ensemble Methods

#### 8.1 Voting Classifier

In [None]:
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

estimators = [
    ('lr', LogisticRegression(random_state=42)),
    ('svc', SVC(random_state=42)),
    ('rf', RandomForestClassifier(random_state=42))
]

voting_clf = VotingClassifier(estimators=estimators, voting='hard')
voting_clf.fit(X_train, y_train)

#### 8.2 Bagging ve Boosting

In [None]:
from sklearn.ensemble import BaggingClassifier, AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier

# Bagging
bagging = BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=100, random_state=42
)

# AdaBoost
adaboost = AdaBoostClassifier(
    DecisionTreeClassifier(), n_estimators=100, random_state=42
)

### 9. Model Persistence

In [None]:
import joblib

# Model kaydetme
joblib.dump(model, 'model.pkl')

# Model yükleme
loaded_model = joblib.load('model.pkl')

### 10. Best Practices

1. **Veri Bölme**: Her zaman train/validation/test split yapın
2. **Cross-Validation**: Model performansını değerlendirmek için CV kullanın
3. **Feature Scaling**: Özellikle distance-based algoritmalar için
4. **Hyperparameter Tuning**: Grid search veya random search kullanın
5. **Ensemble Methods**: Tek bir model yerine ensemble kullanmayı düşünün
6. **Model Interpretability**: Basit modellerle başlayın, gerektiğinde karmaşıklaştırın

## Sonraki Adımlar

Bu dersi tamamladıktan sonra:
1. **Deep Learning**: TensorFlow/Keras veya PyTorch
2. **Advanced ML**: XGBoost, LightGBM, CatBoost
3. **AutoML**: Auto-sklearn, H2O AutoML
4. **MLOps**: Model deployment ve monitoring

## Kaynaklar

- [Scikit-learn Documentation](https://scikit-learn.org/stable/)
- [Scikit-learn User Guide](https://scikit-learn.org/stable/user_guide.html)
- [Scikit-learn Examples](https://scikit-learn.org/stable/auto_examples/)
- [Hands-On Machine Learning](https://github.com/ageron/handson-ml2) 