### Inductive vs Deductive Reasoning
Inductive reasoning involves deriving general principles or patterns from specific observations. For instance, if a person visits several restaurants and notices that all of them serve pizza, they might generalize that pizza is a popular dish in that area.

Deductive reasoning, on the other hand, is the process of reaching a specific conclusion based on general principles or premises. For example, if it is known that all birds have feathers, and a robin is a bird, then one can deduce that a robin has feathers.

In [46]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.compose import ColumnTransformer
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier

def load_data(filepath):
    return pd.read_csv(filepath)

def preprocess_data(df):
    categorical_features = ['workclass', 'education', 'martial-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']
    X = df.drop('income', axis=1)
    y = df['income']
    preprocessor = ColumnTransformer(transformers=[('cat', OneHotEncoder(), categorical_features)], remainder='passthrough')
    X = preprocessor.fit_transform(X)
    return X, y

def decision_tree_model(X_train, y_train):
    param_grid = {'max_depth': [None, 10, 20, 30], 'min_samples_split': [2, 5, 10]}
    grid = GridSearchCV(DecisionTreeClassifier(random_state=42), param_grid, cv=5)
    grid.fit(X_train, y_train)
    return grid.best_estimator_

def random_forest_model(X_train, y_train):
    param_grid = {'n_estimators': [50, 100, 200], 'max_depth': [None, 10, 20], 'min_samples_leaf': [1, 2, 4]}
    grid = RandomizedSearchCV(RandomForestClassifier(random_state=42), param_grid, n_iter=10, cv=5, random_state=42)
    grid.fit(X_train, y_train)
    return grid.best_estimator_

def xgboost_model(X_train, y_train):
    label_encoder = LabelEncoder()
    y_train_encoded = label_encoder.fit_transform(y_train)
    param_grid = {'max_depth': [3, 5, 7], 'learning_rate': [0.01, 0.1, 0.2], 'n_estimators': [50, 100, 200]}
    grid = GridSearchCV(XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=42), param_grid, cv=5)
    grid.fit(X_train, y_train_encoded)
    return grid.best_estimator_

# Load and preprocess data
df = load_data('adult.csv')
X, y = preprocess_data(df)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train models
dt_model = decision_tree_model(X_train, y_train)
rf_model = random_forest_model(X_train, y_train)
xgb_model = xgboost_model(X_train, y_train)

print(dt_model)
print(rf_model)
print(xgb_model)

DecisionTreeClassifier(max_depth=10, min_samples_split=10, random_state=42)
RandomForestClassifier(min_samples_leaf=4, n_estimators=200, random_state=42)
XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, device=None, early_stopping_rounds=None,
              enable_categorical=False, eval_metric='logloss',
              feature_types=None, gamma=None, grow_policy=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=0.2, max_bin=None, max_cat_threshold=None,
              max_cat_to_onehot=None, max_delta_step=None, max_depth=3,
              max_leaves=None, min_child_weight=None, missing=nan,
              monotone_constraints=None, multi_strategy=None, n_estimators=200,
              n_jobs=None, num_parallel_tree=None, random_state=42, ...)


#### Decesion Tree Explanation

    Max Depth (10): This means that the decision tree can have a maximum depth of 10 levels. Imagine a tree-like structure where each level represents a decision based on a feature. A depth of 10 implies that the tree can make up to 10 sequential decisions before reaching a prediction.

    Min Samples Split (10): This parameter specifies that a node in the tree must have at least 10 samples to be eligible for further splitting. It helps control the complexity of the tree by preventing splits on nodes with too few samples, which could lead to overfitting.
    
    Random State (42): This ensures that the random process used to build the tree is reproducible. In other words, if you run the model with the same random state multiple times, you'll get the same results each time.

#### xgboost Model Explanation

    Learning Rate (0.2): This is the step size shrinkage used in the gradient boosting process. A learning rate of 0.2 means that each tree's contribution to the ensemble is reduced by 0.2 at each iteration, which helps prevent overfitting.

    Max Depth (3): Similar to the decision tree, this sets the maximum depth of each individual tree in the ensemble. With a max depth of 3, each tree can only make up to 3 sequential decisions before reaching a prediction.

    Number of Estimators (200): Like in the random forest, this determines the number of boosting rounds or trees to build in the ensemble.
    
    Eval Metric ('logloss'): XGBoost uses the logarithmic loss ('logloss') as the evaluation metric during training. Logloss measures the performance of the model's predicted probabilities against the actual labels.
    Random State (42): Ensures reproducibility of results, similar to the other models.

#### Random Forest Model Explanation

    Min Samples Leaf (4): In a random forest, each decision tree is trained on a subset of the data. This parameter specifies the minimum number of samples required to be at a leaf node of each individual tree. Setting it to 4 means that each leaf node must have at least 4 samples.

    Number of Estimators (200): Random forests consist of an ensemble of decision trees. This parameter defines how many trees are in the forest. Having 200 trees means that the model aggregates predictions from 200 individual decision trees.
    
    Random State (42): As with the decision tree, this ensures reproducibility of results across different runs.