2차 코드

**Overview**

This project aims to build a binary classification model for predicting insurance cross-selling using a neural network.

**1. Import Libraries**

Purpose: 
- Imports required libraries for data manipulation, preprocessing, and building a neural network model.

In [None]:
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LeakyReLU
import tensorflow as tf
import torch

2024-07-24 12:09:51.572558: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-24 12:09:51.572742: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-24 12:09:51.787449: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


**2. Data Loading**

Purpose: 
 - Load the training and test datasets.

In [3]:
# Data loading
train_data = pd.read_csv('/kaggle/input/playground-series-s4e7/train.csv')
test_data = pd.read_csv('/kaggle/input/playground-series-s4e7/test.csv')

**3. Preprocessing**

Purpose:
- Scale numerical features and one-hot encode categorical features.
- Prepare data for training.

In [None]:
# Feature and target separation
X_train = train_data.drop('Response', axis=1)
y_train = train_data['Response']
X_test = test_data  # Assuming test_data does not contain 'Response' column

# Categorical and numerical features selection and preprocessing
categorical_features = ['Region_Code', 'Vehicle_Age', 'Vehicle_Damage', 'Policy_Sales_Channel']
numerical_features = ['Age', 'Driving_License', 'Previously_Insured', 'Annual_Premium', 'Vintage']

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numerical_features),
        ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
    ])


# Applying the preprocessing pipeline
X_train_preprocessed = preprocessor.fit_transform(X_train)
X_test_preprocessed = preprocessor.transform(X_test)

**4. Model Definition**

Purpose:
- Define a neural network with three hidden layers and one output layer.
- Compile the model with loss and evaluation metrics.

In [4]:
# Neural network model configuration
model = Sequential([
    Dense(128, activation = 'relu', input_dim=X_train_preprocessed.shape[1]),
    Dense(64, activation = 'relu'),
    Dense(32, activation = 'relu'),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy', tf.keras.metrics.AUC()])

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5
[1m2528/2528[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 32ms/step - accuracy: 0.8446 - auc: 0.7089 - loss: 0.3733 - val_accuracy: 0.8774 - val_auc: 0.8492 - val_loss: 0.2686
Epoch 2/5
[1m2528/2528[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 32ms/step - accuracy: 0.8770 - auc: 0.8501 - loss: 0.2684 - val_accuracy: 0.8774 - val_auc: 0.8538 - val_loss: 0.2656
Epoch 3/5
[1m2528/2528[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m78s[0m 30ms/step - accuracy: 0.8770 - auc: 0.8537 - loss: 0.2663 - val_accuracy: 0.8774 - val_auc: 0.8559 - val_loss: 0.2645
Epoch 4/5
[1m2528/2528[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m78s[0m 31ms/step - accuracy: 0.8771 - auc: 0.8557 - loss: 0.2650 - val_accuracy: 0.8774 - val_auc: 0.8573 - val_loss: 0.2638
Epoch 5/5
[1m2528/2528[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m79s[0m 31ms/step - accuracy: 0.8767 - auc: 0.8567 - loss: 0.2650 - val_accuracy: 0.8774 - val_auc: 0.8583 - val_loss: 0.2632


<keras.src.callbacks.history.History at 0x7e7ce697fb20>

**5. Model Training**

Purpose: 
- Train the model with preprocessed data for five epochs using a batch size of 4096.

In [None]:
# Model training
model.fit(X_train_preprocessed, y_train, epochs=5, batch_size=4096, validation_split=0.1)

**6. Prediction and Submission**

Purpose: 
- Generate predictions for the test data and save results in a submission file.

In [None]:
final_predictions = model.predict(X_test_preprocessed, batch_size=200000).flatten()

# Creating and saving the results DataFrame
submission_df = pd.DataFrame({
    'id': test_data['id'],
    'Response': final_predictions
})
submission_df.to_csv('/kaggle/working/submission.csv', index=False)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m27s[0m 686ms/step
