<a href="https://colab.research.google.com/github/JoacoUnab/MODELO-1-ENTREGA-2/blob/main/Modulo_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



*   Descripción de los datos




In [22]:
import pandas as pd
import matplotlib.pyplot as plt

# Leer el dataset con separador de punto y coma
data = pd.read_csv('data.csv', sep=';')


# Imprimir los nombres de las columnas
print("\nNombres de las columnas:")
for column in data.columns:
    print(column)

# Cantidad de registros
num_records = data.shape[0]

# Número de variables
num_variables = data.shape[1]

# Número de clases (asumiendo que la columna objetivo se llama 'Target' o algo similar)
# Verificamos el nombre exacto de la columna objetivo
target_column = 'Target'  # Reemplazar con el nombre correcto si es diferente

# Verificar el nombre exacto de la columna objetivo
if target_column not in data.columns:
    print(f"Las columnas del dataset son: {list(data.columns)}")
    raise KeyError(f"La columna '{target_column}' no existe en el dataset. Por favor verifica el nombre de la columna.")

# Número de clases
num_classes = len(data[target_column].unique())

# Mostrar los resultados
print("\nCantidad de registros:", num_records)
print("Número de variables:", num_variables)
print("Número de clases:", num_classes)



Nombres de las columnas:
Marital status
Application mode
Application order
Course
Daytime/evening attendance	
Previous qualification
Previous qualification (grade)
Nacionality
Mother's qualification
Father's qualification
Mother's occupation
Father's occupation
Admission grade
Displaced
Educational special needs
Debtor
Tuition fees up to date
Gender
Scholarship holder
Age at enrollment
International
Curricular units 1st sem (credited)
Curricular units 1st sem (enrolled)
Curricular units 1st sem (evaluations)
Curricular units 1st sem (approved)
Curricular units 1st sem (grade)
Curricular units 1st sem (without evaluations)
Curricular units 2nd sem (credited)
Curricular units 2nd sem (enrolled)
Curricular units 2nd sem (evaluations)
Curricular units 2nd sem (approved)
Curricular units 2nd sem (grade)
Curricular units 2nd sem (without evaluations)
Unemployment rate
Inflation rate
GDP
Target

Cantidad de registros: 4424
Número de variables: 37
Número de clases: 3


*   Implementacion de Librerias
*   Preporcesamiento de datos
*   Codificación de valores faltantes





In [69]:
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, VotingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score
from sklearn.impute import SimpleImputer

# Preprocesamiento de datos
# Imputación de valores faltantes
imputer = SimpleImputer(strategy='most_frequent')
data_imputed = pd.DataFrame(imputer.fit_transform(data), columns=data.columns)

# Codificación de variables categóricas
label_encoders = {}
for column in data_imputed.select_dtypes(include=['object']).columns:
    if column != 'Target':
        le = LabelEncoder()
        data_imputed[column] = le.fit_transform(data_imputed[column])
        label_encoders[column] = le



*   Codificación de la variable objetivo
*   Division del dataset en conjuntos de entrenamiento y prueba
*   Normalización de características





In [70]:
# Codificación de la variable objetivo
le_target = LabelEncoder()
y = le_target.fit_transform(data_imputed['Target'])

# Dividir el dataset en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(data_imputed.drop('Target', axis=1), y, test_size=0.2, random_state=42)

# Normalización de características
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)



*   Entrenamiento modelo Random Forest....
*   Entrenamieno modelo Gradient...
*   Entrenamiento modelo DecisionTree....
*   Entrenamiento modelo SVM
*   Creacion modelo Voting.....



In [71]:
# Entrenar un modelo de RandomForestClassifier
model_rf = RandomForestClassifier(random_state=42)
model_rf.fit(X_train, y_train)

# Entrenar un modelo de GradientBoostingClassifier
model_gb = GradientBoostingClassifier(random_state=42)
model_gb.fit(X_train, y_train)

# Entrenar un modelo de DecisionTreeClassifier
model_dt = DecisionTreeClassifier(random_state=42)
model_dt.fit(X_train, y_train)

# Entrenar un modelo de SVM
model_svc = SVC(random_state=42)
model_svc.fit(X_train, y_train)

# Crear el modelo VotingClassifier
voting_clf = VotingClassifier(estimators=[('rf', model_rf), ('gb', model_gb), ('dt', model_dt), ('svc', model_svc)], voting='hard')
voting_clf.fit(X_train, y_train)



*   Validación cruzada con 10 FOLDS





In [74]:
# Validación cruzada con 10 folds
models = [model_rf, model_gb, model_dt, model_svc, voting_clf]
model_names = ['RandomForest', 'GradientBoosting', 'DecisionTree', 'SVM', 'VotingClassifier']
for model, name in zip(models, model_names):
    cv_scores = cross_val_score(model, X_train, y_train, cv=10, scoring='accuracy')
    print(f"{name} CV Score: {cv_scores.mean()}")

RandomForest CV Score: 0.7739496807029338
GradientBoosting CV Score: 0.7829884284822587
DecisionTree CV Score: 0.682962820697492
SVM CV Score: 0.7668819321073606
VotingClassifier CV Score: 0.7824226564875721




*   Predicciones y evaluación



In [75]:
# Predicciones y evaluación
for model, name in zip(models, model_names):
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred, average='weighted')
    print(f"\n{name} - Accuracy: {accuracy}, F1-Score: {f1}")


RandomForest - Accuracy: 0.7559322033898305, F1-Score: 0.7384726059789364

GradientBoosting - Accuracy: 0.7615819209039548, F1-Score: 0.7480891952898038

DecisionTree - Accuracy: 0.6779661016949152, F1-Score: 0.6807140038188586

SVM - Accuracy: 0.7570621468926554, F1-Score: 0.739037352194901

VotingClassifier - Accuracy: 0.7638418079096045, F1-Score: 0.7475623053526819
