# Implementación

El conjunto de datos "Voices Heard" es una colección completa de informes y quejas presentadas por estudiantes en un entorno universitario. Desde quejas académicas hasta preocupaciones sobre la seguridad del campus, este conjunto de datos ofrece una gran cantidad de conocimientos sobre la experiencia de los estudiantes, proporcionando comentarios valiosos para los administradores y educadores universitarios. Con su diversa gama de comentarios, "Voices Heard" ofrece una oportunidad única para comprender mejor las necesidades e inquietudes de los estudiantes y desarrollar soluciones basadas en datos para mejorar la experiencia universitaria para todos.

In [3]:
# Tratamiento de datos
import numpy as np
import pandas as pd
import string
import re
import nltk
from nltk import pos_tag
from nltk.corpus import stopwords
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_selection import VarianceThreshold
from scipy.spatial.distance import cosine
from unidecode import unidecode
from sklearn.tree import plot_tree

# Gráficos
import matplotlib.pyplot as plt
import matplotlib.style as style
import seaborn as sns
from wordcloud import WordCloud

# Preprocesado y modelado
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score


# Configuración warnings
import warnings
warnings.filterwarnings('ignore')
import nltk
nltk.download('stopwords')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\dania\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\dania\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\dania\AppData\Roaming\nltk_data...


True

Leyendo el conjunto de datos

In [4]:
data = pd.read_csv("voices_heard.csv")
print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1005 entries, 0 to 1004
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Genre        1005 non-null   object 
 1   Reports      1005 non-null   object 
 2   Age          1005 non-null   int64  
 3   Gpa          1005 non-null   float64
 4   Year         1005 non-null   int64  
 5   Count        1005 non-null   int64  
 6   Gender       1005 non-null   object 
 7   Nationality  1005 non-null   object 
dtypes: float64(1), int64(3), object(4)
memory usage: 62.9+ KB
None


In [7]:
print(data.head())

                            Genre  \
0  Academic Support and Resources   
1  Academic Support and Resources   
2  Academic Support and Resources   
3  Academic Support and Resources   
4  Academic Support and Resources   

                                             Reports  Age   Gpa  Year Gender  \
0  The limited access to research databases and m...   27  2.18     2      M   
1  I'm having trouble finding the course material...   23  3.11     2      F   
2  It's frustrating to have limited access to res...   20  3.68     2      F   
3  I'm really struggling in one of my classes but...   20  1.30     2      F   
4   I am really struggling with understanding the...   26  2.50     2      F   

  Nationality  
0       Egypt  
1       Egypt  
2       Egypt  
3       Egypt  
4       Egypt  


Removiendo columna Count

In [6]:
data = data.drop(data.columns[5], axis=1)

Se muestran las columnas del dataset

In [8]:
data[['Genre','Reports', 'Age', 'Gpa', 'Year', 'Gender', 'Nationality']].head()

Unnamed: 0,Genre,Reports,Age,Gpa,Year,Gender,Nationality
0,Academic Support and Resources,The limited access to research databases and m...,27,2.18,2,M,Egypt
1,Academic Support and Resources,I'm having trouble finding the course material...,23,3.11,2,F,Egypt
2,Academic Support and Resources,It's frustrating to have limited access to res...,20,3.68,2,F,Egypt
3,Academic Support and Resources,I'm really struggling in one of my classes but...,20,1.3,2,F,Egypt
4,Academic Support and Resources,I am really struggling with understanding the...,26,2.5,2,F,Egypt


## Inspección y tratamiento de los datos

Verificando que el conjunto de datos no tiene variables con valores faltantes o nulos

In [9]:
data.isnull().sum()

Genre          0
Reports        0
Age            0
Gpa            0
Year           0
Gender         0
Nationality    0
dtype: int64

Se cuenta el numero de filas del Dataset



In [10]:
total_filas = data.shape[0]
print("Total de filas:", total_filas)

Total de filas: 1005
