In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score,classification_report, confusion_matrix

csv_path = '/content/Indicadores_municipales_sabana_DA.csv'

# Try reading the CSV file with different encodings
try:
    df = pd.read_csv(csv_path, encoding='utf-8')
except UnicodeDecodeError:
    try:
        df = pd.read_csv(csv_path, encoding='latin1')
    except UnicodeDecodeError:
        df = pd.read_csv(csv_path, encoding='ISO-8859-1')

Here, the necessary libraries for data management (pandas and numpy), visualization (matplotlib), machine learning (scikit-learn for MLPClassifier), and TensorFlow, which is a popular machine learning library, are imported.

Specifies the path of the CSV file to read.

In this code block, an attempt is made to read the CSV file (Indicadores_municipales_sabana_DA.csv) with three different encodings: 'utf-8', 'latin1', and 'ISO-8859-1'. This is done to handle possible Unicode decoding errors that could occur due to different ways the characters are encoded in the CSV file. The successful read is stored in the DataFrame df.

In [2]:
df.head()

Unnamed: 0,ent,nom_ent,mun,clave_mun,nom_mun,pobtot_ajustada,pobreza,pobreza_e,pobreza_m,vul_car,...,pobreza_alim_10,pobreza_cap_90,pobreza_cap_00,pobreza_cap_10,pobreza_patrim_90,pobreza_patrim_00,pobreza_patrim_10,gini_90,gini_00,gini_10
0,1,Aguascalientes,1,1001,Aguascalientes,794304,30.531104,2.264478,28.266627,27.98332,...,11.8057,20.4,12.7,18.4746,43.4,33.7,41.900398,0.473,0.425,0.422628
1,1,Aguascalientes,2,1002,Asientos,48592,67.111172,8.040704,59.070468,22.439389,...,21.993299,39.9,29.0,30.980801,64.2,48.9,59.1758,0.379,0.533,0.343879
2,1,Aguascalientes,3,1003,Calvillo,53104,61.360527,7.241238,54.119289,29.428583,...,19.2668,39.5,33.1,28.259199,63.9,57.9,56.504902,0.414,0.465,0.386781
3,1,Aguascalientes,4,1004,Cosío,14101,52.800458,4.769001,48.031458,27.128568,...,14.3032,35.2,21.0,22.386101,59.7,40.1,51.164501,0.392,0.541,0.344984
4,1,Aguascalientes,5,1005,Jesús María,101379,45.338512,6.084037,39.254475,26.262912,...,15.0851,36.6,22.6,22.139999,60.6,42.2,45.703899,0.391,0.469,0.458083


It is plot the dataframe to see the dataset and what is inside

In [3]:
# Suppose 'df' is your DataFrame
dimensions = df.shape

# The first component of the tuple represents the number of rows
number_of_rows = dimensions[0]

# The second component of the tuple represents the number of columns
number_of_columns = dimensions[1]

print("Number of rows:", number_of_rows)
print("Number of columns:", number_of_columns)

Number of rows: 2456
Number of columns: 139


To see all the dimensiones of our data, the code prints the number of rows and columns of the DataFrame. This provides basic information about the structure of the data set, showing how many observations (rows) and features (columns) there are in the DataFrame df.

In [4]:
# Suppose 'df' is your DataFrame
# To count null values in each column
null_values_per_column = df.isnull().sum()

# To count null values in total in the DataFrame
total_null_values = df.isnull().sum().sum()

print("Null values per column:")
print(null_values_per_column)

print("\nTotal null values in the DataFrame:", total_null_values)

Null values per column:
ent                  0
nom_ent              0
mun                  0
clave_mun            0
nom_mun              0
                    ..
pobreza_patrim_00    3
pobreza_patrim_10    0
gini_90              2
gini_00              3
gini_10              0
Length: 139, dtype: int64

Total null values in the DataFrame: 305


Due to the data set has null values, we plot the specific amount and then fill them. The code prints the number of null values per column in the DataFrame (null_values_per_column) and the total number of null values in the entire DataFrame (total_null_values). This provides crucial information about the integrity of the data and is useful in deciding how to handle null values in subsequent analysis of the data set.

In [5]:
# Suppose 'df' is your DataFrame
# Fill null values in all columns with the mean of each column
df.fillna(df.mean(), inplace=True)

  df.fillna(df.mean(), inplace=True)


This line of code fills the null values in df with the respective average values of each column, which can be useful to avoid data loss when dealing with null values in a data set.

In [6]:
# Suppose 'df' is your DataFrame
# To count null values in each column
null_values_per_column = df.isnull().sum()

# To count null values in total in the DataFrame
total_null_values = df.isnull().sum().sum()

print("Null values per column:")
print(null_values_per_column)

print("\nTotal null values in the DataFrame:", total_null_values)

Null values per column:
ent                  0
nom_ent              0
mun                  0
clave_mun            0
nom_mun              0
                    ..
pobreza_patrim_00    0
pobreza_patrim_10    0
gini_90              0
gini_00              0
gini_10              0
Length: 139, dtype: int64

Total null values in the DataFrame: 16


The process of counting the null values is done again to ensure that they have been filled. However, we see that not all the spaces have been filled, so we proceed to fill them with 0.

In [7]:
# Suppose 'df' is your DataFrame
df.fillna(0, inplace=True)

This approach can be useful in situations where null values should be treated as zeros or when null values in the DataFrame indicate the absence of certain information and you prefer to treat them as zeros rather than removing those rows or columns from the data set.

In [8]:
# Suppose 'df' is your DataFrame
# To count null values in each column
null_values_per_column = df.isnull().sum()

# To count null values in total in the DataFrame
total_null_values = df.isnull().sum().sum()

print("Null values per column:")
print(null_values_per_column)

print("\nTotal null values in the DataFrame:", total_null_values)

Null values per column:
ent                  0
nom_ent              0
mun                  0
clave_mun            0
nom_mun              0
                    ..
pobreza_patrim_00    0
pobreza_patrim_10    0
gini_90              0
gini_00              0
gini_10              0
Length: 139, dtype: int64

Total null values in the DataFrame: 0


In [10]:
from sklearn.preprocessing import LabelEncoder
# Suppose 'df' is your DataFrame and 'state' is the column with state names
label_encoder = LabelEncoder()
df['codified_municipality'] = label_encoder.fit_transform(df['nom_mun'])
df['encoder_state'] = label_encoder.fit_transform(df['nom_ent'])

The purpose of this encoding process is to convert text labels into numbers so that they can be used as inputs in machine learning algorithms, since most algorithms require numerical data as input. Each unique value in the 'nom_mun' and 'nom_ent' columns is mapped to a corresponding unique number in the new 'codified_municipality' and 'encoder_state' columns.

In [11]:
# Suppose 'df' is your DataFrame and 'category_column' is the name of the column you want to convert
unique_categories = df['gdo_rezsoc00'].value_counts()

print("Unique categories in column 'category_column':")
print(unique_categories)

Unique categories in column 'category_column':
Alto        695
Muy bajo    660
Medio       484
Bajo        428
Muy alto    175
0            14
Name: gdo_rezsoc00, dtype: int64


The code prints the number of unique occurrences of each value in the 'gdo_rezsoc00' column, providing information about the distribution of the categories in that specific column of the DataFrame.

In [12]:
# Convert column 'gdo_rezsoc00' to strings
df['gdo_rezsoc00'] = df['gdo_rezsoc00'].astype(str)
df['gdo_rezsoc05'] = df['gdo_rezsoc05'].astype(str)
df['gdo_rezsoc10'] = df['gdo_rezsoc10'].astype(str)
# Create a LabelEncoder object
label_encoder = LabelEncoder()

# Encode column 'gdo_rezsoc00'
df['gdo_rezsoc00_encoded'] = label_encoder.fit_transform(df['gdo_rezsoc00'])
df['gdo_rezsoc05_encoded'] = label_encoder.fit_transform(df['gdo_rezsoc05'])
df['gdo_rezsoc10_encoded'] = label_encoder.fit_transform(df['gdo_rezsoc10'])

These lines of code use the LabelEncoder to transform the columns 'gdo_rezsoc00', 'gdo_rezsoc05', and 'gdo_rezsoc10' from strings to encoded numeric values. Unique values in these columns are mapped to corresponding unique integers. The results of this encoding are stored in the new columns 'gdo_rezsoc00_encoded', 'gdo_rezsoc05_encoded' and 'gdo_rezsoc10_encoded' in the DataFrame df. These integers are used to represent the original categories in a numerical format, which is useful for many machine learning algorithms.

In [13]:
df.head(10)

Unnamed: 0,ent,nom_ent,mun,clave_mun,nom_mun,pobtot_ajustada,pobreza,pobreza_e,pobreza_m,vul_car,...,pobreza_patrim_00,pobreza_patrim_10,gini_90,gini_00,gini_10,codified_municipality,encoder_state,gdo_rezsoc00_encoded,gdo_rezsoc05_encoded,gdo_rezsoc10_encoded
0,1,Aguascalientes,1,1001,Aguascalientes,794304,30.531104,2.264478,28.266627,27.98332,...,33.7,41.900398,0.473,0.425,0.422628,35,0,5,5,4
1,1,Aguascalientes,2,1002,Asientos,48592,67.111172,8.040704,59.070468,22.439389,...,48.9,59.1758,0.379,0.533,0.343879,124,0,5,5,4
2,1,Aguascalientes,3,1003,Calvillo,53104,61.360527,7.241238,54.119289,29.428583,...,57.9,56.504902,0.414,0.465,0.386781,235,0,5,5,4
3,1,Aguascalientes,4,1004,Cosío,14101,52.800458,4.769001,48.031458,27.128568,...,40.1,51.164501,0.392,0.541,0.344984,430,0,5,5,4
4,1,Aguascalientes,5,1005,Jesús María,101379,45.338512,6.084037,39.254475,26.262912,...,42.2,45.703899,0.391,0.469,0.458083,755,0,5,5,4
5,1,Aguascalientes,6,1006,Pabellón de Arteaga,43783,46.95833,4.141798,42.816532,23.619567,...,47.3,50.8694,0.429,0.466,0.416903,1070,0,5,5,4
6,1,Aguascalientes,7,1007,Rincón de Romos,51284,56.136204,9.163437,46.972767,19.930609,...,50.5,60.598301,0.458,0.486,0.428327,1165,0,5,5,4
7,1,Aguascalientes,8,1008,San José de Gracia,7826,66.619419,7.340833,59.278584,20.860502,...,49.1,53.021801,0.403,0.567,0.387633,1340,0,5,5,4
8,1,Aguascalientes,9,1009,Tepezalá,22027,58.659202,6.033756,52.625448,26.434201,...,47.9,56.040901,0.386,0.53,0.34815,1979,0,5,5,4
9,1,Aguascalientes,10,1010,El Llano,17634,60.638497,9.544885,51.093614,28.606346,...,43.5,56.348099,0.377,0.51,0.323685,519,0,5,5,4


In [15]:
# Remove unwanted columns
columns_to_delete = ['nom_ent', 'nom_mun','gdo_rezsoc00', 'gdo_rezsoc05', 'gdo_rezsoc10']
df.drop(columns_to_delete, axis=1, inplace=True)

After running this code, the columns 'nom_ent', 'nom_mun', 'gdo_rezsoc00', 'gdo_rezsoc05' and 'gdo_rezsoc10' will be deleted from the DataFrame df, leaving the DataFrame with the remaining columns that were not specified in columns_to_delete.

In [16]:
df.head()

Unnamed: 0,ent,mun,clave_mun,pobtot_ajustada,pobreza,pobreza_e,pobreza_m,vul_car,vul_ing,npnv,...,pobreza_patrim_00,pobreza_patrim_10,gini_90,gini_00,gini_10,codified_municipality,encoder_state,gdo_rezsoc00_encoded,gdo_rezsoc05_encoded,gdo_rezsoc10_encoded
0,1,1,1001,794304,30.531104,2.264478,28.266627,27.98332,8.419106,33.066469,...,33.7,41.900398,0.473,0.425,0.422628,35,0,5,5,4
1,1,2,1002,48592,67.111172,8.040704,59.070468,22.439389,5.557604,4.891835,...,48.9,59.1758,0.379,0.533,0.343879,124,0,5,5,4
2,1,3,1003,53104,61.360527,7.241238,54.119289,29.428583,2.921336,6.289554,...,57.9,56.504902,0.414,0.465,0.386781,235,0,5,5,4
3,1,4,1004,14101,52.800458,4.769001,48.031458,27.128568,7.709276,12.361698,...,40.1,51.164501,0.392,0.541,0.344984,430,0,5,5,4
4,1,5,1005,101379,45.338512,6.084037,39.254475,26.262912,8.279864,20.118712,...,42.2,45.703899,0.391,0.469,0.458083,755,0,5,5,4


In [17]:
# Suppose 'df' is your DataFrame
# Use the 'dtypes' attribute to get the column data types
data_types = df.dtypes

# Filters columns that have type 'object', which are commonly used for categorical data
categorical_columns = data_types[data_types == 'object'].index.tolist()

print("Categorical columns:")
print(categorical_columns)

Categorical columns:
[]


The code prints the names of the columns whose data is of type 'object', that is, the categorical columns of the DataFrame df. This provides useful information about which columns might contain categorical data in the data set. With this we can confirm that we have already eliminated all types of categorical data, so the dataset is ready to use in machine learning models



In [18]:
# Suppose 'df' is your DataFrame with 139 columns
# The column that will be the label (target) is called 'label'
# The remaining 138 columns are your characteristics

# Select features
X = df.drop('N_npnv', axis=1) # Column 'label' is excluded

# Select the label
y = df['N_npnv']

# Split the data set into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In this step, a variable X is created that represents the characteristics of the model. The column 'N_npnv' is removed from the DataFrame df to form X, since that column is the target (label) of the model and should not be included in the features. A variable y is created that represents the label (or target) of the model. and contains the values of the 'N_npnv' column of the DataFrame df.

Using scikit-learn's train_test_split function, the data is split into training (X_train, y_train) and test (X_test, y_test) sets. In this case, 20% of the data is used as the test set (test_size=0.2) and random_state=42 is used to make the results reproducible (the number 42 is arbitrary and can be any number).

After these steps, X_train and y_train contain the features and labels of the training set respectively, while X_test and y_test contain the features and labels of the test set respectively.

In [19]:
#This activation function, also known as a step function, returns 1 if the input x is greater than or equal to zero, and 0 otherwise.
# It is a binary activation function commonly used in perceptrons.
# Activation function (step function)
def step_function(x):
     return 1 if x >= 0 else 0

#The weights are initialized with random values using the np.random.randn() function. The bias is also initialized with a random value.
# Initialization of weights and bias
np.random.seed(0)
weights = np.random.randn(X_train.shape[1]) # Initialize the weights with random values
bias = np.random.randn()

#During each epoch (iteration), the perceptron loops through the training examples (X_train and y_train) and adjusts the weights and bias if the prediction is incorrect.
# A learning rate (learning_rate) is used to control the amount by which the weights and bias are adjusted at each step.
# Perceptron Training
learning_rate = 0.1
epochs = 100

for epoch in range(epochs):
     for i in range(X_train.shape[0]):
         # We calculate the output of the perceptron
         weighted_sum = np.dot(X_train.iloc[i], weights) + bias
         prediction = step_function(weighted_sum)

         # We update the weights and bias if the prediction is wrong
         if prediction != y_train.iloc[i]:
             error = y_train.iloc[i] - prediction
             weights += learning_rate * error * X_train.iloc[i]
             bias += learning_rate * error

#This function predict(x) takes an input x and returns the perceptron prediction for that input.
# Classification using the trained Perceptron
def predict(x):
     weighted_sum = np.dot(x, weights) + bias
     return step_function(weighted_sum)

#The features of the test set (X_test) are used to make predictions and the accuracy of the model is calculated by comparing the predictions with the actual labels (y_test).
# Accuracy is printed at the end to evaluate the performance of the perceptron on the test set.
# Evaluation of the model on the test set
predictions = X_test.apply(predict, axis=1)
accuracy = np.mean(predictions == y_test)
print("Model accuracy:", accuracy)

Model accuracy: 0.01016260162601626


This code implements a perceptron, which is a basic supervised learning algorithm for classifying linearly separable data.

**CONCLUSION**

Throughout this project, numerous challenges arose. Firstly, identifying the dataset we were working with was crucial, as understanding how each feature impacted the model we were constructing proved to be a complex and educational process. Then came the challenge of cleaning the dataset to prepare it properly for the model. This task involved getting rid of null or nonexistent values, something I hadn't encountered much before. Converting categorical features into numerical ones was also a new task for me. We had to carefully consider which features were essential and what dimensions of the dataset were sufficient for the model we planned to use.

Once the dataset was ready, the next challenge was building the classification model. Although the result wasn't the best, this process was fundamental for learning to handle the data. I am aware that there is much room for improvement in this procedure by applying more appropriate methods.

Regarding the application of machine learning in robotics, a complex yet exciting landscape emerges. Integrating machine learning into robotics is crucial for a wide range of applications, from obstacle avoidance in mobile robots to detecting moving objects, mapping, and localization for precise path following. However, it's not all sunshine and rainbows. Applying machine learning in robotics can be costly and time-consuming, especially when considering the size of simulations and the specific nature of real-world tasks.

Simulations are essential in this domain as they provide a controlled environment to test and refine algorithms before deploying them on real robots. Despite being a close representation of reality, simulations can fail in some cases due to limitations in cost and processing power. It's a delicate balance between accuracy and economic viability.

In summary, machine learning offers tremendous possibilities in the field of robotics, but its application requires careful consideration of the technical and financial challenges involved. Ongoing research and development in this field are essential to overcoming these barriers and enabling significant advancements at the intersection of machine learning and robotics.