# P01_survey 

### Program to obtain variable k, "Industry 4.0 and 5.0 Knowledge"

### This new variable encompasses knowledge of Informaton and Communicaton Technologies required by Industry 4.0 and 5.0.

1. Select questions 15 to 22 (see Appendix I), which correspond to the knowledge acquired on information technologies such as artificial Intelligence, big data, cybersecurity, software development and programming, networks and communications, robotics and automation, IoT, and cloud computing. These topics align with the study programs offered at the research center.
 n_{ij}
2. Based on an excerpt from Table 1 of the document, we selected a subset of the results corresponding to the mentioned technologies. This excerpt (Table 5 of the document) is contained in file "Tec_x_sector.csv".

3. The adoption percentages of the mentioned information technologies by the industrial sectors of the 26 countries were taken and normalized to be used as coefficients for the weighted sum.

4. Applying Equation 7 and 8, Table 6 was obtained. The normalized coefficients of all sectors for each information technology were obtained by applying Equation 9, and the results are shown in Table 7.


$$ sum_j = \sum_{i=0}^{n} p_i $$
$$ n_{ij} = \frac {p_i}{sum_j} , \, \{ i = 1, nit\} \, | \, j = 1, nsec \}$$

$$ C_i = \frac {\sum_j^{nsec}  n_{ij}}{nsec}, \, \{ i = 1, nit \} $$

   
5. The normalized values $Ci$ , represent the coefficients to obtain the new variable "IT Industry 4.0 and 5.0 Knowledge" (k), based on the weighted sum of the data received from the responses to questions 15 to 22 of the survey questionnaire, according to Equation 10.

$$ k = \sum_{i=1}^{nti} C_i  P_j \quad , \, \{ j = 15, 22 \}$$



### Reading the file containing an excerpt of Technologies will likely be adopted by 2025

In [17]:
import pandas as pd
import numpy as np
from pathlib import Path

# Reading the file containing an excerpt of Technologies will likely be adopted by 2025
# by the share of companies surveyed selected sectors (Table 5 of the document)
# DataFrame with 14 industrial' sectors (columns) and 7 information technologies (rows)

ruta_entrada = Path('..')/'Datasets'/'Tec_x_sector.csv'
data = pd.read_csv(ruta_entrada,delimiter=",",encoding = "ISO-8859-1")

### Obtaining the normalized coefficients

In [18]:
tecnologias = ['Artificial Intelligence', 'Big Data', 'Cybersecurity', 'IoT',
               'Robots', 'Cloud Computing', 'Software and Programming', 'Networks']


df = pd.DataFrame(data)

# Function to normalize values by columns (sectors)
def normalizar_por_columnas_y_calcular_Ci(df):
    # Normalize each column by dividing by the sum of the column
    df_normalizado = df.div(df.sum(axis=0), axis=1)
    
    # Add a new column with the normalized coefficients of all sectors (Ci) 
    df_normalizado['Ci'] = df_normalizado.mean(axis=1)
    return df_normalizado

# Normalize the DataFrame
df_normalizado = normalizar_por_columnas_y_calcular_Ci(df)

#Create a new DataFrame that represents Table 7. "Adoption percentages of IT across all industrial sectors and their normalized values
nuevo_df = pd.DataFrame({
    '(%)':df.mean(axis=1).round(0).astype(int),
    'Ci':df_normalizado['Ci']
})
  
nuevo_df.insert(0,"Technology adopted by all sectors", tecnologias)

# Display original and normalized DataFrames

# Configure pandas to display 9 decimal places in the visualization.
pd.set_option('display.float_format', '{:.9f}'.format)

print("Original DataFrame (adoption percentajes):")
print(df)
print("\n" + "="*50 + "\n")
print("Normalized DataFrame (normalized values by column):")
print(df_normalizado)
print("\n" + "="*50 + "\n")
print("DataFrame Table 7. Adoption percentages of IT across all industrial sectors and their normalized values")
print()
print(nuevo_df)


Original DataFrame (adoption percentajes):
   ï»¿AGRI  AUTO  CON  DIGICIT  EDU  ENG  FS  GOV  HE  MANF  MIM  OILG  PS  \
0       62    76   73       95   76   81  90   65  89    71   76    71  76   
1       86    88   91       95   95   76  91   85  89    81   90    86  86   
2       47    88   85       95   86   88  95   95  84    72   83    71  78   
3       88    82   94       92   62   94  88   79  95    84   90    93  74   
4       54    60   52       61   59   65  53   50  56    79   90    79  35   
5       75    80   82       95   95   88  98   95  84    92   87    86  88   
6       80    82   92       95   95   88  98   95  89    80   80    83  80   
7       80    82   92       92   95   88  98   95  89    80   80    83  80   

   TRANS  
0     88  
1     94  
2     75  
3     76  
4     69  
5     94  
6     92  
7     92  


Normalized DataFrame (normalized values by column):
      ï»¿AGRI        AUTO         CON     DIGICIT         EDU         ENG  \
0 0.108391608 0.11912225

### Obtaining the new variable "IT Industry 4.0 and 5.0 Knowledge" (k)
&nbsp;
Based on the weighted sum of the data received from the responses to questions 15 to 22 of the survey questionnaire, according to Equation:
$$ k = \sum_{i=1}^{nti} C_i  P_j \quad , \, \{ j = 15, 22 \}$$


In [19]:
# Reading the file with the students' responses to the questionnaire.
ruta_entrada = Path('..')/'Datasets'/'enc_datosnum.csv'
data_enc = pd.read_csv(ruta_entrada,delimiter=",",encoding = "ISO-8859-1")
df_enc = pd.DataFrame(data_enc)

print("\n" + "="*50 + "\n")
print("DataFrame with the students' responses to the questionnaire")
print(df_enc)

# Select questions 15 to 22 from the responses DataFrame
preguntas_seleccionadas = df_enc.iloc[:, 15:23]  # Columnas 15 a 22 (índice base 0) 

KI45 = pd.DataFrame()
list_sumprod =[]
list_sumprod_rounded = []

# Obtaining k, weighted sum of the data received from the responses to questions 15 to 22 of the survey questionnaire
for i in range(len(df_enc)):
    # Multiplicar la columna 3 de Ci por pj de df_enc
    suma = sum(nuevo_df.iloc[j, 2] * preguntas_seleccionadas.iloc[i, j] for j in range(8))
    list_sumprod.append(suma)
    list_sumprod_rounded.append(round(suma - 0.2))

# Create a new DataFrame to store the results of the weighted sum k

KI45=pd.DataFrame({'k':list_sumprod,
                   'kr':list_sumprod_rounded})
print('------KI45------')
print(KI45)

#Export DataFrame KI45 as a file named KI45.csv.
ruta_salida = Path('..')/'Datasets'/'KI45.csv'
KI45.to_csv(ruta_salida, index=False)



DataFrame with the students' responses to the questionnaire
     Genero  P01  P02  P03  P04  P05  P06  P07  P08  P09  ...  P19  P20  P21  \
0         1    1    2    4    2    1    1    3    3    3  ...    2    1    1   
1         1    1    2    4    2    1    1    3    3    3  ...    2    1    1   
2         1    1    2    4    3    1    1    4    3    3  ...    2    1    2   
3         1    1    2    4    3    1    1    4    4    3  ...    2    1    2   
4         1    1    2    4    3    1    1    4    4    4  ...    3    2    3   
..      ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...   
107       2    2    2    3    5    2    2    5    5    5  ...    4    3    4   
108       2    2    2    3    5    2    2    5    5    5  ...    4    4    4   
109       2    2    4    5    5    3    3    5    5    5  ...    4    4    5   
110       2    2    1    5    5    3    3    5    5    5  ...    4    4    5   
111       3    2    1    5    5    3    3    5    5    5  