# Navigating Labor Market Transitions in the Age of AI Language Models in Mexico

## 2. Organization Name

<div align='right'><b>Inter-American Development Bank</b></div>

## 3. Problem

### 3.1. What is the business or public policy problem you are facing?

We need to understand how advances in artificial intelligence, specifically Large Language Models (LLMs), could impact the labor market in Mexico.

*Why it's important:* This knowledge is critical for decision-makers in government, industries, and educational organizations to adapt labor and education policies to technological evolution. Lack of clear understanding could result in unmitigated job loss, increasing labor inequalities, and ultimately a less competitive economy nationally and internationally.

*Immediate Task:* This project aims to replicate and adapt an existing study to evaluate the potential impact of large language models on various industries and regions in Mexico, using reliable local data. Our goal is to provide an empirical basis on which more effective labor and education policies can be developed.

### 3.2. Who or what is affected by this problem?

- *Workers:* Particularly those in sectors susceptible to automation and jobs involving repetitive tasks or data processing. This group could face job displacement if they do not adapt to new technologies.
- *Businesses:* Companies will need to restructure and change their business models to adapt. Those that do not will not remain competitive.
- *Universities:* Educational institutions will need to adapt their curricula to prepare future workers with skills more in line with the labor market.
- *Government:* The government requires data about this technological change to make effective decisions.
- *National Economy:* On a larger scale, if Mexico does not adapt, it will lag behind the rest of the world, resulting in impoverishment.

### 3.3. How many of these people/organizations/places/etc. are affected by the problem, and to what extent are they affected (an order of magnitude is sufficient)?

- *Workers:* It is estimated that a significant percentage of workers in certain industries, such as manufacturing or data processing services, could be affected by automation. For example, some studies suggest that up to 30-40% of jobs in these categories could be at risk in the next 5 to 10 years.
- *Businesses:* Organizations that do not adapt to the new technological reality could lose, on the order of magnitude, millions of dollars in annual revenue due to reduced competitiveness.
- *Universities:* Universities that do not adjust their curricula face a potential decrease in the employability of their graduates.
- *Government:* The lack of effective policy could result in significant social costs, which could be measured in billions of dollars in terms of unemployment subsidies, professional training, and other adjustment programs.
- *National Economy:* Failure to adapt to emerging technologies could result in a significant loss in terms of GDP, in the order of percentage points, affecting the economy on a national scale.

### 3.4. Why is solving this problem a priority for your organization at this time?

- *Temporal Urgency:* With the rapid adoption of artificial intelligence technologies in various industries, time is essential to understand their implications before changes in the labor market become irreversible.
- *National Competitiveness:* Staying up-to-date with global technology trends is crucial for Mexico's long-term competitiveness. Failure to adapt could have significant consequences for our global positioning.
- *Socioeconomic Inequalities:* Technological adaptation can amplify existing inequalities if not managed properly.
- *Education and Employability:* Failure to take action could result in a generation of professionals ill-prepared for future challenges.
- *Innovation Opportunity:* This problem represents an opportunity to develop new strategies, policies, and technologies that could not only mitigate associated challenges but also open up new fields of employment and economic growth.

### 3.5. How have you attempted to address this problem, and what has been the outcome of your efforts?

- *Preliminary Research:* We conducted a review of existing literature and studies internationally, allowing us to identify knowledge gaps and areas where our research could contribute significantly.
- *Methodology Development:* We have outlined a research plan to address specific questions we have about the impact of AI on various industries and regions in Mexico.
- *Outcomes of Efforts:* Although we are in the early stages of the project, we have already achieved:
  - Identifying data sources and methodologies that will be useful for further analysis.
  - Generating interest and commitment from key stakeholders, including support in the form of mentorship.

### 3.6. What other groups or stakeholders within and outside your organization need to be involved in defining and implementing this project?

- *Within the Organization:*
  - Department of Computer Science: The academic department is crucial to ensure that the project aligns with educational objectives and competencies of the institution.
- *Outside the Organization:*
  - Oliver and the Inter-American Development Bank (IDB): Their mentorship and expert guidance are crucial for the project's relevance, as Oliver has already worked on a similar project.
  - National Institute of Statistics and Geography (INEGI): Since we use their databases, their collaboration could be crucial for accessing more detailed or updated information.
  
<hr>


## 4. Goals

A goal is a concrete, specific, measurable aim or outcome that the organization will accomplish by addressing the problem. Building a technical solution, such as a predictive model, dashboard, or map, is not itself the goal of a data science project even if one of these tools might help you achieve your goals.

### 4.1 What are your social, policy, or business goals, and what constraints do you have?

Goals should directly relate to the problem you’ve identified, and will typically improve/maximize/increase or decrease/mitigate/reduce a relevant outcome or metric (e.g. increase the percentage of high school students who graduate on time).

Goals often need to balance efficiency (e.g. help the most number of people in need with limited resources), effectiveness (e.g. maximize the total improvement in outcomes from the help you provide to people), and equity (e.g. allocate resources across groups to achieve equity in outcomes).

Common goal-related constraints are limited budget, people and/or time; legal restrictions or
lack of political will; or lack of social license.


| Number | Goal | Goal Type (Efficiency, Effectiveness and Equity) | Constraints around the Goal |
| --- | --- | --- | --- |
| **1**| Find the most exposed occupations and skills, therefore industries. | Efficiency | Lack of clear job and skill descriptions, with no information on the weight and importance of each skill for an industry. |
| **2**| Quantify the percentage of jobs within each identified industry that will be exposed to LLMs. | Effectiveness | Lack of specificity and granularity in the data of job descriptions. No knowledge on the weight and importance of a job inside an industry. |
| **3**| Quantify the exposure of each region based on the industries in the region, for a nationwide view of the impact of LLMs. | Equity | Disparity in data quality and granularity for each region of the country. |


1. **Trade-off**: The goal of identifying the top skills might conflict with the goal of quantifying the percentage of jobs within each industry exposed to LLMs. Given our time constraints (4 months) we might focus on quickly identifying the top industries but might not delve deep enough into each industry to accurately quantify job exposure.
   - If a trade-off must be made, placing more emphasis on the second goal might be beneficial. This is because while identifying the top industries is important, understanding the depth of exposure within these industries provides more valuable insights. However, it’s much more realistic to focus on the first goal given the time and resources we have at our disposal.
   - Otro elemento de nivel 2
2. **Trade-off**:Quickly identifying the top industries might lead to overlooking certain regions that are not as prominent in those industries but are still significantly exposed to LLMs. This could lead to an inequitable representation of regions.However, a robust analysis (product of the focus on finding top jobs irrespective of their region and industry) is a great job and wouldn’t be frowned upon and be a much more solid base for future projects than a project that’s too focused on doing region/industry specific research. 

In summary, given the time and data constraints, we should be focusing first on the first goal to produce a list of the jobs that are most exposed to LLMs and to try to quantify how exposed they are. Even if we don’t have the time or resources available to extend the report to regional or industry-level results, a solid, robust approach to the first goal will help future work to be developed on top of ours and provides a better tool for policy maker who might already be familiar with an industry or a region’s composition.  

<hr>

## 5. Actions

An action is an activity, intervention, or program that your organization has, or will perform, to reach the goal(s) you’ve outlined. Actions are generally performed routinely and often involve allocating resources, such as providing preventative services, outreach attempts, or after-school programs to people, or prioritizing inspection of certain homes or facilities. The data and the analysis in steps 6 and 7 should inform these actions to help achieve our goals. 

### 5.1 What actions will your organization take to address the problem?

| | Action 1 | Action 2 | Action 3 | Action 4 |
| --- |--- |---| --- | --- |
| **What is the action?** | Collect data on various industries, occupations, and regions in Mexico. | Data analysis to study the distribution and relationships in our datasets; understanding of the industries, occupations, and regions we have available. | Bi-weekly meetings with Oliver Azuara to give direction to the project and to get help interpreting and understanding the results. | We are still in the early stages of the project so we still have to decide on the methodology and the models that we will be using to analyze the data provided. |
| **Which goal does this action help achieve?** | Creating a comprehensive dataset that has information on industry classifications and skills associated to jobs inside those industries. Trying to replicate the O*NET survey that Gpts are Gpts used.  | Familiarization with the dataset, specifically with the unknown labor variables. | Give direction from the project adjusting it to the expert’s needs and views. | Have a clear understanding of the problems ahead, of the actions needed to be taken and of the potential models that we’ll need to look into. |
| **Who is executing this action?** | Us, aided by the data source suggestions provided by Oliver and Andrés. | Data analysts and help from researchers at the Interamerican Bank for Development. | Us and Oliver Azuara. | Us |
| **Who or what is the action being taken on?** | Industries, occupations, and potentially regions in Mexico. | The collected data | Oliver Azuara, who is being informed of the project. | On the data |
| **How often is the decision to take this action made?** | Initially at the start of the project, potentially and ideally nurtured by other data sources and experts we can have access to. | Once the data is collected. |  Once a week for the first three weeks and bi-weekly then on. | Once, potentially but hopefully not, twice or three times between here |
| **What channels are or can be used to take this action?** | In-person collaboration between the tea working on the available data (INEGI and Gob sources as well as Databases BID has access to). |  Data analysis software and tools. |  Via Microsoft Teams. | In-person communication and collaborative environments. |
| **Are there any resource or capacity constraints with this action?** | Limited access to some industry-specific data and potential problem-domain constraints such as lack of knowledge of the skills/abilities for specific occupations or tasks | Requires skilled data analysts and appropriate analysis, trying to not over-analyze and bias. | Time and schedule problems, it’s hard to plan a meeting that accommodates anyone. | Theoretical and computational constraints. |
| **What are the ethical issues associated with this action?** | Ensuring data privacy and avoiding any biases in data collection. | Ensuring unbiased analysis and interpretation of data. | None |  If we don’t have clear goals (efficiency, effectiveness, equity) we can be returning results that affect some, that are too biased and that are not trustworthy |
| **Will acting on someone who does not need this intervention have adverse consequences?** |  Not really, unless we decide to omit certain regions or industries without a clear, informed reason. | No |  If we include people in the meetings that shouldn’t be there, we will only waste their time and ours. | No |
| **Can you provide any other useful information about this action?** | Data collection is the foundational step for the research. It needs to be comprehensive and try to remain unbiased. |  Proper data analysis is crucial for the accuracy and credibility of the research, since it will lay the foundations for further analysis. | No | Given our exploratory data analysis, we will have to dive deep into the existing models to tackle similar problems |
| **Has it been tested to be effective?** | Yes, BID has already done some work collecting and consolidating the data in an effective, problem-domain informed way; also INEGI data has been previously shown to be easily aggregated to other data sources. | Yes, data analysis is a standard research procedure. | Yes, meetings to catch up are always positive if they are short, concise and periodical but not daily. | Yes |

<hr>

## 6. Data

In the current context of technological advances, the introduction of natural language artificial intelligence models represents a significant milestone in the transformation of industrial sectors and the economy in Mexico. To understand and assess how these advancements will impact at the sectorial and regional levels, we have gathered and will analyze data from three key sources: the National Survey of Household Income and Expenditure (ENIGH), the National Directory of Economic Units (DENUE), and the North American Industry Classification System (NAICS).

### 6.1 and 6.2 Internal and External Data

|  | Data Source 1 | Data Source 2 | Data Source 3 |
|-----------|-----------|-----------|-----------|
| **NAME OF DATA SOURCE**   | National Survey of Household Income and Expenditure (ENIGH) | National Directory of Economic Units (DENUE) | North American Industry Classification System (NAICS) |
| **WHAT IT CONTAINS**   | Information about the income and expenses of Mexican households, as well as other aspects related to household economics. Examples include family income, family expenses, socio-economic characteristics, access to goods and services, among others. | Detailed information about economic units in Mexico, including companies, establishments, businesses, and other entities engaged in economic activities. Some of the data includes economic information, such as the economic sector to which they belong, size of the economic unit, georeferencing (geographical coordinates), and industrial classification, specifically the National Classification of Activities (CNAE). | A classification system used in North America to categorize and standardize economic and business activities into different industrial sectors and subsectors. |
| **LEVEL OF GRANULARITY**    | The household is the basic unit of analysis. The level of detail in the data depends on the type; for example, income and expenses are recorded in very specific subcategories, while demographic data includes information about household members and their employment situations. | The basic unit of analysis is the "economic unit," which can be companies, establishments, or other entities engaged in economic activities. Specific data is available for each economic unit, such as name, address, contact information, economic sector, size, number of employees, and date of operation. | Sector, subsector, industry, and detailed industry. The most granular level consists of a 6-digit number that uniquely identifies each detailed industry. |
| **TEMPORALITY**    | Data is available from 2012 to 2022 | Data is available from 2015 to 2022 | Data remains constant |
| **FREQUENCY OF UPDATES**    | Every two years in the survey conducted by the National Institute of Statistics and Geography (INEGI) | Annually in the survey conducted by the National Institute of Statistics and Geography (INEGI) | Undetermined |
| **RELIABLE IDENTIFIERS**    | Yes. There is a "folioviv" that uniquely identifies each household, in addition to codes for municipalities, states, and localities where each is located. | Yes. There is an "id" column that identifies each economic unit in the database. There is also a "clee," which is a statistical identification key assigned by INEGI to each of the establishments and companies registered in the RENEM to identify establishments of the same company. | Yes. The 6-digit code uniquely identifies each detailed industry in North American countries. |
| **INTERNAL OWNER OF DATA**    | These data are in the public domain. We, the researchers, will make use of them. | These data are in the public domain. We, the researchers, will make use of them. | These data are in the public domain. We, the researchers, will make use of them. |
| **STORAGE OF DATA**    | Local on the researchers' machines. | Local on the researchers' machines. | Local on the researchers' machines. |
| **ETHICAL ISSUES ASSOCIATED**    | Being publicly accessible mitigates risks of data privacy. It is worth noting that all data is directly anonymized by INEGI. However, due to the type of information it contains, there is a potential for confusion or misunderstanding if used inappropriately or without proper context. | Being publicly accessible mitigates risks of data privacy. However, due to the level of detail in the economic units, precautions should be taken to avoid uses that could be considered unethical, including industrial espionage and promoting unfair competition. | None |
| **ANY OTHER INFORMATION**   | Access to the data through https://www.inegi.org.mx/programas/enigh/nc/2022/#datos_abiertos | Access to the data through https://www.inegi.org.mx/app/descarga/ | Access to the data through https://www.census.gov/naics/ |
| **TYPE OF DATA**   | Public | Public | Public |




### 6.3 Other data that would be great to have

Si tuviéramos acceso a cualquier dato, consideramos que sería especialmente beneficioso para este estudio conocer con precisión las **tareas realizadas en cada uno de las unidades económicas** que estamos investigando, así como la **cantidad de personas que las llevan a cabo** y su **nivel de educación**. Además, el nivel de **acceso a Internet** y las **tendencias en el uso** de nuevas tecnologías serían contribuciones significativas al análisis. Sería igualmente relevante determinar cuánto conocimiento tienen las empresas en México sobre las nuevas tecnologías que están por llegar, ya que esto podría influir en su capacidad para adoptarlas de manera efectiva, facilitando posiblemente la transición y evitando la pérdida de empleos.
<hr>

## 7. Analysis

### 7.1. What analyses will you complete to inform your actions?



## 8. Ethical Considerations

### 8.1. Privacy, Confidentiality, and Security

**Do you work with personally identifiable or sensitive data?** 
We have access to databases containing aggregated information, such as the National Directory of Economic Units (DENUE) and the 2019 Economic Census from INEGI. These data are publicly available and do not contain individually identifiable information. Therefore, we are not handling personally identifiable or sensitive data at an individual level.

### 8.2. Transparency

**Policy Formulators (Not necessarily politicians):**  
Need to know: The potential impact of the project on the labor market and how findings can inform the formulation of public policies.

**Frontline Workers (such as teachers, technology employees, etc.):**  
Need to know: How the study's results could affect their field of work and any policy changes that may arise.

**How We Will Communicate:**  
Presentations and direct meetings to discuss findings and recommendations; we do not want to limit ourselves to just an academic article.

### 8.3. Discrimination/Equity

While our project primarily focuses on the labor market, we will seek to collaborate with experts in related areas to better understand and contextualize our conclusions within the broader framework of social inequality.

### 8.4. Social License

**General Acceptance:**  
If the entire population of the country learns about our project, there is likely to be a mostly positive response. The project's goal is to understand the potential impact of large language models on the labor market, which is a matter of public interest. Furthermore, the results could inform public policies, offer recommendations for different industries, and possibly improve working conditions.

**Possible Concerns:**  
- Labor Impact: "Will this study result in job loss?"
- Equity: "Do the findings apply to my community, or do they only benefit a few?"
- Implementation Costs: "Who will pay for the reforms suggested by the study?"

**Headline on the Front Page of the Newspaper:**  
If the project is conducted transparently, the headline is likely to be positive. Something like:  
"Innovative Data Science Project Aims to Optimize the Labor Market and Formulate More Effective Public Policies"  
However, a study that is limited to being purely descriptive, without offering minimum solution proposals, could result in something like:  
"Inequality Concerns Arise from New Study on the Labor Impact of AI"

### 8.5. Responsibility

**Responsibility in Building the Data Science System:**  
- **Us:** We are responsible for ensuring that data is handled securely and ethically, for identifying and mitigating biases in models, and for being transparent in methodology and results.
- **Oliver:** Feedback mechanisms with Oliver to ensure transparency and error correction.
