In my [previous blog post](https://www.carlos-toruno.com/blog/classification-system/03-using-chat-gpt-as-classifier/), I talked about Large Language Models (LLMs) and how you can use them to perform some specific tasks as long as you put special attention at how you are pharing your instructions to the model. However, we did all this using ChatGPT, which, technically speaking, is an app not a LLM. If you want to incorporate the power of LLMs into your own programming workflow you will have to access them through their official API. In this blog post I will go step by step on how to incorporate the power of GPT and Gemini models to your own framework in order to solve specific tasks.

![AI generated pjhot using DALL-E 2.0](featured.png)

The task at hand will be very easy. First, we want to classify news articles into two groups: those related to the Rule of Law, and those unrelated to the Rule of Law. Second, for those articles that are related to the Rule of Law, we want to see how related they are to each on of the pillars of the Rule of Law. If you are a bit lost or feel unfamiliar with the theoretical framework, don't worry... so is the model. Therefore, I will be providing context along the road.

Let's see the data we will be working with.

## Loading the data
I will be working with a subset of news articles from European newspapers that we donloaded using a News API. For more information on how we were able to get this data, you can check [this blog post](https://www.carlos-toruno.com/blog/classification-system/01-gathering-data/). We begin by reading our data using the [Pandas Python Library](https://pandas.pydata.org/) as follows:

In [2]:
import pandas as pd

master = pd.read_parquet("data_subset.parquet.gzip")
master.head(5)

Unnamed: 0,country,journal,asspillar,language,article_id,title,link,keywords,creator,video_url,...,pubDate,image_url,source_id,source_priority,category,language_id,compiler,title_trans,description_trans,content_trans
2302,[austria],https://www.vn.at/,order_and_security,german,8535537712990c01388421e4aa6c8247,Kein Ende des Kriegs in Israel in Sicht,https://epaper.vn.at/titelblatt/2023/10/09/kei...,[Titelblatt],[importuser],,...,2023-10-09 20:45:15,,vn,1083701.0,[top],de,carlos,No end to the war in Israel in sight,Jerusalem The EU is suspending all payments to...,Israel orders total blockade of the Gaza Strip...
2307,[austria],https://www.vn.at/,order_and_security,german,c09c8b1761eed395062e595869472551,Bewährungsprobe für Markus Söder,https://epaper.vn.at/politik/2023/10/06/bewaeh...,[Politik],[importuser],,...,2023-10-06 20:49:05,,vn,1083701.0,[politics],de,carlos,Test for Markus Söder,Election in Bavaria on Sunday with some open q...,The old Prime Minister will also be the new on...
2313,[austria],https://www.vn.at/,order_and_security,german,b1783608728593e7d811d95e1b4f263a,Bilder des Tages,https://epaper.vn.at/politik/2023/10/02/bilder...,[Politik],[importuser],,...,2023-10-02 20:44:43,,vn,1083701.0,[politics],de,carlos,Pictures of the day,Translation through API failed. Reason: expect...,The 2018 Nobel Peace Prize winner Denis Mukweg...
2326,[austria],https://www.vn.at/,order_and_security,german,ed6711a119adb88769cda28ddd6c9b43,Diebe stehlen 20 Tonnen Äpfel von Obstplantage,https://epaper.vn.at/welt/2023/09/24/diebe-ste...,[Welt],[importuser],,...,2023-09-24 20:41:08,,vn,1083701.0,[top],de,carlos,Thieves steal 20 tons of apples from orchard,Dieterskirchen thieves stole around 20 tons of...,Dieterskirchen thieves stole around 20 tons of...
2328,[austria],https://www.vn.at/,order_and_security,german,baa36e05329fb8381a1d2c10f85b1134,Politik in Kürze,https://epaper.vn.at/politik/2023/09/21/politi...,[Politik],[importuser],,...,2023-09-21 20:49:39,,vn,1083701.0,[politics],de,carlos,Politics in brief,hangzhou Syrian President Bashar al-Assad is v...,The Syrian ruler traveled to Hangzhou. Sana/AF...


The data we will be using is a small subset of 100 news articles from way bigger data file that contains information of more than 32,000 news articles from European newspapers. As you might have noticed, the data was stored as a parquet file. For huge quatities of information, [Apache Parquet](https://parquet.apache.org/) is a more efficient format to store data than JSON or CSV in terms of storage, memory usage, and reading speed. If you want to learn more about the advantages of human non-readable formats I would suggest you to read [this article](https://towardsdatascience.com/which-data-format-to-use-for-your-big-data-project-837a48d3661d) from Towards Data Science.

For the purpose of this blog post, I will only work with 5 specific news articles from the above subset and save them as a pandas data frame called `extract`. Let's take a look at the wonderful winners of this lotto: 

In [3]:
idx = [2307, 2357, 2383, 2439, 2951]
extract = master.loc[idx, ["country", "journal", "article_id", "link", "title_trans", "description_trans", "content_trans"]]
extract

Unnamed: 0,country,journal,article_id,link,title_trans,description_trans,content_trans
2307,[austria],https://www.vn.at/,c09c8b1761eed395062e595869472551,https://epaper.vn.at/politik/2023/10/06/bewaeh...,Test for Markus Söder,Election in Bavaria on Sunday with some open q...,The old Prime Minister will also be the new on...
2357,[austria],https://www.salzburg24.at/,edfa1c27e299213b2e2bf4325361a0ba,https://www.salzburg24.at/news/oesterreich/dre...,Three dead in fire in LK Mödling: investigatio...,After three patients died in a fire at the Möd...,0 Published: 18. October 2023 3:33 p.m. After ...
2383,[austria],https://www.salzburg24.at/,27abe7fe74f897c0192e5e59188a9d1f,https://www.salzburg24.at/news/welt/klima-prot...,"Climate protest in The Hague: 2,400 arrests","In the Dutch city of The Hague, police broke u...",0 Wij verafschuwen het geweld the word was use...
2439,[austria],https://www.salzburg24.at/,80cea065a92d0e20c0b260a612fd1b87,https://www.salzburg24.at/sport/fussball/oeste...,Failed qualifying dress rehearsal for Austria,Austria's national soccer team failed in the d...,0 Published: 07. September 2023 10:38 p.m. Aus...
2951,[belgium],https://www.lesoir.be/,8739dea5ef18c8f95991f1b063804e1f,https://www.lesoir.be/520042/article/2023-06-1...,"Tax reform in Belgium, an emergency for sixty ...",The federal government is trying to reform tax...,"The big tax reform, everyone wants it, but no ..."


## Writting the instructions

As I mentioned before, we are going to divide the task at hand in two stages. In the first stage, we will classify articles according to their relation to the Rule of Law. For this, we will use a contextual prompt as follows:

>_You are an assistant with knowledge and subject-matter expertise on Rule of Law, justice, governance, global politics, 
>social sciences, and related fields in the European Union. Your task is to carefully read a news article and determine 
>whether it is related to the definitions of Rule of Law, Justice, and Governance that I will give you. To successfully 
>perform this task, you should carefully read the definitions that I will provide, and use the knowledge of global politics, 
>law, and social sciences that you have._

In [5]:
context_stage_1 = """
You are an assistant with knowledge and subject-matter expertise on Rule of Law, justice, governance, global politics, 
social sciences, and related fields in the European Union. Your task is to carefully read a news article and determine 
whether it is related to the definitions of Rule of Law, Justice, and Governance that I will give you. To successfully 
perform this task, you should carefully read the definitions that I will provide, and use the knowledge of global politics, 
law, and social sciences that you have.
"""

Additionally, we will write the additional information, the news article, and the set of instructions in a separate prompt that we are going to call `instructions_stage_1`. First, I will provide some _key concepts_ to the model so it doesn't base its answers entirely on the information that was used during its training:

>#### Key macro concepts
>_Here are the definitions of Rule of Law, Justice, and Governance:_
>
>_The term Rule of Law refers to ..._
>
>_We define Justice as ..._
>
>_Finally, we define Governance as ..._

In the same prompt, after passing the key concepts, I will pass the headline, summary, and full content of the news article:

>_Now, given the following news article:_
>
>_News title:_ {headline}
>
>_News summary:_ {summary}
>
>_News body:_ {body}

Finally, I will provide some specific instructions telling the model what to do with the information I just passed and I will ask the model to structure its answer following a specific JSON format:

>_Please analyze the news article and its context, and answer the following question:_
>1. _Based on the definitions that I just provided above, is this news article narrating events related to the Rule of _
>_Law, Justice, or Governance?_
>
>_Use the following JSON format to answer:_
>
>{{
>
>    _rule_of_law_related: answer to the question number 1. if the news article is not related to the Rule of Law, Justice,_ 
>    _or Governance answer with "Unrelated", otherwise answer with "Yes"._
>
>}}

I will make use of [Markdown](https://www.markdownguide.org/basic-syntax/) syntax such as headers (#) and lists to structure and pass the information to the model. You can check the FULL instruction prompt that I'm using for the first stage [here](https://www.carlos-toruno.com/blog/classification-system/04-Langchain-GeminiPro/instructions_stage_1.txt).

In [6]:
instructions_stage_1 = """
# Key macro concepts
Here are the definitions of Rule of Law, Justice, and Governance:

## What is the Rule of Law?
The term Rule of Law refers to a system in which law is able to impose meaningful restraints on the state and individual 
members of the ruling elite. It refers to a governance principle in which all persons, institutions, and entities, public 
and private, including the State itself, are accountable to laws that are publicly promulgated, equally enforced, and 
independently adjudicated, and which are consistent with international human rights norms and standards.

We extend this concept further by defining the Rule of Law as a rules-based system in which the following four universal 
principles are upheld. First, the government and its officials and agents are accountable under the law. Second, the laws 
are clear, publicized, stable, and fair, and protect fundamental rights, including the security of persons and property. 
Third, the process by which the laws are enacted, administered, and enforced is accessible, fair, and efficient. Lastly, 
access to justice is provided by competent, independent, and ethical adjudicators, attorneys or representatives, and 
judicial officers who are of sufficient number, have adequate resources, and reflect the makeup of the communities they serve.

Given the previous definitions, we can infer that the Rule of Law is a multidimensional concept that extends over eight 
basic pillars: limited government powers; absence of corruption; order and security; fundamental rights; open government; 
effective regulatory enforcement; access to civil justice; and an effective criminal justice.

From a political science perspective, these factors draw on four basic ideas related to the relationship between the state 
and the civil society: the checks and balances on the government's power, the effectiveness of the state in performing its 
basic functions, participation and collaboration between the state and its citizens, and absence of arbitrary abuse by the 
authority.

## What is Justice?
We define Justice as the ethical, philosophical idea that people are to be treated impartially, fairly, properly, 
and reasonably by the law and by arbiters of the law, that laws are to ensure that no harm befalls another, and that, 
where harm is alleged, a remedial action is taken - both the accuser and the accused receive a morally right 
consequence merited by their actions.

Justice also refers more specifically to the set of institutions and services that allow for the impartial mediation 
and adjudication of disputes arising from the violation or conflicting interpretations of laws, including the enforcement 
of decisions or agreements reached through their operation.

## What is Governance?
Finally, we define Governance as the mechanisms, processes, and structures through which public institutions operate and 
make decisions for the welfare and benefit of their respective communities. This encompasses the way governments and local 
authorities are organized, how policies are formulated and implemented, how resources are allocated and managed, and how 
public services are delivered to citizens.

# News article and instructions
Now, given the following news article:
News title: {headline}
News summary: {summary}
News body: {body}

Please analyze the news article and its context, and answer the following question:
1. Based on the definitions that I just provided above, is this news article narrating events related to the Rule of 
Law, Justice, or Governance?

Use the following JSON format to answer:
{{
    rule_of_law_related: answer to the question number 1. if the news article is not related to the Rule of Law, Justice, 
    or Governance answer with "Unrelated", otherwise answer with "Yes".
}}
When answering, strictly attach to the definitions of the Rule of Law, Justice, and Governance and keep in mind the following points:
1. We do not include the right to access to healthcare in the conceptual framework of the Rule of Law because we are not contacting public 
health experts in this project, so news articles which main topic is the right to healthcare should not be related to the Rule of Law.
2. It is very important to distinguish between different types of news: description of events, interviews, op-eds, public statements, to 
clarify whether the news article has any actual impact on the Rule of Law, Justice, or Governance.
3. Opinion columns and opinion articles in general should not be related to the Rule of Law, Justice, and Governance.
There is a very important key point to keep in mind:
Our conceptual framework does NOT include war, armed conflict, or peace agreements. Therefore, news articles narrating events that derive 
from war, armed conflict, or peace agreements, or events that are a direct consequence from the previous situations should NOT be related 
to the Rule of Law.

Remember to ONLY answer following the JSON format explained above.

Begin!
"""

In [7]:
context_stage_2 = """
You are an assistant with knowledge and expertise in global politics, social sciences, rule of law, and related fields. Your task is to assist 
me in classifying news articles according to which pillar of the Rule of Law do they belong to. To successfully accomplish this task, you will 
have to carefully read a news article and the definitions of each pillar that I will give you, as well as use the knowledge of global politics, 
social sciences, and law that you have. Once you have read the news article, you will proceed to determine the extent to which the events 
described in the news article are related to each pillar.
"""

In [8]:
instructions_stage_2 = """
To help you contextualize, this is the definition of the Rule of Law that we are using for the task at hand:

# Definition of the Rule of Law
The term Rule of Law refers to a system in which law is able to impose meaningful restraints on the state and individual members of the ruling elite. 
It refers to a governance principle in which all persons, institutions, and entities, public and private, including the State itself, are accountable 
to laws that are publicly promulgated, equally enforced, and independently adjudicated, and which are consistent with international human rights norms 
and standards.

We extend this concept further by defining the Rule of Law as a rules-based system in which the following four universal principles are upheld. First, 
the government and its officials and agents are accountable under the law. Second, the laws are clear, publicized, stable, and fair, and protect fundamental 
rights, including the security of persons and property. Third, the process by which the laws are enacted, administered, and enforced is accessible, fair, 
and efficient. Lastly, access to justice is provided by competent, independent, and ethical adjudicators, attorneys or representatives, and judicial officers 
who are of sufficient number, have adequate resources, and reflect the makeup of the communities they serve.

From a political science perspective, these factors draw on four basic ideas related to the relationship between the state and the civil society: the checks 
and balances on the government's power, the effectiveness of the state in performing its basic functions, participation and collaboration between the state 
and its citizens, and absence of arbitrary abuse by the authority.

Given the previous definitions, we can infer that the Rule of Law is a multidimensional concept that extends over eight basic pillars: limited government 
powers; absence of corruption; order and security; fundamental rights; open government; effective regulatory enforcement; access to civil justice; and an 
effective criminal justice.

Next, I will provide you with detailed definitions for each pillar.

# Pillars of the Rule of Law
1. Constraints on Government Powers: 
    Measures the extent to which those who govern are bound by law. It comprises the means, both constitutional and institutional, by which the powers of the 
    government and its officials and agents are limited and held accountable under the law. It also includes non-governmental checks on the government’s power, 
    such as a free and independent press. Additionally, it measures the absence of authoritarianism, which includes the accountability of the Chief Executive, 
    and its respect for checks and balances.

    In more detail, this pillar is related to the following topics:
    -	The active and reactive transparency of legislative bodies (senate, parliament, assemblies), its ability to impose disciplinary measures to government 
    officials, and if there is representation of disadvantaged groups, as well as citizen participation. It also measures if the opposition can express its 
    opinions and if it exercises its functions of overseeing and investigating the government.
    -	The independence of the judiciary, by means of having sufficient resources and professional judges, with adequate rights and competencies, as well as 
    its ability to impose disciplinary measures on government officials.
    -	The independence and effectiveness of oversight institutions, such as the Supreme Audit Institution or comptroller, anticorruption authority, human rights 
    institution, the office of the ombudsman, and prosecution services, as well as if civil servants are free from political pressure, and are effective in implementing 
    public policies.
    -	Whether elections are free of barriers of entry, intimidation, corruption, and criminality, in accordance with the law, with equitable access to resources, 
    free from misinformation, secure from cyberattacks and with protection of personal. It also measures the effectiveness and independence of the electoral authority.
    -	The effectiveness of non-governmental checks on the government’s power, such as the media, CSOs, political parties, activists, and citizens. It includes their 
    effective exercise of the freedoms of assembly, association, opinion, and expression, and the rights to petition and civic engagement.
    -	The accountability of the Chief Executive or the Head of Government, and its respect for the constitutional order, the law-making process, the outcome and quality 
    of elections, civil liberties, political opponents, as well as the independence of the judiciary and oversight institutions. It also measures the likelihood of 
    sanctions to officials for misconduct.
    -	Whether members of the legislature, judiciary, elected leaders or high-ranking government officials, public sector employees, and police officers, who abuse their 
    power are sanctioned for misconduct.

2. Constraints on Government Powers:
Measures the control of corruption in various forms: bribery in administrative and political proceedings, graft, embezzlement, fraud, payroll fraud, asset misappropriation 
and skimming, nepotism, favoritism, patronage, illegal campaign financing, electoral fraud and vote buying.

In more detail, this pillar is related to the following topics:
    -	The prevalence of bribery in the delivery of public services and regulations, as well as in the political process. 
    -	The prevalence of graft by elected officials, public sector employees, by awarding contracts without competitive bidding processes, exerting influence for their 
    private benefit, and using insider knowledge to profit.
    -	The prevalence of embezzlement or misappropriation of public funds, payroll fraud, asset misappropriation and skimming, as well as of nepotism, favoritism, 
    and patronage.
    -	The prevalence of nepotism and favoritism by elected officials, public sector employees, judges, and prosecutors. It also measures the prevalence of patronage 
    by elected officials.
    -	The prevalence of corruption in elections, including illegal campaign financing, manipulation of elections, and vote buying.

3. Open Government
Measures the openness of government defined by the extent to which a government shares information, empowers people with tools to hold the government accountable, and 
fosters citizen participation in public policy deliberations while maintaining an open civic space. This factor measures whether basic laws and information on legal 
rights are publicized and evaluates the quality of information published by the government.

In more detail, this pillar is related to the following topics:
    -	Whether requests for information from the public to government agencies and institutions are granted, and if these requests are granted within a reasonable 
    time period, with complete and pertinent answers, at a reasonable cost and without having to pay a bribe for the information.
    -	Whether people are aware of their right to information, and whether relevant records are accessible to the public upon request.
    -	The effectiveness of non-governmental checks on the government’s power, as well as the openness of the civic space and the extent of citizen participation. 
    It includes the effective exercise of the freedoms of assembly, association, opinion, and expression, and the rights to petition and civic engagement.

4. Fundamental Rights
This pillar recognizes that a system of positive law that fails to respect core human rights established under international law is at best “rule by law,” and does not 
deserve to be called a rule of law system. This aspect focuses on rights that are firmly established under the Charter of Fundamental Rights of the European Union, but 
only includes a relatively small number of rights which are most closely related to rule of law concerns.

In more detail, this pillar is related to the following topics:
    -	The protection of rights related to dignity, including the prohibition of torture, slavery and forced labor.  
    -	The protection of rights related to freedom, including freedom of thought, conscience, religion, peaceful assembly, association, opinion, and expression, as well 
    as the right to property and to asylum.
    -	Whether civil society organizations are free to comment on government policies, without fear of retaliation. In addition, it also measures if quarantines and 
    lockdowns are applied proportionately, not discriminatory, with limited duration and in accordance with the law.
    -	Whether independent media, civil society organizations, members of the opposition, opposing factions of the governing party, activists, whistleblowers, and 
    individuals are free to report and comment on government policies, without fear of retaliation.
    -	The extent to which the media publish credible fact-checked information, and if its editorial content is independent from political influences, as well as if it 
    can conduct investigations and expose cases of corruption without facing surveillance, harassment, threats, legal actions, or administrative sanctions.
    -	The extent to which the government allows the formation or operation of CSOs, and the formation or operation of opposition parties.
    -	Whether the government respects the property rights of people and corporations, refrains from the illegal seizure of private property, and provides adequate reasons 
    and compensation when property is legally expropriated. It also measures if the process for transferring a property is simple and quick. In addition, it measures if 
    foreign investors receive fair and equitable treatment from the government, and the effective enforcement of anti-squatting laws and intellectual property rights.
    -	Whether the government respects the right to asylum and refugees can move freely within the host country, have access to legal work and state services such as education, 
    accommodation, meals, healthcare, and cash benefits, as well as legal documentation and economic opportunities.
    -	The protection of rights related to equality, including equality before the law, and absence of discrimination based on socio-economic status, gender, ethnicity, religion, 
    national origin, sexual orientation, or gender identity.
    -	The protection of rights related to solidarity, including labor rights such as workers’ right to information, consultation, collective bargaining, access to placement services, 
    and protection in the event of unjustified dismissal, as well as the prohibition of child labor.
    -	The protection of rights related to citizens´ rights, including political rights such as the right to vote and to stand as a candidate at European Parliament and municipal 
    elections, to information, petition, movement, and of residence.
    -	The protection of rights related to justice, including the right to effective remedy and to a fair trial, presumption of innocence, and to defense, as well as rights of the 
    accused including principles of legality and proportionality of criminal offenses and penalties, and the right not to be tried or punished twice.

5. Security
Measures the assurance of the security of persons and property. Security is one of the defining aspects of any rule of law society and is a fundamental function of the state. It is 
also a precondition for the realization of the rights and freedoms that the rule of law seeks to advance. We are excluding armed conflict from this pillar.

In more detail, this pillar is related to the following topics:
    -	The extent to which people feel safe and secure in their city, town, or village, and in their neighborhood.
    -	The extent to which the State is able to keep crime and violence to minimum levels.

6. Regulatory Enforcement and Enabling Business Environment
Measures the extent to which regulations are fairly and effectively implemented and enforced. Regulations, both legal and administrative, structure behaviors within and outside of the 
government. This factor does not assess which activities a government chooses to regulate, nor does it consider how much regulation of a particular activity is appropriate. Rather, 
it examines how regulations are implemented and enforced.
In more detail, this pillar is related to the following topics:
    -	The extent to which the legal framework for businesses is clear, accessible, and predictable, as well as the respect for property rights.
    -	Whether the government respects the property rights of people and corporations, refrains from the illegal seizure of private property, and 
    provides adequate reasons and compensation when property is legally expropriated. In addition, it measures if foreign investors receive fair and equitable treatment from the 
    government, and the effective enforcement of anti-squatting laws and intellectual property rights. 
    -	Whether the process for transferring a property is simple and quick. 
    -	The extent to which the government audits and inspections are conducted in accordance with the law and are free of corruption; if complaint mechanisms are simple, accessible, 
    and cost-effective, and if regulation authorities are impartial, and if they sanction violations.
    -	Whether environmental and labor regulation authorities sanction violations, such as occupational safety and business license or zoning violations.

7. Civil Justice
Measures whether ordinary people can resolve their grievances through formal institutions of justice in a peaceful and effective manner, as well as in accordance with generally 
accepted social norms rather than resorting to violence or self-help. Access to civil justice requires that the system be accessible, affordable, effective, impartial, and culturally 
competent.

In more detail, this pillar is related to the following topics:
    -	Whether justice institutions help prevent legal and justice needs. It also measures whether authorities tolerate illegal activities such as squatting, street vending 
    or informal labor arrangements.
    -	Whether people are aware of their rights, formal justice, and alternative justice mechanisms, and know where to get information and advice when facing a legal problem.
    -	Whether can access and afford legal advice and representation; and can access the court system without incurring unreasonable fees.
    -	Whether the civil justice system is impartial and free of discrimination, corruption and undue influence from the government and private interests.
    -	The effectiveness and timeliness of the enforcement of civil justice decisions and judgments in practice.
    -	Whether alternative dispute resolution mechanisms (ADRs) are available, accessible, affordable, impartial, timely, effective, enforceable, and free of corruption and undue 
    influence from the government and private interests.

8. Criminal Justice
Evaluates a country’s criminal justice system. An effective criminal justice system is a key aspect of the rule of law, as it constitutes the conventional mechanism to redress 
grievances and bring action against individuals for offenses against society. An assessment of the delivery of criminal justice should take into consideration the entire system, 
including the police, lawyers, prosecutors, judges, and prison officers.

In more detail, this pillar is related to the following topics:
    -	Whether criminal investigations are effective, timely, impartial, and free of corruption and undue influence from criminal organizations and political and private 
    interests.
    -	The effectiveness of prosecutors in investigating crimes, and if pre-trial proceedings are timely, outcome-oriented, impartial, and free of corruption and undue influence 
    from political and private interests.
    -	Whether trials are timely, outcome-oriented, impartial, and free of corruption and undue influence from political and private interests.
    -	Whether alternative dispute resolution mechanisms (ADRs) are available, accessible, affordable, impartial, timely, effective, enforceable, and free of corruption and undue 
    influence from the government and private interests.
    -	Whether victim’s rights are effectively guaranteed, including treatment with respect, absence of discrimination, timely and sufficient information, and protection.
    -	Whether the basic rights of criminal suspects are respected, including the presumption of innocence and the freedom from arbitrary arrest and unreasonable pre-trial detention. 
    It also measures whether criminal suspects are able to access and challenge evidence used against them, whether they are subject to abusive treatment, and whether they are provided 
    with adequate legal assistance. In addition, it measures whether the basic rights of prisoners are respected once they have been convicted of a crime.
    -	Whether the prison system guarantees conditions of safety and order and respects the rights of people deprived of their liberty. It also measures the absence of corruption and 
    the effectiveness of the prison system in reducing recidivism.

# News article and instructions
Now, given the following news article:
News title: {headline}
News summary: {summary}
News body: {body}

Analyze the news article and its context, and answer the following question:
1.	How related is this news article to each one of the pillars of the Rule of Law? Produce a score for each pillar ranging from 1 to 10, where 10 reflects that the news article is 
HIGHLY related to a specific pillar, and 0 meaning that the news article is TOTALLY UNRELATED to a given pillar.

Use the following JSON format to structure your answer:
    {{
        "pillars_relation": answer to the question number 1. Answer using a list where each element of the list is a dictionary like this: 
        {{'[number of pillar]. name of pillar': relevance score as an integer number}}
    }}

IMPORTANT: As you saw, each pillar measures a wide range of elements, and they tend to overlap in various aspects. On the other hand, the news articles describe events that take 
place in a specific country, territory, or region. Therefore, a pillar being related to a news article means that the events narrated by the article can be measured by that pillar, 
or that the events happening in the news article would have an impact on that pillar, either positive, negative, or neutral.
Remember to ONLY answer following the JSON format explained above.

When answering, strictly attach to the definitions of the Rule of Law, Justice, and Governance that I provided above and keep in mind the following points:
    1. We do not include the right to access to healthcare in the conceptual framework of the Rule of Law because we are not contacting public health experts in this project, so news 
    articles which main topic is the right to healthcare should not be related to the Rule of Law.
    2. It is very important to distinguish between different types of news: description of events, interviews, op-eds, public statements, to clarify whether the news article has any 
    actual impact on the Rule of Law, Justice, or Governance.
    3. Opinion columns and opinion articles in general should not be related to the Rule of Law, Justice, and Governance.
    4. We are only interested in current or recent events related to the Rule of Law. Therefore, please exclude historical events in the far past.
    5. Any news article referring to foreign policy or international diplomacy should be labeled as unrelated to the Rule of Law.
    There is a very important key point to keep in mind:
    Our conceptual framework does NOT include war, armed conflict, or peace agreements. Therefore, news articles narrating events that derive from war, armed conflict, or peace 
    agreements, or events that are a direct consequence from the previous situations should NOT be related to the Rule of Law.
    Remember to ONLY answer following the JSON format explained above.

Begin!
"""

## Accesing LLMs through their API endpoints

Now that we have loaded the data, we have written (and tested) our prompt, we can proceed to use a LLM to perform the task for us. To be able to send requests to an API, we need to have an API Key. In this guide, we will be using the [GPT model from OpenAI](https://openai.com/gpt-4) and the [Gemini model from GoogleAI](https://deepmind.google/technologies/gemini/#introduction). Therefore, in order to follow this guide, you will have to create an account and an API key from these two developers. Right now, January 2024, accessing the Gemini Pro model through its official API is free and open to the public given that they are introducing the product to the market.

[In a previous post](https://www.carlos-toruno.com/blog/classification-system/01-gathering-data/#managing-you-api-key), I talked about how to manage your API keys through environment variables. Just in case, I leave you the video explaining how to do this using the [dotenv Python library](https://github.com/theskumar/python-dotenv).

<iframe width="100%" height="325" src="https://www.youtube.com/embed/CJjSOzb0IYs?si=Klk-0E98DidAOf1_" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

We start by loading our API keys as follows:

In [24]:
import os
from dotenv import load_dotenv

# Loading API KEY from environment
load_dotenv()
OpenAI_key   = os.getenv("openai_key")
GoogleAI_key = os.getenv("GOOGLE_API_KEY") 

Once we have our API keys loaded as environment variables, we can use the [OpenAI official Python library](https://github.com/openai/openai-python) to send calls and acces their models. OpenAI has different model families depending on the capability you are interested. [DALL·E](https://openai.com/dall-e-3) is a family of models focused on image generation, [Whisper](https://openai.com/research/whisper) is a family of models focused on speech recognition, and [GPT](https://openai.com/gpt-4) is their signature model focused on text generation. For a task such as text classification, what we need is a text model. Therefore, we will be using the `GPT-4-Turbo` model to classify our news articles.

First, let's focus only in the news article about the elections in Bavaria "_A Test for Markus Söder_" with row index `2307` in our `extract` data frame:

In [18]:
headline = extract.loc[2307, "title_trans"]
summary  = extract.loc[2307, "description_trans"]
body     = extract.loc[2307, "content_trans"]

We pass the headline, summary, and content of the news article to the `instructions_stage_1` prompt using the `format()` method:

In [19]:
instructions_stage_1_2307 = instructions_stage_1.format(headline = headline, 
                                                        summary  = summary, 
                                                        body     = body)

We will use the Chat Completion endpoint to pass the information and classify the article at hand as RELATED or UNRELATED to the definition of Rule of Law that we have. The Chat Completion API will take a list of messages as input, and generate an output. However, these messages need to be assign to a specific role. According to the [official documentation](https://platform.openai.com/docs/guides/text-generation/chat-completions-api), there are three roles available: "_system_", "_user_", and "_assistant_". The system message will be setting the general behavior of the model across the conversation. The user messages provide requests or comments for the assistant to respond to. Assistant messages store previous assistant responses, but you can also pass example responses to signal a desired behavior for the model.

Having this in mind, we will pass our `context_stage_1` prompt as the **_system message_**, and our `instructions_stage_1` prompt as a **_user_** message. Additionally, we will require the response provided by the model (or assistant) to be in a JSON format. We use the API endpoint as follows:

In [33]:
from openai import OpenAI

client = OpenAI(api_key = os.getenv("openai_key"))

completion = client.chat.completions.create(
    model = "gpt-4-0125-preview",
    messages = [
        {"role": "system", "content": context_stage_1},
        {"role": "user",   "content": instructions_stage_1_2307}
    ],
    response_format = {"type": "json_object"},
    temperature = 0.2
)

We take a look to the answer by looking into the choices of answers thrown by the model. As we can observe, the model has classify the news article as RELATED to the Rule of Law, which, to be honest, makes sense because the article is talking about elections in Bayern, Germany.

If you are wondering about the `temperature` parameter, it is a numeric value between 0 and 1 that signals how deterministic or random should the assistant construct its answer. Lower values make the answer more deterministic and focused, while higher values will increase the randomness of the output.

In [36]:
print(completion.choices[0].message.content)

{
    "rule_of_law_related": "Yes"
}


We have succesfully completed the first stage. Now that we know that the article is related to the Rule of Law, we can proceed to the second stage of our task: rating from zero to ten how related is this article to each one of the pillars of the Rule of Law. For this, we will be making some adjustments to our prompts. For our _context prompt_, we will be using the following text:

>_You are an assistant with knowledge and expertise in global politics, social sciences, rule of law, and related fields. Your task is to assist 
>me in classifying news articles according to which pillar of the Rule of Law do they belong to. To successfully accomplish this task, you will 
>have to carefully read a news article and the definitions of each pillar that I will give you, as well as use the knowledge of global politics, 
>social sciences, and law that you have. Once you have read the news article, you will proceed to determine the extent to which the events 
>described in the news article are related to each pillar._

For our _instructions prompt_ we will be using a rather large text that you can check [here](https://www.carlos-toruno.com/blog/classification-system/04-Langchain-GeminiPro/instructions_stage_2.txt). Having these new inputs, we can process our request again:

In [37]:
# Introducing the news article into the instruction prompt
instructions_stage_2_2307 = instructions_stage_2.format(headline = headline, 
                                                        summary  = summary, 
                                                        body     = body)

# Making a request to the GPT Chat Completions API
completion = client.chat.completions.create(
    model = "gpt-4-0125-preview",
    messages = [
        {"role": "system", "content": context_stage_2},
        {"role": "user",   "content": instructions_stage_2_2307}
    ],
    response_format = {"type": "json_object"},
    temperature = 0.2
)

# Printing the output
print(completion.choices[0].message.content)

{
    "pillars_relation": [
        {"1. Constraints on Government Powers": 8},
        {"2. Absence of Corruption": 5},
        {"3. Open Government": 7},
        {"4. Fundamental Rights": 2},
        {"5. Order and Security": 1},
        {"6. Regulatory Enforcement and Enabling Business Environment": 1},
        {"7. Civil Justice": 1},
        {"8. Criminal Justice": 1}
    ]
}


**Beautiful!** **Subarashī!**

We can see that the model thinks that news article in highly related to pillars 1 "_Constraints on Government Powers_" and 3 "_Open Government_", somehow related to pillar 2 "_Absence of Corruption_", and completely unrelated to all the other pillars. Honestly... that was amazing. Because those are quite similar to the ratings that I (an allegedly expert in the topic) would give to the article. The model only needed some context and instructions to start generating text. Behind the curtains, what is happening is that the model is just constructing sentences based on the probability of what the next word is basing its predictions on the inputs that you just passed. Again, isn't that amazing?

Let's continue with our journey. We can do the exact same task, using another big language model that was just release in December, 2023 by Google. The **Gemini Pro**. Unlike the GPT-4-Turbo, Google has granted free access to the Gemini Pro capabilities through their official API. In order to use this model, we will need to adjust some things:

- First, the Google API only accepts two roles in the list of messages: "_user_" and "_model_", so we will have to turn the context prompt into a user message.
- Second, the API only accepts multi-turn conversations. That means that, for our specific case in which we pass two user message, we also need to provide a model answer to the first message. A short message like: "_Sure, I can assist you in classifying news articles according to the pillars of the Rule of Law._" will be enough.
- Third, we need to setup some safety settings to avoid getting rejections in our calls. Please check the [official Python documentation](https://ai.google.dev/tutorials/python_quickstart) for more information.

Let's begin by authenticating and setting up a channel through the Generative Model endpoint as follows:

In [38]:
model_answer_stage_1 = """
"Sure, I can help you with that. I have knowledge and subject-matter expertise on Rule of Law, justice, governance, global politics, social sciences, 
and related fields in the European Union. I can carefully read a news article and determine whether it is related to the definitions of Rule of Law, 
Justice, and Governance that you will provide."
"""

model_answer_stage_2 = """
Sure, I can assist you in classifying news articles according to the pillars of the Rule of Law. 
"""

In [60]:
import google.generativeai as genai

# Authenticating our API key
genai.configure(api_key = GoogleAI_key)

# Set up the model config
generation_config = {
  "temperature": 0.2,
  "top_p": 1,
  "top_k": 1,
  "max_output_tokens": 1000,
}

# Safety presettings
safety_settings = [
  {
    "category": "HARM_CATEGORY_HARASSMENT",
    "threshold": "BLOCK_ONLY_HIGH"
  },
  {
    "category": "HARM_CATEGORY_HATE_SPEECH",
    "threshold": "BLOCK_ONLY_HIGH"
  },
  {
    "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
    "threshold": "BLOCK_ONLY_HIGH"
  },
  {
    "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
    "threshold": "BLOCK_ONLY_HIGH"
  },
]

# Set-up a model
model = genai.GenerativeModel(model_name        = "gemini-pro",
                              generation_config = generation_config,
                              safety_settings   = safety_settings)

Once that we set-up a model to be used, we can start a multi-turn instance with the model to provide the information contained in `context_stage_1`, and `instructions_stage_1`. As we can observe, Gemini Pro also thinks that this news article is related to the Rule of Law:

In [55]:
# Start an instance
instance = model.start_chat(history = [
  {
    "role": "user",
    "parts": [context_stage_1]
  },
  {
    "role": "model",
    "parts": [model_answer_stage_1]
  }
])

# Sending instructions
instance.send_message(instructions_stage_1_2307)

# Previewing answer
print(instance.last.text)

```
{
    "rule_of_law_related": "Yes"
}
```


Let's now try it again but now with the second stage. Let's compare the ratings assigned by GPT and Gemini to the same news article:

In [61]:
# Start an instance
instance = model.start_chat(history = [
  {
    "role": "user",
    "parts": [context_stage_2]
  },
  {
    "role": "model",
    "parts": [model_answer_stage_2]
  }
])

# Sending instructions
instance.send_message(instructions_stage_2_2307)

# Previewing answer
print(instance.last.text)

```
{
    "pillars_relation": [
        {
            "1. Constraints on Government Powers": 7
        },
        {
            "2. Absence of Corruption": 5
        },
        {
            "3. Open Government": 4
        },
        {
            "4. Fundamental Rights": 6
        },
        {
            "5. Security": 3
        },
        {
            "6. Regulatory Enforcement and Enabling Business Environment": 2
        },
        {
            "7. Civil Justice": 1
        },
        {
            "8. Criminal Justice": 1
        }
    ]
}
```


As we can observe, the ratings are somehow similar between GPT and Gemini. However, Gemini scored Pillar 4 "_Fundamental Rights_" higher than GPT, while also giving a lower score to Pillar 3 "_Open Government_". Both answers are acceptable, and I can also see the potential reasons behind the differences in the scores.

We can extract the individual scores for each pillar by parsing the string content into a Python dictionary and extracting the dictionary values using a list comprehension:

In [69]:
import json

json_content  = json.loads(instance.last.text[3:-3])
pillar_scores = [list(x.values())[0] for x in json_content["pillars_relation"]]
pillar_scores

[7, 5, 4, 6, 3, 2, 1, 1]

## Setting up a data workflow

By now, you are able to use the OpenAI and GoogleAI APIs to access their models in order to generate text outputs. However, we have been doing it targeting a single news article. This gives us no advantage against using the official apps such as ChatGPT or Bard. The main purpose for us to access the power and capabilities of the models through their API is to be able to process large amounts of information without having to depend on an user interface. In other words, accessing the model in a programmatically way. In other words, instead of running the code individually for each article, having access to the API allow us to set up a workflow to process a whole data file. 

For example, in our case, we will define a single function that will perform the task automatically for us. The function will work as follows:
- First, it will extract the relevant information for each news article (headlin, summary, and content) and format an instruction prompt for that article in specific.
- Then, it will send the instructions to the model through their respective API.
- Finally, it will parse and process the string output sent by the model and store it as new variables in our data frame.

We will use the GeminiPro API for this example as follows:

In [82]:
def classify_article(row, stage):
    """
    A function that takes a row as an input, formats a prompt for stage 1, sends a conversation request to the
    GeminiPro API and returns the answer from the model.
    """
    if stage == 1:
        instprompt = instructions_stage_1
        conprompt  = context_stage_1
        ansprompt  = model_answer_stage_1
    if stage == 2:
        instprompt = instructions_stage_2
        conprompt  = context_stage_2
        ansprompt  = model_answer_stage_2

    # Formatting prompt
    prompt = instprompt.format(headline = row["title_trans"], 
                               summary  = row["description_trans"], 
                               body     = row["content_trans"])
    
    # Start an instance
    instance = model.start_chat(history = [
    {
        "role": "user",
        "parts": [conprompt]
    },
    {
        "role": "model",
        "parts": [ansprompt]
    }
    ])
    
    # Sending instructions
    instance.send_message(prompt)

    # Parsening results
    out = json.loads(instance.last.text[3:-3])
    if stage == 1:
        val = list(out.values())[0]
        return val
    
    if stage == 2:
        if row["stage_1"] == "Yes":
            pillar_scores = [list(x.values())[0] for x in out["pillars_relation"]]
        
        else:
            pillar_scores = [0,0,0,0,0,0,0,0]

        return pillar_scores

Once we have the function defined, we can use vectorization to apply it to a whole data frame. We will test it using the `extract` data file with our 5 news articles.

In [85]:
extract["stage_1"] = extract.apply(lambda row: classify_article(row, stage=1), axis = 1)
extract["stage_2"] = extract.apply(lambda row: classify_article(row, stage=2), axis = 1)

In [86]:
extract.loc[:,["title_trans", "description_trans", "content_trans", "stage_1", "stage_2"]]

Unnamed: 0,title_trans,description_trans,content_trans,stage_1,stage_2
2307,Test for Markus Söder,Election in Bavaria on Sunday with some open q...,The old Prime Minister will also be the new on...,Yes,"[7, 5, 4, 6, 2, 3, 2, 2]"
2357,Three dead in fire in LK Mödling: investigatio...,After three patients died in a fire at the Möd...,0 Published: 18. October 2023 3:33 p.m. After ...,Yes,"[3, 2, 1, 5, 7, 1, 2, 8]"
2383,"Climate protest in The Hague: 2,400 arrests","In the Dutch city of The Hague, police broke u...",0 Wij verafschuwen het geweld the word was use...,Yes,"[7, 0, 0, 8, 7, 0, 0, 9]"
2439,Failed qualifying dress rehearsal for Austria,Austria's national soccer team failed in the d...,0 Published: 07. September 2023 10:38 p.m. Aus...,Not related to Rule of Law,"[0, 0, 0, 0, 0, 0, 0, 0]"
2951,"Tax reform in Belgium, an emergency for sixty ...",The federal government is trying to reform tax...,"The big tax reform, everyone wants it, but no ...",Not related to Rule of Law,"[0, 0, 0, 0, 0, 0, 0, 0]"


**Sehr schön!**

You can now try playing and process/generate data with Large Language Models by extending these examples on your own data project. As I mentioned in my previous post, there are hundreds of LLMs available for you to play out there. Some of them have their own python library available. In this situation, where setting up workflows with different providers can easily become chaotic, you need to worry. The [Langchain framework](https://www.langchain.com/) comes in handy for this. Langchain is a framework for developing applications powered by language models. It provides tools for connecting your application with different models, managing prompt templates, parsening outcomes, among many other features. In a future blog post, I will elaborate a bit more on the basic use of this framework to facilitate many of the steps we implemented in this example. Until then, farewell my dear three readers.